asm inline assembly

Inline assembly

  Inline assembly refers to the assembly code embedded in the C/C++ code. Unlike assembly source files that are all assembly, they are embedded in the C/C++ environment.

One, gcc inline assembly

  The format of gcc inline assembly is as follows:

 
  1. asm ( 汇编语句

  2. : 输出操作数 // 非必需

  3. : 输入操作数 // 非必需

  4. : 其他被污染的寄存器 // 非必需

  5. );

  We use a simple example to understand its format (gcc_add.c):

 
  1. #include <stdio.h>

  2.  
  3. int main()

  4. {

  5. int a=1, b=2, c=0;

  6.  
  7. // 蛋疼的 add 操作

  8. asm(

  9. "addl %2, %0" // 1

  10. : "=g"(c) // 2

  11. : "0"(a), "g"(b) // 3

  12. : "memory"); // 4

  13.  
  14. printf("现在c是:%d\n", c);

  15. return 0;

  16. }

  In inline assembly:

  1. The first line is the assembly statement, enclosed in double quotation marks, and multiple statements are separated by; or \n\t.
  2. The second line is the output operand, which is in the form of "=?" (var), var can be any memory variable (the output result will be stored in this variable), and? Are generally the following identifiers (inline assembly What to use to proxy this operand):

    • a, b, c, d, S, D represent eax, ebx, ecx, edx, esi, edi registers respectively
    • Any one of the registers above r (use whoever is idle)
    • m memory
    • i immediate value (constant, only used for input operand)
    • g Registers, memory, and immediate data will do (gcc you can figure it out)

    In the assembly, the% serial number is used to represent these input/output operands, and the serial number starts from 0. In order to distinguish it from the operand, the register is led out with two %, such as: %%eax

  3. The third line is the input operand, which is in the form of "?" (var) . In addition to the identifiers above, it can also be the serial number of the output operand, which means that var is used to initialize the output operand . The above In the program, %0 and %1 are one thing, initialized to 1 (the value of a).
  4. Line 4 identifies those registers that have been modified in the assembly code but are not listed in the input/output list, so that gcc will not use these "dangerous" registers without authorization. You can also use "memory" to indicate that the memory has been modified in inline assembly, and the memory variables previously cached in registers need to be read again.

  The effect of the above paragraph of inline assembly is that the sum of a and b is stored in c. Of course, this is just a sample program, it hurts whoever really uses it like this, inline assembly is generally used when it has to be used .

Two, VC inline assembly

  The gcc inline assembly is designed to be very complicated, and beginners tend to get big heads, while the inline assembly of VC is much simpler:

 
  1. __asm{

  2. 汇编语句

  3. }

  An example program is as follows (vc_add.c):

 
  1. #include <stdio.h>

  2.  
  3. int main()

  4. {

  5. int a=1, b=2, c=0;

  6.  
  7. // 蛋疼的 add 操作

  8. __asm{

  9. push eax // 保护 eax

  10.  
  11. mov eax, a // eax = a;

  12. add eax, b // eax = eax + b;

  13. mov c, eax // c = eax;

  14.  
  15. pop eax // 恢复 eax

  16. }

  17.  
  18. printf("现在c是:%d\n", c);

  19. return 0;

  20. }

  In VC's inline assembly, local variables can be used directly in the form of variable names, which is much more convenient. However, some variable names in VC inline assembly are reserved, such as size, and an error will be reported when these variable names are used (change b to size, and the above program will fail to compile). So, be careful when naming it!

  Because VC does not have a list of input/output operands, it does not look at your assembly code (use it directly), so it does not know which registers you have modified. These registers to be modified may hold important data, so use push/ pop to protect/restore the register to be modified. But gcc doesn't need it. It can obtain rich information from the input/output list to adjust the use of each register and optimize it, so VC is a complete failure in terms of efficiency!

Three, why use inline assembly

  The main purpose of using inline assembly is to improve efficiency: suppose there is a program diff that compares text differences, and it spends 99% of the time on the function strcmp, if you use inline assembly to achieve an efficient strcmp than using C language The realization is twice as fast, so the expert's thoughts on this small function can increase the efficiency of the entire program by almost twice, which is worthy of "pretty care".

  Another purpose is to realize the parts that cannot be realized by C language, such as IO operation, and the self-modification of esp register mentioned in the previous article must also be realized by assembly.

Four, memcpy

  The best tutor for learning gcc inline assembly is the linux kernel. There are many commonly used small functions such as memcpy, strlen, strcpy,... and there are short and succinct inline assembly versions, such as the memcpy function in linux 2.6.37:

 
  1. // 位于 /arch/x86/boot/compressed/misc.c

  2. void *memcpy(void *dest, const void *src, size_t n)

  3. {

  4. int d0, d1, d2;

  5. asm volatile(

  6. "rep ; movsl\n\t"

  7. "movl %4,%%ecx\n\t"

  8. "rep ; movsb\n\t"

  9. : "=&c" (d0), "=&D" (d1), "=&S" (d2)

  10. : "0" (n >> 2), "g" (n & 3), "1" (dest), "2" (src)

  11. : "memory");

  12.  
  13. return dest;

  14. }

  Compared with gcc_add.c, this function is much more complicated:

  • The keyword volatile tells gcc not to try to move or delete this inline assembly.
  • rep; The workflow of movsl is as follows:

     
    1. while(ecx) {

    2. movl (%esi), (%edi);

    3. esi += 4;

    4. edi += 4;

    5. ecx--;

    6. }

    rep; movsb is similar to this, except that each copy is not a double word (4 bytes), but a byte.

  • "=&D" (d1) does not want to output the final value of edi to d1, but wants to tell gcc that the value of edi has changed a long time ago. Don't think that its value is still dest when it was initialized. Avoid "stingy" gcc. The modified edi is also used as dest. And d0, d1, d2 will be ignored by gcc after the optimization is turned on (the values ​​output to them are not used).

  memcpy first copies the double words one by one, and at the end, if there are any uncompleted (less than 4 bytes), then copy them one byte by one. The d_printf I finally implemented imitated this function.

Guess you like

Origin blog.csdn.net/wyyy2088511/article/details/111566758
ASM