https://software.intel.com/en-us/articles/introduction-to-x64-assembly
不同的汇编语言
MASM
Microsoft Assembler,微软开发的汇编器,主要用于Windows平台上的开发。MASM是指用于16-bit/32-bit平台的汇编器,而ML64则是MASM在64-bit平台上的汇编器。
NASM/YASM
Netwide Assembler,Linux平台下流行的汇编器,主要用于Linux平台的开发,也可以用于Windows平台。
YASM完全兼容NASM,是在BSD协议下的完全重写。
FASM
Flat Assembler,另一个汇编器,在Windows和Linux下都可用,但并不是这两个平台的默认汇编器。
可用寄存器
https://wiki.osdev.org/CPU_Registers_x86-64
The 64-bit versions of the ‘original’ x86 registers are named:
- rax - register a extended
- rbx - register b extended
- rcx - register c extended
- rdx - register d extended
- rbp - register base pointer (start of stack)
- rsp - register stack pointer (current location in stack, growing downwards)
- rsi - register source index (source for data copies)
- rdi - register destination index (destination for data copies)
The registers added for 64-bit mode are named:
- r8 - register 8
- r9 - register 9
- r10 - register 10
- r11 - register 11
- r12 - register 12
- r13 - register 13
- r14 - register 14
- r15 - register 15
此外,eax表示rax的低32位,ax表示rax的低16位,al表示低8位,ah表示低16位中较高的8位;对于r开头的寄存器而言规则有所区别,r8d表示低32位,r8w表示低16位,r8b表示低8位,可以看到这些命名更规范了。
函数调用约定
Intel(MS)的汇编方法调用规范和Linux不同,阅读代码时需要留意
Intel C Style function calling:
For the Microsoft* x64 calling convention, the additional register space let fastcall be the only calling convention (under x86 there were many: stdcall, thiscall, fastcall, cdecl, etc.). The rules for interfacing with C/C++ style functions:
- RCX, RDX, R8, R9 are used for integer and pointer arguments in that order left to right.
- XMM0, 1, 2, and 3 are used for floating point arguments.
- Additional arguments are pushed on the stack left to right.
- Parameters less than 64 bits long are not zero extended; the high bits contain garbage.
- It is the caller’s responsibility to allocate 32 bytes of “shadow space” (for storing RCX, RDX, R8, and R9 if needed) before calling the function.
- It is the caller’s responsibility to clean the stack after the call.
- Integer return values (similar to x86) are returned in RAX if 64 bits or less.
- Floating point return values are returned in XMM0.
- Larger return values (structs) have space allocated on the stack by the caller, and RCX then contains a pointer to the return space when the callee is called. Register usage for integer parameters is then pushed one to the right. RAX returns this address to the caller.
- The stack is 16-byte aligned. The “call” instruction pushes an 8-byte return value, so the all non-leaf functions must adjust the stack by a value of the form 16n+8 when allocating stack space.
- Registers RAX, RCX, RDX, R8, R9, R10, and R11 are considered volatile and must be considered destroyed on function calls.
- RBX, RBP, RDI, RSI, R12, R14, R14, and R15 must be saved in any function using them.
- Note there is no calling convention for the floating point (and thus MMX) registers.
- Further details (varargs, exception handling, stack unwinding) are at Microsoft’s site.
Linux ABI:
参数顺序:
整型:rdi, rsi, rdx, rcx, r8, r9
浮点:xmm0~xmm7
放不下的,从右到左push到stack上,这样取出来的时候顺序就是从左到右的自然顺序
返回地址:[rsp], 16字节对齐
被调用方需维护的寄存器:rbp, rbx, r12~r15
调用方需维护:XMCSR, x87 control word, 通常不会用到x87指令,因此调用方不怎么需要做这些事
返回值:
整数:rax或者rdx:rax
浮点数:xmm0或xmm1:xmm0(对于标准C,浮点数最大为double,64位,用不到xmm1)
https://cs.lmu.edu/~ray/notes/nasmtutorial/
- From left to right, pass as many parameters as will fit in registers. The order in which registers are allocated, are:
- For integers and pointers,
rdi
,rsi
,rdx
,rcx
,r8
,r9
. - For floating-point (float, double),
xmm0
,xmm1
,xmm2
,xmm3
,xmm4
,xmm5
,xmm6
,xmm7
.
- For integers and pointers,
- Additional parameters are pushed on the stack, right to left, and are to be removed by the caller after the call.
- After the parameters are pushed, the call instruction is made, so when the called function gets control, the return address is at
[rsp]
, the first memory parameter is at[rsp+8]
, etc. - The stack pointer
rsp
must be aligned to a 16-byte boundary before making a call. Fine, but the process of making a call pushes the return address (8 bytes) on the stack, so when a function gets control,rsp
is not aligned. You have to make that extra space yourself, by pushing something or subtracting 8 fromrsp
. - The only registers that the called function is required to preserve (the calle-save registers) are:
rbp
,rbx
,r12
,r13
,r14
,r15
. All others are free to be changed by the called function. - The callee is also supposed to save the control bits of the XMCSR and the x87 control word, but x87 instructions are rare in 64-bit code so you probably don’t have to worry about this.
- Integers are returned in
rax
orrdx:rax
, and floating point values are returned inxmm0
orxmm1:xmm0
. - 此外,Linux还规定函数栈顶之后的128个字节为’red zone’,函数可以任意使用这部分区域存储局部变量,使用时无需修改rsp寄存器。
寻址模式
$Imm是直接数,不涉及寄存器存取或者内存存取;
ra代表寄存器内容,用简写R[ra]表示ra寄存器的内容;
Imm(rb, ri, s) 代表:M[Imm + R(rb) + R(ri) * s],用M[…]代表某处内存的内容
- Imm: 立即数
- rb: 基址寄存器
- ri: 变址寄存器
- s: 比例因子
条件码寄存器
假设刚刚利用ADD执行完了t=a+b(不存在a-b指令SUB,因为补码减法可以用a加上b的负数来表达,一个ADD配合一个NEG即可),那么:
CF:t < a,无符号指令执行时会被设置,表示溢出
ZF:t == 0,结果为零
SF:t < 0,结果为负
OF:(a < 0 == b < 0) && (t < 0 != a < 0),a和b的符号相同但结果和a或者b符号不同,即发生有符号溢出
指令列表
- 数据传送指令
- MOV(b, w, l, q, absq)
- params: S, D
- result: D <- S
- 其中,movl会对高32bit置零,movabsq用于传送立即数到指定位置
- MOVZ(bw, bl, wl, bq, wq)
- params: S, D
- result: D <- S
- 用于传输位数不同的数据,以0填充左侧位,源操作数必须比目标位数少,不存在movzlq是因为它相当于movl,movl已经会将高32位置0了
- MOVS(bw, bl, wl, bq, wq, lq)
- 类似MOVZ,但以符号位填充左侧位
- MOV(b, w, l, q, absq)
- 栈操作指令
- pushq
- params: S
- result: R[%rsp] <- R[%rsp] - 8; M[R[%rsp]] <- S
- 将指定的参数压入栈,压栈操作是先移动栈指针让出空间,然后将S写入到被让出的空间中
- popq
- params: D
- result: D <- M[R[%rsp]]; R[%rsp] <- R[%rsp] + 8
- 将指定的参数出栈,写入到参数指定的地址,出栈过程和压栈相反
- pushq
- 算数操作
- leaq:
- params: S, D
- result: D <- &S
- 按照寻址模式规则对地址进行计算,但不取地址处的值,而是取计算好的地址,写入到指定位置
- INC, DEC, NEG, NOT
- params: D
- 将D地址值加1减1,或者取负取补后,将结果写入原位置D
- ADD, SUB, IMUL, XOR, OR, AND
- params: S, D
- 执行完D+S, D-S等之后,将结果写入D
- SAL, SHL, SHR, SAR
- params: k, D
- 将D值按照立即数k进行算数或逻辑左右移,然后将结果写入D
- leaq:
- 特殊算数操作,涉及多个事先约定的寄存器
- imulq, mulq
- params: S
- result: R[%rdx]: R[%rax] <- S * R[%rax]
- i表示有符号版本,计算S * %rax, 结果最高有128位,将高64位存入%rdx, 低64位存入%rax
- cqto
- 无参数
- 利用%rax的符号位填满%rdx,通常在使用下面的除法指令时,如果被除数仅有64位,那么就需要用该指令对被除数的高64位也就是%rdx使用0或者符号位进行填充,一般利用xor %rdx, %rdx来清零,而利用cqto来填充符号位
- idivq, divq
- params: S
- R[%rdx] <- R[%rdx] : R[%rax] mod S
- R[%rax] <- R[%rdx] : R[%rax] / S
- imulq, mulq
- 比较指令,专用于修改条件码寄存器
- CMP(b, w, l, q)
- params: S1, S2
- 基于S2-S1,设置条件码寄存器
- TEST(b, w, l, q)
- params: S1, S2
- 基于S1 & S2,设置条件码寄存器
- CMP(b, w, l, q)
- 根据条件码寄存器设置值
- set(e, z, ne, nz, s, ns, g, ge, nle, nl, l, le, nge, ng, a, nbe, ae, nb, b, nae, be, na)
- e = equal, z = zero(e和z效果相同), s = sign(有符号位,负数)
- g = greater, l = less (有符号判断大小于)
- a = above, b = below(无符号判断大小于)
- n = not, 否定后面的其他符号(因此nl和ge效果相同,”不小于”等同”大于等于”)
- set(e, z, ne, nz, s, ns, g, ge, nle, nl, l, le, nge, ng, a, nbe, ae, nb, b, nae, be, na)
- 跳转指令,跳转到指定标号位置,在链接阶段标号会被实际的地址替代
- jmp, je, jne, jz, jnz, js, jns, jg, jnle, jge, jnl, jl, jnge, jle, jng, ja, jnbe, jae, jnb, jb, jnae, jbe, jna
- 和set一样,jmp指令也有一堆条件码寄存器变种,和set变种判断逻辑一致,不一样的是set负责将判断结果写入目标位置,而jmp当判断结果为真时就会执行跳转
- jmp是无条件跳转,可以跳转到任意间接地址或标号(即直接地址,最终链接完就是一个固定地址),而其他变种只能跳转到标号
- 寻址
- PC相对寻址:相对下一条指令的位置指定的偏移量,比较常见,因为编码较紧凑,只有一个指令字节和一个相对偏移量字节
- 绝对寻址
- (有时会看到repz;retq这样的指令序列,ret的意义很明显,而rep其实原本是用于实现重复字符串操作的,但repz只是一个空指令,这样做的原因是为了兼容AMD处理器的实现问题,避免AMD处理器无法正确执行分支预测导致代码执行变慢,不会有其他影响。)
- jmp, je, jne, jz, jnz, js, jns, jg, jnle, jge, jnl, jl, jnge, jle, jng, ja, jnbe, jae, jnb, jb, jnae, jbe, jna
- 条件传送指令
- cmov(e, z, ne, nz, s, ns, g, nle, ge, nl, l, nge, le, ng, a, nbe, ae, nb, b, nae, be, na)
- 参数:S, R,源可以是寄存器或者内存地址,但目的必须是寄存器
- 后缀部分和上面其他的条件含义一致
- 条件传送指令效率上优于条件跳转指令,因此对于除了赋值之外无副作用的分支逻辑,编译器会优先通过条件传送指令实现,但如何判定是否有副作用比较复杂,为了用条件传送必须事先计算出可能的两个值,有时计算的代价可能会大于条件跳转的代价,这方面的知识就是另一门课程‘编译原理’的知识点了;
- 过程调用指令
- call
- 参数:函数名称
- 效果:将rip存入栈顶,并跳转到函数处执行
- ret
- 参数:无
- 效果:读取栈顶地址并跳转执行
- call
- 浮点传送指令(SSE/AVX指令)
- vmovss, vmovsd
- 参数(源, 目的):X, M / M, X
- X表示128位的XMM寄存器
- M表示内存地址,vmovss用于传送单精度浮点数,vmovsd用于传送双精度浮点数,因此M指向的内容必须是4字节或8字节的
- 参数(源, 目的):X, M / M, X
- vomvaps
- 参数(源, 目的):X, X
- vmovss, vmovsd
- 浮点转换指令
- vcvtt(ss2si, sd2si, ss2siq, sd2siq)
- ss表示单精度,sd表示双精度,siq表示四字节整数
- 参数(源, 目的):X/M, R
- 用于将浮点数转为整数并存储到目标寄存器,转换过程中会发生截断
- vcvt(si2ss, si2sd, si2ssq, si2sdq)
- 参数(源, 意义不明的源2, 目的):M/R, X, X
- 用于将整数转为浮点数,此时第二个参数都会被设置为和目的一样的XMM寄存器
- vcvtt(ss2si, sd2si, ss2siq, sd2siq)
- 浮点运算指令(AVX2指令)
- vaddss, vsubss, vmulss, vdivss, vminss, vmaxss, sqrtss
- 以上包含了加减乘除、取最大最小值、求平方根
- 除了sqrtss, 其他指令都接受三个操作数,分别是S1,S2,D
- D <- S2
S1,或者D <- S1
- vxorps, vandps
- 对浮点数的全部128位执行位操作,xor表示异或,and表示与
- 参数:S1,S2,D
- vucomiss, vucomisd
- 对浮点数进行比较,和com指令类似,根据S2 - S1对条件码寄存器进行设置
- 当S1或者S2当中有一个是NaN时,还会设置PF,即奇偶校验位,此时认为比较失败了
- vaddss, vsubss, vmulss, vdivss, vminss, vmaxss, sqrtss