05|计算机指令
计算机指令集 instruction set
Intel ARM 架构
常见的五大类指令
- 算数类指令
- 数据传输类指令
- 逻辑类指令
- 条件分支指令
- 无条件跳转指令
- 打开win10 上的ubutun
- sudo apt-get install build-essential 安装 gcc
- gcc —version
- vim test.c
- gcc -g -c test.c
- $ objdump -d -M intel -S test.o
06|指令跳转
CPU是如何执行指令的?
- PC寄存器(program counter register)/指令地址寄存器(instruction address register)
用来存放下一条需要处理执行的计算机指令的内存地址。 - 指令寄存器(instruction register),存放当前正在执行的指令
- 条件码寄存器(status register ),用里面的一个一个标记位,存放 CPU进行算数或者逻辑计算的结果
- 通用寄存器,存放数据,地址等,比如整数寄存器,浮点数寄存器,向量寄存器,地址寄存器等
一个程序执行的时候,CPU会根据PC寄存器中的地址,把内存中需要执行的指令读取到指令寄存器里面执行,然后根据指令长度自增,开始顺序读取下一条指令。跳转指令,会修改PC寄存器里面的地址值。
SwitchCase 汇编
paradise@PradiseXPS:~$ cat switch.c
#include <stdio.h>
int main()
{
int a = 3;
int x = 0;
switch(a)
{
case 0:
a = 0;
break;
case 1:
a = 1;
break;
case 2:
a = 2;
break;
case 3:
a = 3;
break;
case 4:
a = 4;
break;
case 5:
a = 5;
break;
case 6:
a = 6;
break;
default:
a = 0;
}
return 0;
}
paradise@PradiseXPS:~$ gcc -g -c switch.c
paradise@PradiseXPS:~$ ls
print.c print.o switch.c switch.o test.c test.o
paradise@PradiseXPS:~$ objdump -d -M intel -S switch.o
switch.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <main>:
#include <stdio.h>
int main()
{
0: 55 push rbp
1: 48 89 e5 mov rbp,rsp
int a = 3;
4: c7 45 f8 03 00 00 00 mov DWORD PTR [rbp-0x8],0x3
int x = 0;
b: c7 45 fc 00 00 00 00 mov DWORD PTR [rbp-0x4],0x0
switch(a)
12: 83 7d f8 06 cmp DWORD PTR [rbp-0x8],0x6
16: 77 63 ja 7b <main+0x7b>
18: 8b 45 f8 mov eax,DWORD PTR [rbp-0x8]
1b: 48 8d 14 85 00 00 00 lea rdx,[rax*4+0x0]
22: 00
23: 48 8d 05 00 00 00 00 lea rax,[rip+0x0] # 2a <main+0x2a>
2a: 8b 04 02 mov eax,DWORD PTR [rdx+rax*1]
2d: 48 63 d0 movsxd rdx,eax
30: 48 8d 05 00 00 00 00 lea rax,[rip+0x0] # 37 <main+0x37>
37: 48 01 d0 add rax,rdx
3a: ff e0 jmp rax
{
case 0:
a = 0;
3c: c7 45 f8 00 00 00 00 mov DWORD PTR [rbp-0x8],0x0
break;
43: eb 3d jmp 82 <main+0x82>
case 1:
a = 1;
45: c7 45 f8 01 00 00 00 mov DWORD PTR [rbp-0x8],0x1
break;
4c: eb 34 jmp 82 <main+0x82>
case 2:
a = 2;
4e: c7 45 f8 02 00 00 00 mov DWORD PTR [rbp-0x8],0x2
break;
55: eb 2b jmp 82 <main+0x82>
case 3:
a = 3;
57: c7 45 f8 03 00 00 00 mov DWORD PTR [rbp-0x8],0x3
break;
5e: eb 22 jmp 82 <main+0x82>
case 4:
a = 4;
60: c7 45 f8 04 00 00 00 mov DWORD PTR [rbp-0x8],0x4
break;
67: eb 19 jmp 82 <main+0x82>
case 5:
a = 5;
69: c7 45 f8 05 00 00 00 mov DWORD PTR [rbp-0x8],0x5
break;
70: eb 10 jmp 82 <main+0x82>
case 6:
a = 6;
72: c7 45 f8 06 00 00 00 mov DWORD PTR [rbp-0x8],0x6
break;
79: eb 07 jmp 82 <main+0x82>
default:
a = 0;
7b: c7 45 f8 00 00 00 00 mov DWORD PTR [rbp-0x8],0x0
}
return 0;
82: b8 00 00 00 00 mov eax,0x0
}
87: 5d pop rbp
88: c3 ret
paradise@PradiseXPS:~$
补充内容
paradise@PradiseXPS:~$ objdump -H
Usage: objdump <option(s)> <file(s)>
Display information from object <file(s)>.
At least one of the following switches must be given:
-a, --archive-headers Display archive header information
-f, --file-headers Display the contents of the overall file header
-p, --private-headers Display object format specific file header contents
-P, --private=OPT,OPT... Display object format specific contents
-h, --[section-]headers Display the contents of the section headers
-x, --all-headers Display the contents of all headers
-d, --disassemble Display assembler contents of executable sections
-D, --disassemble-all Display assembler contents of all sections
-S, --source Intermix source code with disassembly
-s, --full-contents Display the full contents of all sections requested
-g, --debugging Display debug information in object file
-e, --debugging-tags Display debug information using ctags style
-G, --stabs Display (in raw form) any STABS info in the file
-W[lLiaprmfFsoRtUuTgAckK] or
--dwarf[=rawline,=decodedline,=info,=abbrev,=pubnames,=aranges,=macro,=frames,
=frames-interp,=str,=loc,=Ranges,=pubtypes,
=gdb_index,=trace_info,=trace_abbrev,=trace_aranges,
=addr,=cu_index,=links,=follow-links]
Display DWARF info in the file
-t, --syms Display the contents of the symbol table(s)
-T, --dynamic-syms Display the contents of the dynamic symbol table
-r, --reloc Display the relocation entries in the file
-R, --dynamic-reloc Display the dynamic relocation entries in the file
@<file> Read options from <file>
-v, --version Display this program's version number
-i, --info List object formats and architectures supported
-H, --help Display this information
The following switches are optional:
-b, --target=BFDNAME Specify the target object format as BFDNAME
-m, --architecture=MACHINE Specify the target architecture as MACHINE
-j, --section=NAME Only display information for section NAME
-M, --disassembler-options=OPT Pass text OPT on to the disassembler
-EB --endian=big Assume big endian format when disassembling
-EL --endian=little Assume little endian format when disassembling
--file-start-context Include context from start of file (with -S)
-I, --include=DIR Add DIR to search list for source files
-l, --line-numbers Include line numbers and filenames in output
-F, --file-offsets Include file offsets when displaying information
-C, --demangle[=STYLE] Decode mangled/processed symbol names
The STYLE, if specified, can be `auto', `gnu',
`lucid', `arm', `hp', `edg', `gnu-v3', `java'
or `gnat'
-w, --wide Format output for more than 80 columns
-z, --disassemble-zeroes Do not skip blocks of zeroes when disassembling
--start-address=ADDR Only process data whose address is >= ADDR
--stop-address=ADDR Only process data whose address is <= ADDR
--prefix-addresses Print complete address alongside disassembly
--[no-]show-raw-insn Display hex alongside symbolic disassembly
--insn-width=WIDTH Display WIDTH bytes on a single line for -d
--adjust-vma=OFFSET Add OFFSET to all displayed section addresses
--special-syms Include special symbols in symbol dumps
--inlines Print all inlines for source line (with -l)
--prefix=PREFIX Add PREFIX to absolute paths for -S
--prefix-strip=LEVEL Strip initial directory names for -S
--dwarf-depth=N Do not display DIEs at depth N or greater
--dwarf-start=N Display DIEs starting with N, at the same depth
or deeper
--dwarf-check Make additional dwarf internal consistency checks.
objdump: supported targets: elf64-x86-64 elf32-i386 elf32-iamcu elf32-x86-64 a.out-i386-linux pei-i386 pei-x86-64 elf64-l1om elf64-k1om elf64-little elf64-big elf32-little elf32-big pe-x86-64 pe-bigobj-x86-64 pe-i386 plugin srec symbolsrec verilog tekhex binary ihex
objdump: supported architectures: i386 i386:x86-64 i386:x64-32 i8086 i386:intel i386:x86-64:intel i386:x64-32:intel i386:nacl i386:x86-64:nacl i386:x64-32:nacl iamcu iamcu:intel l1om l1om:intel k1om k1om:intel plugin
The following i386/x86-64 specific disassembler options are supported for use
with the -M switch (multiple options should be separated by commas):
x86-64 Disassemble in 64bit mode
i386 Disassemble in 32bit mode
i8086 Disassemble in 16bit mode
att Display instruction in AT&T syntax
intel Display instruction in Intel syntax
att-mnemonic
Display instruction in AT&T mnemonic
intel-mnemonic
Display instruction in Intel mnemonic
addr64 Assume 64bit address size
addr32 Assume 32bit address size
addr16 Assume 16bit address size
data32 Assume 32bit data size
data16 Assume 16bit data size
suffix Always display instruction suffix in AT&T syntax
amd64 Display instruction in AMD64 ISA
intel64 Display instruction in Intel64 ISA
Report bugs to <http://www.sourceware.org/bugzilla/>.
paradise@PradiseXPS:~$
07|函数调用
StackOverflow 栈溢出
函数调用示例程序
paradise@PradiseXPS:~$ cat example.c
// function_example.c
#include <stdio.h>
int static add(int a,int b)
{
return a + b;
}
int main()
{
int x = 5;
int y = 10;
int u = add(x,y);
}
paradise@PradiseXPS:~$ objdump -d -M intel -S example.o
example.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <add>:
// function_example.c
#include <stdio.h>
int static add(int a,int b)
{
0: 55 push rbp
1: 48 89 e5 mov rbp,rsp
4: 89 7d fc mov DWORD PTR [rbp-0x4],edi
7: 89 75 f8 mov DWORD PTR [rbp-0x8],esi
return a + b;
a: 8b 55 fc mov edx,DWORD PTR [rbp-0x4]
d: 8b 45 f8 mov eax,DWORD PTR [rbp-0x8]
10: 01 d0 add eax,edx
}
12: 5d pop rbp
13: c3 ret
0000000000000014 <main>:
int main()
{
14: 55 push rbp
15: 48 89 e5 mov rbp,rsp
18: 48 83 ec 10 sub rsp,0x10
int x = 5;
1c: c7 45 f4 05 00 00 00 mov DWORD PTR [rbp-0xc],0x5
int y = 10;
23: c7 45 f8 0a 00 00 00 mov DWORD PTR [rbp-0x8],0xa
int u = add(x,y);
2a: 8b 55 f8 mov edx,DWORD PTR [rbp-0x8]
2d: 8b 45 f4 mov eax,DWORD PTR [rbp-0xc]
30: 89 d6 mov esi,edx
32: 89 c7 mov edi,eax
34: e8 c7 ff ff ff call 0 <add>
39: 89 45 fc mov DWORD PTR [rbp-0x4],eax
3c: b8 00 00 00 00 mov eax,0x0
}
41: c9 leave
42: c3 ret
paradise@PradiseXPS:~$
call 指令后面跟着的,仍然是跳转后的程序地址;
add函数
函数开始执行了 一条push指令和一条mov指令
函数结束的时候执行了一条pop和一条ret指令
这四条指令的执行其实就是压栈和出栈的操作。
函数调用和条件跳转不同之处在于,执行了内存地址的跳转指令之后,还需要再回来继续执行。
在内存中开辟一段空间,用 栈这个 LIFO后进先出的数据结构。
栈帧 stack frame 整个函数A的所占用的所有的内存空间,就是函数A的栈帧;
push rbp 把之前调用函数,也就是main函数的栈帧的栈底地址,压到栈顶。
mov rbp,rsp 把rsp这个栈指针的值复制到到rbp里,而rsp始终会指向栈顶
rbp - register base pointer (start of stack) 栈帧指针,存放了当前栈帧位置的寄存器
rsp - register stack pointer (current location in stack, growing downwards) 栈指针
小结
学的云里雾里的,有点懵逼了。
这里需要有时间回过头来,回顾下汇编的知识。
08|ELF和静态链接
objdump命令
可执行文件 executale program
目标文件 object file
链接器 linker
通过 gcc -o 参数,可以生成对应的可执行文件
“c语言代码 - 汇编代码 - 机器码”
- 编译compile 汇编 assemble 链接 link
- 通过装载器loader把可执行文件装载load到内存中,CPU从内存中读取指令和数据,开始执行程序。
ELF格式和链接:理解链接过程
ELF executable and linkable file format 可执行与可链接文件格式
如果我们有一个能够解析PE格式的装载器,我们就有可能在linux下运行windows程序了。Wine
WSL windows subsystem for linux ,可以解析家在ELF格式的文件
课后思考:
readelf 读取程序的符号表
objdump 读取重定位表
09|程序装载
把程序装载到内存:
- 可执行程序加载后占用的内存空间应该是连续的
- 我们需要同时加载很多个程序,并且不能让程序自己规定在内存中加载的位置
内存分段 segmentation
内存碎片 memory fragmentation
内存交换 memory swapping
内存分页
课后思考:JAVA程序是如何装载到内存里面的?
10|动态链接
动态链接 dynamic link
静态链接 static link
共享库 shared libraries
dll dynamic-link-library
so shared object
相对地址 relative address
程序链接表 procedure link table
全局偏移表 GOT global offset table
虽然共享库的代码部分的物理内存是共享的,但是数据部分是各个动态链接它的应用
课后思考
节省内存空间
11|二进制编码
原码
补码
ASCII码:用8位二进制中的128个不同的数,映射到128个不同的字符串。
存储数据的时候,要采用二进制序列化的方式,不管是整数也好,浮点数也好,采用二进制序列化会比存储文本省下不少空间。
字符集 charset
字符编码 character encoding
UTF
手持两把锟斤拷,口中疾呼烫烫烫
脚踏千朵屯屯屯,笑看万物锘锘锘
12|理解电路
信息传播的历史
继电器 relay
反向器 inverter