05|计算机指令

计算机指令集 instruction set
Intel ARM 架构

常见的五大类指令

  1. 算数类指令
  2. 数据传输类指令
  3. 逻辑类指令
  4. 条件分支指令
  5. 无条件跳转指令

极客时间计算机组成原理专栏笔记(二) - 图1

  1. 打开win10 上的ubutun
  2. sudo apt-get install build-essential 安装 gcc
  3. gcc —version
  4. vim test.c
  5. gcc -g -c test.c
  6. $ objdump -d -M intel -S test.o

06|指令跳转

CPU是如何执行指令的?

  1. PC寄存器(program counter register)/指令地址寄存器(instruction address register)
    用来存放下一条需要处理执行的计算机指令的内存地址。
  2. 指令寄存器(instruction register),存放当前正在执行的指令
  3. 条件码寄存器(status register ),用里面的一个一个标记位,存放 CPU进行算数或者逻辑计算的结果
  4. 通用寄存器,存放数据,地址等,比如整数寄存器,浮点数寄存器,向量寄存器,地址寄存器等

一个程序执行的时候,CPU会根据PC寄存器中的地址,把内存中需要执行的指令读取到指令寄存器里面执行,然后根据指令长度自增,开始顺序读取下一条指令。跳转指令,会修改PC寄存器里面的地址值。

SwitchCase 汇编

  1. paradise@PradiseXPS:~$ cat switch.c
  2. #include <stdio.h>
  3. int main()
  4. {
  5. int a = 3;
  6. int x = 0;
  7. switch(a)
  8. {
  9. case 0:
  10. a = 0;
  11. break;
  12. case 1:
  13. a = 1;
  14. break;
  15. case 2:
  16. a = 2;
  17. break;
  18. case 3:
  19. a = 3;
  20. break;
  21. case 4:
  22. a = 4;
  23. break;
  24. case 5:
  25. a = 5;
  26. break;
  27. case 6:
  28. a = 6;
  29. break;
  30. default:
  31. a = 0;
  32. }
  33. return 0;
  34. }
  35. paradise@PradiseXPS:~$ gcc -g -c switch.c
  36. paradise@PradiseXPS:~$ ls
  37. print.c print.o switch.c switch.o test.c test.o
  38. paradise@PradiseXPS:~$ objdump -d -M intel -S switch.o
  39. switch.o: file format elf64-x86-64
  40. Disassembly of section .text:
  41. 0000000000000000 <main>:
  42. #include <stdio.h>
  43. int main()
  44. {
  45. 0: 55 push rbp
  46. 1: 48 89 e5 mov rbp,rsp
  47. int a = 3;
  48. 4: c7 45 f8 03 00 00 00 mov DWORD PTR [rbp-0x8],0x3
  49. int x = 0;
  50. b: c7 45 fc 00 00 00 00 mov DWORD PTR [rbp-0x4],0x0
  51. switch(a)
  52. 12: 83 7d f8 06 cmp DWORD PTR [rbp-0x8],0x6
  53. 16: 77 63 ja 7b <main+0x7b>
  54. 18: 8b 45 f8 mov eax,DWORD PTR [rbp-0x8]
  55. 1b: 48 8d 14 85 00 00 00 lea rdx,[rax*4+0x0]
  56. 22: 00
  57. 23: 48 8d 05 00 00 00 00 lea rax,[rip+0x0] # 2a <main+0x2a>
  58. 2a: 8b 04 02 mov eax,DWORD PTR [rdx+rax*1]
  59. 2d: 48 63 d0 movsxd rdx,eax
  60. 30: 48 8d 05 00 00 00 00 lea rax,[rip+0x0] # 37 <main+0x37>
  61. 37: 48 01 d0 add rax,rdx
  62. 3a: ff e0 jmp rax
  63. {
  64. case 0:
  65. a = 0;
  66. 3c: c7 45 f8 00 00 00 00 mov DWORD PTR [rbp-0x8],0x0
  67. break;
  68. 43: eb 3d jmp 82 <main+0x82>
  69. case 1:
  70. a = 1;
  71. 45: c7 45 f8 01 00 00 00 mov DWORD PTR [rbp-0x8],0x1
  72. break;
  73. 4c: eb 34 jmp 82 <main+0x82>
  74. case 2:
  75. a = 2;
  76. 4e: c7 45 f8 02 00 00 00 mov DWORD PTR [rbp-0x8],0x2
  77. break;
  78. 55: eb 2b jmp 82 <main+0x82>
  79. case 3:
  80. a = 3;
  81. 57: c7 45 f8 03 00 00 00 mov DWORD PTR [rbp-0x8],0x3
  82. break;
  83. 5e: eb 22 jmp 82 <main+0x82>
  84. case 4:
  85. a = 4;
  86. 60: c7 45 f8 04 00 00 00 mov DWORD PTR [rbp-0x8],0x4
  87. break;
  88. 67: eb 19 jmp 82 <main+0x82>
  89. case 5:
  90. a = 5;
  91. 69: c7 45 f8 05 00 00 00 mov DWORD PTR [rbp-0x8],0x5
  92. break;
  93. 70: eb 10 jmp 82 <main+0x82>
  94. case 6:
  95. a = 6;
  96. 72: c7 45 f8 06 00 00 00 mov DWORD PTR [rbp-0x8],0x6
  97. break;
  98. 79: eb 07 jmp 82 <main+0x82>
  99. default:
  100. a = 0;
  101. 7b: c7 45 f8 00 00 00 00 mov DWORD PTR [rbp-0x8],0x0
  102. }
  103. return 0;
  104. 82: b8 00 00 00 00 mov eax,0x0
  105. }
  106. 87: 5d pop rbp
  107. 88: c3 ret
  108. paradise@PradiseXPS:~$

补充内容

  1. paradise@PradiseXPS:~$ objdump -H
  2. Usage: objdump <option(s)> <file(s)>
  3. Display information from object <file(s)>.
  4. At least one of the following switches must be given:
  5. -a, --archive-headers Display archive header information
  6. -f, --file-headers Display the contents of the overall file header
  7. -p, --private-headers Display object format specific file header contents
  8. -P, --private=OPT,OPT... Display object format specific contents
  9. -h, --[section-]headers Display the contents of the section headers
  10. -x, --all-headers Display the contents of all headers
  11. -d, --disassemble Display assembler contents of executable sections
  12. -D, --disassemble-all Display assembler contents of all sections
  13. -S, --source Intermix source code with disassembly
  14. -s, --full-contents Display the full contents of all sections requested
  15. -g, --debugging Display debug information in object file
  16. -e, --debugging-tags Display debug information using ctags style
  17. -G, --stabs Display (in raw form) any STABS info in the file
  18. -W[lLiaprmfFsoRtUuTgAckK] or
  19. --dwarf[=rawline,=decodedline,=info,=abbrev,=pubnames,=aranges,=macro,=frames,
  20. =frames-interp,=str,=loc,=Ranges,=pubtypes,
  21. =gdb_index,=trace_info,=trace_abbrev,=trace_aranges,
  22. =addr,=cu_index,=links,=follow-links]
  23. Display DWARF info in the file
  24. -t, --syms Display the contents of the symbol table(s)
  25. -T, --dynamic-syms Display the contents of the dynamic symbol table
  26. -r, --reloc Display the relocation entries in the file
  27. -R, --dynamic-reloc Display the dynamic relocation entries in the file
  28. @<file> Read options from <file>
  29. -v, --version Display this program's version number
  30. -i, --info List object formats and architectures supported
  31. -H, --help Display this information
  32. The following switches are optional:
  33. -b, --target=BFDNAME Specify the target object format as BFDNAME
  34. -m, --architecture=MACHINE Specify the target architecture as MACHINE
  35. -j, --section=NAME Only display information for section NAME
  36. -M, --disassembler-options=OPT Pass text OPT on to the disassembler
  37. -EB --endian=big Assume big endian format when disassembling
  38. -EL --endian=little Assume little endian format when disassembling
  39. --file-start-context Include context from start of file (with -S)
  40. -I, --include=DIR Add DIR to search list for source files
  41. -l, --line-numbers Include line numbers and filenames in output
  42. -F, --file-offsets Include file offsets when displaying information
  43. -C, --demangle[=STYLE] Decode mangled/processed symbol names
  44. The STYLE, if specified, can be `auto', `gnu',
  45. `lucid', `arm', `hp', `edg', `gnu-v3', `java'
  46. or `gnat'
  47. -w, --wide Format output for more than 80 columns
  48. -z, --disassemble-zeroes Do not skip blocks of zeroes when disassembling
  49. --start-address=ADDR Only process data whose address is >= ADDR
  50. --stop-address=ADDR Only process data whose address is <= ADDR
  51. --prefix-addresses Print complete address alongside disassembly
  52. --[no-]show-raw-insn Display hex alongside symbolic disassembly
  53. --insn-width=WIDTH Display WIDTH bytes on a single line for -d
  54. --adjust-vma=OFFSET Add OFFSET to all displayed section addresses
  55. --special-syms Include special symbols in symbol dumps
  56. --inlines Print all inlines for source line (with -l)
  57. --prefix=PREFIX Add PREFIX to absolute paths for -S
  58. --prefix-strip=LEVEL Strip initial directory names for -S
  59. --dwarf-depth=N Do not display DIEs at depth N or greater
  60. --dwarf-start=N Display DIEs starting with N, at the same depth
  61. or deeper
  62. --dwarf-check Make additional dwarf internal consistency checks.
  63. objdump: supported targets: elf64-x86-64 elf32-i386 elf32-iamcu elf32-x86-64 a.out-i386-linux pei-i386 pei-x86-64 elf64-l1om elf64-k1om elf64-little elf64-big elf32-little elf32-big pe-x86-64 pe-bigobj-x86-64 pe-i386 plugin srec symbolsrec verilog tekhex binary ihex
  64. objdump: supported architectures: i386 i386:x86-64 i386:x64-32 i8086 i386:intel i386:x86-64:intel i386:x64-32:intel i386:nacl i386:x86-64:nacl i386:x64-32:nacl iamcu iamcu:intel l1om l1om:intel k1om k1om:intel plugin
  65. The following i386/x86-64 specific disassembler options are supported for use
  66. with the -M switch (multiple options should be separated by commas):
  67. x86-64 Disassemble in 64bit mode
  68. i386 Disassemble in 32bit mode
  69. i8086 Disassemble in 16bit mode
  70. att Display instruction in AT&T syntax
  71. intel Display instruction in Intel syntax
  72. att-mnemonic
  73. Display instruction in AT&T mnemonic
  74. intel-mnemonic
  75. Display instruction in Intel mnemonic
  76. addr64 Assume 64bit address size
  77. addr32 Assume 32bit address size
  78. addr16 Assume 16bit address size
  79. data32 Assume 32bit data size
  80. data16 Assume 16bit data size
  81. suffix Always display instruction suffix in AT&T syntax
  82. amd64 Display instruction in AMD64 ISA
  83. intel64 Display instruction in Intel64 ISA
  84. Report bugs to <http://www.sourceware.org/bugzilla/>.
  85. paradise@PradiseXPS:~$

07|函数调用

StackOverflow 栈溢出

函数调用示例程序

  1. paradise@PradiseXPS:~$ cat example.c
  2. // function_example.c
  3. #include <stdio.h>
  4. int static add(int a,int b)
  5. {
  6. return a + b;
  7. }
  8. int main()
  9. {
  10. int x = 5;
  11. int y = 10;
  12. int u = add(x,y);
  13. }
  14. paradise@PradiseXPS:~$ objdump -d -M intel -S example.o
  15. example.o: file format elf64-x86-64
  16. Disassembly of section .text:
  17. 0000000000000000 <add>:
  18. // function_example.c
  19. #include <stdio.h>
  20. int static add(int a,int b)
  21. {
  22. 0: 55 push rbp
  23. 1: 48 89 e5 mov rbp,rsp
  24. 4: 89 7d fc mov DWORD PTR [rbp-0x4],edi
  25. 7: 89 75 f8 mov DWORD PTR [rbp-0x8],esi
  26. return a + b;
  27. a: 8b 55 fc mov edx,DWORD PTR [rbp-0x4]
  28. d: 8b 45 f8 mov eax,DWORD PTR [rbp-0x8]
  29. 10: 01 d0 add eax,edx
  30. }
  31. 12: 5d pop rbp
  32. 13: c3 ret
  33. 0000000000000014 <main>:
  34. int main()
  35. {
  36. 14: 55 push rbp
  37. 15: 48 89 e5 mov rbp,rsp
  38. 18: 48 83 ec 10 sub rsp,0x10
  39. int x = 5;
  40. 1c: c7 45 f4 05 00 00 00 mov DWORD PTR [rbp-0xc],0x5
  41. int y = 10;
  42. 23: c7 45 f8 0a 00 00 00 mov DWORD PTR [rbp-0x8],0xa
  43. int u = add(x,y);
  44. 2a: 8b 55 f8 mov edx,DWORD PTR [rbp-0x8]
  45. 2d: 8b 45 f4 mov eax,DWORD PTR [rbp-0xc]
  46. 30: 89 d6 mov esi,edx
  47. 32: 89 c7 mov edi,eax
  48. 34: e8 c7 ff ff ff call 0 <add>
  49. 39: 89 45 fc mov DWORD PTR [rbp-0x4],eax
  50. 3c: b8 00 00 00 00 mov eax,0x0
  51. }
  52. 41: c9 leave
  53. 42: c3 ret
  54. paradise@PradiseXPS:~$

call 指令后面跟着的,仍然是跳转后的程序地址;

add函数

函数开始执行了 一条push指令和一条mov指令
函数结束的时候执行了一条pop和一条ret指令
这四条指令的执行其实就是压栈和出栈的操作。

函数调用和条件跳转不同之处在于,执行了内存地址的跳转指令之后,还需要再回来继续执行。
在内存中开辟一段空间,用 栈这个 LIFO后进先出的数据结构。

栈帧 stack frame 整个函数A的所占用的所有的内存空间,就是函数A的栈帧;

push rbp 把之前调用函数,也就是main函数的栈帧的栈底地址,压到栈顶。
mov rbp,rsp 把rsp这个栈指针的值复制到到rbp里,而rsp始终会指向栈顶

rbp - register base pointer (start of stack) 栈帧指针,存放了当前栈帧位置的寄存器
rsp - register stack pointer (current location in stack, growing downwards) 栈指针

小结

学的云里雾里的,有点懵逼了。
这里需要有时间回过头来,回顾下汇编的知识。

08|ELF和静态链接

objdump命令

可执行文件 executale program
目标文件 object file
链接器 linker

通过 gcc -o 参数,可以生成对应的可执行文件

“c语言代码 - 汇编代码 - 机器码”

  1. 编译compile 汇编 assemble 链接 link
  2. 通过装载器loader把可执行文件装载load到内存中,CPU从内存中读取指令和数据,开始执行程序。

image.png

ELF格式和链接:理解链接过程

ELF executable and linkable file format 可执行与可链接文件格式

如果我们有一个能够解析PE格式的装载器,我们就有可能在linux下运行windows程序了。Wine
WSL windows subsystem for linux ,可以解析家在ELF格式的文件

课后思考:

readelf 读取程序的符号表
objdump 读取重定位表

09|程序装载

把程序装载到内存:

  1. 可执行程序加载后占用的内存空间应该是连续的
  2. 我们需要同时加载很多个程序,并且不能让程序自己规定在内存中加载的位置

内存分段 segmentation

内存碎片 memory fragmentation
内存交换 memory swapping

内存分页

课后思考:JAVA程序是如何装载到内存里面的?

10|动态链接

动态链接 dynamic link
静态链接 static link
共享库 shared libraries

dll dynamic-link-library
so shared object

相对地址 relative address

程序链接表 procedure link table
全局偏移表 GOT global offset table
虽然共享库的代码部分的物理内存是共享的,但是数据部分是各个动态链接它的应用

课后思考

节省内存空间

11|二进制编码

整数,二进制与十六进制转化
负数

原码

补码

ASCII码:用8位二进制中的128个不同的数,映射到128个不同的字符串。

存储数据的时候,要采用二进制序列化的方式,不管是整数也好,浮点数也好,采用二进制序列化会比存储文本省下不少空间。

字符集 charset
字符编码 character encoding

UTF

手持两把锟斤拷,口中疾呼烫烫烫
脚踏千朵屯屯屯,笑看万物锘锘锘

12|理解电路

信息传播的历史
继电器 relay
反向器 inverter

13|加法器

14|乘法器

15|浮点数和定点数