1) Leveling up with gdb
2) Stack mechanics
3) Pass params by “channeling”
4) Recursion
- ulimit -a 可以看到stack size栈大小
- Tail call和 Tail recursion
5) Debugging stack misuses
- gdb watchpoint
- gets, the sequel

1) Leveling up with gdb

Examining stack frames

Printing registers, auto-display

undisplay可以取消显示

Pro-tip: breakpoint commands

2) Stack mechanics

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int leaf(int arr[], int unused)
{
    int val = arr[0];
    return val;
}
int locals(int a, int b)
{
    // single local variables likely to end up registers
    // array will be placed on stack
    int arr[] = {1, 2, 3};
    int x = a * 14;
    int y = b - x;
    int z = y * b;
    int q = 10 + z + a;
    int p = q + b;
    return leaf(arr, p);
}
int calls(const char *param)
{
    return strspn(param, "aeiou") +  atoi(param);
}
int main(int argc, char *argv[])
{
    printf("This program demonstrates various behaviors of the runtime stack.\n");
    int val = calls("Stanford");
    val += locals(7, 1);
    printf("All done, goodbye!\n");
    return val;
}

main函数

calls函数

%rbp保存栈地址。
此处存的代码段字符串：

main中调用了calls函数，main中下一行指令的地址是0x0000555555555225，进入calls函数后，打印当前的栈中的数据：

可以看到就是0x0000555555555225，也就是main中下一行指令的地址。而main在调用calls函数之前，push了%rbx，%rbx中的内容是：
，正好是栈的上一个8byte中存的内容。

leaf函数

传入参数为7和1，，arr存入了栈stack中，很自然，因为不是malloc之类的函数在堆heap中分配出来的，所以在栈中，
ni运行到<+32>处，查看对应栈地址中的数据：

就是对应的arr的1、2、3。其他在寄存器中的局部变量简单一些。

calls函数中先讲%rbp和%rbx寄存器中的值压入栈中，%rbp是栈底，%rbp和%rbx也是callee-saved寄存器，也就是需要被调者（子函数）事先保存的寄存器，所以如calls函数之后，会先将%rbp和%rbx压入栈中。执行<+25>处的指令前，栈底地址为0，执行此处地址后：

调用了atoi函数，并讲%rax的值mov到%rbp中，仍然时0，保存了strspn的返回值。也就是说atoi函数返回值是0。后面讲0xa放到了%edx中，也就是作为strtol的第三个参数10。esi为第二个参数，是NULL，也就是0，%rbx放到了%rdi为第一个参数，里面的值是0x55555555600a，也就是字符串”Stanford”的地址。
继续运行到调用strtol结束：

%rbp中保存了strspn的返回值，%rax中保存了刚刚结束的strtol函数的返回值，二者加起来，就是calls函数的返回值，保存到rax中，然后后面三行汇编指令就是恢复栈的状态，然后retq返回。结束calls函数。

参考深入理解计算机系统书的3.7.5 Local Storage in Registers。
在编译器的设计中有个概念叫做“被调用者保存”和“调用者保存”，可以近似的按子函数保存和父函数保存对应来理解，这一概念的出现完全是由于寄存器资源个数有限造成的（为了寄存器复用）。当父函数在调用子函数时，由于子函数可能访问到父函数用于保存数值的寄存器，为了互不干扰和造成覆盖，编译器就制定了相应的规则。%rbx, %rbp, 和 %r12–%r15被划分为“被调用者保存”寄存器，同样的，这些寄存器上有值，在使用前需要被调用者（子函数）自己想办法帮调用者（父函数）进行备份，具体方法就是子函数在覆盖他们之前，先进行入栈备份，等子函数返回时，再出栈还原父函数运行时这些寄存器上的旧值；其他所有的寄存器，除了%rsp，都被划分为“调用者保存”寄存器，顾名思义，这些寄存器上面存储的值，需要调用者（父函数）自己想办法先备份好，否则过会子函数直接使用这些寄存器时将无情的覆盖。如何备份？当然是事先压入栈中，等子函数调用完，再通过出栈恢复这些寄存器原本在父函数运行时的旧值；
简而言之，寄存器要被无数父函数子函数重复使用，如何合理使用寄存器？那只能是通过栈来进行备份，这样才能使得寄存器在特定的运行时段存储相应合理的值。区别仅仅是具体某个寄存器的备份工作是由父函数来操作还是由子函数来操作，这就由编译器来进行约定俗成。
可以看到locals函数有很多局部变量，main函数中只有一个局部变量val，保存在%rbx寄存器中，其他寄存器中的信息不需要保存，而locals函数中没有用到rbx寄存器，而是直接使用了调用者保存寄存器，也就是被调用者拥有的寄存器。

这是atoi函数的原型，是一个inline函数（内联函数）。

如图中看到的，代码中没有callq atoi的相关代码，只callq了atoi函数中的strtol函数，也就是说，对于inline函数，编译器会直接将其源码展开在调用的地方，而不是去call调用它。

3) Pass params by “channeling”

Review the code in channel.c . The init_array and sum_array function each declare a stack array. The init function sets the array values, the sum function adds the array values. Neither init nor sum takes any arguments. There is a warning from the compiler about sum_array , what is it?

The program calls init_array() then sum_array() . Despite no explicit pass/return between the two calls, the array seems to be magically passed as intended. How are these functions communicating?

main的汇编代码，运行到init_array之前。

可以看到init_array和sum_array是连着调用的，没有进行任何寄存器准备。
进入init_array：

关于cltq：https://www.coder.work/article/6786387。cltq等效于movslq %eax, %rax。
根据上图，数组是保存在栈中，执行完for循环后，栈的地址及数据如下：

结束函数时，rsp加了0x38，也就是从0x7fffffffddc0变成了0x7fffffffddf8：

retq之后，rsp再次加8，变成了0x7fffffffde00。

进入sum_array：

继续输入si运行

注意在main中进入sum_array时，call指令将下一条指令的地址入栈了，栈顶寄存器rsp自减 8 bytes，从0x7fffffffde00变成了0x7fffffffddf8，
可以看到0x7fffffffddf8处保存了main中call sum_array的下一条指令的地址0x00005555555552ba，然后是在0x7fffffffddf0处push rbx指令产生的0x0000555555555350。
继续执行

rsp + 0x30之后，变成了0x7fffffffddc0，也就是init_array函数存放数组的地方，至此，这个问题就非常清晰了，这两个函数的stack里面的sums数组都是同一个地址，所以能成功运行。

The program’s second attempt to channel the array from init to sum fails. What is different about this case?

第二次运行sum_array函数并执行完for循环之后：

0x7fffffffddc0处仍然是数组地址。

返回之后rsp为0x7fffffffde00，

调用puts函数之前数组地址中的数据仍然没有被破坏。

执行puts之后，数组内容已经被破坏了，而进入sum_array之前的rsp内容仍然是0x7fffffffde00，

第二次sum_array中的nums数组仍然指向0x7fffffffddc0，但是里面的内容已经在前面被破坏，所以输出不一样了。

4) Recursion

unsigned int是32位，算13的阶乘的时候开始溢出了。

Let’s push things a bit further. Look at the code and trace the behavior of the call
factorial(-1) . Do you and your partner agree on the expected result? Run the program
on the argument -1 and see its result. Didn’t I once tell you that a segmentation fault can
only come from memory mismanagement and use of bad numbers isn’t usually
catastrophic? How then is an invalid number causing a memory problem? Run fact
under gdb and use backtrace to find out where the program was executing at the time
of the crash. Does this shed some light on what has gone wrong?
Get into gdb and figure out how big each factorial stack frame is. Here are two
different strategies; try both!
Disassemble factorial and scan its instructions. You are looking for those
instructions that change the %rsp . Each pushq and callq instruction copies an 8-
byte value to the stack and decrements %rsp . The %rsp can also be explicitly
moved to open up space for local variables or scratch results. (Factorial has no such
manual adjustment of %rsp . The only data stored in its frame are saved registers
placed there by push/call instructions.) Find those instructions in factorial and
sum the total number of bytes stored in its stack frame.
Set a breakpoint on factorial , let a few calls through, and then use the gdb
commands info frame 1 and info frame 2 to examine the top two stack
frames. Look for the frame addresses labeled Stack frame at and subtract the
two to compute the size of a single frame.
These two approaches should give you the same answer. Do they?

ulimit -a 可以看到stack size栈大小

Tail call和 Tail recursion

-O2级优化之后

5) Debugging stack misuses

可以看到i在数组的第5个位置上，当i为4的时候，数组的前4个已经初始化为0了，继续执行

i为5，判断条件是大于5才退出循环，所以继续跳转到前面，继续ni执行，下面执行到了关键的地方。

这时如果执行这条指令，那么i的这个位置上的5会被赋值为0，那么后面的jle指令就会让程序继续跳转到的地方，所以是无限循环。

造成这个的原因是，i被保存在了stack中，i被保存的地址是在arr数组内部，而且arr的元素都被赋值为了一个小于N_ELEMS的数字。
如果i是从0开始，那么就不会重叠，就不是无限循环。
这种错误叫做 “stomping” or “clobbering” memory