GCC -O优化选项 - GCC-O2优化在多线程中可能出现的问题 - 《Linux/Unix系统编程》

代码如下所示
do_useless 函数在这没有用到，不用管它。
运行环境为：

代码如下所示

#include <stdio.h>
#include "mythreads.h"
int done = 0;
void do_useless(int num){
    int sum = 0;
    for(int i=0; i< num; i++){
        for (int j = 0; j < num; ++j) {
            for (int k = 0; k < num; ++k) {
                sum += 1;
            }
        }
    }
}
void* worker(void* arg) {
    printf("this should print first\n");
    done = 1;
    printf("end \n");
    return NULL;
}
int main(int argc, char *argv[]) {
    pthread_t p;
    Pthread_create(&p, NULL, worker, NULL);
    while (done == 0)
    ;
    printf("this should print last\n");
    return 0;
}

do_useless 函数在这没有用到，不用管它。

运行环境为：

gcc (Debian 7.3.0-19) 7.3.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

默认优化情况下 main函数 部分汇编如下：
d2c:    48 8d 15 ac ff ff ff     lea    -0x54(%rip),%rdx        # cdf <worker>
 d33:    be 00 00 00 00           mov    $0x0,%esi
 d38:    48 89 c7                 mov    %rax,%rdi
 d3b:    e8 a2 fe ff ff           callq  be2 <Pthread_create>
 d40:    90                       nop
 d41:    8b 05 3d 13 20 00        mov    0x20133d(%rip),%eax        # 202084 <done>
 d47:    85 c0                    test   %eax,%eax
 d49:    74 f6                    je     d41 <main+0x2d>
 d4b:    48 8d 3d cf 00 00 00     lea    0xcf(%rip),%rdi        # e21 <_IO_stdin_used+0x41>
 d52:    e8 b9 fa ff ff           callq  810 <puts@plt>

在这没有什么特别之处，其中 objdump 给出的 202084 的 d41 行就是讲内存中的 done 值载入到 eax 中。然后判读 eax 是否为 0, 接着跳转到 d41 处。逻辑很清晰，没有什么问题。而 o2 的情况如下所示

o2优化后 部分main函数的汇编如下（会导致死循环，出不来）
 90d:    8b 05 71 17 20 00        mov    0x201771(%rip),%eax        # 202084 <done>
 913:    85 c0                    test   %eax,%eax
 915:    75 09                    jne    920 <main+0x30>
 917:    eb fe                    jmp    917 <main+0x27>
 919:    0f 1f 80 00 00 00 00     nopl   0x0(%rax)
 920:    48 8d 3d 5e 04 00 00     lea    0x45e(%rip),%rdi        # d85 <_IO_stdin_used+0x35>
 927:    e8 e4 fe ff ff           callq  810 <puts@plt>

经过 o2 优化后这一个逻辑就不太容易理解了，90d 那一行把 done 值载入到 eax 中，紧接着判断 eax 是否为 0，如果不为 0，直接跳转到最后的 printf 语句，然而 eax 为 0 的时候会运行到 917，紧接着就一直是 917 jmp 917，出现死循环出不来的现象。理论上来说这段简单的代码不应该出现这种死循环的情况，然而经过优化后就会出现这种情况，按照道理编译器优化也没有什么问题，如果这是单线程的情况下，只有自己会改 done 的值，所以载入一次就行了。这种优化过后的结果有点类似与 java 中的可见性问题，就是 done 值写了，但是实际上 main 线程并没有看见 done 写的结果。