概述

panic 能够改变程序的控制流,函数调用panic时会立刻停止执行函数的其他代码,并在执行结束后在当前Goroutine中递归执行调用方的延迟函数调用defer;

  • recover可以中止panic造成的程序崩溃。它是一个只能在defer中发挥作用的函数,在其他作用域中调用不会发挥任何作用;

当前执行的goroutine中、有一个defer链表的头指针、其实也有一个panic链表头指针。panic链表链起来的是一个一个_panic结构体,和defer链表一样。发生新的panic时,也是在链表头上插入新的_panic结构体,所以链表头上的panic就是当前正在执行的那一个。

现象

  • panic 只会触发当前 Goroutine 的延迟defer函数调用;
  • recover 只有在 defer 函数中调用才会生效;
  • panic 允许在 defer 中嵌套多次调用;

跨协程失效

  1. func main() {
  2. defer println("in main")
  3. go func() {
  4. defer println("in goroutine")
  5. panic("")
  6. }()
  7. time.Sleep(1 * time.Second)
  8. }
  9. $ go run main.go
  10. in goroutine
  11. panic:
  12. ...

defer 关键字对应的runtime.deferproc会将延迟调用函数与调用方所在 Goroutine 进行关联。所以当程序发生崩溃时只会调用当前 Goroutine 的延迟调用函数也是非常合理的。

失效的崩溃恢复

  1. func main() {
  2. defer fmt.Println("in main")
  3. if err := recover(); err != nil {
  4. fmt.Println(err)
  5. }
  6. panic("unknown err")
  7. }
  8. $ go run main.go
  9. in main
  10. panic: unknown err
  11. goroutine 1 [running]:
  12. main.main()
  13. ...
  14. exit status 2

panic 关键字在 Go 语言的源代码是由数据结构 runtime._panic 表示的。每当我们调用 panic 都会创建一个如下所示的数据结构存储相关信息:

  1. type _panic struct {
  2. argp unsafe.Pointer
  3. arg interface{}
  4. link *_panic
  5. recovered bool
  6. aborted bool
  7. pc uintptr
  8. sp unsafe.Pointer
  9. goexit bool
  10. }
  1. argp 是指向 defer 调用时参数的指针;
  2. arg 是调用 panic 时传入的参数;
  3. link 指向了更早调用的 runtime._panic 结构;
  4. recovered 表示当前 runtime._panic 是否被 recover 恢复;
  5. aborted 表示当前的 panic 是否被强行终止;

程序崩溃

panic 函数是如何终止程序的。编译器会将关键字 panic 转换成 runtime.gopanic,该函数的执行过程包含以下几个步骤:

  1. 创建新的 runtime._panic 结构并添加到所在 Goroutine _panic 链表的最前面;
  2. 在循环中不断从当前 Goroutine 的 _defer 中链表获取 runtime._defer 并调用 runtime.reflectcall 运行延迟调用函数;
  3. 调用 runtime.fatalpanic 中止整个程序;
  1. func gopanic(e interface{}) {
  2. gp := getg()
  3. ...
  4. var p _panic
  5. p.arg = e
  6. p.link = gp._panic
  7. gp._panic = (*_panic)(noescape(unsafe.Pointer(&p)))
  8. for {
  9. d := gp._defer
  10. if d == nil {
  11. break
  12. }
  13. d._panic = (*_panic)(noescape(unsafe.Pointer(&p)))
  14. reflectcall(nil, unsafe.Pointer(d.fn), deferArgs(d), uint32(d.siz), uint32(d.siz))
  15. d._panic = nil
  16. d.fn = nil
  17. gp._defer = d.link
  18. freedefer(d)
  19. if p.recovered {
  20. ...
  21. }
  22. }
  23. fatalpanic(gp._panic)
  24. *(*int)(nil) = 0
  25. }

panic 内部主要流程是这样:

  • 获取当前调用者所在的 g ,也就是 goroutine
  • 遍历并执行 g 中的 defer 函数
  • 如果 defer 函数中有调用 recover ,并发现已经发生了 panic ,则将 panic 标记为 recovered
  • 在遍历 defer 的过程中,如果发现已经被标记为 recovered ,则提取出该 defer 的 sp 与 pc,保存在 g 的两个状态码字段中。
  • 调用 runtime.mcall 切到 m->g0 并跳转到 recovery 函数,将前面获取的 g 作为参数传给 recovery 函数。runtime.mcall 的代码在 go 源码的 src/runtime/asm_xxx.s 中,xxx 是平台类型,如 amd64

哪些情况会出现panic

1. 数组越界

  1. func f() {
  2. defer func() {
  3. if err := recover(); err != nil {
  4. fmt.Println(err)
  5. }
  6. }()
  7. var bar = []int{1}
  8. fmt.Println(bar[1])
  9. }
  10. func main() {
  11. f()
  12. fmt.Println("exit")
  13. }

2. 访问未初始化的指针或 nil 指针

  1. import (
  2. "fmt"
  3. )
  4. func foo(){
  5. defer func(){
  6. if err := recover(); err != nil {
  7. fmt.Println(err)
  8. }
  9. }()
  10. var b *int
  11. fmt.Println(*b)
  12. }
  13. func main(){
  14. foo()
  15. fmt.Println("exit")
  16. }

3. 试图往已经 close 的 chan 里发送数据

  1. func foo() {
  2. defer func() {
  3. if err := recover(); err != nil {
  4. fmt.Println(err)
  5. }
  6. }()
  7. var bar = make(chan int, 1)
  8. close(bar)
  9. bar <- 1
  10. }
  11. func main() {
  12. foo()
  13. fmt.Println("exit")
  14. }

并发读写相同 map
  1. package main
  2. import "fmt"
  3. func foo() {
  4. defer func() {
  5. if err := recover(); err != nil {
  6. fmt.Println(err)
  7. }
  8. }()
  9. var bar = make(map[int]int)
  10. go func() {
  11. defer func() {
  12. if err := recover(); err != nil {
  13. fmt.Println(err)
  14. }
  15. }()
  16. for {
  17. bar[1] = 1
  18. }
  19. }()
  20. for {
  21. bar[1] = 1
  22. }
  23. }
  24. func main() {
  25. foo()
  26. fmt.Println("exit")
  27. }

对于并发读写 map 的地方,应该对 map 加锁

4. 类型断言

  1. package main
  2. import (
  3. "fmt"
  4. )
  5. func foo(){
  6. defer func(){
  7. if err := recover(); err != nil {
  8. fmt.Println(err)
  9. }
  10. }()
  11. var i interface{} = "abc"
  12. _ = i.([]string)
  13. }
  14. func main(){
  15. foo()
  16. fmt.Println("exit")
  17. }

recover是怎么实现的

  1. func gorecover(argp uintptr) interface{} {
  2. // Must be in a function running as part of a deferred call during the panic.
  3. // Must be called from the topmost function of the call
  4. // (the function used in the defer statement).
  5. // p.argp is the argument pointer of that topmost deferred function call.
  6. // Compare against argp reported by caller.
  7. // If they match, the caller is the one who can recover.
  8. gp := getg()
  9. p := gp._panic
  10. if p != nil && !p.recovered && argp == uintptr(p.argp) {
  11. p.recovered = true
  12. return p.arg
  13. }
  14. return nil
  15. }

recover会先检查gp(current goroutine)是否在panic流程中,如果不是,直接返回nil。所以在普通流程调用recover除了耗费cpu并不会有什么实际作用。

如果确实当前goroutine在panic中,会设置recovered为true;panic流程中在调用完每个defer以后会检查recovered标记,如果为true则会退出panic流程,恢复正常。

recover函数只是设置了recovered标记,那么gouroutine是怎么从panic返回的呢?

从gopanic函数来看,reflectcall调用完recover之后,会在pc、sp中记录当前defer的pc、sp,然后调用recovery;

  1. func gopanic(e interface{}) {
  2. gp := getg()
  3. if gp.m.curg != gp {
  4. print("panic: ")
  5. printany(e)
  6. print("\n")
  7. throw("panic on system stack")
  8. }
  9. if gp.m.mallocing != 0 {
  10. print("panic: ")
  11. printany(e)
  12. print("\n")
  13. throw("panic during malloc")
  14. }
  15. if gp.m.preemptoff != "" {
  16. print("panic: ")
  17. printany(e)
  18. print("\n")
  19. print("preempt off reason: ")
  20. print(gp.m.preemptoff)
  21. print("\n")
  22. throw("panic during preemptoff")
  23. }
  24. if gp.m.locks != 0 {
  25. print("panic: ")
  26. printany(e)
  27. print("\n")
  28. throw("panic holding locks")
  29. }
  30. var p _panic
  31. p.arg = e
  32. p.link = gp._panic
  33. gp._panic = (*_panic)(noescape(unsafe.Pointer(&p)))
  34. atomic.Xadd(&runningPanicDefers, 1)
  35. // By calculating getcallerpc/getcallersp here, we avoid scanning the
  36. // gopanic frame (stack scanning is slow...)
  37. addOneOpenDeferFrame(gp, getcallerpc(), unsafe.Pointer(getcallersp()))
  38. for {
  39. d := gp._defer
  40. if d == nil {
  41. break
  42. }
  43. // If defer was started by earlier panic or Goexit (and, since we're back here, that triggered a new panic),
  44. // take defer off list. An earlier panic will not continue running, but we will make sure below that an
  45. // earlier Goexit does continue running.
  46. // 如果触发 defer 的 panic 是在前一个 panic 或者 Goexit 的 defer 中触发的,那么将前一个 defer 从列表中去除。前一个 panic 或者 Goexit 将不再继续执行。
  47. if d.started {
  48. if d._panic != nil {
  49. d._panic.aborted = true
  50. }
  51. d._panic = nil
  52. if !d.openDefer {
  53. // For open-coded defers, we need to process the
  54. // defer again, in case there are any other defers
  55. // to call in the frame (not including the defer
  56. // call that caused the panic).
  57. d.fn = nil
  58. gp._defer = d.link
  59. freedefer(d)
  60. continue
  61. }
  62. }
  63. // Mark defer as started, but keep on list, so that traceback
  64. // can find and update the defer's argument frame if stack growth
  65. // or a garbage collection happens before reflectcall starts executing d.fn.
  66. d.started = true
  67. // Record the panic that is running the defer.
  68. // If there is a new panic during the deferred call, that panic
  69. // will find d in the list and will mark d._panic (this panic) aborted.
  70. d._panic = (*_panic)(noescape(unsafe.Pointer(&p)))
  71. // 将 defer 标记为 started,但是保留在列表上,这样,如果在 reflectcall 开始执行 d.fn 之前发生了堆栈增长或垃圾回收,则 traceback 可以找到并更新 defer 的参数帧。
  72. done := true
  73. if d.openDefer {
  74. done = runOpenDeferFrame(gp, d)
  75. if done && !d._panic.recovered {
  76. addOneOpenDeferFrame(gp, 0, nil)
  77. }
  78. } else {
  79. p.argp = unsafe.Pointer(getargp(0))
  80. reflectcall(nil, unsafe.Pointer(d.fn), deferArgs(d), uint32(d.siz), uint32(d.siz))
  81. }
  82. p.argp = nil
  83. // reflectcall did not panic. Remove d.
  84. if gp._defer != d {
  85. throw("bad defer entry in panic")
  86. }
  87. d._panic = nil
  88. // trigger shrinkage to test stack copy. See stack_test.go:TestStackPanic
  89. //GC()
  90. pc := d.pc
  91. sp := unsafe.Pointer(d.sp) // must be pointer so it gets adjusted during stack copy
  92. if done {
  93. d.fn = nil
  94. gp._defer = d.link
  95. freedefer(d)
  96. }
  97. if p.recovered {
  98. gp._panic = p.link
  99. if gp._panic != nil && gp._panic.goexit && gp._panic.aborted {
  100. // A normal recover would bypass/abort the Goexit. Instead,
  101. // we return to the processing loop of the Goexit.
  102. gp.sigcode0 = uintptr(gp._panic.sp)
  103. gp.sigcode1 = uintptr(gp._panic.pc)
  104. mcall(recovery)
  105. throw("bypassed recovery failed") // mcall should not return
  106. }
  107. atomic.Xadd(&runningPanicDefers, -1)
  108. if done {
  109. // Remove any remaining non-started, open-coded
  110. // defer entries after a recover, since the
  111. // corresponding defers will be executed normally
  112. // (inline). Any such entry will become stale once
  113. // we run the corresponding defers inline and exit
  114. // the associated stack frame.
  115. d := gp._defer
  116. var prev *_defer
  117. for d != nil {
  118. if d.openDefer {
  119. if d.started {
  120. // This defer is started but we
  121. // are in the middle of a
  122. // defer-panic-recover inside of
  123. // it, so don't remove it or any
  124. // further defer entries
  125. break
  126. }
  127. if prev == nil {
  128. gp._defer = d.link
  129. } else {
  130. prev.link = d.link
  131. }
  132. newd := d.link
  133. freedefer(d)
  134. d = newd
  135. } else {
  136. prev = d
  137. d = d.link
  138. }
  139. }
  140. }
  141. gp._panic = p.link
  142. // Aborted panics are marked but remain on the g.panic list.
  143. // Remove them from the list.
  144. for gp._panic != nil && gp._panic.aborted {
  145. gp._panic = gp._panic.link
  146. }
  147. if gp._panic == nil { // must be done with signal
  148. gp.sig = 0
  149. }
  150. // Pass information about recovering frame to recovery.
  151. gp.sigcode0 = uintptr(sp)
  152. gp.sigcode1 = pc
  153. mcall(recovery)
  154. throw("recovery failed") // mcall should not return
  155. }
  156. }
  157. // ran out of deferred calls - old-school panic now
  158. // Because it is unsafe to call arbitrary user code after freezing
  159. // the world, we call preprintpanics to invoke all necessary Error
  160. // and String methods to prepare the panic strings before startpanic.
  161. preprintpanics(gp._panic)
  162. fatalpanic(gp._panic) // should not return
  163. *(*int)(nil) = 0 // not reached
  164. }

在recovery中,设置当前goroutine的sched的sp/pc等,调用gogo切到defer的上下文去。

  1. // Unwind the stack after a deferred function calls recover
  2. // after a panic. Then arrange to continue running as though
  3. // the caller of the deferred function returned normally.
  4. // 在 panic 后,在延迟函数中调用 recover 的时候,将回溯堆栈,并且继续执行,就像延迟函数的调用者正常返回一样。
  5. func recovery(gp *g) {
  6. // Info about defer passed in G struct.
  7. sp := gp.sigcode0
  8. pc := gp.sigcode1
  9. // d's arguments need to be in the stack.
  10. if sp != 0 && (sp < gp.stack.lo || gp.stack.hi < sp) {
  11. print("recover: ", hex(sp), " not in [", hex(gp.stack.lo), ", ", hex(gp.stack.hi), "]\n")
  12. throw("bad recovery")
  13. }
  14. // Make the deferproc for this d return again,
  15. // this time returning 1. The calling function will
  16. // jump to the standard return epilogue.
  17. // 让延迟函数的 deferproc 再次返回,这次返回 1 。调用函数将跳转到标准返回结尾。
  18. gp.sched.sp = sp
  19. gp.sched.pc = pc
  20. gp.sched.lr = 0
  21. gp.sched.ret = 1
  22. gogo(&gp.sched)
  23. }

为啥从defer上取的sp/pc能回到defer的返回流程上来?来看看defer

defer

defer是一个面向编译器的声明,他会让编译器做两件事:

  1. 编译器会将defer声明编译为runtime.deferproc(fn),这样运行时,会调用runtime.deferproc,在deferproc中将所有defer挂到goroutine的defer链上;
  2. 编译器会在函数return之前(注意,是return之前,而不是return xxx之前,后者不是一条原子指令),增加runtime.deferreturn调用;这样运行时,开始处理前面挂在defer链上的所有defer。

deferreturn

先判断链表有没有defer,然后jmpdefer去做defer声明的事情,但jmpdefer魔幻的地方是它会跳回到deferreturn之前,也就是说,会再次deferreturn一下,如果defer链表还有没处理的defer,那么会再这么循环一把,如果空了,那就return,defer处理结束。

recovery是如何从panic切回到normal流程的。

注意

  1. 1.利用recover处理panic指令,defer 必须放在 panic 之前定义,另外 recover 只有在 defer 调用的函数中才有效。否则当panic时,recover无法捕获到panic,无法防止panic扩散。
  2. 2.recover 处理异常后,逻辑并不会恢复到 panic 那个点去,函数跑到 defer 之后的那个点。
  3. 3.多个 defer 会形成 defer 栈,后定义的 defer 语句会被最先调用。


  1. func recovery(gp *g) {
  2. // Info about defer passed in G struct.
  3. sp := gp.sigcode0
  4. pc := gp.sigcode1
  5. // d's arguments need to be in the stack.
  6. if sp != 0 && (sp < gp.stack.lo || gp.stack.hi < sp) {
  7. print("recover: ", hex(sp), " not in [", hex(gp.stack.lo), ", ", hex(gp.stack.hi), "]\n")
  8. throw("bad recovery")
  9. }
  10. // Make the deferproc for this d return again,
  11. // this time returning 1. The calling function will
  12. // jump to the standard return epilogue.
  13. // 让延迟函数的 deferproc 再次返回,这次返回 1 。调用函数将跳转到标准返回结尾。
  14. gp.sched.sp = sp
  15. gp.sched.pc = pc
  16. gp.sched.lr = 0
  17. gp.sched.ret = 1
  18. gogo(&gp.sched)
  19. }

recovery将调用recover函数的defer的pc和sp设置到了当前goroutine的sched上,并且将ret设置为1,然后gogo重新调度。

前面代码可以看到,defer的pc和sp是deferproc下面的代码,也就是下一个defer或者normal处理流程。但这样设置,gogo重新调度不就又回到函数开始的地方了吗?

原来,编译器为defer声明生成的代码,总是会在deferproc后面检查其返回值,如果返回值为0,那么deferproc成功,可以继续处理下一个defer声明或者后面的代码;如果返回值不为0,那么会跳到当前函数的最后,return之前

也就是说,gogo调度之后,相当于调用了deferproc;由于返回值为1,检查失败,直奔return之前的deferreturn,因此,可以再次进入defer流程。

由于调用recover的defer已经从defer链表上摘掉了,所以可以继续执行之前没完成的defer,并最终返回当前函数的调用者。

panic是怎么退出的

panic退出时会打印调用栈,最终调用exit(-2)退出整个进程。

注意

recover只在defer的函数中有效,如果不是在refer上下文中调用,recover会直接返回nil。

参考

Golang: 深入理解panic and recover
Effective Go
面向信仰编程