概述
panic 能够改变程序的控制流,函数调用panic时会立刻停止执行函数的其他代码,并在执行结束后在当前Goroutine中递归执行调用方的延迟函数调用defer;
- recover可以中止panic造成的程序崩溃。它是一个只能在defer中发挥作用的函数,在其他作用域中调用不会发挥任何作用;
当前执行的goroutine中、有一个defer链表的头指针、其实也有一个panic链表头指针。panic链表链起来的是一个一个_panic结构体,和defer链表一样。发生新的panic时,也是在链表头上插入新的_panic结构体,所以链表头上的panic就是当前正在执行的那一个。
现象
- panic 只会触发当前 Goroutine 的延迟defer函数调用;
- recover 只有在 defer 函数中调用才会生效;
- panic 允许在 defer 中嵌套多次调用;
跨协程失效
func main() {defer println("in main")go func() {defer println("in goroutine")panic("")}()time.Sleep(1 * time.Second)}$ go run main.goin goroutinepanic:...
defer 关键字对应的runtime.deferproc会将延迟调用函数与调用方所在 Goroutine 进行关联。所以当程序发生崩溃时只会调用当前 Goroutine 的延迟调用函数也是非常合理的。
失效的崩溃恢复
func main() {defer fmt.Println("in main")if err := recover(); err != nil {fmt.Println(err)}panic("unknown err")}$ go run main.goin mainpanic: unknown errgoroutine 1 [running]:main.main()...exit status 2
panic 关键字在 Go 语言的源代码是由数据结构 runtime._panic 表示的。每当我们调用 panic 都会创建一个如下所示的数据结构存储相关信息:
type _panic struct {argp unsafe.Pointerarg interface{}link *_panicrecovered boolaborted boolpc uintptrsp unsafe.Pointergoexit bool}
- argp 是指向 defer 调用时参数的指针;
- arg 是调用 panic 时传入的参数;
- link 指向了更早调用的
runtime._panic结构; - recovered 表示当前
runtime._panic是否被 recover 恢复; - aborted 表示当前的 panic 是否被强行终止;
程序崩溃
panic 函数是如何终止程序的。编译器会将关键字 panic 转换成 runtime.gopanic,该函数的执行过程包含以下几个步骤:
- 创建新的
runtime._panic结构并添加到所在 Goroutine _panic 链表的最前面; - 在循环中不断从当前 Goroutine 的 _defer 中链表获取
runtime._defer并调用runtime.reflectcall运行延迟调用函数; - 调用
runtime.fatalpanic中止整个程序;
func gopanic(e interface{}) {gp := getg()...var p _panicp.arg = ep.link = gp._panicgp._panic = (*_panic)(noescape(unsafe.Pointer(&p)))for {d := gp._deferif d == nil {break}d._panic = (*_panic)(noescape(unsafe.Pointer(&p)))reflectcall(nil, unsafe.Pointer(d.fn), deferArgs(d), uint32(d.siz), uint32(d.siz))d._panic = nild.fn = nilgp._defer = d.linkfreedefer(d)if p.recovered {...}}fatalpanic(gp._panic)*(*int)(nil) = 0}
panic 内部主要流程是这样:
- 获取当前调用者所在的
g,也就是goroutine - 遍历并执行
g中的defer函数 - 如果
defer函数中有调用recover,并发现已经发生了panic,则将panic标记为recovered - 在遍历
defer的过程中,如果发现已经被标记为recovered,则提取出该defer的 sp 与 pc,保存在g的两个状态码字段中。 - 调用
runtime.mcall切到m->g0并跳转到recovery函数,将前面获取的g作为参数传给recovery函数。runtime.mcall的代码在 go 源码的src/runtime/asm_xxx.s中,xxx是平台类型,如amd64。
哪些情况会出现panic
1. 数组越界
func f() {defer func() {if err := recover(); err != nil {fmt.Println(err)}}()var bar = []int{1}fmt.Println(bar[1])}func main() {f()fmt.Println("exit")}
2. 访问未初始化的指针或 nil 指针
import ("fmt")func foo(){defer func(){if err := recover(); err != nil {fmt.Println(err)}}()var b *intfmt.Println(*b)}func main(){foo()fmt.Println("exit")}
3. 试图往已经 close 的 chan 里发送数据
func foo() {defer func() {if err := recover(); err != nil {fmt.Println(err)}}()var bar = make(chan int, 1)close(bar)bar <- 1}func main() {foo()fmt.Println("exit")}
并发读写相同 map
package mainimport "fmt"func foo() {defer func() {if err := recover(); err != nil {fmt.Println(err)}}()var bar = make(map[int]int)go func() {defer func() {if err := recover(); err != nil {fmt.Println(err)}}()for {bar[1] = 1}}()for {bar[1] = 1}}func main() {foo()fmt.Println("exit")}
对于并发读写 map 的地方,应该对 map 加锁
4. 类型断言
package mainimport ("fmt")func foo(){defer func(){if err := recover(); err != nil {fmt.Println(err)}}()var i interface{} = "abc"_ = i.([]string)}func main(){foo()fmt.Println("exit")}
recover是怎么实现的
func gorecover(argp uintptr) interface{} {// Must be in a function running as part of a deferred call during the panic.// Must be called from the topmost function of the call// (the function used in the defer statement).// p.argp is the argument pointer of that topmost deferred function call.// Compare against argp reported by caller.// If they match, the caller is the one who can recover.gp := getg()p := gp._panicif p != nil && !p.recovered && argp == uintptr(p.argp) {p.recovered = truereturn p.arg}return nil}
recover会先检查gp(current goroutine)是否在panic流程中,如果不是,直接返回nil。所以在普通流程调用recover除了耗费cpu并不会有什么实际作用。
如果确实当前goroutine在panic中,会设置recovered为true;panic流程中在调用完每个defer以后会检查recovered标记,如果为true则会退出panic流程,恢复正常。
recover函数只是设置了recovered标记,那么gouroutine是怎么从panic返回的呢?
从gopanic函数来看,reflectcall调用完recover之后,会在pc、sp中记录当前defer的pc、sp,然后调用recovery;
func gopanic(e interface{}) {gp := getg()if gp.m.curg != gp {print("panic: ")printany(e)print("\n")throw("panic on system stack")}if gp.m.mallocing != 0 {print("panic: ")printany(e)print("\n")throw("panic during malloc")}if gp.m.preemptoff != "" {print("panic: ")printany(e)print("\n")print("preempt off reason: ")print(gp.m.preemptoff)print("\n")throw("panic during preemptoff")}if gp.m.locks != 0 {print("panic: ")printany(e)print("\n")throw("panic holding locks")}var p _panicp.arg = ep.link = gp._panicgp._panic = (*_panic)(noescape(unsafe.Pointer(&p)))atomic.Xadd(&runningPanicDefers, 1)// By calculating getcallerpc/getcallersp here, we avoid scanning the// gopanic frame (stack scanning is slow...)addOneOpenDeferFrame(gp, getcallerpc(), unsafe.Pointer(getcallersp()))for {d := gp._deferif d == nil {break}// If defer was started by earlier panic or Goexit (and, since we're back here, that triggered a new panic),// take defer off list. An earlier panic will not continue running, but we will make sure below that an// earlier Goexit does continue running.// 如果触发 defer 的 panic 是在前一个 panic 或者 Goexit 的 defer 中触发的,那么将前一个 defer 从列表中去除。前一个 panic 或者 Goexit 将不再继续执行。if d.started {if d._panic != nil {d._panic.aborted = true}d._panic = nilif !d.openDefer {// For open-coded defers, we need to process the// defer again, in case there are any other defers// to call in the frame (not including the defer// call that caused the panic).d.fn = nilgp._defer = d.linkfreedefer(d)continue}}// Mark defer as started, but keep on list, so that traceback// can find and update the defer's argument frame if stack growth// or a garbage collection happens before reflectcall starts executing d.fn.d.started = true// Record the panic that is running the defer.// If there is a new panic during the deferred call, that panic// will find d in the list and will mark d._panic (this panic) aborted.d._panic = (*_panic)(noescape(unsafe.Pointer(&p)))// 将 defer 标记为 started,但是保留在列表上,这样,如果在 reflectcall 开始执行 d.fn 之前发生了堆栈增长或垃圾回收,则 traceback 可以找到并更新 defer 的参数帧。done := trueif d.openDefer {done = runOpenDeferFrame(gp, d)if done && !d._panic.recovered {addOneOpenDeferFrame(gp, 0, nil)}} else {p.argp = unsafe.Pointer(getargp(0))reflectcall(nil, unsafe.Pointer(d.fn), deferArgs(d), uint32(d.siz), uint32(d.siz))}p.argp = nil// reflectcall did not panic. Remove d.if gp._defer != d {throw("bad defer entry in panic")}d._panic = nil// trigger shrinkage to test stack copy. See stack_test.go:TestStackPanic//GC()pc := d.pcsp := unsafe.Pointer(d.sp) // must be pointer so it gets adjusted during stack copyif done {d.fn = nilgp._defer = d.linkfreedefer(d)}if p.recovered {gp._panic = p.linkif gp._panic != nil && gp._panic.goexit && gp._panic.aborted {// A normal recover would bypass/abort the Goexit. Instead,// we return to the processing loop of the Goexit.gp.sigcode0 = uintptr(gp._panic.sp)gp.sigcode1 = uintptr(gp._panic.pc)mcall(recovery)throw("bypassed recovery failed") // mcall should not return}atomic.Xadd(&runningPanicDefers, -1)if done {// Remove any remaining non-started, open-coded// defer entries after a recover, since the// corresponding defers will be executed normally// (inline). Any such entry will become stale once// we run the corresponding defers inline and exit// the associated stack frame.d := gp._defervar prev *_deferfor d != nil {if d.openDefer {if d.started {// This defer is started but we// are in the middle of a// defer-panic-recover inside of// it, so don't remove it or any// further defer entriesbreak}if prev == nil {gp._defer = d.link} else {prev.link = d.link}newd := d.linkfreedefer(d)d = newd} else {prev = dd = d.link}}}gp._panic = p.link// Aborted panics are marked but remain on the g.panic list.// Remove them from the list.for gp._panic != nil && gp._panic.aborted {gp._panic = gp._panic.link}if gp._panic == nil { // must be done with signalgp.sig = 0}// Pass information about recovering frame to recovery.gp.sigcode0 = uintptr(sp)gp.sigcode1 = pcmcall(recovery)throw("recovery failed") // mcall should not return}}// ran out of deferred calls - old-school panic now// Because it is unsafe to call arbitrary user code after freezing// the world, we call preprintpanics to invoke all necessary Error// and String methods to prepare the panic strings before startpanic.preprintpanics(gp._panic)fatalpanic(gp._panic) // should not return*(*int)(nil) = 0 // not reached}
在recovery中,设置当前goroutine的sched的sp/pc等,调用gogo切到defer的上下文去。
// Unwind the stack after a deferred function calls recover// after a panic. Then arrange to continue running as though// the caller of the deferred function returned normally.// 在 panic 后,在延迟函数中调用 recover 的时候,将回溯堆栈,并且继续执行,就像延迟函数的调用者正常返回一样。func recovery(gp *g) {// Info about defer passed in G struct.sp := gp.sigcode0pc := gp.sigcode1// d's arguments need to be in the stack.if sp != 0 && (sp < gp.stack.lo || gp.stack.hi < sp) {print("recover: ", hex(sp), " not in [", hex(gp.stack.lo), ", ", hex(gp.stack.hi), "]\n")throw("bad recovery")}// Make the deferproc for this d return again,// this time returning 1. The calling function will// jump to the standard return epilogue.// 让延迟函数的 deferproc 再次返回,这次返回 1 。调用函数将跳转到标准返回结尾。gp.sched.sp = spgp.sched.pc = pcgp.sched.lr = 0gp.sched.ret = 1gogo(&gp.sched)}
为啥从defer上取的sp/pc能回到defer的返回流程上来?来看看defer
defer
defer是一个面向编译器的声明,他会让编译器做两件事:
- 编译器会将defer声明编译为runtime.deferproc(fn),这样运行时,会调用runtime.deferproc,在deferproc中将所有defer挂到goroutine的defer链上;
- 编译器会在函数return之前(注意,是return之前,而不是return xxx之前,后者不是一条原子指令),增加runtime.deferreturn调用;这样运行时,开始处理前面挂在defer链上的所有defer。
deferreturn
先判断链表有没有defer,然后jmpdefer去做defer声明的事情,但jmpdefer魔幻的地方是它会跳回到deferreturn之前,也就是说,会再次deferreturn一下,如果defer链表还有没处理的defer,那么会再这么循环一把,如果空了,那就return,defer处理结束。
recovery是如何从panic切回到normal流程的。
注意
1.利用recover处理panic指令,defer 必须放在 panic 之前定义,另外 recover 只有在 defer 调用的函数中才有效。否则当panic时,recover无法捕获到panic,无法防止panic扩散。2.recover 处理异常后,逻辑并不会恢复到 panic 那个点去,函数跑到 defer 之后的那个点。3.多个 defer 会形成 defer 栈,后定义的 defer 语句会被最先调用。
func recovery(gp *g) {// Info about defer passed in G struct.sp := gp.sigcode0pc := gp.sigcode1// d's arguments need to be in the stack.if sp != 0 && (sp < gp.stack.lo || gp.stack.hi < sp) {print("recover: ", hex(sp), " not in [", hex(gp.stack.lo), ", ", hex(gp.stack.hi), "]\n")throw("bad recovery")}// Make the deferproc for this d return again,// this time returning 1. The calling function will// jump to the standard return epilogue.// 让延迟函数的 deferproc 再次返回,这次返回 1 。调用函数将跳转到标准返回结尾。gp.sched.sp = spgp.sched.pc = pcgp.sched.lr = 0gp.sched.ret = 1gogo(&gp.sched)}
recovery将调用recover函数的defer的pc和sp设置到了当前goroutine的sched上,并且将ret设置为1,然后gogo重新调度。
前面代码可以看到,defer的pc和sp是deferproc下面的代码,也就是下一个defer或者normal处理流程。但这样设置,gogo重新调度不就又回到函数开始的地方了吗?
原来,编译器为defer声明生成的代码,总是会在deferproc后面检查其返回值,如果返回值为0,那么deferproc成功,可以继续处理下一个defer声明或者后面的代码;如果返回值不为0,那么会跳到当前函数的最后,return之前。
也就是说,gogo调度之后,相当于调用了deferproc;由于返回值为1,检查失败,直奔return之前的deferreturn,因此,可以再次进入defer流程。
由于调用recover的defer已经从defer链表上摘掉了,所以可以继续执行之前没完成的defer,并最终返回当前函数的调用者。
panic是怎么退出的
panic退出时会打印调用栈,最终调用exit(-2)退出整个进程。
注意
recover只在defer的函数中有效,如果不是在refer上下文中调用,recover会直接返回nil。
参考
