article - panic 和 recover - 《computer》

概述
现象
哪些情况会出现panic
recover是怎么实现的
defer
panic是怎么退出的

概述

panic 能够改变程序的控制流，函数调用panic时会立刻停止执行函数的其他代码，并在执行结束后在当前Goroutine中递归执行调用方的延迟函数调用defer；

recover可以中止panic造成的程序崩溃。它是一个只能在defer中发挥作用的函数，在其他作用域中调用不会发挥任何作用；

当前执行的goroutine中、有一个defer链表的头指针、其实也有一个panic链表头指针。panic链表链起来的是一个一个_panic结构体，和defer链表一样。发生新的panic时，也是在链表头上插入新的_panic结构体，所以链表头上的panic就是当前正在执行的那一个。

现象

panic 只会触发当前 Goroutine 的延迟defer函数调用；
recover 只有在 defer 函数中调用才会生效；
panic 允许在 defer 中嵌套多次调用；

跨协程失效

func main() {
    defer println("in main")
    go func() {
        defer println("in goroutine")
        panic("")
    }()
    time.Sleep(1 * time.Second)
}
$ go run main.go
in goroutine
panic:
...

defer 关键字对应的runtime.deferproc会将延迟调用函数与调用方所在 Goroutine 进行关联。所以当程序发生崩溃时只会调用当前 Goroutine 的延迟调用函数也是非常合理的。

失效的崩溃恢复

func main() {
    defer fmt.Println("in main")
    if err := recover(); err != nil {
        fmt.Println(err)
    }
    panic("unknown err")
}
$ go run main.go
in main
panic: unknown err
goroutine 1 [running]:
main.main()
    ...
exit status 2

panic 关键字在 Go 语言的源代码是由数据结构 runtime._panic 表示的。每当我们调用 panic 都会创建一个如下所示的数据结构存储相关信息：

type _panic struct {
    argp      unsafe.Pointer
    arg       interface{}
    link      *_panic
    recovered bool
    aborted   bool
    pc        uintptr
    sp        unsafe.Pointer
    goexit    bool
}

argp 是指向 defer 调用时参数的指针；
arg 是调用 panic 时传入的参数；
link 指向了更早调用的 runtime._panic 结构；
recovered 表示当前 runtime._panic 是否被 recover 恢复；
aborted 表示当前的 panic 是否被强行终止；

程序崩溃

panic 函数是如何终止程序的。编译器会将关键字 panic 转换成 runtime.gopanic，该函数的执行过程包含以下几个步骤：

创建新的 runtime._panic 结构并添加到所在 Goroutine _panic 链表的最前面；
在循环中不断从当前 Goroutine 的 _defer 中链表获取 runtime._defer 并调用 runtime.reflectcall 运行延迟调用函数；
调用 runtime.fatalpanic 中止整个程序；

func gopanic(e interface{}) {
    gp := getg()
    ...
    var p _panic
    p.arg = e
    p.link = gp._panic
    gp._panic = (*_panic)(noescape(unsafe.Pointer(&p)))
    for {
        d := gp._defer
        if d == nil {
            break
        }
        d._panic = (*_panic)(noescape(unsafe.Pointer(&p)))
        reflectcall(nil, unsafe.Pointer(d.fn), deferArgs(d), uint32(d.siz), uint32(d.siz))
        d._panic = nil
        d.fn = nil
        gp._defer = d.link
        freedefer(d)
        if p.recovered {
            ...
        }
    }
    fatalpanic(gp._panic)
    *(*int)(nil) = 0
}

panic 内部主要流程是这样：

获取当前调用者所在的 g ，也就是 goroutine
遍历并执行 g 中的 defer 函数
如果 defer 函数中有调用 recover ，并发现已经发生了 panic ，则将 panic 标记为 recovered
在遍历 defer 的过程中，如果发现已经被标记为 recovered ，则提取出该 defer 的 sp 与 pc，保存在 g 的两个状态码字段中。
调用 runtime.mcall 切到 m->g0 并跳转到 recovery 函数，将前面获取的 g 作为参数传给 recovery 函数。runtime.mcall 的代码在 go 源码的 src/runtime/asm_xxx.s 中，xxx 是平台类型，如 amd64 。

哪些情况会出现panic

1. 数组越界

func f() {
    defer func() {
        if err := recover(); err != nil {
            fmt.Println(err)
        }
    }()
    var bar = []int{1}
    fmt.Println(bar[1])
}
func main() {
    f()
    fmt.Println("exit")
}

2. 访问未初始化的指针或 nil 指针

import (
    "fmt"
)
func foo(){
    defer func(){
        if err := recover(); err != nil {
            fmt.Println(err)
        }
    }()
    var b *int
    fmt.Println(*b)
}
func main(){
    foo()
    fmt.Println("exit")
}

3. 试图往已经 close 的 `chan` 里发送数据

func foo() {
    defer func() {
        if err := recover(); err != nil {
            fmt.Println(err)
        }
    }()
    var bar = make(chan int, 1)
    close(bar)
    bar <- 1
}
func main() {
    foo()
    fmt.Println("exit")
}

并发读写相同 map

package main
import "fmt"
func foo() {
    defer func() {
        if err := recover(); err != nil {
            fmt.Println(err)
        }
    }()
    var bar = make(map[int]int)
    go func() {
        defer func() {
            if err := recover(); err != nil {
                fmt.Println(err)
            }
        }()
        for {
            bar[1] = 1
        }
    }()
    for {
        bar[1] = 1
    }
}
func main() {
    foo()
    fmt.Println("exit")
}

对于并发读写 map 的地方，应该对 map 加锁

4. 类型断言

package main
import (
    "fmt"
)
func foo(){
    defer func(){
        if err := recover(); err != nil {
            fmt.Println(err)
        }
    }()
    var i interface{} = "abc"
    _ = i.([]string)
}
func main(){
    foo()
    fmt.Println("exit")
}

recover是怎么实现的

func gorecover(argp uintptr) interface{} {
    // Must be in a function running as part of a deferred call during the panic.
    // Must be called from the topmost function of the call
    // (the function used in the defer statement).
    // p.argp is the argument pointer of that topmost deferred function call.
    // Compare against argp reported by caller.
    // If they match, the caller is the one who can recover.
    gp := getg()
    p := gp._panic
    if p != nil && !p.recovered && argp == uintptr(p.argp) {
        p.recovered = true
        return p.arg
    }
    return nil
}

recover会先检查gp(current goroutine)是否在panic流程中，如果不是，直接返回nil。所以在普通流程调用recover除了耗费cpu并不会有什么实际作用。

如果确实当前goroutine在panic中，会设置recovered为true；panic流程中在调用完每个defer以后会检查recovered标记，如果为true则会退出panic流程，恢复正常。

recover函数只是设置了recovered标记，那么gouroutine是怎么从panic返回的呢？

从gopanic函数来看，reflectcall调用完recover之后，会在pc、sp中记录当前defer的pc、sp，然后调用recovery；

func gopanic(e interface{}) {
    gp := getg()
    if gp.m.curg != gp {
        print("panic: ")
        printany(e)
        print("\n")
        throw("panic on system stack")
    }
    if gp.m.mallocing != 0 {
        print("panic: ")
        printany(e)
        print("\n")
        throw("panic during malloc")
    }
    if gp.m.preemptoff != "" {
        print("panic: ")
        printany(e)
        print("\n")
        print("preempt off reason: ")
        print(gp.m.preemptoff)
        print("\n")
        throw("panic during preemptoff")
    }
    if gp.m.locks != 0 {
        print("panic: ")
        printany(e)
        print("\n")
        throw("panic holding locks")
    }
    var p _panic
    p.arg = e
    p.link = gp._panic
    gp._panic = (*_panic)(noescape(unsafe.Pointer(&p)))
    atomic.Xadd(&runningPanicDefers, 1)
    // By calculating getcallerpc/getcallersp here, we avoid scanning the
    // gopanic frame (stack scanning is slow...)
    addOneOpenDeferFrame(gp, getcallerpc(), unsafe.Pointer(getcallersp()))
    for {
        d := gp._defer
        if d == nil {
            break
        }
        // If defer was started by earlier panic or Goexit (and, since we're back here, that triggered a new panic),
        // take defer off list. An earlier panic will not continue running, but we will make sure below that an
        // earlier Goexit does continue running.
        // 如果触发 defer 的 panic 是在前一个 panic 或者 Goexit 的 defer 中触发的，那么将前一个 defer 从列表中去除。前一个 panic 或者 Goexit 将不再继续执行。
        if d.started {
            if d._panic != nil {
                d._panic.aborted = true
            }
            d._panic = nil
            if !d.openDefer {
                // For open-coded defers, we need to process the
                // defer again, in case there are any other defers
                // to call in the frame (not including the defer
                // call that caused the panic).
                d.fn = nil
                gp._defer = d.link
                freedefer(d)
                continue
            }
        }
        // Mark defer as started, but keep on list, so that traceback
        // can find and update the defer's argument frame if stack growth
        // or a garbage collection happens before reflectcall starts executing d.fn.
        d.started = true
        // Record the panic that is running the defer.
        // If there is a new panic during the deferred call, that panic
        // will find d in the list and will mark d._panic (this panic) aborted.
        d._panic = (*_panic)(noescape(unsafe.Pointer(&p)))
        // 将 defer 标记为 started，但是保留在列表上，这样，如果在 reflectcall 开始执行 d.fn 之前发生了堆栈增长或垃圾回收，则 traceback 可以找到并更新 defer 的参数帧。
        done := true
        if d.openDefer {
            done = runOpenDeferFrame(gp, d)
            if done && !d._panic.recovered {
                addOneOpenDeferFrame(gp, 0, nil)
            }
        } else {
            p.argp = unsafe.Pointer(getargp(0))
            reflectcall(nil, unsafe.Pointer(d.fn), deferArgs(d), uint32(d.siz), uint32(d.siz))
        }
        p.argp = nil
        // reflectcall did not panic. Remove d.
        if gp._defer != d {
            throw("bad defer entry in panic")
        }
        d._panic = nil
        // trigger shrinkage to test stack copy. See stack_test.go:TestStackPanic
        //GC()
        pc := d.pc
        sp := unsafe.Pointer(d.sp) // must be pointer so it gets adjusted during stack copy
        if done {
            d.fn = nil
            gp._defer = d.link
            freedefer(d)
        }
        if p.recovered {
            gp._panic = p.link
            if gp._panic != nil && gp._panic.goexit && gp._panic.aborted {
                // A normal recover would bypass/abort the Goexit.  Instead,
                // we return to the processing loop of the Goexit.
                gp.sigcode0 = uintptr(gp._panic.sp)
                gp.sigcode1 = uintptr(gp._panic.pc)
                mcall(recovery)
                throw("bypassed recovery failed") // mcall should not return
            }
            atomic.Xadd(&runningPanicDefers, -1)
            if done {
                // Remove any remaining non-started, open-coded
                // defer entries after a recover, since the
                // corresponding defers will be executed normally
                // (inline). Any such entry will become stale once
                // we run the corresponding defers inline and exit
                // the associated stack frame.
                d := gp._defer
                var prev *_defer
                for d != nil {
                    if d.openDefer {
                        if d.started {
                            // This defer is started but we
                            // are in the middle of a
                            // defer-panic-recover inside of
                            // it, so don't remove it or any
                            // further defer entries
                            break
                        }
                        if prev == nil {
                            gp._defer = d.link
                        } else {
                            prev.link = d.link
                        }
                        newd := d.link
                        freedefer(d)
                        d = newd
                    } else {
                        prev = d
                        d = d.link
                    }
                }
            }
            gp._panic = p.link
            // Aborted panics are marked but remain on the g.panic list.
            // Remove them from the list.
            for gp._panic != nil && gp._panic.aborted {
                gp._panic = gp._panic.link
            }
            if gp._panic == nil { // must be done with signal
                gp.sig = 0
            }
            // Pass information about recovering frame to recovery.
            gp.sigcode0 = uintptr(sp)
            gp.sigcode1 = pc
            mcall(recovery)
            throw("recovery failed") // mcall should not return
        }
    }
    // ran out of deferred calls - old-school panic now
    // Because it is unsafe to call arbitrary user code after freezing
    // the world, we call preprintpanics to invoke all necessary Error
    // and String methods to prepare the panic strings before startpanic.
    preprintpanics(gp._panic)
    fatalpanic(gp._panic) // should not return
    *(*int)(nil) = 0      // not reached
}

在recovery中，设置当前goroutine的sched的sp/pc等，调用gogo切到defer的上下文去。

// Unwind the stack after a deferred function calls recover
// after a panic. Then arrange to continue running as though
// the caller of the deferred function returned normally.
// 在 panic 后，在延迟函数中调用 recover 的时候，将回溯堆栈，并且继续执行，就像延迟函数的调用者正常返回一样。
func recovery(gp *g) {
    // Info about defer passed in G struct.
    sp := gp.sigcode0
    pc := gp.sigcode1
    // d's arguments need to be in the stack.
    if sp != 0 && (sp < gp.stack.lo || gp.stack.hi < sp) {
        print("recover: ", hex(sp), " not in [", hex(gp.stack.lo), ", ", hex(gp.stack.hi), "]\n")
        throw("bad recovery")
    }
    // Make the deferproc for this d return again,
    // this time returning 1. The calling function will
    // jump to the standard return epilogue.
    // 让延迟函数的 deferproc 再次返回，这次返回 1 。调用函数将跳转到标准返回结尾。
    gp.sched.sp = sp
    gp.sched.pc = pc
    gp.sched.lr = 0
    gp.sched.ret = 1
    gogo(&gp.sched)
}

为啥从defer上取的sp/pc能回到defer的返回流程上来？来看看defer

defer

defer是一个面向编译器的声明，他会让编译器做两件事：

编译器会将defer声明编译为runtime.deferproc(fn)，这样运行时，会调用runtime.deferproc，在deferproc中将所有defer挂到goroutine的defer链上；
编译器会在函数return之前（注意，是return之前，而不是return xxx之前，后者不是一条原子指令），增加runtime.deferreturn调用；这样运行时，开始处理前面挂在defer链上的所有defer。

deferreturn

先判断链表有没有defer，然后jmpdefer去做defer声明的事情，但jmpdefer魔幻的地方是它会跳回到deferreturn之前，也就是说，会再次deferreturn一下，如果defer链表还有没处理的defer，那么会再这么循环一把，如果空了，那就return，defer处理结束。

recovery是如何从panic切回到normal流程的。

注意

1.利用recover处理panic指令，defer 必须放在 panic 之前定义，另外 recover 只有在 defer 调用的函数中才有效。否则当panic时，recover无法捕获到panic，无法防止panic扩散。
2.recover 处理异常后，逻辑并不会恢复到 panic 那个点去，函数跑到 defer 之后的那个点。
3.多个 defer 会形成 defer 栈，后定义的 defer 语句会被最先调用。

func recovery(gp *g) {
    // Info about defer passed in G struct.
    sp := gp.sigcode0
    pc := gp.sigcode1
    // d's arguments need to be in the stack.
    if sp != 0 && (sp < gp.stack.lo || gp.stack.hi < sp) {
        print("recover: ", hex(sp), " not in [", hex(gp.stack.lo), ", ", hex(gp.stack.hi), "]\n")
        throw("bad recovery")
    }
    // Make the deferproc for this d return again,
    // this time returning 1. The calling function will
    // jump to the standard return epilogue.
    // 让延迟函数的 deferproc 再次返回，这次返回 1 。调用函数将跳转到标准返回结尾。
    gp.sched.sp = sp
    gp.sched.pc = pc
    gp.sched.lr = 0
    gp.sched.ret = 1
    gogo(&gp.sched)
}

recovery将调用recover函数的defer的pc和sp设置到了当前goroutine的sched上，并且将ret设置为1，然后gogo重新调度。

前面代码可以看到，defer的pc和sp是deferproc下面的代码，也就是下一个defer或者normal处理流程。但这样设置，gogo重新调度不就又回到函数开始的地方了吗？

原来，编译器为defer声明生成的代码，总是会在deferproc后面检查其返回值，如果返回值为0，那么deferproc成功，可以继续处理下一个defer声明或者后面的代码；如果返回值不为0，那么会跳到当前函数的最后，return之前。

也就是说，gogo调度之后，相当于调用了deferproc；由于返回值为1，检查失败，直奔return之前的deferreturn，因此，可以再次进入defer流程。

由于调用recover的defer已经从defer链表上摘掉了，所以可以继续执行之前没完成的defer，并最终返回当前函数的调用者。

panic是怎么退出的

panic退出时会打印调用栈，最终调用exit(-2)退出整个进程。

注意

recover只在defer的函数中有效，如果不是在refer上下文中调用，recover会直接返回nil。

参考

Golang: 深入理解panic and recover
Effective Go
面向信仰编程

panic 和 recover

概述

现象