概述
panic 能够改变程序的控制流,函数调用panic时会立刻停止执行函数的其他代码,并在执行结束后在当前Goroutine中递归执行调用方的延迟函数调用defer;
- recover可以中止panic造成的程序崩溃。它是一个只能在defer中发挥作用的函数,在其他作用域中调用不会发挥任何作用;
当前执行的goroutine中、有一个defer链表的头指针、其实也有一个panic链表头指针。panic链表链起来的是一个一个_panic结构体,和defer链表一样。发生新的panic时,也是在链表头上插入新的_panic结构体,所以链表头上的panic就是当前正在执行的那一个。
现象
- panic 只会触发当前 Goroutine 的延迟defer函数调用;
- recover 只有在 defer 函数中调用才会生效;
- panic 允许在 defer 中嵌套多次调用;
跨协程失效
func main() {
defer println("in main")
go func() {
defer println("in goroutine")
panic("")
}()
time.Sleep(1 * time.Second)
}
$ go run main.go
in goroutine
panic:
...
defer 关键字对应的runtime.deferproc会将延迟调用函数与调用方所在 Goroutine 进行关联。所以当程序发生崩溃时只会调用当前 Goroutine 的延迟调用函数也是非常合理的。
失效的崩溃恢复
func main() {
defer fmt.Println("in main")
if err := recover(); err != nil {
fmt.Println(err)
}
panic("unknown err")
}
$ go run main.go
in main
panic: unknown err
goroutine 1 [running]:
main.main()
...
exit status 2
panic 关键字在 Go 语言的源代码是由数据结构 runtime._panic
表示的。每当我们调用 panic 都会创建一个如下所示的数据结构存储相关信息:
type _panic struct {
argp unsafe.Pointer
arg interface{}
link *_panic
recovered bool
aborted bool
pc uintptr
sp unsafe.Pointer
goexit bool
}
- argp 是指向 defer 调用时参数的指针;
- arg 是调用 panic 时传入的参数;
- link 指向了更早调用的
runtime._panic
结构; - recovered 表示当前
runtime._panic
是否被 recover 恢复; - aborted 表示当前的 panic 是否被强行终止;
程序崩溃
panic 函数是如何终止程序的。编译器会将关键字 panic 转换成 runtime.gopanic
,该函数的执行过程包含以下几个步骤:
- 创建新的
runtime._panic
结构并添加到所在 Goroutine _panic 链表的最前面; - 在循环中不断从当前 Goroutine 的 _defer 中链表获取
runtime._defer
并调用runtime.reflectcall
运行延迟调用函数; - 调用
runtime.fatalpanic
中止整个程序;
func gopanic(e interface{}) {
gp := getg()
...
var p _panic
p.arg = e
p.link = gp._panic
gp._panic = (*_panic)(noescape(unsafe.Pointer(&p)))
for {
d := gp._defer
if d == nil {
break
}
d._panic = (*_panic)(noescape(unsafe.Pointer(&p)))
reflectcall(nil, unsafe.Pointer(d.fn), deferArgs(d), uint32(d.siz), uint32(d.siz))
d._panic = nil
d.fn = nil
gp._defer = d.link
freedefer(d)
if p.recovered {
...
}
}
fatalpanic(gp._panic)
*(*int)(nil) = 0
}
panic
内部主要流程是这样:
- 获取当前调用者所在的
g
,也就是goroutine
- 遍历并执行
g
中的defer
函数 - 如果
defer
函数中有调用recover
,并发现已经发生了panic
,则将panic
标记为recovered
- 在遍历
defer
的过程中,如果发现已经被标记为recovered
,则提取出该defer
的 sp 与 pc,保存在g
的两个状态码字段中。 - 调用
runtime.mcall
切到m->g0
并跳转到recovery
函数,将前面获取的g
作为参数传给recovery
函数。runtime.mcall
的代码在 go 源码的src/runtime/asm_xxx.s
中,xxx
是平台类型,如amd64
。
哪些情况会出现panic
1. 数组越界
func f() {
defer func() {
if err := recover(); err != nil {
fmt.Println(err)
}
}()
var bar = []int{1}
fmt.Println(bar[1])
}
func main() {
f()
fmt.Println("exit")
}
2. 访问未初始化的指针或 nil 指针
import (
"fmt"
)
func foo(){
defer func(){
if err := recover(); err != nil {
fmt.Println(err)
}
}()
var b *int
fmt.Println(*b)
}
func main(){
foo()
fmt.Println("exit")
}
3. 试图往已经 close 的 chan
里发送数据
func foo() {
defer func() {
if err := recover(); err != nil {
fmt.Println(err)
}
}()
var bar = make(chan int, 1)
close(bar)
bar <- 1
}
func main() {
foo()
fmt.Println("exit")
}
并发读写相同 map
package main
import "fmt"
func foo() {
defer func() {
if err := recover(); err != nil {
fmt.Println(err)
}
}()
var bar = make(map[int]int)
go func() {
defer func() {
if err := recover(); err != nil {
fmt.Println(err)
}
}()
for {
bar[1] = 1
}
}()
for {
bar[1] = 1
}
}
func main() {
foo()
fmt.Println("exit")
}
对于并发读写 map 的地方,应该对 map 加锁
4. 类型断言
package main
import (
"fmt"
)
func foo(){
defer func(){
if err := recover(); err != nil {
fmt.Println(err)
}
}()
var i interface{} = "abc"
_ = i.([]string)
}
func main(){
foo()
fmt.Println("exit")
}
recover是怎么实现的
func gorecover(argp uintptr) interface{} {
// Must be in a function running as part of a deferred call during the panic.
// Must be called from the topmost function of the call
// (the function used in the defer statement).
// p.argp is the argument pointer of that topmost deferred function call.
// Compare against argp reported by caller.
// If they match, the caller is the one who can recover.
gp := getg()
p := gp._panic
if p != nil && !p.recovered && argp == uintptr(p.argp) {
p.recovered = true
return p.arg
}
return nil
}
recover会先检查gp(current goroutine)是否在panic流程中,如果不是,直接返回nil。所以在普通流程调用recover除了耗费cpu并不会有什么实际作用。
如果确实当前goroutine在panic中,会设置recovered为true;panic流程中在调用完每个defer以后会检查recovered标记,如果为true则会退出panic流程,恢复正常。
recover函数只是设置了recovered标记,那么gouroutine是怎么从panic返回的呢?
从gopanic函数来看,reflectcall调用完recover之后,会在pc、sp中记录当前defer的pc、sp,然后调用recovery;
func gopanic(e interface{}) {
gp := getg()
if gp.m.curg != gp {
print("panic: ")
printany(e)
print("\n")
throw("panic on system stack")
}
if gp.m.mallocing != 0 {
print("panic: ")
printany(e)
print("\n")
throw("panic during malloc")
}
if gp.m.preemptoff != "" {
print("panic: ")
printany(e)
print("\n")
print("preempt off reason: ")
print(gp.m.preemptoff)
print("\n")
throw("panic during preemptoff")
}
if gp.m.locks != 0 {
print("panic: ")
printany(e)
print("\n")
throw("panic holding locks")
}
var p _panic
p.arg = e
p.link = gp._panic
gp._panic = (*_panic)(noescape(unsafe.Pointer(&p)))
atomic.Xadd(&runningPanicDefers, 1)
// By calculating getcallerpc/getcallersp here, we avoid scanning the
// gopanic frame (stack scanning is slow...)
addOneOpenDeferFrame(gp, getcallerpc(), unsafe.Pointer(getcallersp()))
for {
d := gp._defer
if d == nil {
break
}
// If defer was started by earlier panic or Goexit (and, since we're back here, that triggered a new panic),
// take defer off list. An earlier panic will not continue running, but we will make sure below that an
// earlier Goexit does continue running.
// 如果触发 defer 的 panic 是在前一个 panic 或者 Goexit 的 defer 中触发的,那么将前一个 defer 从列表中去除。前一个 panic 或者 Goexit 将不再继续执行。
if d.started {
if d._panic != nil {
d._panic.aborted = true
}
d._panic = nil
if !d.openDefer {
// For open-coded defers, we need to process the
// defer again, in case there are any other defers
// to call in the frame (not including the defer
// call that caused the panic).
d.fn = nil
gp._defer = d.link
freedefer(d)
continue
}
}
// Mark defer as started, but keep on list, so that traceback
// can find and update the defer's argument frame if stack growth
// or a garbage collection happens before reflectcall starts executing d.fn.
d.started = true
// Record the panic that is running the defer.
// If there is a new panic during the deferred call, that panic
// will find d in the list and will mark d._panic (this panic) aborted.
d._panic = (*_panic)(noescape(unsafe.Pointer(&p)))
// 将 defer 标记为 started,但是保留在列表上,这样,如果在 reflectcall 开始执行 d.fn 之前发生了堆栈增长或垃圾回收,则 traceback 可以找到并更新 defer 的参数帧。
done := true
if d.openDefer {
done = runOpenDeferFrame(gp, d)
if done && !d._panic.recovered {
addOneOpenDeferFrame(gp, 0, nil)
}
} else {
p.argp = unsafe.Pointer(getargp(0))
reflectcall(nil, unsafe.Pointer(d.fn), deferArgs(d), uint32(d.siz), uint32(d.siz))
}
p.argp = nil
// reflectcall did not panic. Remove d.
if gp._defer != d {
throw("bad defer entry in panic")
}
d._panic = nil
// trigger shrinkage to test stack copy. See stack_test.go:TestStackPanic
//GC()
pc := d.pc
sp := unsafe.Pointer(d.sp) // must be pointer so it gets adjusted during stack copy
if done {
d.fn = nil
gp._defer = d.link
freedefer(d)
}
if p.recovered {
gp._panic = p.link
if gp._panic != nil && gp._panic.goexit && gp._panic.aborted {
// A normal recover would bypass/abort the Goexit. Instead,
// we return to the processing loop of the Goexit.
gp.sigcode0 = uintptr(gp._panic.sp)
gp.sigcode1 = uintptr(gp._panic.pc)
mcall(recovery)
throw("bypassed recovery failed") // mcall should not return
}
atomic.Xadd(&runningPanicDefers, -1)
if done {
// Remove any remaining non-started, open-coded
// defer entries after a recover, since the
// corresponding defers will be executed normally
// (inline). Any such entry will become stale once
// we run the corresponding defers inline and exit
// the associated stack frame.
d := gp._defer
var prev *_defer
for d != nil {
if d.openDefer {
if d.started {
// This defer is started but we
// are in the middle of a
// defer-panic-recover inside of
// it, so don't remove it or any
// further defer entries
break
}
if prev == nil {
gp._defer = d.link
} else {
prev.link = d.link
}
newd := d.link
freedefer(d)
d = newd
} else {
prev = d
d = d.link
}
}
}
gp._panic = p.link
// Aborted panics are marked but remain on the g.panic list.
// Remove them from the list.
for gp._panic != nil && gp._panic.aborted {
gp._panic = gp._panic.link
}
if gp._panic == nil { // must be done with signal
gp.sig = 0
}
// Pass information about recovering frame to recovery.
gp.sigcode0 = uintptr(sp)
gp.sigcode1 = pc
mcall(recovery)
throw("recovery failed") // mcall should not return
}
}
// ran out of deferred calls - old-school panic now
// Because it is unsafe to call arbitrary user code after freezing
// the world, we call preprintpanics to invoke all necessary Error
// and String methods to prepare the panic strings before startpanic.
preprintpanics(gp._panic)
fatalpanic(gp._panic) // should not return
*(*int)(nil) = 0 // not reached
}
在recovery中,设置当前goroutine的sched的sp/pc等,调用gogo切到defer的上下文去。
// Unwind the stack after a deferred function calls recover
// after a panic. Then arrange to continue running as though
// the caller of the deferred function returned normally.
// 在 panic 后,在延迟函数中调用 recover 的时候,将回溯堆栈,并且继续执行,就像延迟函数的调用者正常返回一样。
func recovery(gp *g) {
// Info about defer passed in G struct.
sp := gp.sigcode0
pc := gp.sigcode1
// d's arguments need to be in the stack.
if sp != 0 && (sp < gp.stack.lo || gp.stack.hi < sp) {
print("recover: ", hex(sp), " not in [", hex(gp.stack.lo), ", ", hex(gp.stack.hi), "]\n")
throw("bad recovery")
}
// Make the deferproc for this d return again,
// this time returning 1. The calling function will
// jump to the standard return epilogue.
// 让延迟函数的 deferproc 再次返回,这次返回 1 。调用函数将跳转到标准返回结尾。
gp.sched.sp = sp
gp.sched.pc = pc
gp.sched.lr = 0
gp.sched.ret = 1
gogo(&gp.sched)
}
为啥从defer上取的sp/pc能回到defer的返回流程上来?来看看defer
defer
defer是一个面向编译器的声明,他会让编译器做两件事:
- 编译器会将defer声明编译为runtime.deferproc(fn),这样运行时,会调用runtime.deferproc,在deferproc中将所有defer挂到goroutine的defer链上;
- 编译器会在函数return之前(注意,是return之前,而不是return xxx之前,后者不是一条原子指令),增加runtime.deferreturn调用;这样运行时,开始处理前面挂在defer链上的所有defer。
deferreturn
先判断链表有没有defer,然后jmpdefer去做defer声明的事情,但jmpdefer魔幻的地方是它会跳回到deferreturn之前,也就是说,会再次deferreturn一下,如果defer链表还有没处理的defer,那么会再这么循环一把,如果空了,那就return,defer处理结束。
recovery是如何从panic切回到normal流程的。
注意
1.利用recover处理panic指令,defer 必须放在 panic 之前定义,另外 recover 只有在 defer 调用的函数中才有效。否则当panic时,recover无法捕获到panic,无法防止panic扩散。
2.recover 处理异常后,逻辑并不会恢复到 panic 那个点去,函数跑到 defer 之后的那个点。
3.多个 defer 会形成 defer 栈,后定义的 defer 语句会被最先调用。
func recovery(gp *g) {
// Info about defer passed in G struct.
sp := gp.sigcode0
pc := gp.sigcode1
// d's arguments need to be in the stack.
if sp != 0 && (sp < gp.stack.lo || gp.stack.hi < sp) {
print("recover: ", hex(sp), " not in [", hex(gp.stack.lo), ", ", hex(gp.stack.hi), "]\n")
throw("bad recovery")
}
// Make the deferproc for this d return again,
// this time returning 1. The calling function will
// jump to the standard return epilogue.
// 让延迟函数的 deferproc 再次返回,这次返回 1 。调用函数将跳转到标准返回结尾。
gp.sched.sp = sp
gp.sched.pc = pc
gp.sched.lr = 0
gp.sched.ret = 1
gogo(&gp.sched)
}
recovery将调用recover函数的defer的pc和sp设置到了当前goroutine的sched上,并且将ret设置为1,然后gogo重新调度。
前面代码可以看到,defer的pc和sp是deferproc下面的代码,也就是下一个defer或者normal处理流程。但这样设置,gogo重新调度不就又回到函数开始的地方了吗?
原来,编译器为defer声明生成的代码,总是会在deferproc后面检查其返回值,如果返回值为0,那么deferproc成功,可以继续处理下一个defer声明或者后面的代码;如果返回值不为0,那么会跳到当前函数的最后,return之前。
也就是说,gogo调度之后,相当于调用了deferproc;由于返回值为1,检查失败,直奔return之前的deferreturn,因此,可以再次进入defer流程。
由于调用recover的defer已经从defer链表上摘掉了,所以可以继续执行之前没完成的defer,并最终返回当前函数的调用者。
panic是怎么退出的
panic退出时会打印调用栈,最终调用exit(-2)退出整个进程。
注意
recover只在defer的函数中有效,如果不是在refer上下文中调用,recover会直接返回nil。
参考