1 简述
jstack是jdk自带的线程堆栈分析工具,用于生成java虚拟机当前时刻的线程快照。线程快照是当前java虚拟机内每一条线程正在执行的方法堆栈的集合,生成线程快照的主要目的是定位线程出现长时间停顿的原因,如线程间死锁、死循环、请求外部资源导致的长时间等待等。 线程出现停顿的时候通过jstack来查看各个线程的调用堆栈,就可以知道没有响应的线程到底在后台做什么事情,或者等待什么资源。
[webuser@TESTENV hsperfdata_webuser]$ jstack -help
Usage:
jstack [-l] <pid>
(to connect to running process) 连接活动线程
jstack -F [-m] [-l] <pid>
(to connect to a hung process) 连接阻塞线程
jstack [-m] [-l] <executable> <core>
(to connect to a core file) 连接dump的文件
jstack [-m] [-l] [server_id@]<remote server IP or hostname>
(to connect to a remote debug server) 连接远程服务器
Options:
-F to force a thread dump. Use when jstack <pid> does not respond (process is hung)
-m to print both java and native frames (mixed mode)
-l long listing. Prints additional information about locks
-h or -help to print this help message
- -F:当进程挂起了,此时 jstack [-l] pid 是没有相应的,这时候可使用此参数来强制打印堆栈信息,强制jstack,一般情况不需要使用。
- m:打印java和native c/c++框架的所有栈信息。可以打印JVM的堆栈,以及Native的栈帧,一般应用排查不需要使用。
- l: 打印关于锁的附加信息。
2 简单使用示例:
可以先通过jps获得Java进程
然后执行 jstack pid 查看指定进程的当前堆栈情况 ```shell [webuser@-TESTENV hsperfdata_webuser]$ jstack 23479 2021-04-22 21:16:08 Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.161-b12 mixed mode):[webuser@-TESTENV hsperfdata_webuser]$ jps 23457 xx-test.jar 23479 xx-like.jar
“pool-2-thread-1” #45 prio=5 os_prio=0 tid=0x00007fdfc8b53000 nid=0x7a58 waiting on condition [0x00007fdfa7dfc000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at io.netty.util.HashedWheelTimer$Worker.waitForNextTick(HashedWheelTimer.java:577) at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:476) at java.lang.Thread.run(Thread.java:748)
“qtp1383519982-259” #259 prio=5 os_prio=0 tid=0x00007fdfc0445000 nid=0x78c5 waiting on condition [0x00007fdfa1cd0000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000c5afbbf0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:382)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.idleJobPoll(QueuedThreadPool.java:875)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:925)
at java.lang.Thread.run(Thread.java:748)
“qtp1383519982-258” #258 prio=5 os_prio=0 tid=0x00007fdfc0444000 nid=0x78c4 waiting on condition [0x00007fdfa1dd1000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000c5afbbf0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:382)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.idleJobPoll(QueuedThreadPool.java:875)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:925)
at java.lang.Thread.run(Thread.java:748)
“qtp1383519982-257” #257 prio=5 os_prio=0 tid=0x00007fdfb8154800 nid=0x78c3 waiting on condition [0x00007fdfa1ed2000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000c5afbbf0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:382)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.idleJobPoll(QueuedThreadPool.java:875)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:925)
at java.lang.Thread.run(Thread.java:748)
“Attach Listener” #230 daemon prio=9 os_prio=0 tid=0x00007fdfcc00b800 nid=0x75c0 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE
“logback-10” #84 daemon prio=5 os_prio=0 tid=0x00007fdfe0014800 nid=0x5dca waiting on condition [0x00007fdfa22d4000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000c515fb90> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
截出了部分打印内容
也可以将指定进程的当前堆栈情况记录到某个文件中:
```shell
jstack 23479 > /data/logs/jstack_info.txt
3 伪案例分析
在几乎没有什么调用的情况下,服务器cpu使用率一直居高不下,非常有可能时程序中的某个线程出现了死循环。cpu占用率过高,除了死循环,常见的可能还有 内存泄漏,导致大量full GC、主机cpu超卖。
3.1 死循环
运行下述代码,死循环,造成CPU占用较高的情况
import java.util.concurrent.Executor;
import java.util.concurrent.Executors;
/**
* Created by haoll on 2021/4/22
*/
public class Test {
private static Executor executor = Executors.newFixedThreadPool(2);
public static Object block = new Object();
public static void testMethod1(){
Task task1 = new Task();
executor.execute(task1);
}
static class Task implements Runnable{
@Override
public void run() {
synchronized (block){
add();
}
}
public void add(){
int i = 0;
while (true) {
i++;
}
}
}
}
排查开始~
第一步:使用top指令,定位CPU占用较高的进程
[webuser@-TESTENV middleplatform]$ top
top - 21:32:13 up 194 days, 17:42, 1 user, load average: 1.04, 1.91, 2.12
Tasks: 208 total, 2 running, 206 sleeping, 0 stopped, 0 zombie
%Cpu(s): 53.3 us, 1.3 sy, 0.0 ni, 45.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 3883016 total, 432172 free, 2665676 used, 785168 buff/cache
KiB Swap: 4063228 total, 3602380 free, 460848 used. 989076 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
32687 webuser 20 0 3627128 604488 13824 S 103.3 15.6 7:13.00 java
32665 webuser 20 0 3631304 621724 13896 S 4.0 16.0 0:47.94 java
12 root 20 0 0 0 0 R 0.3 0.0 139:51.84 rcuos/0
871 root 20 0 0 0 0 S 0.3 0.0 0:00.16 kworker/1:1
10413 root 20 0 3575824 25420 772 S 0.3 0.7 387:11.80 java
32673 webuser 20 0 3670260 583704 13892 S 0.3 15.0 0:45.29 java
1 root 20 0 123508 2564 1392 S 0.0 0.1 72:25.10 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:07.44 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 10:36.92 ksoftirqd/0
5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
7 root rt 0 0 0 0 S 0.0 0.0 0:11.37 migration/0
8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh
9 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcuob/0
10 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcuob/1
从上面的结果我们就可以看出来PID=32687的cpu占有率高的离谱<br />**第二步:使用top -Hp ${进程id}指令,查看指定进程下各个线程的cpu使用情况**
[webuser@CSDN-TESTENV middleplatform-interaction]$ top -Hp 32687
top - 21:34:36 up 194 days, 17:44, 1 user, load average: 2.23, 1.95, 2.10
Threads: 66 total, 3 running, 63 sleeping, 0 stopped, 0 zombie
%Cpu(s): 53.5 us, 1.9 sy, 0.0 ni, 44.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 3883016 total, 429100 free, 2667968 used, 785948 buff/cache
KiB Swap: 4063228 total, 3602576 free, 460652 used. 986784 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
582 webuser 20 0 3627128 605856 13828 R 99.7 15.6 8:36.79 java
574 webuser 20 0 3627128 605856 13828 R 2.3 15.6 0:10.19 java
32720 webuser 20 0 3627128 605856 13828 S 1.0 15.6 0:22.43 java
572 webuser 20 0 3627128 605856 13828 S 0.7 15.6 0:02.07 java
576 webuser 20 0 3627128 605856 13828 S 0.3 15.6 0:00.95 java
611 webuser 20 0 3627128 605856 13828 S 0.3 15.6 0:00.27 java
630 webuser 20 0 3627128 605856 13828 S 0.3 15.6 0:00.69 java
632 webuser 20 0 3627128 605856 13828 S 0.3 15.6 0:00.76 java
633 webuser 20 0 3627128 605856 13828 R 0.3 15.6 0:00.69 java
32687 webuser 20 0 3627128 605856 13828 S 0.0 15.6 0:00.00 java
32696 webuser 20 0 3627128 605856 13828 S 0.0 15.6 0:14.69 java
32702 webuser 20 0 3627128 605856 13828 S 0.0 15.6 0:00.41 java
32706 webuser 20 0 3627128 605856 13828 S 0.0 15.6 0:00.38 java
32708 webuser 20 0 3627128 605856 13828 S 0.0 15.6 0:00.54 java
32710 webuser 20 0 3627128 605856 13828 S 0.0 15.6 0:00.00 java
32711 webuser 20 0 3627128 605856 13828 S 0.0 15.6 0:00.01 java
32716 webuser 20 0 3627128 605856 13828 S 0.0 15.6 0:00.00 java
32724 webuser 20 0 3627128 605856 13828 S 0.0 15.6 0:05.37 java
32728 webuser 20 0 3627128 605856 13828 S 0.0 15.6 0:00.04 java
32729 webuser 20 0 3627128 605856 13828 S 0.0 15.6 0:00.30 java sh
发现线程ID为582的cpu占用率为99.7%,继续分析~<br />注:PID虽然名为进程控制符,但其用途不限于特指进程id。如上中的PID列指的就是线程的id。<br />**第三步:使用printf "%x" xxx指令,将582转换为16进制。**
[webuser@-TESTENV middleplatform]$ printf "%x" 582
246
第四步:使用jstack pid指令,查看当前的堆栈信息;并根据上一步得到的16进制的线程id,找到肇事线程。如果线程较多的话,我们可以使用grep指定来根据关键字定位行,如:jstack 32687 | grep -10 246:显示【|】前面的指令打印出的信息中的,含有关键字【246】的那一行以及其前后各10行的数据
[webuser@-TESTENV middleplatform]$ jstack 32687 | grep -10 246
- parking to wait for <0x00000000ff5d1ee0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"pool-3-thread-1" #64 prio=5 os_prio=0 tid=0x00007fec6094d800 nid=0x246 runnable [0x00007fec2b6ce000]
java.lang.Thread.State: RUNNABLE
at net.csdn.interaction.like.controller.Test$Task.add(Test.java:37)
at net.csdn.interaction.like.controller.Test$Task.run(Test.java:30)
- locked <0x00000000ff655cb0> (a java.lang.Object)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"sentinel-metric-aggregator-task-thread-1" #63 daemon prio=5 os_prio=0 tid=0x00007fec60822000 nid=0x244 waiting on condition [0x00007fec2b9cf000]
java.lang.Thread.State: TIMED_WAITING (parking)
--
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.idleJobPoll(QueuedThreadPool.java:875)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:925)
at java.lang.Thread.run(Thread.java:748)
第五步:分析肇事线程堆栈信息。
- pool-3-thread-1: 线程名。
- prio=5: 线程优先级。
- nid=0x246: 线程十六进制id。
- java.lang.Thread.State: RUNNABLE 线程状态。
- at net.csdn.interaction.like.controller.Test$Task.add(Test.java:37)线程当前所处方法。
第六步:我们已经定义到了问题代码所在的位置,这个时候去查看、解决问题即可。
3.2 死锁
import lombok.SneakyThrows;
import java.util.concurrent.Executor;
import java.util.concurrent.Executors;
/**
* Created by haoll on 2021/4/23
*/
public class DeadlockDemo {
private static Executor executor = Executors.newFixedThreadPool(2);
private static final Object sync1 = new Object();
private static final Object sync2 = new Object();
static class MyRunnableOne implements Runnable{
@SneakyThrows
@Override
public void run() {
synchronized (sync1){
System.out.println("one线程~获得到锁1~");
Thread.sleep(1000);
synchronized (sync2){
System.out.println("one线程~获取到锁2~");
}
}
}
}
static class MyRunnableTwo implements Runnable{
@SneakyThrows
@Override
public void run() {
synchronized (sync2){
System.out.println("two线程~获得到锁2~");
Thread.sleep(1000);
synchronized (sync1){
System.out.println("two线程~获得到锁1~");
}
}
}
}
public static void deadLock() {
MyRunnableOne myRunnableOne = new MyRunnableOne();
MyRunnableTwo myRunnableTwo = new MyRunnableTwo();
executor.execute(myRunnableOne);
executor.execute(myRunnableTwo);
}
}
上述代码是伪代码,应该不会有人这样去写代码~,一般线上出现死锁的时候,都会造成线上应用卡顿,但卡顿不一定就是死锁造成,要具体问题具体分析。
排查开始~
第一步:使用jps指令,查看所有java进程。
第二步:使用jstack pid指令,查看指定进程的堆栈信息,观察并定位到死锁线程
[webuser@-TESTENV interaction-like]$ jstack 12484 | grep -10 BLOCKED
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"pool-3-thread-2" #67 prio=5 os_prio=0 tid=0x00007fb7985c4800 nid=0x3210 waiting for monitor entry [0x00007fb75c4e3000]
java.lang.Thread.State: BLOCKED (on object monitor)
at net.csdn.interaction.like.controller.DeadlockDemo$MyRunnableTwo.run(DeadlockDemo.java:46)
- waiting to lock <0x00000000c6e7e6e0> (a java.lang.Object)
- locked <0x00000000c6e7e6f0> (a java.lang.Object)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"pool-3-thread-1" #66 prio=5 os_prio=0 tid=0x00007fb7985c3800 nid=0x320f waiting for monitor entry [0x00007fb75c5e4000]
java.lang.Thread.State: BLOCKED (on object monitor)
at net.csdn.interaction.like.controller.DeadlockDemo$MyRunnableOne.run(DeadlockDemo.java:29)
- waiting to lock <0x00000000c6e7e6f0> (a java.lang.Object)
- locked <0x00000000c6e7e6e0> (a java.lang.Object)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"sentinel-metric-aggregator-task-thread-1" #65 daemon prio=5 os_prio=0 tid=0x00007fb7982df800 nid=0x320d waiting on condition [0x00007fb75c8e5000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
这样发生死锁的线程就定位到了。我们可以通过:设置获取锁的超时时间、死锁检测等手段预防死锁发生。但是如果一旦发生了死锁就没法解开了,只能停掉程序,修复bug后再启动服务了。