常见的引起Java 线程卡死的问题

有各种问题会引起 Java 线程卡死, 导致应用程序最终不能正常服务. 通常加上合适的 timeout 时间会使问题缓解. 这里列举一些例子.

  1. 死锁
    这种问题很常见, 如果都是 Synchronizer 锁引起的, 基本通过查看 thread dump 就能找出来. 如果是 Synchronizer 锁和 AQS 的锁, 或者是和外部资源一起死锁,比如和外部数据库, 这种就不那么明显.

  2. 关键资源泄漏
    关键资源的泄漏, 导致后面需要这些资源的线程只能傻等在门口. 一般在等的门口加一些 timeout, 或许会短期有所缓解, 但是如果这个是必经之路, 最后导致业务无法继续.

    1. 连接池泄漏
      经常看到有些连接池在使用完之后没有还回去, 或者各种 error 没有捕获, 导致资源不能还回去. 当然有一些框架可以通过检测 reference 情况, 自动回收这些资源; 比如:
"DefaultThreadPool-42" daemon prio=10 tid=0x00007fd3fc339000 nid=0x1776 waiting on condition [0x00007fd3c41c4000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x0000000783800070> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
    at org.apache.http.pool.PoolEntryFuture.await(PoolEntryFuture.java:133)
    at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(AbstractConnPool.java:282)
    at org.apache.http.pool.AbstractConnPool.access$000(AbstractConnPool.java:64)
    at org.apache.http.pool.AbstractConnPool$2.getPoolEntry(AbstractConnPool.java:177)
    at org.apache.http.pool.AbstractConnPool$2.getPoolEntry(AbstractConnPool.java:170)
    at org.apache.http.pool.PoolEntryFuture.get(PoolEntryFuture.java:102)
    at org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnection(PoolingClientConnectionManager.java:208)
    at org.apache.http.impl.conn.PoolingClientConnectionManager$1.getConnection(PoolingClientConnectionManager.java:195)
  1. AQS 资源泄漏
    Java 里面很多并发控制手段都是通过 AQS(Abstract Queue Sychronizer) 实现的, 如果有些某些 AQS 的资源没有被及时归还, 就会导致傻等. 傻等的线程栈类似下面:
"DefaultThreadPool-2" daemon prio=10 tid=0x00007efd841c9800 nid=0x4b25 waiting on condition [0x00007efcfd6d3000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000007affa8a38> (a java.util.concurrent.CountDownLatch$Sync)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
    at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:236)
  1. TCP 连接 block
    Java 里面很多还是使用 BIO, 如果没有设置 read timeout, 有时候会进入无限等待中. 线程栈如下:
"DefaultThreadPool-52" daemon prio=10 tid=0x00007f18a0015000 nid=0x7739 runnable [0x00007f1822a7b000]
   java.lang.Thread.State: RUNNABLE
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(SocketInputStream.java:153)
    at java.net.SocketInputStream.read(SocketInputStream.java:122)
    at com.sun.mail.util.TraceInputStream.read(TraceInputStream.java:110)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
    - locked <0x00000007ab10de98> (a java.io.BufferedInputStream)
    at com.sun.mail.util.LineInputStream.readLine(LineInputStream.java:89)

标签: none

添加新评论