MySQL Druid
一些老系统 MySQL 驱动版本较低,存在一些危险漏洞,于是把一些老应用的 MySQL 驱动程序升级到了 8.0。然而升级后,在并发量较高时,查看监控打点,Druid 连接池拿到连接并执行 SQL 的时间大部分都超过 200ms。
本文详细的分析了这次“破案”的全过程。
对系统进行压测,发现出现大量线程阻塞的情况,线程 dump 信息如下:

  1. "http-nio-5366-exec-48" #210 daemon prio=5 os_prio=0 tid=0x00000000023d0800 nid=0x3be9 waiting for monitor entry [0x00007fa4c1400000]
  2. java.lang.Thread.State: BLOCKED (on object monitor)
  3. at org.springframework.boot.web.embedded.tomcat.TomcatEmbeddedWebappClassLoader.loadClass(TomcatEmbeddedWebappClassLoader.java:66)
  4. - waiting to lock <0x0000000775af0960> (a java.lang.Object)
  5. at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1186)
  6. at com.alibaba.druid.util.Utils.loadClass(Utils.java:220)
  7. at com.alibaba.druid.util.MySqlUtils.getLastPacketReceivedTimeMs(MySqlUtils.java:372)

通过上面的信息得知,有线程 BLOCKED 了,BLOCKED 的位置在 com.alibaba.druid.util.Utils.loadClass(Utils.java:220),于是重点的查看这一块的代码,最终发现了问题。

2021-06-16-09-45-44-158938.png


根本原因分析

  1. public class MySqlUtils {
  2. public static long getLastPacketReceivedTimeMs(Connection conn) throws SQLException {
  3. if (class_connectionImpl == null && !class_connectionImpl_Error) {
  4. try {
  5. class_connectionImpl = Utils.loadClass("com.mysql.jdbc.MySQLConnection");
  6. } catch (Throwable error){
  7. class_connectionImpl_Error = true;
  8. }
  9. }
  10. if (class_connectionImpl == null) {
  11. return -1;
  12. }
  13. if (method_getIO == null && !method_getIO_error) {
  14. try {
  15. method_getIO = class_connectionImpl.getMethod("getIO");
  16. } catch (Throwable error){
  17. method_getIO_error = true;
  18. }
  19. }
  20. if (method_getIO == null) {
  21. return -1;
  22. }
  23. if (class_MysqlIO == null && !class_MysqlIO_Error) {
  24. try {
  25. class_MysqlIO = Utils.loadClass("com.mysql.jdbc.MysqlIO");
  26. } catch (Throwable error){
  27. class_MysqlIO_Error = true;
  28. }
  29. }
  30. if (class_MysqlIO == null) {
  31. return -1;
  32. }
  33. if (method_getLastPacketReceivedTimeMs == null && !method_getLastPacketReceivedTimeMs_error) {
  34. try {
  35. Method method = class_MysqlIO.getDeclaredMethod("getLastPacketReceivedTimeMs");
  36. method.setAccessible(true);
  37. method_getLastPacketReceivedTimeMs = method;
  38. } catch (Throwable error){
  39. method_getLastPacketReceivedTimeMs_error = true;
  40. }
  41. }
  42. if (method_getLastPacketReceivedTimeMs == null) {
  43. return -1;
  44. }
  45. try {
  46. Object connImpl = conn.unwrap(class_connectionImpl);
  47. if (connImpl == null) {
  48. return -1;
  49. }
  50. Object mysqlio = method_getIO.invoke(connImpl);
  51. Long ms = (Long) method_getLastPacketReceivedTimeMs.invoke(mysqlio);
  52. return ms.longValue();
  53. } catch (IllegalArgumentException e) {
  54. throw new SQLException("getLastPacketReceivedTimeMs error", e);
  55. } catch (IllegalAccessException e) {
  56. throw new SQLException("getLastPacketReceivedTimeMs error", e);
  57. } catch (InvocationTargetException e) {
  58. throw new SQLException("getLastPacketReceivedTimeMs error", e);
  59. }
  60. }
  61. }

MySqlUtils 中的getLastPacketReceivedTimeMs()方法会加载com.mysql.jdbc.MySQLConnection这个类,但在 MySQL 驱动 8.0 中类名改为com.mysql.cj.jdbc.ConnectionImpl,所以 MySQL 驱动8.0 中加载不到com.mysql.jdbc.MySQLConnection
getLastPacketReceivedTimeMs()方法实现中,如果Utils.loadClass("com.mysql.jdbc.MySQLConnection")加载不到类并抛出异常,会修改变量 class_connectionImpl_Error,下次调用不会再进行加载

  1. public class Utils {
  2. public static Class<?> loadClass(String className) {
  3. Class<?> clazz = null;
  4. if (className == null) {
  5. return null;
  6. }
  7. try {
  8. return Class.forName(className);
  9. } catch (ClassNotFoundException e) {
  10. // skip
  11. }
  12. ClassLoader ctxClassLoader = Thread.currentThread().getContextClassLoader();
  13. if (ctxClassLoader != null) {
  14. try {
  15. clazz = ctxClassLoader.loadClass(className);
  16. } catch (ClassNotFoundException e) {
  17. // skip
  18. }
  19. }
  20. return clazz;
  21. }
  22. }

但是,在 Utils 的loadClass()方法中同样 catch了ClassNotFoundException,这就导致loadClass()在加载不到类的时候,并不会抛出异常,从而会导致每调用一次getLastPacketReceivedTimeMs()方法,就会加载一次 MySQLConnection这个类
线程 dump 信息中可以看到是在调用 TomcatEmbeddedWebappClassLoaderloadClass()方法时,导致线程阻塞的。

  1. public class TomcatEmbeddedWebappClassLoader extends ParallelWebappClassLoader {
  2. public Class<?> loadClass(String name, boolean resolve) throws ClassNotFoundException {
  3. synchronized (JreCompat.isGraalAvailable() ? this : getClassLoadingLock(name)) {
  4. Class<?> result = findExistingLoadedClass(name);
  5. result = (result != null) ? result : doLoadClass(name);
  6. if (result == null) {
  7. throw new ClassNotFoundException(name);
  8. }
  9. return resolveIfNecessary(result, resolve);
  10. }
  11. }
  12. }

这是因为 TomcatEmbeddedWebappClassLoader 在加载类的时候,会加synchronized锁,这就导致每调用一次getLastPacketReceivedTimeMs()方法,就会加载一次com.mysql.jdbc.MySQLConnection,而又始终加载不到,在加载类的时候会加 synchronized 锁,所以会出现线程阻塞,性能下降的现象。

getLastPacketReceivedTimeMs()方法调用时机

  1. public abstract class DruidAbstractDataSource extends WrapperAdapter implements DruidAbstractDataSourceMBean, DataSource, DataSourceProxy, Serializable {
  2. protected boolean testConnectionInternal(DruidConnectionHolder holder, Connection conn) {
  3. String sqlFile = JdbcSqlStat.getContextSqlFile();
  4. String sqlName = JdbcSqlStat.getContextSqlName();
  5. if (sqlFile != null) {
  6. JdbcSqlStat.setContextSqlFile(null);
  7. }
  8. if (sqlName != null) {
  9. JdbcSqlStat.setContextSqlName(null);
  10. }
  11. try {
  12. if (validConnectionChecker != null) {
  13. boolean valid = validConnectionChecker.isValidConnection(conn, validationQuery, validationQueryTimeout);
  14. long currentTimeMillis = System.currentTimeMillis();
  15. if (holder != null) {
  16. holder.lastValidTimeMillis = currentTimeMillis;
  17. holder.lastExecTimeMillis = currentTimeMillis;
  18. }
  19. if (valid && isMySql) { // unexcepted branch
  20. long lastPacketReceivedTimeMs = MySqlUtils.getLastPacketReceivedTimeMs(conn);
  21. if (lastPacketReceivedTimeMs > 0) {
  22. long mysqlIdleMillis = currentTimeMillis - lastPacketReceivedTimeMs;
  23. if (lastPacketReceivedTimeMs > 0 //
  24. && mysqlIdleMillis >= timeBetweenEvictionRunsMillis) {
  25. discardConnection(holder);
  26. String errorMsg = "discard long time none received connection. "
  27. + ", jdbcUrl : " + jdbcUrl
  28. + ", jdbcUrl : " + jdbcUrl
  29. + ", lastPacketReceivedIdleMillis : " + mysqlIdleMillis;
  30. LOG.error(errorMsg);
  31. return false;
  32. }
  33. }
  34. }
  35. if (valid && onFatalError) {
  36. lock.lock();
  37. try {
  38. if (onFatalError) {
  39. onFatalError = false;
  40. }
  41. } finally {
  42. lock.unlock();
  43. }
  44. }
  45. return valid;
  46. }
  47. if (conn.isClosed()) {
  48. return false;
  49. }
  50. if (null == validationQuery) {
  51. return true;
  52. }
  53. Statement stmt = null;
  54. ResultSet rset = null;
  55. try {
  56. stmt = conn.createStatement();
  57. if (getValidationQueryTimeout() > 0) {
  58. stmt.setQueryTimeout(validationQueryTimeout);
  59. }
  60. rset = stmt.executeQuery(validationQuery);
  61. if (!rset.next()) {
  62. return false;
  63. }
  64. } finally {
  65. JdbcUtils.close(rset);
  66. JdbcUtils.close(stmt);
  67. }
  68. if (onFatalError) {
  69. lock.lock();
  70. try {
  71. if (onFatalError) {
  72. onFatalError = false;
  73. }
  74. } finally {
  75. lock.unlock();
  76. }
  77. }
  78. return true;
  79. } catch (Throwable ex) {
  80. // skip
  81. return false;
  82. } finally {
  83. if (sqlFile != null) {
  84. JdbcSqlStat.setContextSqlFile(sqlFile);
  85. }
  86. if (sqlName != null) {
  87. JdbcSqlStat.setContextSqlName(sqlName);
  88. }
  89. }
  90. }
  91. }

只有DruidAbstractDataSourcetestConnectionInternal()方法中会调用getLastPacketReceivedTimeMs()方法
testConnectionInternal()是用来检测连接是否有效的,在获取连接和归还连接时都有可能会调用该方法,这取决于Druid检测连接是否有效的参数
「Druid检测连接是否有效的参数」:

  • testOnBorrow:每次获取连接时执行validationQuery检测连接是否有效(会影响性能)
  • testOnReturn:每次归还连接时执行validationQuery检测连接是否有效(会影响性能)
  • testWhileIdle:申请连接的时候检测,如果空闲时间大于timeBetweenEvictionRunsMillis,执行validationQuery检测连接是否有效

应用中设置了testOnBorrow=true,每次获取连接时,都会去抢占synchronized锁,所以性能下降的很明显

解决方案

经验证,使用Druid 1.x版本<=1.1.22会出现该bug,解决方案就是升级至Druid 1.x版本>=1.1.23或者Druid 1.2.x版本