早来来上班,开发告诉我,java热更新代码有的机器可以热更新,有的机器不可以热更新.
    然后登陆服务器,查看业务日志
    image.png
    昨天热更class,报错Unable to open socket file: target process not responding or HotSpot VM not loaded=
    com.sun.tools.attach.AttachNotSupportedException: Unable to open socket file: target process not responding or HotSpot VM not loaded

    jps查看pid

    1. [root@txy-zh-ysj-game1 ~]# jps
    2. 19968 ChatApplication
    3. 19873 ChatApplication
    4. 5858 Application
    5. 5989 Application
    6. 17829 Jps
    7. 5800 Application
    8. 19817 ChatApplication
    9. 16814 Application
    10. 5743 Application
    11. 19920 ChatApplication
    12. 20020 ChatApplication
    13. 19764 ChatApplication
    14. 5919 Application

    根据pid查看堆栈信息

    [root@txy-zh-ysj-game1 2020-05-19]# jstack 5858
    5858: Unable to open socket file: target process not responding or HotSpot VM not loaded
    The -F option can be used when the target process is not responding
    [root@txy-zh-ysj-game1 2020-05-19]# jstack 5858 -F
    Attaching to core -F from executable 5858, please wait...
    Error attaching to core file: cannot open binary file
    sun.jvm.hotspot.debugger.DebuggerException: cannot open binary file
           at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.attach0(Native Method)
           at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.attach(LinuxDebuggerLocal.java:286)
           at sun.jvm.hotspot.HotSpotAgent.attachDebugger(HotSpotAgent.java:673)
           at sun.jvm.hotspot.HotSpotAgent.setupDebuggerLinux(HotSpotAgent.java:611)
           at sun.jvm.hotspot.HotSpotAgent.setupDebugger(HotSpotAgent.java:337)
           at sun.jvm.hotspot.HotSpotAgent.go(HotSpotAgent.java:304)
           at sun.jvm.hotspot.HotSpotAgent.attach(HotSpotAgent.java:156)
           at sun.jvm.hotspot.tools.Tool.start(Tool.java:191)
           at sun.jvm.hotspot.tools.Tool.execute(Tool.java:118)
           at sun.jvm.hotspot.tools.JStack.main(JStack.java:92)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at sun.tools.jstack.JStack.runJStackTool(JStack.java:140)
           at sun.tools.jstack.JStack.main(JStack.java:106)
    

    发现都基本报的是相同的错误,找不到socket文件.

    百度分析得知:
    jvm运行时会生成一个目录hsperfdata$USER($USER是启动java进程的用户),在linux中默认是/tmp。目录下会有些pid文件,存放jvm进程信息。
    jps、jstack等工具读取/tmp/hsperfdata
    $USER下的pid文件获取连接信息。

    查看/tmp/hsperfdata_root下,文件都存在,为啥找不到呢…

    [root@txy-zh-ysj-game1 2020-05-19]# ls /tmp/hsperfdata_root/
    16814  19764  19817  19873  19920  19968  20020  5743  5800  5858  5919  5989
    

    查看pid5858进程用到了哪些文件
    [root@txy-zh-ysj-game1 ~]# lsof -p 5858
    image.png
    发现了使用了这个文件/tmp/.java_pid5858.tmp

    查看tmp目录下所有的文件
    image.png

    发现并没有.java_pid5858.tmp文件,但是有其他3个类似的文件
    查看另外几个文件的类型,正好是socket类型文件.
    [root@txy-zh-ysj-game1 ~]# file /tmp/.java_pid19764
    /tmp/.java_pid19764: socket
    [root@txy-zh-ysj-game1 ~]# file /tmp/.java_pid19968
    /tmp/.java_pid19968: socket

    执行19968的栈信息,发现可以正常输出.
    [root@txy-zh-ysj-game1 ~]# jstack 19968

    这下基本确认问题了,/tmp目录下丢失.java_pid5858.tmp文件导致无法导出堆内存,java热更新已需要借助socket文件

    排查为啥文件会丢失,我没有对/tmp目录的文件进行删除操作呀
    前几天热更新,都可以热更新的
    根据查询,java 热更新最后一次在5月15日,都是可以成功的.
    19号热更新,就只有部分进程生效了

    只能去查系统日志了.
    查询5月15号之后的系统日志
    image.png
    May 16 18:56:01 txy-zh-ysj-game1 systemd: Starting Cleanup of Temporary Directories…
    May 16 18:56:01 txy-zh-ysj-game1 systemd: Started Cleanup of Temporary Directories.
    发现两条异常日志,原来是systemd-tmpfiles-clean.service服务
    linux系统会自动按照一定的规则清理/tmp目录下的文件,.

    centos6 使用的是tmpwatch,默认是没装这个命令的(怪不得我们程序部署在centos6上面从来没出现热更新不生效的问题)

    centos7 根据服务systemd-tmpfiles-clean.service 进行临时文件的清理,清理规则定义在配置文件/usr/lib/tmpfiles.d/tmp.conf,调用命令为/usr/bin/systemd-tmpfiles —clean,执行时间依靠systemd-tmpfiles-clean.timer进行管理

    [root@txy-zh-ysj-game1 2020-05-19]# cat /etc/redhat-release
    CentOS Linux release 7.5.1804 (Core)

    [root@txy-zh-ysj-game1 2020-05-19]# cat /usr/lib/systemd/system/systemd-tmpfiles-clean.timer
    #  This file is part of systemd.
    #
    #  systemd is free software; you can redistribute it and/or modify it
    #  under the terms of the GNU Lesser General Public License as published by
    #  the Free Software Foundation; either version 2.1 of the License, or
    #  (at your option) any later version.
    
    [Unit]
    Description=Daily Cleanup of Temporary Directories
    Documentation=man:tmpfiles.d(5) man:systemd-tmpfiles(8)
    
    [Timer]
    OnBootSec=15min
    #开机15分钟执行服务
    OnUnitActiveSec=1d
    #距离上次执行该服务1天后执行服务
    
    [root@txy-zh-ysj-game1 2020-05-19]# cat /usr/lib/systemd/system/systemd-tmpfiles-clean.service
    #  This file is part of systemd.
    #
    #  systemd is free software; you can redistribute it and/or modify it
    #  under the terms of the GNU Lesser General Public License as published by
    #  the Free Software Foundation; either version 2.1 of the License, or
    #  (at your option) any later version.
    
    [Unit]
    Description=Cleanup of Temporary Directories
    Documentation=man:tmpfiles.d(5) man:systemd-tmpfiles(8)
    DefaultDependencies=no
    Conflicts=shutdown.target
    After=systemd-readahead-collect.service systemd-readahead-replay.service local-fs.target time-sync.target
    Before=shutdown.target
    
    [Service]
    Type=oneshot
    ExecStart=/usr/bin/systemd-tmpfiles --clean
    IOSchedulingClass=idle
    


    [root@txy-zh-ysj-game1 2020-05-19]# cat /usr/lib/tmpfiles.d/tmp.conf 
    #  This file is part of systemd.
    #
    #  systemd is free software; you can redistribute it and/or modify it
    #  under the terms of the GNU Lesser General Public License as published by
    #  the Free Software Foundation; either version 2.1 of the License, or
    #  (at your option) any later version.
    
    # See tmpfiles.d(5) for details
    
    # Clear tmp directories separately, to make them easier to override
    v /tmp 1777 root root 10d
    v /var/tmp 1777 root root 30d
    
    # Exclude namespace mountpoints created with PrivateTmp=yes
    x /tmp/systemd-private-%b-*
    X /tmp/systemd-private-%b-*/tmp
    x /var/tmp/systemd-private-%b-*
    X /var/tmp/systemd-private-%b-*/tmp
    

    v 需要清理的目录
    x 忽略的目录及目录下的子文件,可以使用shell通配符
    X 忽略的指定目录,不包含子文件,可以使用shell通配符

    tmpfiles文件详细文档….
    http://www.jinbuguo.com/systemd/tmpfiles.d.html

    最终处理:

    echo "x /tmp/hsperfdata" >>/usr/lib/tmpfiles.d/tmp.conf
    echo "X /tmp/.java_" >>/usr/lib/tmpfiles.d/tmp.conf
    

    最后将这个写成playpook写入系统初始化里面,避免出现这样的问题.
    no_clean_java_tmpfile.yml

    ---
    - hosts: txy-zh-ysj-game1
      remote_user: root
    
      tasks:
      - name: /usr/lib/tmpfiles.d/tmp.conf is exitS
        shell: ls /usr/lib/tmpfiles.d/tmp.conf
        ignore_errors: True
        register: result
    
      - name: add x /tmp/hsperfdata*
        lineinfile: dest=/usr/lib/tmpfiles.d/tmp.conf line='x /tmp/hsperfdata*'
        when: result is succeeded
    
      - name: add X /tmp/.java_*
        lineinfile: dest=/usr/lib/tmpfiles.d/tmp.conf line='X /tmp/.java_*'
        when: result is succeeded
        notify: addline_handlers
    
      handlers: 
      - name: addline_handlers
        service: name=systemd-tmpfiles-clean state=restarted enabled=yes
    
    # - shell: echo "file not exit"
    #    when: result is failed