1 应用场景

在Client上编写脚本程序远程监控Web服务器上网站的运行状态,当发现服务器上的为网站运行异常的时候,做如下的处理:
(1)检测服务器是否可访问,如果服务器连接不上,则记录和上报
(2)如果服务器能连接上,尝试重启Web服务。重启后,故障排除,则记录。
(3)如果尝试重启后,服务器故障没有排除,则记录和上报。
2 准备工作
2.1 为Client和WebServer准备好光盘软件源
以Client机器为例,现在光驱中放入光盘文件,然后按照如下命令操作。
[root@localhost ~]# hostnamectl set-hostname client[root@client ~]# mkdir /media/cdrom[root@client ~]# mount /dev/cdrom /media/cdrommount: /dev/sr0 写保护,将以只读方式挂载[root@client ~]# cd /etc/yum.repos.d/[root@client yum.repos.d]# mkdir ./backup[root@client yum.repos.d]# mv *.repo ./backup/[root@client yum.repos.d]# cp ./backup/CentOS-Media.repo ./[root@client yum.repos.d]# vi CentOS-Media.repo#修改如下两项内容:gpgcheck=0enabled=1#修改完成后保存配置文件[root@client yum.repos.d]# yum clean all
2.2 在WebServer安装htttpd,并发布一个网站
[root@server ~]# yum -y install httpd[root@server ~]# setenforce 0[root@server ~]# firewall-cmd --permanent --add-service=httpsuccess[root@server ~]# firewall-cmd --reloadsuccess[root@server ~]# cd /var/www/html[root@server html]# touch index.html[root@server html]# echo '<h1>WELCOME TO MY WEBSITE.</h1>' > /var/www/html/index.html[root@server html]# systemctl start httpd[root@server html]# systemctl enable httpdCreated symlink from /etc/systemd/system/multi-user.target.wants/httpd.service to /usr/lib/systemd/system/httpd.service.
3 检测WebServer的运行状态
3.1 检测网站能否访问
curl是一个文本网页浏览器,可以把它作为网页访问检测的工具。
[root@client .ssh]# yum -y install curl
可以使用curl来访问网站中的网页
[root@client .ssh]# curl http://192.168.237.202<h1>WELCOME TO MY WEBSITE.</h1>[root@client .ssh]# curl http://192.168.237.203curl: (7) Failed connect to 192.168.237.203:80; 没有到主机的路由[root@client .ssh]# curl http://192.168.237.202<h1>WELCOME TO MY WEBSITE.</h1>[root@client .ssh]# echo $?0[root@client .ssh]# curl http://192.168.237.203curl: (7) Failed connect to 192.168.237.203:80; 没有到主机的路由[root@client .ssh]# echo $?7[root@client .ssh]# curl http://192.168.237.202 &> /dev/null[root@client .ssh]# echo $?0[root@client .ssh]# curl http://192.168.237.203 &> /dev/null[root@client .ssh]# echo $?7
写一个脚本程序,用curl来检测网站的状态
#!/bin/bash#检测网站运行状态的脚本小程序curl http://192.168.237.202 &> /dev/nullif test $? -eq 0thenecho "网站http://192.168.237.202运行正常"elseecho "网站http://192.168.237.202运行异常"fi
3.2 检测WebServer能否连接
可以从Client上通过ping命令检测WebServer能否连接
[root@client test]# ping -c 1 192.168.237.202# -c 数字,用于指定ping操作发送的数据包个数PING 192.168.237.202 (192.168.237.202) 56(84) bytes of data.64 bytes from 192.168.237.202: icmp_seq=1 ttl=64 time=0.430 ms--- 192.168.237.202 ping statistics ---1 packets transmitted, 1 received, 0% packet loss, time 0msrtt min/avg/max/mdev = 0.430/0.430/0.430/0.000 ms[root@client test]# echo $?# ping通了,则$?=0, 否则 $? <> 00
写一个脚本用于来判断WebServer是否在线
#!/bin/bashping -c 1 192.168.237.202 &> /dev/nullif test $? -eq 0thenecho "主机192.168.237.202在线."elseecho "主机192.168.237.202不在线."fi
3.3 检测Web服务是否正常运行
3.3.1 检测网络服务运行状态的方法
可以使用systemctl status 命令检测服务运行状态
使用systemctl status 命令检测服务运行状态[root@webserver ~]# systemctl status httpd
● httpd.service - The Apache HTTP Server
Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
Active: active (running) since 二 2022-06-07 08:28:32 CST; 2h 25min ago
Docs: man:httpd(8)
man:apachectl(8)
Main PID: 4693 (httpd)
Status: “Total requests: 6; Current requests/sec: 0; Current traffic: 0 B/sec”
Tasks: 7
CGroup: /system.slice/httpd.service
├─4693 /usr/sbin/httpd -DFOREGROUND
├─4694 /usr/sbin/httpd -DFOREGROUND
├─4695 /usr/sbin/httpd -DFOREGROUND
├─4696 /usr/sbin/httpd -DFOREGROUND
├─4697 /usr/sbin/httpd -DFOREGROUND
├─4698 /usr/sbin/httpd -DFOREGROUND
└─4893 /usr/sbin/httpd -DFOREGROUND
6月 07 08:28:17 webserver systemd[1]: Starting The Apache HTTP Server…
6月 07 08:28:27 webserver httpd[4693]: AH00558: httpd: Could not reliably determine the serve…age
6月 07 08:28:32 webserver systemd[1]: Started The Apache HTTP Server.
Hint: Some lines were ellipsized, use -l to show in full.
[root@webserver ~]# echo $?
0
#如果服务运行正常则返回0
[root@webserver ~]# systemctl stop httpd
[root@webserver ~]# systemctl status httpd
● httpd.service - The Apache HTTP Server
Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
Active: inactive (dead) since 二 2022-06-07 10:54:10 CST; 4s ago
Docs: man:httpd(8)
man:apachectl(8)
Process: 7226 ExecStop=/bin/kill -WINCH ${MAINPID} (code=exited, status=0/SUCCESS)
Process: 4693 ExecStart=/usr/sbin/httpd $OPTIONS -DFOREGROUND (code=exited, status=0/SUCCESS)
Main PID: 4693 (code=exited, status=0/SUCCESS)
Status: “Total requests: 6; Current requests/sec: 0; Current traffic: 0 B/sec”
6月 07 08:28:17 webserver systemd[1]: Starting The Apache HTTP Server…
6月 07 08:28:27 webserver httpd[4693]: AH00558: httpd: Could not reliably determine the serve…age
6月 07 08:28:32 webserver systemd[1]: Started The Apache HTTP Server.
6月 07 10:54:09 webserver systemd[1]: Stopping The Apache HTTP Server…
6月 07 10:54:10 webserver systemd[1]: Stopped The Apache HTTP Server.
Hint: Some lines were ellipsized, use -l to show in full.
[root@webserver ~]# echo $?
3
#如果服务运行异常则返回非0
[root@webserver ~]# systemctl status httpd &> /dev/null
[root@webserver ~]# echo $?
0
可以使用systemctl is-active命令检测服务运行状态
使用systemctl is-active命令检测服务运行状态[root@webserver ~]# systemctl start httpd
[root@webserver ~]# systemctl is-active httpd
active
#httpd服务运行正常则返回”active”
[root@webserver ~]# systemctl stop httpd
[root@webserver ~]# systemctl is-active httpd
inactive
#httpd服务运行异常则返回”inactive”
[root@webserver ~]# systemctl is-active ftpd
unknown
#ftpd服务没有安装,则返回”unknown”
3.3.2 在Client上监控WebServerd的网络服务运行状态
要利用Client与WebServer的SSH连接,来实现对WebServer网络服务运行状态的监控
[root@client test]# ssh root@192.168.237.202 ‘systemctl status httpd &> /dev/null’
root@192.168.237.202’s password:
[root@client test]# echo $?
3
[root@client test]# ssh root@192.168.237.202 ‘systemctl status httpd &> /dev/null’
root@192.168.237.202’s password:
[root@client test]# echo $?
0
在Client实现对WebServer的SSH免密码登录
在Client实现对WebServer的SSH免密码登录[root@client test]# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:q3B+rg/zRHjRxeJsauvIYCgruYBAE3abrtPKGndnPE4 root@client
The key’s randomart image is:
+—-[RSA 2048]——+
| o . .. |
|. o o …. |
| o o .o.. |
|. o . .+ |
|. . . So |
|o o. . oo. |
|=+o.= E.o. |
|==oo @ Oo |
|+ B= |
+——[SHA256]——-+
[root@client .ssh]# ssh-copy-id -i ~/.ssh/id_rsa.pub 192.168.237.202
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: “/root/.ssh/id_rsa.pub”
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed — if you are prompted now it is to install the new keys
root@192.168.237.202’s password:
Number of key(s) added: 1
Now try logging into the machine, with: “ssh ‘192.168.237.202’”
and check to make sure that only the key(s) you wanted were added.
[root@client .ssh]# ssh root@192.168.237.202 ‘systemctl status httpd &> /dev/null’
[root@client .ssh]# echo $?
0
3.3.3 写一个脚本程序在Client上测试WebServer上的httpd服务运行状态
#!/bin/bashssh root@192.168.237.202 'systemctl status httpd &> /dev/null'if test $? -eq 0thenecho "服务器192.168.237.202上的httpd访问运行正常"elseecho "服务器192.168.237.202上的httpd访问运行异常"fi
4 使用脚本程序发送电子邮件
4.1 使用mailx软件发送电子邮件
- 首先准备一个电子邮箱,并获得相应的参数,以126 邮箱为例。
(1)电子邮箱账户:XXXXXXX@qq.com
(2)邮件发送服务器(SMTP)的域名:smtp.qq.com
(3)邮箱第三方服务授权码: ** (邮件服务器都会提供,不同邮件服务器获取方式略有不同)

2.在Client上安装mailx邮件客户端程序,并配置
[root@client test]# yum -y install mailx[root@client test]# vi /etc/mail.rc#修改/etc/mail.rc配置文件,添加如下内容:set from=14271080@qq.com#设置发送邮件的邮箱set smtp="smtp.qq.com"#设置邮件发送服务器set smtp-auth-user="14271080@qq.com"#设置用于验证的邮箱账号set smtp-auth-password="*****************"#设置smtp第三方邮箱登录密码set smtp-auth=login
3.使用mail命令测试邮件发送
[root@client test]# echo "这是一个测试邮件" | mail -s "test" 14271080@qq.com#说明:#(1)echo "这是一个测试邮件" 表示邮件正文,通过管道传递给mail命令#(2)mail -s 表示发送邮件#(3)"test" 指定邮件标题#(4)14271080@qq.com 指定收件邮箱
4.2 写一个脚本程序测试邮件发送
#!/bin/bashDATE=$(date '+%Y-%m-%d %H:%M')INFO="发现服务器192.168.237.202离线!"echo "$DATE $INFO" | mail -s '服务器警告!' 14271080@qq.com
5 检测脚本的编写

#!/bin/bashmailto()#定义mailto函数,用来发送电子邮件#mailto函数需要3个(位置)参数,$1 表示邮件正文 $2表示邮件主题 $3表示收件邮箱{DATETIME=$(date "+%Y-%m-%d %H:%M:%S")echo $DATETIME $1 | mail -s $2 $3}curl http://192.168.237.202 &> /dev/nullif test $? -ne 0then#发现http://192.168.237.202无法访问ping -c 1 192.168.237.202 &> /dev/nullif test $? -ne 0then#192.168.237.202无法ping通mailto "发现192.168.237.202网站无法访问,且主机ping不通,请及时处理" "服务器警告" 14271080@qq.comDATETIME=$(date "+%Y-%m-%d %H:%M:%S")echo "$DATETIME 服务器警告:发现192.168.237.202网站无法访问,且主机ping不通" >> /root/example/server.logelse#192.168.237.202能够ping通ssh root@192.168.237.202 "systemctl status httpd" &> /dev/nullif test $? -eq 0then#192.168.237.202上的httpd服务运行正常mailto "发现192.168.237.202网站无法访问,经测试能ping通,httpd服务运行正常,请及时处理" "服务器警告" 14271080@qq.comDATETIME=$(date "+%Y-%m-%d %H:%M:%S")echo "$DATETIME 服务器警告:发现192.168.237.202网站无法访问,经测试能ping通,httpd服务运行正常" >> /root/example/server.logelse#192.168.237.202上的httpd服务运行不正常ssh root@192.168.237.202 "systemctl restart httpd" &> /dev/null#远程重启httpd服务ssh root@192.168.237.202 "systemctl status httpd" &> /dev/nullif test $? -ne 0#远程检测httpd服务状态then#重启httpd服务失败mailto "发现192.168.237.202网站无法访问,经测试能ping通,重启httpd服务失败,请及时处理" "服务器警告" 14271080@qq.comDATETIME=$(date "+%Y-%m-%d %H:%M:%S")echo "$DATETIME 服务器警告:发现192.168.237.202网站无法访问,经测试能ping通,重启httpd服务失败" >> /root/example/server.logelse#重启httpd服务成功curl http://192.168.237.202 &> /dev/null#检测网站能否访问if test $? -eq 0thenDATETIME=$(date "+%Y-%m-%d %H:%M:%S")echo "$DATETIME 服务器警告:发现192.168.237.202网站无法访问,经重启httpd服务,网站恢复正常" >> /root/example/server.logelsemailto "发现192.168.237.202网站无法访问,经测试能ping通,重启httpd服务仍无法排除故障,请及时处理" "服务器警告" 14271080@qq.comDATETIME=$(date "+%Y-%m-%d %H:%M:%S")echo "$DATETIME 服务器警告:发现192.168.237.202网站无法访问,经测试能ping通,重启httpd服务仍无法排除故障" >> /root/example/server.logfififififi
6 在Client上将test.sh脚本作为Crond的定时任务计划
编辑/etc/crontab配置文件,设置每个3分钟运行一次test.sh脚本
[root@client ~]# vi /etc/crontab#添加如下内容*/3 * * * * root /root/example/test.sh#然后保存退出[root@client ~]# systemctl restart crond
这里还有一个需要改进的地方:当发现网站故障之后,如果没有及时解决,test.sh脚本会每个3分钟给管理员发送信息和记入日志,重复提醒。请修改程序,解决这个问题。
