1 应用场景
在Client上编写脚本程序远程监控Web服务器上网站的运行状态,当发现服务器上的为网站运行异常的时候,做如下的处理:
(1)检测服务器是否可访问,如果服务器连接不上,则记录和上报
(2)如果服务器能连接上,尝试重启Web服务。重启后,故障排除,则记录。
(3)如果尝试重启后,服务器故障没有排除,则记录和上报。
2 准备工作
2.1 为Client和WebServer准备好光盘软件源
以Client机器为例,现在光驱中放入光盘文件,然后按照如下命令操作。
[root@localhost ~]# hostnamectl set-hostname client
[root@client ~]# mkdir /media/cdrom
[root@client ~]# mount /dev/cdrom /media/cdrom
mount: /dev/sr0 写保护,将以只读方式挂载
[root@client ~]# cd /etc/yum.repos.d/
[root@client yum.repos.d]# mkdir ./backup
[root@client yum.repos.d]# mv *.repo ./backup/
[root@client yum.repos.d]# cp ./backup/CentOS-Media.repo ./
[root@client yum.repos.d]# vi CentOS-Media.repo
#修改如下两项内容:
gpgcheck=0
enabled=1
#修改完成后保存配置文件
[root@client yum.repos.d]# yum clean all
2.2 在WebServer安装htttpd,并发布一个网站
[root@server ~]# yum -y install httpd
[root@server ~]# setenforce 0
[root@server ~]# firewall-cmd --permanent --add-service=http
success
[root@server ~]# firewall-cmd --reload
success
[root@server ~]# cd /var/www/html
[root@server html]# touch index.html
[root@server html]# echo '<h1>WELCOME TO MY WEBSITE.</h1>' > /var/www/html/index.html
[root@server html]# systemctl start httpd
[root@server html]# systemctl enable httpd
Created symlink from /etc/systemd/system/multi-user.target.wants/httpd.service to /usr/lib/systemd/system/httpd.service.
3 检测WebServer的运行状态
3.1 检测网站能否访问
curl是一个文本网页浏览器,可以把它作为网页访问检测的工具。
[root@client .ssh]# yum -y install curl
可以使用curl来访问网站中的网页
[root@client .ssh]# curl http://192.168.237.202
<h1>WELCOME TO MY WEBSITE.</h1>
[root@client .ssh]# curl http://192.168.237.203
curl: (7) Failed connect to 192.168.237.203:80; 没有到主机的路由
[root@client .ssh]# curl http://192.168.237.202
<h1>WELCOME TO MY WEBSITE.</h1>
[root@client .ssh]# echo $?
0
[root@client .ssh]# curl http://192.168.237.203
curl: (7) Failed connect to 192.168.237.203:80; 没有到主机的路由
[root@client .ssh]# echo $?
7
[root@client .ssh]# curl http://192.168.237.202 &> /dev/null
[root@client .ssh]# echo $?
0
[root@client .ssh]# curl http://192.168.237.203 &> /dev/null
[root@client .ssh]# echo $?
7
写一个脚本程序,用curl来检测网站的状态
#!/bin/bash
#检测网站运行状态的脚本小程序
curl http://192.168.237.202 &> /dev/null
if test $? -eq 0
then
echo "网站http://192.168.237.202运行正常"
else
echo "网站http://192.168.237.202运行异常"
fi
3.2 检测WebServer能否连接
可以从Client上通过ping命令检测WebServer能否连接
[root@client test]# ping -c 1 192.168.237.202
# -c 数字,用于指定ping操作发送的数据包个数
PING 192.168.237.202 (192.168.237.202) 56(84) bytes of data.
64 bytes from 192.168.237.202: icmp_seq=1 ttl=64 time=0.430 ms
--- 192.168.237.202 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.430/0.430/0.430/0.000 ms
[root@client test]# echo $?
# ping通了,则$?=0, 否则 $? <> 0
0
写一个脚本用于来判断WebServer是否在线
#!/bin/bash
ping -c 1 192.168.237.202 &> /dev/null
if test $? -eq 0
then
echo "主机192.168.237.202在线."
else
echo "主机192.168.237.202不在线."
fi
3.3 检测Web服务是否正常运行
3.3.1 检测网络服务运行状态的方法
可以使用systemctl status 命令检测服务运行状态
使用systemctl status 命令检测服务运行状态[root@webserver ~]# systemctl status httpd
● httpd.service - The Apache HTTP Server
Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
Active: active (running) since 二 2022-06-07 08:28:32 CST; 2h 25min ago
Docs: man:httpd(8)
man:apachectl(8)
Main PID: 4693 (httpd)
Status: “Total requests: 6; Current requests/sec: 0; Current traffic: 0 B/sec”
Tasks: 7
CGroup: /system.slice/httpd.service
├─4693 /usr/sbin/httpd -DFOREGROUND
├─4694 /usr/sbin/httpd -DFOREGROUND
├─4695 /usr/sbin/httpd -DFOREGROUND
├─4696 /usr/sbin/httpd -DFOREGROUND
├─4697 /usr/sbin/httpd -DFOREGROUND
├─4698 /usr/sbin/httpd -DFOREGROUND
└─4893 /usr/sbin/httpd -DFOREGROUND
6月 07 08:28:17 webserver systemd[1]: Starting The Apache HTTP Server…
6月 07 08:28:27 webserver httpd[4693]: AH00558: httpd: Could not reliably determine the serve…age
6月 07 08:28:32 webserver systemd[1]: Started The Apache HTTP Server.
Hint: Some lines were ellipsized, use -l to show in full.
[root@webserver ~]# echo $?
0
#如果服务运行正常则返回0
[root@webserver ~]# systemctl stop httpd
[root@webserver ~]# systemctl status httpd
● httpd.service - The Apache HTTP Server
Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
Active: inactive (dead) since 二 2022-06-07 10:54:10 CST; 4s ago
Docs: man:httpd(8)
man:apachectl(8)
Process: 7226 ExecStop=/bin/kill -WINCH ${MAINPID} (code=exited, status=0/SUCCESS)
Process: 4693 ExecStart=/usr/sbin/httpd $OPTIONS -DFOREGROUND (code=exited, status=0/SUCCESS)
Main PID: 4693 (code=exited, status=0/SUCCESS)
Status: “Total requests: 6; Current requests/sec: 0; Current traffic: 0 B/sec”
6月 07 08:28:17 webserver systemd[1]: Starting The Apache HTTP Server…
6月 07 08:28:27 webserver httpd[4693]: AH00558: httpd: Could not reliably determine the serve…age
6月 07 08:28:32 webserver systemd[1]: Started The Apache HTTP Server.
6月 07 10:54:09 webserver systemd[1]: Stopping The Apache HTTP Server…
6月 07 10:54:10 webserver systemd[1]: Stopped The Apache HTTP Server.
Hint: Some lines were ellipsized, use -l to show in full.
[root@webserver ~]# echo $?
3
#如果服务运行异常则返回非0
[root@webserver ~]# systemctl status httpd &> /dev/null
[root@webserver ~]# echo $?
0
可以使用systemctl is-active命令检测服务运行状态
使用systemctl is-active命令检测服务运行状态[root@webserver ~]# systemctl start httpd
[root@webserver ~]# systemctl is-active httpd
active
#httpd服务运行正常则返回”active”
[root@webserver ~]# systemctl stop httpd
[root@webserver ~]# systemctl is-active httpd
inactive
#httpd服务运行异常则返回”inactive”
[root@webserver ~]# systemctl is-active ftpd
unknown
#ftpd服务没有安装,则返回”unknown”
3.3.2 在Client上监控WebServerd的网络服务运行状态
要利用Client与WebServer的SSH连接,来实现对WebServer网络服务运行状态的监控
[root@client test]# ssh root@192.168.237.202 ‘systemctl status httpd &> /dev/null’
root@192.168.237.202’s password:
[root@client test]# echo $?
3
[root@client test]# ssh root@192.168.237.202 ‘systemctl status httpd &> /dev/null’
root@192.168.237.202’s password:
[root@client test]# echo $?
0
在Client实现对WebServer的SSH免密码登录
在Client实现对WebServer的SSH免密码登录[root@client test]# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:q3B+rg/zRHjRxeJsauvIYCgruYBAE3abrtPKGndnPE4 root@client
The key’s randomart image is:
+—-[RSA 2048]——+
| o . .. |
|. o o …. |
| o o .o.. |
|. o . .+ |
|. . . So |
|o o. . oo. |
|=+o.= E.o. |
|==oo @ Oo |
|+ B= |
+——[SHA256]——-+
[root@client .ssh]# ssh-copy-id -i ~/.ssh/id_rsa.pub 192.168.237.202
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: “/root/.ssh/id_rsa.pub”
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed — if you are prompted now it is to install the new keys
root@192.168.237.202’s password:
Number of key(s) added: 1
Now try logging into the machine, with: “ssh ‘192.168.237.202’”
and check to make sure that only the key(s) you wanted were added.
[root@client .ssh]# ssh root@192.168.237.202 ‘systemctl status httpd &> /dev/null’
[root@client .ssh]# echo $?
0
3.3.3 写一个脚本程序在Client上测试WebServer上的httpd服务运行状态
#!/bin/bash
ssh root@192.168.237.202 'systemctl status httpd &> /dev/null'
if test $? -eq 0
then
echo "服务器192.168.237.202上的httpd访问运行正常"
else
echo "服务器192.168.237.202上的httpd访问运行异常"
fi
4 使用脚本程序发送电子邮件
4.1 使用mailx软件发送电子邮件
- 首先准备一个电子邮箱,并获得相应的参数,以126 邮箱为例。
(1)电子邮箱账户:XXXXXXX@qq.com
(2)邮件发送服务器(SMTP)的域名:smtp.qq.com
(3)邮箱第三方服务授权码: ** (邮件服务器都会提供,不同邮件服务器获取方式略有不同)
2.在Client上安装mailx邮件客户端程序,并配置
[root@client test]# yum -y install mailx
[root@client test]# vi /etc/mail.rc
#修改/etc/mail.rc配置文件,添加如下内容:
set from=14271080@qq.com
#设置发送邮件的邮箱
set smtp="smtp.qq.com"
#设置邮件发送服务器
set smtp-auth-user="14271080@qq.com"
#设置用于验证的邮箱账号
set smtp-auth-password="*****************"
#设置smtp第三方邮箱登录密码
set smtp-auth=login
3.使用mail命令测试邮件发送
[root@client test]# echo "这是一个测试邮件" | mail -s "test" 14271080@qq.com
#说明:
#(1)echo "这是一个测试邮件" 表示邮件正文,通过管道传递给mail命令
#(2)mail -s 表示发送邮件
#(3)"test" 指定邮件标题
#(4)14271080@qq.com 指定收件邮箱
4.2 写一个脚本程序测试邮件发送
#!/bin/bash
DATE=$(date '+%Y-%m-%d %H:%M')
INFO="发现服务器192.168.237.202离线!"
echo "$DATE $INFO" | mail -s '服务器警告!' 14271080@qq.com
5 检测脚本的编写
#!/bin/bash
mailto()
#定义mailto函数,用来发送电子邮件
#mailto函数需要3个(位置)参数,$1 表示邮件正文 $2表示邮件主题 $3表示收件邮箱
{
DATETIME=$(date "+%Y-%m-%d %H:%M:%S")
echo $DATETIME $1 | mail -s $2 $3
}
curl http://192.168.237.202 &> /dev/null
if test $? -ne 0
then
#发现http://192.168.237.202无法访问
ping -c 1 192.168.237.202 &> /dev/null
if test $? -ne 0
then
#192.168.237.202无法ping通
mailto "发现192.168.237.202网站无法访问,且主机ping不通,请及时处理" "服务器警告" 14271080@qq.com
DATETIME=$(date "+%Y-%m-%d %H:%M:%S")
echo "$DATETIME 服务器警告:发现192.168.237.202网站无法访问,且主机ping不通" >> /root/example/server.log
else
#192.168.237.202能够ping通
ssh root@192.168.237.202 "systemctl status httpd" &> /dev/null
if test $? -eq 0
then
#192.168.237.202上的httpd服务运行正常
mailto "发现192.168.237.202网站无法访问,经测试能ping通,httpd服务运行正常,请及时处理" "服务器警告" 14271080@qq.com
DATETIME=$(date "+%Y-%m-%d %H:%M:%S")
echo "$DATETIME 服务器警告:发现192.168.237.202网站无法访问,经测试能ping通,httpd服务运行正常" >> /root/example/server.log
else
#192.168.237.202上的httpd服务运行不正常
ssh root@192.168.237.202 "systemctl restart httpd" &> /dev/null
#远程重启httpd服务
ssh root@192.168.237.202 "systemctl status httpd" &> /dev/null
if test $? -ne 0
#远程检测httpd服务状态
then
#重启httpd服务失败
mailto "发现192.168.237.202网站无法访问,经测试能ping通,重启httpd服务失败,请及时处理" "服务器警告" 14271080@qq.com
DATETIME=$(date "+%Y-%m-%d %H:%M:%S")
echo "$DATETIME 服务器警告:发现192.168.237.202网站无法访问,经测试能ping通,重启httpd服务失败" >> /root/example/server.log
else
#重启httpd服务成功
curl http://192.168.237.202 &> /dev/null
#检测网站能否访问
if test $? -eq 0
then
DATETIME=$(date "+%Y-%m-%d %H:%M:%S")
echo "$DATETIME 服务器警告:发现192.168.237.202网站无法访问,经重启httpd服务,网站恢复正常" >> /root/example/server.log
else
mailto "发现192.168.237.202网站无法访问,经测试能ping通,重启httpd服务仍无法排除故障,请及时处理" "服务器警告" 14271080@qq.com
DATETIME=$(date "+%Y-%m-%d %H:%M:%S")
echo "$DATETIME 服务器警告:发现192.168.237.202网站无法访问,经测试能ping通,重启httpd服务仍无法排除故障" >> /root/example/server.log
fi
fi
fi
fi
fi
6 在Client上将test.sh脚本作为Crond的定时任务计划
编辑/etc/crontab配置文件,设置每个3分钟运行一次test.sh脚本
[root@client ~]# vi /etc/crontab
#添加如下内容
*/3 * * * * root /root/example/test.sh
#然后保存退出
[root@client ~]# systemctl restart crond
这里还有一个需要改进的地方:当发现网站故障之后,如果没有及时解决,test.sh脚本会每个3分钟给管理员发送信息和记入日志,重复提醒。请修改程序,解决这个问题。