性能监控系统
SkyWalking 是一个应用性能监控系统,特别为微服务、云原生和基于容器(Docker, Kubernetes, Mesos)体系结构而设计。除了应用指标监控以外,它还能对分布式调用链路进行追踪。类似功能的组件还有:Zipkin、Pinpoint、CAT等。
上几张图,看看效果:
SkyWalking性能监控系统 - 图1
SkyWalking性能监控系统 - 图2
SkyWalking性能监控系统 - 图3
SkyWalking性能监控系统 - 图4

1、概念与架构

SkyWalking是一个开源监控平台,用于从服务和云原生基础设施收集、分析、聚合和可视化数据。SkyWalking提供了一种简单的方法来维护分布式系统的清晰视图,甚至可以跨云查看。它是一种现代APM,专门为云原生、基于容器的分布式系统设计。
SkyWalking从三个维度对应用进行监视:service(服务), service instance(实例), endpoint(端点)
服务和实例就不多说了,端点是服务中的某个路径或者说URI

SkyWalking allows users to understand the topology relationship between Services and Endpoints, to view the metrics of every Service/Service Instance/Endpoint and to set alarm rules.

SkyWalking允许用户了解服务和端点之间的拓扑关系,查看每个服务/服务实例/端点的度量,并设置警报规则。

1.1. 架构

SkyWalking性能监控系统 - 图5
SkyWalking逻辑上分为四个部分:Probes(探针), Platform backend(平台后端), Storage(存储), UI
这个结构就很清晰了,探针就是Agent负责采集数据并上报给服务端,服务端对数据进行处理和存储,UI负责展示
SkyWalking性能监控系统 - 图6

2、下载与安装

SkyWalking有两中版本,ES版本和非ES版。如果决定采用ElasticSearch作为存储,那么就下载es版本。
https://skywalking.apache.org/downloads/
https://archive.apache.org/dist/skywalking/
image.png
SkyWalking性能监控系统 - 图8
agent目录将来要拷贝到各服务所在机器上用作探针
bin目录是服务启动脚本
config目录是配置文件
oap-libs目录是oap服务运行所需的jar包
webapp目录是web服务运行所需的jar包
接下来,要选择存储了,支持的存储有:

  • H2
  • ElasticSearch 6, 7
  • MySQL
  • TiDB
  • InfluxDB

作为监控系统,首先排除H2和MySQL,这里推荐InfluxDB,它本身就是时序数据库,非常适合这种场景
但是InfluxDB不是很熟悉,所以这里先用ElasticSearch7
https://github.com/apache/skywalking/blob/master/docs/en/setup/backend/backend-storage.md

2.1. 安装ElasticSearch

https://www.elastic.co/guide/en/elasticsearch/reference/7.10/targz.html

  1. # 启动
  2. ./bin/elasticsearch -d -p pid
  3. # 停止
  4. pkill -F pid

SkyWalking性能监控系统 - 图9
ElasticSearch7.x需要Java 11以上的版本,但是如果设置了环境变量JAVA_HOME的话,它会用环境变量设置的Java版本
通常,启动过程中会报以下三个错误:

  1. [1]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65535]
  2. [2]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
  3. [3]: the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured

解决方法:
在 /etc/security/limits.conf 文件中追加以下内容:

  1. * soft nofile 65536
  2. * hard nofile 65536
  3. * soft nproc 4096
  4. * hard nproc 4096

可通过以下四个命令查看修改结果:

  1. ulimit -Hn
  2. ulimit -Sn
  3. ulimit -Hu
  4. ulimit -Su

修改 /etc/sysctl.conf 文件,追加以下内容:

  1. vm.max_map_count=262144

修改es配置文件 elasticsearch.yml 取消注释,保留一个节点

  1. cluster.initial_master_nodes: ["node-1"]

为了能够ip:port方式访问,还需修改网络配置

  1. network.host: 0.0.0.0

修改完是这样的:
SkyWalking性能监控系统 - 图10
SkyWalking性能监控系统 - 图11
至此,ElasticSearch算是启动成功了
一个节点还不够,这里用三个节点搭建一个集群
192.168.100.14 config/elasticsearch.yml

  1. cluster.name: my-monitor
  2. node.name: node-1
  3. network.host: 192.168.100.14
  4. http.port: 9200
  5. discovery.seed_hosts: ["192.168.100.14:9300", "192.168.100.15:9300", "192.168.100.19:9300"]
  6. cluster.initial_master_nodes: ["node-1"]

192.168.100.15 config/elasticsearch.yml

  1. cluster.name: my-monitor
  2. node.name: node-2
  3. network.host: 192.168.100.15
  4. http.port: 9200
  5. discovery.seed_hosts: ["192.168.100.14:9300", "192.168.100.15:9300", "192.168.100.19:9300"]
  6. cluster.initial_master_nodes: ["node-1"]

192.168.100.19 config/elasticsearch.yml

  1. cluster.name: my-monitor
  2. node.name: node-3
  3. network.host: 192.168.100.19
  4. http.port: 9200
  5. discovery.seed_hosts: ["192.168.100.14:9300", "192.168.100.15:9300", "192.168.100.19:9300"]
  6. cluster.initial_master_nodes: ["node-1"]

同时,建议修改三个节点config/jvm.options

  1. -Xms2g
  2. -Xmx2g

依次启动三个节点

  1. pkill -F pid
  2. ./bin/elasticsearch -d -p pid

SkyWalking性能监控系统 - 图12
SkyWalking性能监控系统 - 图13
SkyWalking性能监控系统 - 图14
接下来,修改skywalking下config/application.yml 中配置es地址即可

  1. storage:
  2. selector: ${SW_STORAGE:elasticsearch7}
  3. elasticsearch7:
  4. nameSpace: ${SW_NAMESPACE:""}
  5. clusterNodes: ${SW_STORAGE_ES_CLUSTER_NODES:192.168.100.14:9200,192.168.100.15:9200,192.168.100.19:9200}

2.2. 安装Agent

https://github.com/apache/skywalking/blob/v8.2.0/docs/en/setup/service-agent/java-agent/README.md
将agent目录拷贝至各服务所在的机器上

  1. scp -r ./agent chengjs@192.168.100.12:~/

这里,将它拷贝至各个服务目录下
SkyWalking性能监控系统 - 图15
plugins是探针用到各种插件,SkyWalking插件都是即插即用的,可以把optional-plugins中的插件放到plugins中
修改 agent/config/agent.config 配置文件,也可以通过命令行参数指定
主要是配置服务名称和后端服务地址

  1. agent.service_name=${SW_AGENT_NAME:user-center}
  2. collector.backend_service=${SW_AGENT_COLLECTOR_BACKEND_SERVICES:192.168.100.17:11800}

当然,也可以通过环境变量或系统属性的方式来设置,例如:

  1. export SW_AGENT_COLLECTOR_BACKEND_SERVICES=127.0.0.1:11800

最后,在服务启动的时候用命令行参数 -javaagent 来指定探针

  1. java -javaagent:/path/to/skywalking-agent/skywalking-agent.jar -jar yourApp.jar

例如:

  1. java -javaagent:./agent/skywalking-agent.jar -Dspring.profiles.active=dev -Xms512m -Xmx1024m -jar demo-0.0.1-SNAPSHOT.jar

3、启动服务

修改 webapp/webapp.yml 文件,更改端口号及后端服务地址

  1. server:
  2. port: 9000
  3. collector:
  4. path: /graphql
  5. ribbon:
  6. ReadTimeout: 10000
  7. # Point to all backend's restHost:restPort, split by ,
  8. listOfServers: 127.0.0.1:12800

启动服务

  1. bin/startup.sh

或者分别依次启动

  1. bin/oapService.sh
  2. bin/webappService.sh

查看logs目录下的日志文件,看是否启动成功
浏览器访问 http://127.0.0.1:9000

4、告警

SkyWalking性能监控系统 - 图16
编辑 alarm-settings.yml 设置告警规则和通知
https://github.com/apache/skywalking/blob/v8.2.0/docs/en/setup/backend/backend-alarm.md
重点说下告警通知
SkyWalking性能监控系统 - 图17
SkyWalking性能监控系统 - 图18
为了使用钉钉机器人通知,接下来,新建一个项目

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  3. xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
  4. <modelVersion>4.0.0</modelVersion>
  5. <parent>
  6. <groupId>org.springframework.boot</groupId>
  7. <artifactId>spring-boot-starter-parent</artifactId>
  8. <version>2.4.0</version>
  9. <relativePath/> <!-- lookup parent from repository -->
  10. </parent>
  11. <groupId>com.wt.monitor</groupId>
  12. <artifactId>skywalking-alarm</artifactId>
  13. <version>1.0.0-SNAPSHOT</version>
  14. <name>skywalking-alarm</name>
  15. <properties>
  16. <java.version>1.8</java.version>
  17. </properties>
  18. <dependencies>
  19. <dependency>
  20. <groupId>org.springframework.boot</groupId>
  21. <artifactId>spring-boot-starter-web</artifactId>
  22. </dependency>
  23. <dependency>
  24. <groupId>com.aliyun</groupId>
  25. <artifactId>alibaba-dingtalk-service-sdk</artifactId>
  26. <version>1.0.1</version>
  27. </dependency>
  28. <dependency>
  29. <groupId>commons-codec</groupId>
  30. <artifactId>commons-codec</artifactId>
  31. <version>1.15</version>
  32. </dependency>
  33. <dependency>
  34. <groupId>com.alibaba</groupId>
  35. <artifactId>fastjson</artifactId>
  36. <version>1.2.75</version>
  37. </dependency>
  38. <dependency>
  39. <groupId>org.projectlombok</groupId>
  40. <artifactId>lombok</artifactId>
  41. <optional>true</optional>
  42. </dependency>
  43. </dependencies>
  44. <build>
  45. <plugins>
  46. <plugin>
  47. <groupId>org.springframework.boot</groupId>
  48. <artifactId>spring-boot-maven-plugin</artifactId>
  49. </plugin>
  50. </plugins>
  51. </build>
  52. </project>

可选依赖(不建议引入)

  1. <dependency>
  2. <groupId>org.apache.skywalking</groupId>
  3. <artifactId>server-core</artifactId>
  4. <version>8.2.0</version>
  5. </dependency>

定义告警消息实体类

  1. package com.wt.monitor.skywalking.alarm.domain;
  2. import lombok.Data;
  3. import java.io.Serializable;
  4. @Data
  5. public class AlarmMessageDTO implements Serializable {
  6. private int scopeId;
  7. private String scope;
  8. /**
  9. * Target scope entity name
  10. */
  11. private String name;
  12. private String id0;
  13. private String id1;
  14. private String ruleName;
  15. /**
  16. * Alarm text message
  17. */
  18. private String alarmMessage;
  19. /**
  20. * Alarm time measured in milliseconds
  21. */
  22. private long startTime;
  23. }

发送钉钉机器人消息

  1. package com.wt.monitor.skywalking.alarm.service;
  2. import com.dingtalk.api.DefaultDingTalkClient;
  3. import com.dingtalk.api.DingTalkClient;
  4. import com.dingtalk.api.request.OapiRobotSendRequest;
  5. import com.taobao.api.ApiException;
  6. import lombok.extern.slf4j.Slf4j;
  7. import org.apache.commons.codec.binary.Base64;
  8. import org.springframework.beans.factory.annotation.Value;
  9. import org.springframework.stereotype.Service;
  10. import javax.crypto.Mac;
  11. import javax.crypto.spec.SecretKeySpec;
  12. import java.io.UnsupportedEncodingException;
  13. import java.net.URLEncoder;
  14. import java.security.InvalidKeyException;
  15. import java.security.NoSuchAlgorithmException;
  16. @Slf4j
  17. @Service
  18. public class DingTalkAlarmService {
  19. @Value("${dingtalk.webhook}")
  20. private String webhook;
  21. @Value("${dingtalk.secret}")
  22. private String secret;
  23. public void sendMessage(String content) {
  24. try {
  25. Long timestamp = System.currentTimeMillis();
  26. String stringToSign = timestamp + "\n" + secret;
  27. Mac mac = Mac.getInstance("HmacSHA256");
  28. mac.init(new SecretKeySpec(secret.getBytes("UTF-8"), "HmacSHA256"));
  29. byte[] signData = mac.doFinal(stringToSign.getBytes("UTF-8"));
  30. String sign = URLEncoder.encode(new String(Base64.encodeBase64(signData)),"UTF-8");
  31. String serverUrl = webhook + "&timestamp=" + timestamp + "&sign=" + sign;
  32. DingTalkClient client = new DefaultDingTalkClient(serverUrl);
  33. OapiRobotSendRequest request = new OapiRobotSendRequest();
  34. request.setMsgtype("text");
  35. OapiRobotSendRequest.Text text = new OapiRobotSendRequest.Text();
  36. text.setContent(content);
  37. request.setText(text);
  38. client.execute(request);
  39. } catch (ApiException e) {
  40. e.printStackTrace();
  41. log.error(e.getMessage(), e);
  42. } catch (NoSuchAlgorithmException e) {
  43. e.printStackTrace();
  44. log.error(e.getMessage(), e);
  45. } catch (UnsupportedEncodingException e) {
  46. e.printStackTrace();
  47. log.error(e.getMessage(), e);
  48. } catch (InvalidKeyException e) {
  49. e.printStackTrace();
  50. log.error(e.getMessage(), e);
  51. }
  52. }
  53. }

AlarmController.java

  1. package com.wt.monitor.skywalking.alarm.controller;
  2. import com.alibaba.fastjson.JSON;
  3. import com.wt.monitor.skywalking.alarm.domain.AlarmMessageDTO;
  4. import com.wt.monitor.skywalking.alarm.service.DingTalkAlarmService;
  5. import lombok.extern.slf4j.Slf4j;
  6. import org.springframework.beans.factory.annotation.Autowired;
  7. import org.springframework.web.bind.annotation.PostMapping;
  8. import org.springframework.web.bind.annotation.RequestBody;
  9. import org.springframework.web.bind.annotation.RequestMapping;
  10. import org.springframework.web.bind.annotation.RestController;
  11. import java.text.MessageFormat;
  12. import java.util.List;
  13. @Slf4j
  14. @RestController
  15. @RequestMapping("/skywalking")
  16. public class AlarmController {
  17. @Autowired
  18. private DingTalkAlarmService dingTalkAlarmService;
  19. @PostMapping("/alarm")
  20. public void alarm(@RequestBody List<AlarmMessageDTO> alarmMessageDTOList) {
  21. log.info("收到告警信息: {}", JSON.toJSONString(alarmMessageDTOList));
  22. if (null != alarmMessageDTOList) {
  23. alarmMessageDTOList.forEach(e->dingTalkAlarmService.sendMessage(MessageFormat.format("-----来自SkyWalking的告警-----\n【名称】: {0}\n【消息】: {1}\n", e.getName(), e.getAlarmMessage())));
  24. }
  25. }
  26. }

SkyWalking性能监控系统 - 图19