前置条件
当前 datakit 版本 1.4.0
当前案例使用 ddtrace 版本0.78.3
和 0.97.0
(最新版本)进行测试
开启 ddtrace 采集
# {"version": "1.4.1", "desc": "do NOT edit this line"}
[[inputs.ddtrace]]
## DDTrace Agent endpoints register by version respectively.
## Endpoints can be skipped listen by remove them from the list.
## Default value set as below. DO NOT MODIFY THESE ENDPOINTS if not necessary.
endpoints = ["/v0.3/traces", "/v0.4/traces", "/v0.5/traces"]
## customer_tags is a list of keys contains keys set by client code like span.SetTag(key, value)
## that want to send to data center. Those keys set by client code will take precedence over
## keys in [inputs.ddtrace.tags]. DOT(.) IN KEY WILL BE REPLACED BY DASH(_) WHEN SENDING.
# customer_tags = ["key1", "key2", ...]
## Keep rare tracing resources list switch.
## If some resources are rare enough(not presend in 1 hour), those resource will always send
## to data center and do not consider samplers and filters.
# keep_rare_resource = false
## By default every error presents in span will be send to data center and omit any filters or
## sampler. If you want to get rid of some error status, you can set the error status list here.
# omit_err_status = ["404"]
## Ignore tracing resources map like service:[resources...].
## The service name is the full service name in current application.
## The resource list is regular expressions uses to block resource names.
## If you want to block some resources universally under all services, you can set the
## service name as "*". Note: double quotes "" cannot be omitted.
# [inputs.ddtrace.close_resource]
# service1 = ["resource1", "resource2", ...]
# service2 = ["resource1", "resource2", ...]
# "*" = ["close_resource_under_all_services"]
# ...
## Sampler config uses to set global sampling strategy.
## priority uses to set tracing data propagation level, the valid values are -1, 0, 1
## -1: always reject any tracing data send to datakit
## 0: accept tracing data and calculate with sampling_rate
## 1: always send to data center and do not consider sampling_rate
## sampling_rate used to set global sampling rate
#[inputs.ddtrace.sampler]
# priority = -1
# sampling_rate = 0
## Piplines use to manipulate message and meta data. If this item configured right then
## the current input procedure will run the scripts wrote in pipline config file against the data
## present in span message.
## The string on the left side of the equal sign must be identical to the service name that
## you try to handle.
# [inputs.ddtrace.pipelines]
# service1 = "service1.p"
# service2 = "service2.p"
# ...
# [inputs.ddtrace.tags]
# key1 = "value1"
# key2 = "value2"
# ...
准备 Shell
java -javaagent:D:/ddtrace/dd-java-agent-0.97.0.jar \
-Ddd.service=ddtrace-server \
-Ddd.agent.port=9529 \
-jar springboot-ddtrace-server.jar
开启 Query 参数
开启 query 参数,可以更直观方便的让用户看到当前请求携带了哪些参数,更直观的还原客户真实操作流程,但 query 开启参数只能采集到 url 上的参数,request Body 里面的参数目前尚不支持。默认为 false,表示为默认不开启。
-Ddd.http.server.tag.query-string=TRUE
配置远程采集链接
dd.agent.host
默认值是localhost
,所以默认推送的是本地的 DataKit ,如果想推送远程 DataKit ,则需要配置 dd.agent.host
。
-Ddd.agent.host=192.168.91.11
类或方法注入 Trace
ddtrace 支持给方法注入 Trace ,默认情况下,ddtrace 会对所有的 API 接口动态注入 Trace,如果想对非 API 类(方法)——一些重要的类和方法需要重点标记,可以通过 dd.trace.methods
参数配置。
Environment Variable: DD_TRACE_METHODS
Default: null
Example: package.ClassName[method1,method2,…];AnonymousClass$1[call];package.ClassName[]
List of class/interface and methods to trace. Similar to adding @Trace, but without changing code. Note: The wildcard method support ([]) does not accommodate constructors, getters, setters, synthetic, toString, equals, hashcode, or finalizer method calls
如对com.zy.observable.ddtrace.service.TestService
类的getDemo
方法需要添加 Trace。
-Ddd.trace.methods=”com.zy.observable.ddtrace.service.TestService[getDemo]”
部分代码所示:
@Autowired
private TestService testService;
@GetMapping("/gateway")
@ResponseBody
public String gateway(String tag) {
String userId = "user-" + System.currentTimeMillis();
MDC.put(ConstantsUtils.MDC_USER_ID, userId);
logger.info("this is tag");
sleep();
testService.getDemo();
httpTemplate.getForEntity(apiUrl + "/resource", String.class).getBody();
httpTemplate.getForEntity(apiUrl + "/auth", String.class).getBody();
if (client) {
httpTemplate.getForEntity("http://"+extraHost+":8081/client", String.class).getBody();
}
return httpTemplate.getForEntity(apiUrl + "/billing?tag=" + tag, String.class).getBody();
}
未添加dd.trace.methods
参数,上报 11 个 span ,效果如下:
添加dd.trace.methods
参数,上报 12 个 span ,效果如下:
忽略 Trace(实测无效)
通过配置dd.trace.classes.exclude
可以忽略我们不愿意上报的 trace 数据,在实际生产环境中,比如注册中心的心跳。
Environment Variable: DD_TRACE_CLASSES_EXCLUDE
Default: null
Example: package.ClassName,package.ClassName$Nested,package.Foo,package.other.
A list of fully qualified classes (that may end with a wildcard to denote a prefix) which will be ignored (not modified) by the tracer. Must use the jvm internal representation for names (eg package.ClassName$Nested and not package.ClassName.Nested)
忽略 IndexController
产生的 trace,配置入下:
-Ddd.trace.classes.exclude=”com.zy.observable.ddtrace.controller.IndexController”
然而并没有达到既定的效果。
两种添加 Tag 方式
ddtrace 提供两种添加 tag 方式,效果一样。但还是推荐使用 dd.tags 方式
dd.trace.span.tags
将 projectName:observable-demo
添加到每个 span 的示例:
-Ddd.trace.span.tags=projectName:observable-demo
dd.tags
-Ddd.tags=user_name:joy
两种方式都能生成 tag,效果一样,都会在meta
里面展示数据。
如果确实想要把 dd.tags
标记的 tag 作为观测云的 标签 ,则需要在 ddtrace.conf 配置 customer_tags
[[inputs.ddtrace]]
endpoints = ["/v0.3/traces", "/v0.4/traces", "/v0.5/traces"]
customer_tags = ["projectName","user_name"]
效果如图
注意:如果自定义tag 包含了 tag 关键字,则会当成标签显示。
显示数据库实例名称
显示数据库的名称,默认显示数据库的类型,如需要显示数据库名称,将值设置成TRUE
-Ddd.trace.db.client.split-by-instance=TRUE
以上 demo 并没有加载数据库,所以想要达到这个效果,可以选择一个引入数据库的应用添加参数 dd.trace.db.client.split-by-instance=TRUE
效果图:
采样
ddtrace.conf 采样部分配置
## Sampler config uses to set global sampling strategy.
## priority uses to set tracing data propagation level, the valid values are -1, 0, 1
## -1: always reject any tracing data send to datakit
## 0: accept tracing data and calculate with sampling_rate
## 1: always send to data center and do not consider sampling_rate
## sampling_rate used to set global sampling rate
[inputs.ddtrace.sampler]
priority = 0
sampling_rate = 0.1
测试脚本
for ((i=1;i<=100;i++));
do
curl http://localhost:8080/counter
done
priority = 0
sampling_rate
范围 (0,1),不在范围内,trace丢弃。如果需要全采,则不需要配置[inputs.ddtrace.sampler]
应用不配置dd.trace.sample.rate
sampling_rate = 0.1
请求序号 | 次数 | 观测云traceId数 |
---|---|---|
1 | 100 | 7 |
2 | 100 | 12 |
3 | 1000 | 101 |
-Ddd.trace.sample.rate=0.10
& sampling_rate = 0.1
请求序号 | 次数 | 观测云traceId数 |
---|---|---|
1 | 1000 | 103 |
2 | 1000 | 104 |
3 | 2000 | 204 |
4 | 2000 | 203 |
-Ddd.trace.sample.rate=0.10
对结果不影响,可以删除。
sampling_rate = 0.5
请求序号 | 次数 | 观测云traceId数 |
---|---|---|
1 | 1000 | 505 |
2 | 1000 | 505 |
3 | 2000 | 998 |
4 | 2000 | 1006 |
datakit sampling_rate = 1
请求序号 | 次数 | 观测云traceId数 |
---|---|---|
1 | 500 | 0 |
2 | 500 | 0 |
priority =1
sampling_rate
参数失效,应用层面如果不配置概率采样,则datakit会过滤所有的trace
应用配置概率 -Ddd.trace.sample.rate=0.10
请求序号 | 次数 | 观测云traceId数 |
---|---|---|
1 | 100 | 100 |
2 | 500 | 500 |
应用配置概率 -Ddd.trace.sample.rate=1
请求序号 | 次数 | 观测云traceId数 |
---|---|---|
1 | 100 | 100 |
2 | 500 | 500 |
应用配置概率 -Ddd.trace.sample.rate=10
请求序号 | 次数 | 观测云traceId数 |
---|---|---|
1 | 100 | 100 |
2 | 500 | 500 |
应用配置概率 -Ddd.trace.sample.rate=0.5
请求序号 | 次数 | 观测云traceId数 |
---|---|---|
1 | 100 | 100 |
2 | 100 | 100 |
priority =-1
无论其他参数怎么配置,都不会上报trace信息。
采样结论
数据采样是基于datakit 配置ddtrace来进行采样的
- 当priority=0时,基于sampling_rate 概率进行采样,取值范围为(0,1)。
- 当priority=1时,则表示不进行概率采样,即上报所有的trace信息。
- 当priority=-1时,则拒绝上报trace信息,即全部trace信息过滤掉。
同时也证明了 ddtrace-javagent 采样参数无效。
过滤 Resource
即过滤一些不需要的请求,减少trace的上报。比如过滤 nacos 数据中心相关的资源上报、eureka 数据中心相关的资源上报。
主要是通过datakit 的 inputs.ddtrace.close_resource
标签配置进行过滤
## Ignore tracing resources map like service:[resources...].
## The service name is the full service name in current application.
## The resource list is regular expressions uses to block resource names.
## If you want to block some resources universally under all services, you can set the
## service name as "*". Note: double quotes "" cannot be omitted.
# [inputs.ddtrace.close_resource]
# service1 = ["resource1", "resource2", ...]
# service2 = ["resource1", "resource2", ...]
# "*" = ["close_resource_under_all_services"]
# ...
比如过滤 resource 为 counter
的请求。
[inputs.ddtrace.close_resource]
ddtrace-server = ["counter"]
如果要过滤所有的 service counter
请求,可以用通配符"*"
代替 servicename,如:
[inputs.ddtrace.close_resource]
"*" = ["counter"]
如果要过滤的 resource 存在多层级span链路,不会对当前链路上报有影响,只会影响被过滤的resource不能上报上来。如,过滤 gateway
[inputs.ddtrace.close_resource]
"*" = ["gateway"]
通过请求curl http://localhost:8080/gateway
,在观测云上查看链路情况,发现 resource为 gateway
的请求已经过滤掉了。
入侵式埋点
与前面件Tag有差异,前面是通过 javaagent方式配置埋点,相对来说方便管理和动态埋点,不会因为想要对特殊的请求做一些埋点而调整代码。但同时配置会比较繁琐,入侵式埋点能够很好的跟业务结合,对业务进行埋点,则需要用 dd-trace-api
的方式。
以java为例
<dependency>
<groupId>com.datadoghq</groupId>
<artifactId>dd-trace-api</artifactId>
<version>0.102.0</version>
</dependency>
在对应需要埋点的方法添加注解 @Trace
@Trace
public String apiTrace(){
return "apiTrace";
}
然后在 gateway
方法调用这个
...
testService.apiTrace();
...
重启,访问 gateway
注意:入侵式埋点不代表应用启动的时候不需要 agent ,如果没有agent, @Trace
也将失效。@Trace
注释具有默认操作名称 trace.annotation
,而跟踪的方法默认具有资源。
可以修改对应的名称
@Trace(resourceName = "apiTrace",operationName = "apiTrace")
public String apiTrace(){
return "apiTrace";
}
修改后,效果如下:
Log
trace与 log 关联,都是通过 MDC 方式进行埋点。以logback-spring.xml为例
logback-spring.xml
<?xml version="1.0" encoding="UTF-8"?>
<configuration scan="true" scanPeriod="30 seconds">
<!-- 部分参数需要来源于properties文件 -->
<springProperty scope="context" name="logName" source="spring.application.name" defaultValue="localhost.log"/>
<!-- 配置后可以动态修改日志级别-->
<jmxConfigurator />
<property name="log.pattern" value="%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{20} - [%method,%line] %X{dd.service} %X{dd.trace_id} %X{dd.span_id} - %msg%n" />
<!-- %m输出的信息,%p日志级别,%t线程名,%d日期,%c类的全名,,,, -->
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<pattern>${log.pattern}</pattern>
<charset>UTF-8</charset>
</encoder>
</appender>
<appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>logs/${logName}/${logName}.log</file> <!-- 使用方法 -->
<append>true</append>
<rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
<fileNamePattern>logs/${logName}/${logName}-%d{yyyy-MM-dd}.log.%i</fileNamePattern>
<maxFileSize>64MB</maxFileSize>
<maxHistory>30</maxHistory>
<totalSizeCap>1GB</totalSizeCap>
</rollingPolicy>
<encoder>
<pattern>${log.pattern}</pattern>
<charset>UTF-8</charset>
</encoder>
</appender>
<!-- 只打印error级别的内容 -->
<logger name="com.netflix" level="ERROR" />
<logger name="net.sf.json" level="ERROR" />
<logger name="org.springframework" level="ERROR" />
<logger name="springfox" level="ERROR" />
<!-- sql 打印 配置-->
<logger name="com.github.pagehelper.mapper" level="DEBUG" />
<logger name="org.apache.ibatis" level="DEBUG" />
<root level="info">
<appender-ref ref="STDOUT" />
<appender-ref ref="FILE" />
</root>
</configuration>
主要是通过 pattern 配置日志格式,%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{20} - [%method,%line] %X{dd.service} %X{dd.trace_id} %X{dd.span_id} - %msg%n
2022-06-10 17:07:45.257 [main] INFO o.a.c.c.StandardEngine - [log,173] ddtrace-server - Starting Servlet engine: [Apache Tomcat/9.0.56]
2022-06-10 17:07:45.369 [main] INFO o.a.c.c.C.[.[.[/] - [log,173] ddtrace-server - Initializing Spring embedded WebApplicationContext
2022-06-10 17:07:45.758 [main] INFO o.a.c.h.Http11NioProtocol - [log,173] ddtrace-server - Starting ProtocolHandler ["http-nio-8080"]
2022-06-10 17:07:45.786 [main] INFO c.z.o.d.DdtraceApplication - [logStarted,61] ddtrace-server - Started DdtraceApplication in 2.268 seconds (JVM running for 5.472)
2022-06-10 17:09:01.493 [http-nio-8080-exec-1] INFO o.a.c.c.C.[.[.[/] - [log,173] ddtrace-server 5983174698688502665 5075189911231446778 - Initializing Spring DispatcherServlet 'dispatcherServlet'
2022-06-10 17:09:01.550 [http-nio-8080-exec-1] INFO c.z.o.d.c.IndexController - [gateway,48] ddtrace-server 5983174698688502665 7355870844984555943 - this is tag
2022-06-10 17:09:01.625 [http-nio-8080-exec-3] INFO c.z.o.d.c.IndexController - [auth,69] ddtrace-server 5983174698688502665 7209299453959523135 - this is auth
2022-06-10 17:09:01.631 [http-nio-8080-exec-4] INFO c.z.o.d.c.IndexController - [billing,77] ddtrace-server 5983174698688502665 9179949003735674110 - this is method3,null
datakit 采集日志
上述日志输出到文本后,datakit 可以从文本文件里面读取日志信息并上报到观测云。
开启日志采集器
# {"version": "1.2.18", "desc": "do NOT edit this line"}
[[inputs.logging]]
## required
logfiles = [
"D:/code_zy/observable-demo/logs/ddtrace-server/*.log",
]
# only two protocols are supported:TCP and UDP
# sockets = [
# "tcp://0.0.0.0:9530",
# "udp://0.0.0.0:9531",
# ]
## glob filteer
ignore = [""]
## your logging source, if it's empty, use 'default'
source = "ddtrace-server"
## add service tag, if it's empty, use $source.
service = "ddtrace-server"
## grok pipeline script name
pipeline = "log-ddtrace.p"
## optional status:
## "emerg","alert","critical","error","warning","info","debug","OK"
ignore_status = []
## optional encodings:
## "utf-8", "utf-16le", "utf-16le", "gbk", "gb18030" or ""
character_encoding = ""
## datakit read text from Files or Socket , default max_textline is 32k
## If your log text line exceeds 32Kb, please configure the length of your text,
## but the maximum length cannot exceed 32Mb
# maximum_length = 32766
## The pattern should be a regexp. Note the use of '''this regexp'''
## regexp link: https://golang.org/pkg/regexp/syntax/#hdr-Syntax
# multiline_match = '''^\S'''
## removes ANSI escape codes from text strings
remove_ansi_escape_codes = false
## if file is inactive, it is ignored
## time units are "ms", "s", "m", "h"
ignore_dead_log = "10m"
[inputs.logging.tags]
# some_tag = "some_value"
# more_tag = "some_other_value"
配置 pipeline
目的是为了将日志进行切割,将一些关键的字段作为tag,用于过滤、筛选和数据分析。
#日志样式
#2022-06-10 17:09:01.625 [http-nio-8080-exec-3] INFO c.z.o.d.c.IndexController - [auth,69] ddtrace-server 5983174698688502665 7209299453959523135 - this is auth
grok(_, "%{TIMESTAMP_ISO8601:time} %{NOTSPACE:thread_name} %{LOGLEVEL:status}%{SPACE}%{NOTSPACE:class_name} - \\[%{NOTSPACE:method_name},%{NUMBER:line}\\] %{DATA:service_name} %{DATA:trace_id} %{DATA:span_id} - %{GREEDYDATA:msg}")
default_time(time,"Asia/Shanghai")
切割后的日志,已经产生了很多tag
观测云也支持其他的日志方式采集,比如socket,更多日志采集参考:https://www.yuque.com/dataflux/bp/logging
当我们从日志里面把 traceId 和 spanId 切出来后,观测云上可以直接从日志关联到对应的链路信息,实现了日志链路的互通行为。
demo 源码地址 https://github.com/lrwh/observable-demo/tree/main/springboot-ddtrace-server