ddtrace java 文档

前置条件

当前 datakit 版本 1.4.0

当前案例使用 ddtrace 版本0.78.30.97.0(最新版本)进行测试

开启 ddtrace 采集

  1. # {"version": "1.4.1", "desc": "do NOT edit this line"}
  2. [[inputs.ddtrace]]
  3. ## DDTrace Agent endpoints register by version respectively.
  4. ## Endpoints can be skipped listen by remove them from the list.
  5. ## Default value set as below. DO NOT MODIFY THESE ENDPOINTS if not necessary.
  6. endpoints = ["/v0.3/traces", "/v0.4/traces", "/v0.5/traces"]
  7. ## customer_tags is a list of keys contains keys set by client code like span.SetTag(key, value)
  8. ## that want to send to data center. Those keys set by client code will take precedence over
  9. ## keys in [inputs.ddtrace.tags]. DOT(.) IN KEY WILL BE REPLACED BY DASH(_) WHEN SENDING.
  10. # customer_tags = ["key1", "key2", ...]
  11. ## Keep rare tracing resources list switch.
  12. ## If some resources are rare enough(not presend in 1 hour), those resource will always send
  13. ## to data center and do not consider samplers and filters.
  14. # keep_rare_resource = false
  15. ## By default every error presents in span will be send to data center and omit any filters or
  16. ## sampler. If you want to get rid of some error status, you can set the error status list here.
  17. # omit_err_status = ["404"]
  18. ## Ignore tracing resources map like service:[resources...].
  19. ## The service name is the full service name in current application.
  20. ## The resource list is regular expressions uses to block resource names.
  21. ## If you want to block some resources universally under all services, you can set the
  22. ## service name as "*". Note: double quotes "" cannot be omitted.
  23. # [inputs.ddtrace.close_resource]
  24. # service1 = ["resource1", "resource2", ...]
  25. # service2 = ["resource1", "resource2", ...]
  26. # "*" = ["close_resource_under_all_services"]
  27. # ...
  28. ## Sampler config uses to set global sampling strategy.
  29. ## priority uses to set tracing data propagation level, the valid values are -1, 0, 1
  30. ## -1: always reject any tracing data send to datakit
  31. ## 0: accept tracing data and calculate with sampling_rate
  32. ## 1: always send to data center and do not consider sampling_rate
  33. ## sampling_rate used to set global sampling rate
  34. #[inputs.ddtrace.sampler]
  35. # priority = -1
  36. # sampling_rate = 0
  37. ## Piplines use to manipulate message and meta data. If this item configured right then
  38. ## the current input procedure will run the scripts wrote in pipline config file against the data
  39. ## present in span message.
  40. ## The string on the left side of the equal sign must be identical to the service name that
  41. ## you try to handle.
  42. # [inputs.ddtrace.pipelines]
  43. # service1 = "service1.p"
  44. # service2 = "service2.p"
  45. # ...
  46. # [inputs.ddtrace.tags]
  47. # key1 = "value1"
  48. # key2 = "value2"
  49. # ...

准备 Shell

  1. java -javaagent:D:/ddtrace/dd-java-agent-0.97.0.jar \
  2. -Ddd.service=ddtrace-server \
  3. -Ddd.agent.port=9529 \
  4. -jar springboot-ddtrace-server.jar

开启 Query 参数

开启 query 参数,可以更直观方便的让用户看到当前请求携带了哪些参数,更直观的还原客户真实操作流程,但 query 开启参数只能采集到 url 上的参数,request Body 里面的参数目前尚不支持。默认为 false,表示为默认不开启。

-Ddd.http.server.tag.query-string=TRUE

2022-03-16-16-07-17-image.png

配置远程采集链接

dd.agent.host 默认值是localhost,所以默认推送的是本地的 DataKit ,如果想推送远程 DataKit ,则需要配置 dd.agent.host

-Ddd.agent.host=192.168.91.11

类或方法注入 Trace

ddtrace 支持给方法注入 Trace ,默认情况下,ddtrace 会对所有的 API 接口动态注入 Trace,如果想对非 API 类(方法)——一些重要的类和方法需要重点标记,可以通过 dd.trace.methods参数配置。

Environment Variable: DD_TRACE_METHODS
Default: null
Example: package.ClassName[method1,method2,…];AnonymousClass$1[call];package.ClassName[]
List of class/interface and methods to trace. Similar to adding @Trace, but without changing code. Note: The wildcard method support ([
]) does not accommodate constructors, getters, setters, synthetic, toString, equals, hashcode, or finalizer method calls

如对com.zy.observable.ddtrace.service.TestService 类的getDemo方法需要添加 Trace。

-Ddd.trace.methods=”com.zy.observable.ddtrace.service.TestService[getDemo]”

部分代码所示:

  1. @Autowired
  2. private TestService testService;
  3. @GetMapping("/gateway")
  4. @ResponseBody
  5. public String gateway(String tag) {
  6. String userId = "user-" + System.currentTimeMillis();
  7. MDC.put(ConstantsUtils.MDC_USER_ID, userId);
  8. logger.info("this is tag");
  9. sleep();
  10. testService.getDemo();
  11. httpTemplate.getForEntity(apiUrl + "/resource", String.class).getBody();
  12. httpTemplate.getForEntity(apiUrl + "/auth", String.class).getBody();
  13. if (client) {
  14. httpTemplate.getForEntity("http://"+extraHost+":8081/client", String.class).getBody();
  15. }
  16. return httpTemplate.getForEntity(apiUrl + "/billing?tag=" + tag, String.class).getBody();
  17. }

未添加dd.trace.methods参数,上报 11 个 span ,效果如下:
image.png
添加dd.trace.methods参数,上报 12 个 span ,效果如下:
image.png

忽略 Trace(实测无效)

通过配置dd.trace.classes.exclude可以忽略我们不愿意上报的 trace 数据,在实际生产环境中,比如注册中心的心跳。

Environment Variable: DD_TRACE_CLASSES_EXCLUDE
Default: null
Example: package.ClassName,package.ClassName$Nested,package.Foo,package.other.
A list of fully qualified classes (that may end with a wildcard to denote a prefix) which will be ignored (not modified) by the tracer. Must use the jvm internal representation for names (eg package.ClassName$Nested and not package.ClassName.Nested)

忽略 IndexController产生的 trace,配置入下:

-Ddd.trace.classes.exclude=”com.zy.observable.ddtrace.controller.IndexController”

然而并没有达到既定的效果。

两种添加 Tag 方式

ddtrace 提供两种添加 tag 方式,效果一样。但还是推荐使用 dd.tags 方式

dd.trace.span.tags

projectName:observable-demo 添加到每个 span 的示例:

-Ddd.trace.span.tags=projectName:observable-demo

image.png

dd.tags

-Ddd.tags=user_name:joy

image.png
两种方式都能生成 tag,效果一样,都会在meta里面展示数据。
如果确实想要把 dd.tags标记的 tag 作为观测云的 标签 ,则需要在 ddtrace.conf 配置 customer_tags

  1. [[inputs.ddtrace]]
  2. endpoints = ["/v0.3/traces", "/v0.4/traces", "/v0.5/traces"]
  3. customer_tags = ["projectName","user_name"]

效果如图
image.png

注意:如果自定义tag 包含了 tag 关键字,则会当成标签显示。

显示数据库实例名称

显示数据库的名称,默认显示数据库的类型,如需要显示数据库名称,将值设置成TRUE

-Ddd.trace.db.client.split-by-instance=TRUE

以上 demo 并没有加载数据库,所以想要达到这个效果,可以选择一个引入数据库的应用添加参数 dd.trace.db.client.split-by-instance=TRUE
效果图:
2022-03-16-15-20-01-image.png

采样

ddtrace.conf 采样部分配置

  1. ## Sampler config uses to set global sampling strategy.
  2. ## priority uses to set tracing data propagation level, the valid values are -1, 0, 1
  3. ## -1: always reject any tracing data send to datakit
  4. ## 0: accept tracing data and calculate with sampling_rate
  5. ## 1: always send to data center and do not consider sampling_rate
  6. ## sampling_rate used to set global sampling rate
  7. [inputs.ddtrace.sampler]
  8. priority = 0
  9. sampling_rate = 0.1

测试脚本

  1. for ((i=1;i<=100;i++));
  2. do
  3. curl http://localhost:8080/counter
  4. done

priority = 0

sampling_rate范围 (0,1),不在范围内,trace丢弃。如果需要全采,则不需要配置[inputs.ddtrace.sampler]

应用不配置dd.trace.sample.rate

sampling_rate = 0.1

请求序号 次数 观测云traceId数
1 100 7
2 100 12
3 1000 101

-Ddd.trace.sample.rate=0.10 & sampling_rate = 0.1

请求序号 次数 观测云traceId数
1 1000 103
2 1000 104
3 2000 204
4 2000 203

-Ddd.trace.sample.rate=0.10 对结果不影响,可以删除。

sampling_rate = 0.5

请求序号 次数 观测云traceId数
1 1000 505
2 1000 505
3 2000 998
4 2000 1006

datakit sampling_rate = 1

请求序号 次数 观测云traceId数
1 500 0
2 500 0

priority =1

sampling_rate参数失效,应用层面如果不配置概率采样,则datakit会过滤所有的trace

应用配置概率 -Ddd.trace.sample.rate=0.10

请求序号 次数 观测云traceId数
1 100 100
2 500 500

应用配置概率 -Ddd.trace.sample.rate=1

请求序号 次数 观测云traceId数
1 100 100
2 500 500

应用配置概率 -Ddd.trace.sample.rate=10

请求序号 次数 观测云traceId数
1 100 100
2 500 500

应用配置概率 -Ddd.trace.sample.rate=0.5

请求序号 次数 观测云traceId数
1 100 100
2 100 100

priority =-1

无论其他参数怎么配置,都不会上报trace信息。

采样结论

image.png
数据采样是基于datakit 配置ddtrace来进行采样的

  • 当priority=0时,基于sampling_rate 概率进行采样,取值范围为(0,1)。
  • 当priority=1时,则表示不进行概率采样,即上报所有的trace信息。
  • 当priority=-1时,则拒绝上报trace信息,即全部trace信息过滤掉。

同时也证明了 ddtrace-javagent 采样参数无效。

过滤 Resource

即过滤一些不需要的请求,减少trace的上报。比如过滤 nacos 数据中心相关的资源上报、eureka 数据中心相关的资源上报。

主要是通过datakit 的 inputs.ddtrace.close_resource标签配置进行过滤

  1. ## Ignore tracing resources map like service:[resources...].
  2. ## The service name is the full service name in current application.
  3. ## The resource list is regular expressions uses to block resource names.
  4. ## If you want to block some resources universally under all services, you can set the
  5. ## service name as "*". Note: double quotes "" cannot be omitted.
  6. # [inputs.ddtrace.close_resource]
  7. # service1 = ["resource1", "resource2", ...]
  8. # service2 = ["resource1", "resource2", ...]
  9. # "*" = ["close_resource_under_all_services"]
  10. # ...

比如过滤 resource 为 counter的请求。

  1. [inputs.ddtrace.close_resource]
  2. ddtrace-server = ["counter"]

如果要过滤所有的 service counter请求,可以用通配符"*"代替 servicename,如:

  1. [inputs.ddtrace.close_resource]
  2. "*" = ["counter"]

如果要过滤的 resource 存在多层级span链路,不会对当前链路上报有影响,只会影响被过滤的resource不能上报上来。如,过滤 gateway

  1. [inputs.ddtrace.close_resource]
  2. "*" = ["gateway"]

通过请求curl http://localhost:8080/gateway,在观测云上查看链路情况,发现 resource为 gateway的请求已经过滤掉了。
image.png

入侵式埋点

与前面件Tag有差异,前面是通过 javaagent方式配置埋点,相对来说方便管理和动态埋点,不会因为想要对特殊的请求做一些埋点而调整代码。但同时配置会比较繁琐,入侵式埋点能够很好的跟业务结合,对业务进行埋点,则需要用 dd-trace-api的方式。

以java为例

  1. <dependency>
  2. <groupId>com.datadoghq</groupId>
  3. <artifactId>dd-trace-api</artifactId>
  4. <version>0.102.0</version>
  5. </dependency>

在对应需要埋点的方法添加注解 @Trace

  1. @Trace
  2. public String apiTrace(){
  3. return "apiTrace";
  4. }

然后在 gateway方法调用这个

  1. ...
  2. testService.apiTrace();
  3. ...

重启,访问 gateway

image.png

注意:入侵式埋点不代表应用启动的时候不需要 agent ,如果没有agent, @Trace 也将失效。@Trace 注释具有默认操作名称 trace.annotation,而跟踪的方法默认具有资源。

可以修改对应的名称

  1. @Trace(resourceName = "apiTrace",operationName = "apiTrace")
  2. public String apiTrace(){
  3. return "apiTrace";
  4. }

修改后,效果如下:
image.png

Log

trace与 log 关联,都是通过 MDC 方式进行埋点。以logback-spring.xml为例

logback-spring.xml

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <configuration scan="true" scanPeriod="30 seconds">
  3. <!-- 部分参数需要来源于properties文件 -->
  4. <springProperty scope="context" name="logName" source="spring.application.name" defaultValue="localhost.log"/>
  5. <!-- 配置后可以动态修改日志级别-->
  6. <jmxConfigurator />
  7. <property name="log.pattern" value="%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{20} - [%method,%line] %X{dd.service} %X{dd.trace_id} %X{dd.span_id} - %msg%n" />
  8. <!-- %m输出的信息,%p日志级别,%t线程名,%d日期,%c类的全名,,,, -->
  9. <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
  10. <encoder>
  11. <pattern>${log.pattern}</pattern>
  12. <charset>UTF-8</charset>
  13. </encoder>
  14. </appender>
  15. <appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
  16. <file>logs/${logName}/${logName}.log</file> <!-- 使用方法 -->
  17. <append>true</append>
  18. <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
  19. <fileNamePattern>logs/${logName}/${logName}-%d{yyyy-MM-dd}.log.%i</fileNamePattern>
  20. <maxFileSize>64MB</maxFileSize>
  21. <maxHistory>30</maxHistory>
  22. <totalSizeCap>1GB</totalSizeCap>
  23. </rollingPolicy>
  24. <encoder>
  25. <pattern>${log.pattern}</pattern>
  26. <charset>UTF-8</charset>
  27. </encoder>
  28. </appender>
  29. <!-- 只打印error级别的内容 -->
  30. <logger name="com.netflix" level="ERROR" />
  31. <logger name="net.sf.json" level="ERROR" />
  32. <logger name="org.springframework" level="ERROR" />
  33. <logger name="springfox" level="ERROR" />
  34. <!-- sql 打印 配置-->
  35. <logger name="com.github.pagehelper.mapper" level="DEBUG" />
  36. <logger name="org.apache.ibatis" level="DEBUG" />
  37. <root level="info">
  38. <appender-ref ref="STDOUT" />
  39. <appender-ref ref="FILE" />
  40. </root>
  41. </configuration>

主要是通过 pattern 配置日志格式,%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{20} - [%method,%line] %X{dd.service} %X{dd.trace_id} %X{dd.span_id} - %msg%n

  1. 2022-06-10 17:07:45.257 [main] INFO o.a.c.c.StandardEngine - [log,173] ddtrace-server - Starting Servlet engine: [Apache Tomcat/9.0.56]
  2. 2022-06-10 17:07:45.369 [main] INFO o.a.c.c.C.[.[.[/] - [log,173] ddtrace-server - Initializing Spring embedded WebApplicationContext
  3. 2022-06-10 17:07:45.758 [main] INFO o.a.c.h.Http11NioProtocol - [log,173] ddtrace-server - Starting ProtocolHandler ["http-nio-8080"]
  4. 2022-06-10 17:07:45.786 [main] INFO c.z.o.d.DdtraceApplication - [logStarted,61] ddtrace-server - Started DdtraceApplication in 2.268 seconds (JVM running for 5.472)
  5. 2022-06-10 17:09:01.493 [http-nio-8080-exec-1] INFO o.a.c.c.C.[.[.[/] - [log,173] ddtrace-server 5983174698688502665 5075189911231446778 - Initializing Spring DispatcherServlet 'dispatcherServlet'
  6. 2022-06-10 17:09:01.550 [http-nio-8080-exec-1] INFO c.z.o.d.c.IndexController - [gateway,48] ddtrace-server 5983174698688502665 7355870844984555943 - this is tag
  7. 2022-06-10 17:09:01.625 [http-nio-8080-exec-3] INFO c.z.o.d.c.IndexController - [auth,69] ddtrace-server 5983174698688502665 7209299453959523135 - this is auth
  8. 2022-06-10 17:09:01.631 [http-nio-8080-exec-4] INFO c.z.o.d.c.IndexController - [billing,77] ddtrace-server 5983174698688502665 9179949003735674110 - this is method3,null

datakit 采集日志

上述日志输出到文本后,datakit 可以从文本文件里面读取日志信息并上报到观测云。

开启日志采集器

  1. # {"version": "1.2.18", "desc": "do NOT edit this line"}
  2. [[inputs.logging]]
  3. ## required
  4. logfiles = [
  5. "D:/code_zy/observable-demo/logs/ddtrace-server/*.log",
  6. ]
  7. # only two protocols are supported:TCP and UDP
  8. # sockets = [
  9. # "tcp://0.0.0.0:9530",
  10. # "udp://0.0.0.0:9531",
  11. # ]
  12. ## glob filteer
  13. ignore = [""]
  14. ## your logging source, if it's empty, use 'default'
  15. source = "ddtrace-server"
  16. ## add service tag, if it's empty, use $source.
  17. service = "ddtrace-server"
  18. ## grok pipeline script name
  19. pipeline = "log-ddtrace.p"
  20. ## optional status:
  21. ## "emerg","alert","critical","error","warning","info","debug","OK"
  22. ignore_status = []
  23. ## optional encodings:
  24. ## "utf-8", "utf-16le", "utf-16le", "gbk", "gb18030" or ""
  25. character_encoding = ""
  26. ## datakit read text from Files or Socket , default max_textline is 32k
  27. ## If your log text line exceeds 32Kb, please configure the length of your text,
  28. ## but the maximum length cannot exceed 32Mb
  29. # maximum_length = 32766
  30. ## The pattern should be a regexp. Note the use of '''this regexp'''
  31. ## regexp link: https://golang.org/pkg/regexp/syntax/#hdr-Syntax
  32. # multiline_match = '''^\S'''
  33. ## removes ANSI escape codes from text strings
  34. remove_ansi_escape_codes = false
  35. ## if file is inactive, it is ignored
  36. ## time units are "ms", "s", "m", "h"
  37. ignore_dead_log = "10m"
  38. [inputs.logging.tags]
  39. # some_tag = "some_value"
  40. # more_tag = "some_other_value"

配置 pipeline

目的是为了将日志进行切割,将一些关键的字段作为tag,用于过滤、筛选和数据分析。

  1. #日志样式
  2. #2022-06-10 17:09:01.625 [http-nio-8080-exec-3] INFO c.z.o.d.c.IndexController - [auth,69] ddtrace-server 5983174698688502665 7209299453959523135 - this is auth
  3. grok(_, "%{TIMESTAMP_ISO8601:time} %{NOTSPACE:thread_name} %{LOGLEVEL:status}%{SPACE}%{NOTSPACE:class_name} - \\[%{NOTSPACE:method_name},%{NUMBER:line}\\] %{DATA:service_name} %{DATA:trace_id} %{DATA:span_id} - %{GREEDYDATA:msg}")
  4. default_time(time,"Asia/Shanghai")

切割后的日志,已经产生了很多tag
image.png
观测云也支持其他的日志方式采集,比如socket,更多日志采集参考:https://www.yuque.com/dataflux/bp/logging

当我们从日志里面把 traceId 和 spanId 切出来后,观测云上可以直接从日志关联到对应的链路信息,实现了日志链路的互通行为。
guance-log.gif

demo 源码地址 https://github.com/lrwh/observable-demo/tree/main/springboot-ddtrace-server