背景

智能巡检 v2 版本支持 APM 异常检测根因分析及受影响用户范围的智能巡检场景
基于上述场景观测云 UI 侧需要支持如下功能:

  • 创建智能巡检(开启APM 异常检测)
  • 查看智能巡检结果

    流程设计

    如图所示,APM异常根因分析的设计分为大致四个部分:收集报警信息、提取报警信息的关键特征、发现异常点、展示报警摘要。
    APM智能巡检设计方案 v2 - 图1

    需求定义

    APM 延时

    检测粒度:service + resource + project + env
    数据点颗粒度(频率):1 分钟 (预聚合周期,固定)
    检测数据范围:过去 24小时 (传参)
    巡检周期:1 小时 (传参)

    APM 错误率

    检测粒度:service + resource + project + env
    数据点颗粒度(频率):1 分钟
    检测数据范围:过去 24小时 (传参)
    巡检周期:1 小时(传参)

    APM P90

    检测粒度:service + resource + project + env
    数据点颗粒度(频率):1 分钟
    检测数据范围:过去 24小时 (传参)
    巡检周期:1 小时(传参)

    APM QPS

    检测粒度:service + resource + project + env
    数据点颗粒度(频率):1 分钟
    检测数据范围:过去 24小时 (传参)
    巡检周期:1 小时(传参)

    检测粒度

    指定 project, env, version, 所有的service,也可以选中指定的 Service

    数据获取

    DQL

    ``bash 请求数(线)--- 获取服务一分钟内的所有请求数量,加上时间范围查询即可得出服务的每分钟请求数变化趋势 T::re(.*`):(sum(r_request_count)) {source = ‘service_list_1m’} [::1m] by r_service

错误数(线)—- 获取服务一分钟内的所有错误请求数量,加上时间范围查询即可得出服务每分钟错误请求数变化趋势 T::re(.*):(sum(r_error_count)) {source = ‘service_list_1m’} [::1m] by r_service

错误率(线)—- 获取服务一分钟内的请求错误率,加上时间范围查询即可得出服务每分钟的错误率变化趋势 eval(A/B100, A=”T::re(`.):(sum(r_error_count)) {source = 'service_list_1m'} [::1m] by r_service", B="T::re(.*`):(sum(r_request_count)) {source = ‘service_list_1m’} [::1m] by r_service”)

平均响应时间(线)—- 获取服务一分钟内的平均响应时间,加上时间范围查询即可得出服务每分钟的平均响应时间变化趋势 eval(A/B,A=”T::re(.*):(sum(r_resp_time)) {source = ‘service_list_1m’} [::1m] by r_service”,B=”T::re(.*):(sum(r_request_count)) {source = ‘service_list_1m’} [::1m] by r_service”)

QPS(线)—- 获取服务一分钟内的所有请求数量,除以60 秒,加上时间范围查询即可得出服务的QPS(每秒请求速率)变化趋势 eval(A/60,A=”T::re(.*):(sum(r_request_count)) {source = ‘service_list_1m’} [::1m] by r_service”)

P90(点) 接口文档:https://confluence.jiagouyun.com/pages/viewpage.action?pageId=139763636

  1. <a name="X6iN0"></a>
  2. ### 数据格式
  3. 请求数:
  4. ```bash
  5. (200, {'tags': {'r_service': 'ruoyi-08-gateway'}, 'columns': ['time', 'sum(r_request_count)'], 'values': [[1657255260000, 0], [1657255320000, 0], [1657255380000, 0], [1657255440000, 0], [1657255500000, 0], [1657255560000, 0], [1657255620000, 0], [1657255680000, 0], [1657255740000, 0], [1657255800000, 0], [1657255860000, 0], [1657255920000, 0], [1657255980000, 0], [1657256040000, 0], [1657256100000, 0], [1657256160000, 0], [1657256220000, 0], [1657256280000, 0], [1657256340000, 0], [1657256400000, 0], [1657256460000, 0], [1657256520000, 0], [1657256580000, 0], [1657256640000, 0], [1657256700000, 0], [1657256760000, 0], [1657256820000, 0], [1657256880000, 0], [1657256940000, 0], [1657257000000, 0], [1657257060000, 0], [1657257120000, 12], [1657257180000, 0], [1657257240000, 12], [1657257300000, 0], [1657257360000, 0], [1657257420000, 0], [1657257480000, 0], [1657257540000, 0], [1657257600000, 0], [1657257660000, 0], [1657257720000, 0], [1657257780000, 0], [1657257840000, 0], [1657257900000, 0], [1657257960000, 0], [1657258020000, 0], [1657258080000, 0], [1657258140000, 0], [1657258200000, 0], [1657258260000, 0], [1657258320000, 0], [1657258380000, 0], [1657258440000, 0], [1657258500000, 0], [1657258560000, 0], [1657258620000, 0], [1657258680000, 0], [1657258740000, 0], [1657258800000, 0], [1657258860000, 0], [1657258920000, 0], [1657258980000, 0], [1657259040000, 0], [1657259100000, 0], [1657259160000, 0], [1657259220000, 0], [1657259280000, 0], [1657259340000, 0], [1657259400000, 0], [1657259460000, 0], [1657259520000, 0], [1657259580000, 0], [1657259640000, 0], [1657259700000, 0], [1657259760000, 0], [1657259820000, 0], [1657259880000, 0], [1657259940000, 0], [1657260000000, 0], [1657260060000, 0], [1657260120000, 0], [1657260180000, 0], [1657260240000, 0], [1657260300000, 0], [1657260360000, 0], [1657260420000, 0], [1657260480000, 0], [1657260540000, 0], [1657260600000, 0], [1657260660000, 0], [1657260720000, 0], [1657260780000, 0], [1657260840000, 0], [1657260900000, 0], [1657260960000, 0], [1657261020000, 0], [1657261080000, 0], [1657261140000, 0], [1657261200000, 0], [1657261260000, 0], [1657261320000, 0], [1657261380000, 0], [1657261440000, 0], [1657261500000, 0], [1657261560000, 0], [1657261620000, 0], [1657261680000, 0], [1657261740000, 0], [1657261800000, 0], [1657261860000, 0], [1657261920000, 0], [1657261980000, 0], [1657262040000, 0], [1657262100000, 0], [1657262160000, 0], [1657262220000, 0], [1657262280000, 0], [1657262340000, 0], [1657262400000, 0], [1657262460000, 0], [1657262520000, 0], [1657262580000, 0], [1657262640000, 0], [1657262700000, 0], [1657262760000, 0], [1657262820000, 0], [1657262880000, 0], [1657262940000, 0], [1657263000000, 0], [1657263060000, 0], [1657263120000, 0], [1657263180000, 0], [1657263240000, 0], [1657263300000, 0], [1657263360000, 0], [1657263420000, 0], [1657263480000, 0], [1657263540000, 0], [1657263600000, 0], [1657263660000, 0], [1657263720000, 0], [1657263780000, 0], [1657263840000, 0], [1657263900000, 0], [1657263960000, 0], [1657264020000, 0], [1657264080000, 0], [1657264140000, 0], [1657264200000, 0], [1657264260000, 0], [1657264320000, 0], [1657264380000, 0], [1657264440000, 0], [1657264500000, 0], [1657264560000, 0], [1657264620000, 0], [1657264680000, 0], [1657264740000, 0], [1657264800000, 0], [1657264860000, 0], [1657264920000, 0], [1657264980000, 0], [1657265040000, 0], [1657265100000, 0], [1657265160000, 0], [1657265220000, 0], [1657265280000, 0], [1657265340000, 0], [1657265400000, 0], [1657265460000, 0], [1657265520000, 0], [1657265580000, 0], [1657265640000, 0], [1657265700000, 0], [1657265760000, 0], [1657265820000, 0], [1657265880000, 0], [1657265940000, 0], [1657266000000, 0], [1657266060000, 0], [1657266120000, 0], [1657266180000, 0], [1657266240000, 0], [1657266300000, 0], [1657266360000, 0], [1657266420000, 0], [1657266480000, 0], [1657266540000, 0], [1657266600000, 0], [1657266660000, 0], [1657266720000, 0], [1657266780000, 0], [1657266840000, 0], [1657266900000, 0], [1657266960000, 0], [1657267020000, 0], [1657267080000, 0], [1657267140000, 0], [1657267200000, 0], [1657267260000, 0], [1657267320000, 0], [1657267380000, 0], [1657267440000, 0], [1657267500000, 0], [1657267560000, 0], [1657267620000, 0], [1657267680000, 0], [1657267740000, 0], [1657267800000, 0], [1657267860000, 0], [1657267920000, 0], [1657267980000, 0], [1657268040000, 0], [1657268100000, 0], [1657268160000, 0], [1657268220000, 0], [1657268280000, 0], [1657268340000, 0], [1657268400000, 0], [1657268460000, 0], [1657268520000, 0], [1657268580000, 0], [1657268640000, 0], [1657268700000, 0], [1657268760000, 0], [1657268820000, 0], [1657268880000, 0], [1657268940000, 0], [1657269000000, 0], [1657269060000, 0], [1657269120000, 0], [1657269180000, 0], [1657269240000, 0], [1657269300000, 0], [1657269360000, 0], [1657269420000, 0], [1657269480000, 0], [1657269540000, 0], [1657269600000, 0], [1657269660000, 0], [1657269720000, 0], [1657269780000, 0], [1657269840000, 0], [1657269900000, 0], [1657269960000, 0], [1657270020000, 0], [1657270080000, 0], [1657270140000, 0], [1657270200000, 0], [1657270260000, 0], [1657270320000, 0], [1657270380000, 0], [1657270440000, 0], [1657270500000, 0], [1657270560000, 0], [1657270620000, 0], [1657270680000, 0], [1657270740000, 0], [1657270800000, 0], [1657270860000, 0], [1657270920000, 0], [1657270980000, 0], [1657271040000, 0], [1657271100000, 0], [1657271160000, 0], [1657271220000, 0], [1657271280000, 0], [1657271340000, 0], [1657271400000, 0], [1657271460000, 0], [1657271520000, 0], [1657271580000, 0], [1657271640000, 0], [1657271700000, 0], [1657271760000, 0], [1657271820000, 0], [1657271880000, 0], [1657271940000, 0], [1657272000000, 0], [1657272060000, 0], [1657272120000, 0], [1657272180000, 0], [1657272240000, 0], [1657272300000, 0], [1657272360000, 0], [1657272420000, 0], [1657272480000, 0], [1657272540000, 0], [1657272600000, 0], [1657272660000, 0], [1657272720000, 0], [1657272780000, 0], [1657272840000, 0], [1657272900000, 0], [1657272960000, 0], [1657273020000, 0], [1657273080000, 0], [1657273140000, 0], [1657273200000, 0], [1657273260000, 0], [1657273320000, 0], [1657273380000, 0], [1657273440000, 0], [1657273500000, 0], [1657273560000, 0], [1657273620000, 0], [1657273680000, 0], [1657273740000, 0], [1657273800000, 0], [1657273860000, 0], [1657273920000, 0], [1657273980000, 0], [1657274040000, 0], [1657274100000, 0], [1657274160000, 0], [1657274220000, 0], [1657274280000, 0], [1657274340000, 0], [1657274400000, 0], [1657274460000, 0], [1657274520000, 0], [1657274580000, 0], [1657274640000, 0], [1657274700000, 0], [1657274760000, 0], [1657274820000, 0], [1657274880000, 0], [1657274940000, 0], [1657275000000, 0], [1657275060000, 0], [1657275120000, 0], [1657275180000, 0], [1657275240000, 0], [1657275300000, 0], [1657275360000, 0], [1657275420000, 0], [1657275480000, 0], [1657275540000, 0], [1657275600000, 0], [1657275660000, 0], [1657275720000, 0], [1657275780000, 0], [1657275840000, 0], [1657275900000, 0], [1657275960000, 0], [1657276020000, 0], [1657276080000, 0], [1657276140000, 0], [1657276200000, 0], [1657276260000, 0], [1657276320000, 0], [1657276380000, 0], [1657276440000, 0], [1657276500000, 0], [1657276560000, 0], [1657276620000, 0], [1657276680000, 0], [1657276740000, 0], [1657276800000, 0]]}]})

错误数:

(200, {'tags': {'r_service': 'ruoyi-08-gateway'}, 'columns': ['time', 'sum(r_request_count)'], 'values': [[1657255260000, 0], [1657255320000, 0], [1657255380000, 0], [1657255440000, 0], [1657255500000, 0], [1657255560000, 0], [1657255620000, 0], [1657255680000, 0], [1657255740000, 0], [1657255800000, 0], [1657255860000, 0], [1657255920000, 0], [1657255980000, 0], [1657256040000, 0], [1657256100000, 0], [1657256160000, 0], [1657256220000, 0], [1657256280000, 0], [1657256340000, 0], [1657256400000, 0], [1657256460000, 0], [1657256520000, 0], [1657256580000, 0], [1657256640000, 0], [1657256700000, 0], [1657256760000, 0], [1657256820000, 0], [1657256880000, 0], [1657256940000, 0], [1657257000000, 0], [1657257060000, 0], [1657257120000, 12], [1657257180000, 0], [1657257240000, 12], [1657257300000, 0], [1657257360000, 0], [1657257420000, 0], [1657257480000, 0], [1657257540000, 0], [1657257600000, 0], [1657257660000, 0], [1657257720000, 0], [1657257780000, 0], [1657257840000, 0], [1657257900000, 0], [1657257960000, 0], [1657258020000, 0], [1657258080000, 0], [1657258140000, 0], [1657258200000, 0], [1657258260000, 0], [1657258320000, 0], [1657258380000, 0], [1657258440000, 0], [1657258500000, 0], [1657258560000, 0], [1657258620000, 0], [1657258680000, 0], [1657258740000, 0], [1657258800000, 0], [1657258860000, 0], [1657258920000, 0], [1657258980000, 0], [1657259040000, 0], [1657259100000, 0], [1657259160000, 0], [1657259220000, 0], [1657259280000, 0], [1657259340000, 0], [1657259400000, 0], [1657259460000, 0], [1657259520000, 0], [1657259580000, 0], [1657259640000, 0], [1657259700000, 0], [1657259760000, 0], [1657259820000, 0], [1657259880000, 0], [1657259940000, 0], [1657260000000, 0], [1657260060000, 0], [1657260120000, 0], [1657260180000, 0], [1657260240000, 0], [1657260300000, 0], [1657260360000, 0], [1657260420000, 0], [1657260480000, 0], [1657260540000, 0], [1657260600000, 0], [1657260660000, 0], [1657260720000, 0], [1657260780000, 0], [1657260840000, 0], [1657260900000, 0], [1657260960000, 0], [1657261020000, 0], [1657261080000, 0], [1657261140000, 0], [1657261200000, 0], [1657261260000, 0], [1657261320000, 0], [1657261380000, 0], [1657261440000, 0], [1657261500000, 0], [1657261560000, 0], [1657261620000, 0], [1657261680000, 0], [1657261740000, 0], [1657261800000, 0], [1657261860000, 0], [1657261920000, 0], [1657261980000, 0], [1657262040000, 0], [1657262100000, 0], [1657262160000, 0], [1657262220000, 0], [1657262280000, 0], [1657262340000, 0], [1657262400000, 0], [1657262460000, 0], [1657262520000, 0], [1657262580000, 0], [1657262640000, 0], [1657262700000, 0], [1657262760000, 0], [1657262820000, 0], [1657262880000, 0], [1657262940000, 0], [1657263000000, 0], [1657263060000, 0], [1657263120000, 0], [1657263180000, 0], [1657263240000, 0], [1657263300000, 0], [1657263360000, 0], [1657263420000, 0], [1657263480000, 0], [1657263540000, 0], [1657263600000, 0], [1657263660000, 0], [1657263720000, 0], [1657263780000, 0], [1657263840000, 0], [1657263900000, 0], [1657263960000, 0], [1657264020000, 0], [1657264080000, 0], [1657264140000, 0], [1657264200000, 0], [1657264260000, 0], [1657264320000, 0], [1657264380000, 0], [1657264440000, 0], [1657264500000, 0], [1657264560000, 0], [1657264620000, 0], [1657264680000, 0], [1657264740000, 0], [1657264800000, 0], [1657264860000, 0], [1657264920000, 0], [1657264980000, 0], [1657265040000, 0], [1657265100000, 0], [1657265160000, 0], [1657265220000, 0], [1657265280000, 0], [1657265340000, 0], [1657265400000, 0], [1657265460000, 0], [1657265520000, 0], [1657265580000, 0], [1657265640000, 0], [1657265700000, 0], [1657265760000, 0], [1657265820000, 0], [1657265880000, 0], [1657265940000, 0], [1657266000000, 0], [1657266060000, 0], [1657266120000, 0], [1657266180000, 0], [1657266240000, 0], [1657266300000, 0], [1657266360000, 0], [1657266420000, 0], [1657266480000, 0], [1657266540000, 0], [1657266600000, 0], [1657266660000, 0], [1657266720000, 0], [1657266780000, 0], [1657266840000, 0], [1657266900000, 0], [1657266960000, 0], [1657267020000, 0], [1657267080000, 0], [1657267140000, 0], [1657267200000, 0], [1657267260000, 0], [1657267320000, 0], [1657267380000, 0], [1657267440000, 0], [1657267500000, 0], [1657267560000, 0], [1657267620000, 0], [1657267680000, 0], [1657267740000, 0], [1657267800000, 0], [1657267860000, 0], [1657267920000, 0], [1657267980000, 0], [1657268040000, 0], [1657268100000, 0], [1657268160000, 0], [1657268220000, 0], [1657268280000, 0], [1657268340000, 0], [1657268400000, 0], [1657268460000, 0], [1657268520000, 0], [1657268580000, 0], [1657268640000, 0], [1657268700000, 0], [1657268760000, 0], [1657268820000, 0], [1657268880000, 0], [1657268940000, 0], [1657269000000, 0], [1657269060000, 0], [1657269120000, 0], [1657269180000, 0], [1657269240000, 0], [1657269300000, 0], [1657269360000, 0], [1657269420000, 0], [1657269480000, 0], [1657269540000, 0], [1657269600000, 0], [1657269660000, 0], [1657269720000, 0], [1657269780000, 0], [1657269840000, 0], [1657269900000, 0], [1657269960000, 0], [1657270020000, 0], [1657270080000, 0], [1657270140000, 0], [1657270200000, 0], [1657270260000, 0], [1657270320000, 0], [1657270380000, 0], [1657270440000, 0], [1657270500000, 0], [1657270560000, 0], [1657270620000, 0], [1657270680000, 0], [1657270740000, 0], [1657270800000, 0], [1657270860000, 0], [1657270920000, 0], [1657270980000, 0], [1657271040000, 0], [1657271100000, 0], [1657271160000, 0], [1657271220000, 0], [1657271280000, 0], [1657271340000, 0], [1657271400000, 0], [1657271460000, 0], [1657271520000, 0], [1657271580000, 0], [1657271640000, 0], [1657271700000, 0], [1657271760000, 0], [1657271820000, 0], [1657271880000, 0], [1657271940000, 0], [1657272000000, 0], [1657272060000, 0], [1657272120000, 0], [1657272180000, 0], [1657272240000, 0], [1657272300000, 0], [1657272360000, 0], [1657272420000, 0], [1657272480000, 0], [1657272540000, 0], [1657272600000, 0], [1657272660000, 0], [1657272720000, 0], [1657272780000, 0], [1657272840000, 0], [1657272900000, 0], [1657272960000, 0], [1657273020000, 0], [1657273080000, 0], [1657273140000, 0], [1657273200000, 0], [1657273260000, 0], [1657273320000, 0], [1657273380000, 0], [1657273440000, 0], [1657273500000, 0], [1657273560000, 0], [1657273620000, 0], [1657273680000, 0], [1657273740000, 0], [1657273800000, 0], [1657273860000, 0], [1657273920000, 0], [1657273980000, 0], [1657274040000, 0], [1657274100000, 0], [1657274160000, 0], [1657274220000, 0], [1657274280000, 0], [1657274340000, 0], [1657274400000, 0], [1657274460000, 0], [1657274520000, 0], [1657274580000, 0], [1657274640000, 0], [1657274700000, 0], [1657274760000, 0], [1657274820000, 0], [1657274880000, 0], [1657274940000, 0], [1657275000000, 0], [1657275060000, 0], [1657275120000, 0], [1657275180000, 0], [1657275240000, 0], [1657275300000, 0], [1657275360000, 0], [1657275420000, 0], [1657275480000, 0], [1657275540000, 0], [1657275600000, 0], [1657275660000, 0], [1657275720000, 0], [1657275780000, 0], [1657275840000, 0], [1657275900000, 0], [1657275960000, 0], [1657276020000, 0], [1657276080000, 0], [1657276140000, 0], [1657276200000, 0], [1657276260000, 0], [1657276320000, 0], [1657276380000, 0], [1657276440000, 0], [1657276500000, 0], [1657276560000, 0], [1657276620000, 0], [1657276680000, 0], [1657276740000, 0], [1657276800000, 0]]}]})

错误率:

(200, {'tags': {'r_service': 'ruoyi-08-gateway'}, 'columns': ['time', 'sum(r_request_count)'], 'values': [[1657255260000, 0], [1657255320000, 0], [1657255380000, 0], [1657255440000, 0], [1657255500000, 0], [1657255560000, 0], [1657255620000, 0], [1657255680000, 0], [1657255740000, 0], [1657255800000, 0], [1657255860000, 0], [1657255920000, 0], [1657255980000, 0], [1657256040000, 0], [1657256100000, 0], [1657256160000, 0], [1657256220000, 0], [1657256280000, 0], [1657256340000, 0], [1657256400000, 0], [1657256460000, 0], [1657256520000, 0], [1657256580000, 0], [1657256640000, 0], [1657256700000, 0], [1657256760000, 0], [1657256820000, 0], [1657256880000, 0], [1657256940000, 0], [1657257000000, 0], [1657257060000, 0], [1657257120000, 12], [1657257180000, 0], [1657257240000, 12], [1657257300000, 0], [1657257360000, 0], [1657257420000, 0], [1657257480000, 0], [1657257540000, 0], [1657257600000, 0], [1657257660000, 0], [1657257720000, 0], [1657257780000, 0], [1657257840000, 0], [1657257900000, 0], [1657257960000, 0], [1657258020000, 0], [1657258080000, 0], [1657258140000, 0], [1657258200000, 0], [1657258260000, 0], [1657258320000, 0], [1657258380000, 0], [1657258440000, 0], [1657258500000, 0], [1657258560000, 0], [1657258620000, 0], [1657258680000, 0], [1657258740000, 0], [1657258800000, 0], [1657258860000, 0], [1657258920000, 0], [1657258980000, 0], [1657259040000, 0], [1657259100000, 0], [1657259160000, 0], [1657259220000, 0], [1657259280000, 0], [1657259340000, 0], [1657259400000, 0], [1657259460000, 0], [1657259520000, 0], [1657259580000, 0], [1657259640000, 0], [1657259700000, 0], [1657259760000, 0], [1657259820000, 0], [1657259880000, 0], [1657259940000, 0], [1657260000000, 0], [1657260060000, 0], [1657260120000, 0], [1657260180000, 0], [1657260240000, 0], [1657260300000, 0], [1657260360000, 0], [1657260420000, 0], [1657260480000, 0], [1657260540000, 0], [1657260600000, 0], [1657260660000, 0], [1657260720000, 0], [1657260780000, 0], [1657260840000, 0], [1657260900000, 0], [1657260960000, 0], [1657261020000, 0], [1657261080000, 0], [1657261140000, 0], [1657261200000, 0], [1657261260000, 0], [1657261320000, 0], [1657261380000, 0], [1657261440000, 0], [1657261500000, 0], [1657261560000, 0], [1657261620000, 0], [1657261680000, 0], [1657261740000, 0], [1657261800000, 0], [1657261860000, 0], [1657261920000, 0], [1657261980000, 0], [1657262040000, 0], [1657262100000, 0], [1657262160000, 0], [1657262220000, 0], [1657262280000, 0], [1657262340000, 0], [1657262400000, 0], [1657262460000, 0], [1657262520000, 0], [1657262580000, 0], [1657262640000, 0], [1657262700000, 0], [1657262760000, 0], [1657262820000, 0], [1657262880000, 0], [1657262940000, 0], [1657263000000, 0], [1657263060000, 0], [1657263120000, 0], [1657263180000, 0], [1657263240000, 0], [1657263300000, 0], [1657263360000, 0], [1657263420000, 0], [1657263480000, 0], [1657263540000, 0], [1657263600000, 0], [1657263660000, 0], [1657263720000, 0], [1657263780000, 0], [1657263840000, 0], [1657263900000, 0], [1657263960000, 0], [1657264020000, 0], [1657264080000, 0], [1657264140000, 0], [1657264200000, 0], [1657264260000, 0], [1657264320000, 0], [1657264380000, 0], [1657264440000, 0], [1657264500000, 0], [1657264560000, 0], [1657264620000, 0], [1657264680000, 0], [1657264740000, 0], [1657264800000, 0], [1657264860000, 0], [1657264920000, 0], [1657264980000, 0], [1657265040000, 0], [1657265100000, 0], [1657265160000, 0], [1657265220000, 0], [1657265280000, 0], [1657265340000, 0], [1657265400000, 0], [1657265460000, 0], [1657265520000, 0], [1657265580000, 0], [1657265640000, 0], [1657265700000, 0], [1657265760000, 0], [1657265820000, 0], [1657265880000, 0], [1657265940000, 0], [1657266000000, 0], [1657266060000, 0], [1657266120000, 0], [1657266180000, 0], [1657266240000, 0], [1657266300000, 0], [1657266360000, 0], [1657266420000, 0], [1657266480000, 0], [1657266540000, 0], [1657266600000, 0], [1657266660000, 0], [1657266720000, 0], [1657266780000, 0], [1657266840000, 0], [1657266900000, 0], [1657266960000, 0], [1657267020000, 0], [1657267080000, 0], [1657267140000, 0], [1657267200000, 0], [1657267260000, 0], [1657267320000, 0], [1657267380000, 0], [1657267440000, 0], [1657267500000, 0], [1657267560000, 0], [1657267620000, 0], [1657267680000, 0], [1657267740000, 0], [1657267800000, 0], [1657267860000, 0], [1657267920000, 0], [1657267980000, 0], [1657268040000, 0], [1657268100000, 0], [1657268160000, 0], [1657268220000, 0], [1657268280000, 0], [1657268340000, 0], [1657268400000, 0], [1657268460000, 0], [1657268520000, 0], [1657268580000, 0], [1657268640000, 0], [1657268700000, 0], [1657268760000, 0], [1657268820000, 0], [1657268880000, 0], [1657268940000, 0], [1657269000000, 0], [1657269060000, 0], [1657269120000, 0], [1657269180000, 0], [1657269240000, 0], [1657269300000, 0], [1657269360000, 0], [1657269420000, 0], [1657269480000, 0], [1657269540000, 0], [1657269600000, 0], [1657269660000, 0], [1657269720000, 0], [1657269780000, 0], [1657269840000, 0], [1657269900000, 0], [1657269960000, 0], [1657270020000, 0], [1657270080000, 0], [1657270140000, 0], [1657270200000, 0], [1657270260000, 0], [1657270320000, 0], [1657270380000, 0], [1657270440000, 0], [1657270500000, 0], [1657270560000, 0], [1657270620000, 0], [1657270680000, 0], [1657270740000, 0], [1657270800000, 0], [1657270860000, 0], [1657270920000, 0], [1657270980000, 0], [1657271040000, 0], [1657271100000, 0], [1657271160000, 0], [1657271220000, 0], [1657271280000, 0], [1657271340000, 0], [1657271400000, 0], [1657271460000, 0], [1657271520000, 0], [1657271580000, 0], [1657271640000, 0], [1657271700000, 0], [1657271760000, 0], [1657271820000, 0], [1657271880000, 0], [1657271940000, 0], [1657272000000, 0], [1657272060000, 0], [1657272120000, 0], [1657272180000, 0], [1657272240000, 0], [1657272300000, 0], [1657272360000, 0], [1657272420000, 0], [1657272480000, 0], [1657272540000, 0], [1657272600000, 0], [1657272660000, 0], [1657272720000, 0], [1657272780000, 0], [1657272840000, 0], [1657272900000, 0], [1657272960000, 0], [1657273020000, 0], [1657273080000, 0], [1657273140000, 0], [1657273200000, 0], [1657273260000, 0], [1657273320000, 0], [1657273380000, 0], [1657273440000, 0], [1657273500000, 0], [1657273560000, 0], [1657273620000, 0], [1657273680000, 0], [1657273740000, 0], [1657273800000, 0], [1657273860000, 0], [1657273920000, 0], [1657273980000, 0], [1657274040000, 0], [1657274100000, 0], [1657274160000, 0], [1657274220000, 0], [1657274280000, 0], [1657274340000, 0], [1657274400000, 0], [1657274460000, 0], [1657274520000, 0], [1657274580000, 0], [1657274640000, 0], [1657274700000, 0], [1657274760000, 0], [1657274820000, 0], [1657274880000, 0], [1657274940000, 0], [1657275000000, 0], [1657275060000, 0], [1657275120000, 0], [1657275180000, 0], [1657275240000, 0], [1657275300000, 0], [1657275360000, 0], [1657275420000, 0], [1657275480000, 0], [1657275540000, 0], [1657275600000, 0], [1657275660000, 0], [1657275720000, 0], [1657275780000, 0], [1657275840000, 0], [1657275900000, 0], [1657275960000, 0], [1657276020000, 0], [1657276080000, 0], [1657276140000, 0], [1657276200000, 0], [1657276260000, 0], [1657276320000, 0], [1657276380000, 0], [1657276440000, 0], [1657276500000, 0], [1657276560000, 0], [1657276620000, 0], [1657276680000, 0], [1657276740000, 0], [1657276800000, 0]]}]})

QPS:

(200, {'tags': {'r_service': 'ruoyi-08-gateway'}, 'columns': ['time', 'sum(r_request_count)'], 'values': [[1657255260000, 0], [1657255320000, 0], [1657255380000, 0], [1657255440000, 0], [1657255500000, 0], [1657255560000, 0], [1657255620000, 0], [1657255680000, 0], [1657255740000, 0], [1657255800000, 0], [1657255860000, 0], [1657255920000, 0], [1657255980000, 0], [1657256040000, 0], [1657256100000, 0], [1657256160000, 0], [1657256220000, 0], [1657256280000, 0], [1657256340000, 0], [1657256400000, 0], [1657256460000, 0], [1657256520000, 0], [1657256580000, 0], [1657256640000, 0], [1657256700000, 0], [1657256760000, 0], [1657256820000, 0], [1657256880000, 0], [1657256940000, 0], [1657257000000, 0], [1657257060000, 0], [1657257120000, 12], [1657257180000, 0], [1657257240000, 12], [1657257300000, 0], [1657257360000, 0], [1657257420000, 0], [1657257480000, 0], [1657257540000, 0], [1657257600000, 0], [1657257660000, 0], [1657257720000, 0], [1657257780000, 0], [1657257840000, 0], [1657257900000, 0], [1657257960000, 0], [1657258020000, 0], [1657258080000, 0], [1657258140000, 0], [1657258200000, 0], [1657258260000, 0], [1657258320000, 0], [1657258380000, 0], [1657258440000, 0], [1657258500000, 0], [1657258560000, 0], [1657258620000, 0], [1657258680000, 0], [1657258740000, 0], [1657258800000, 0], [1657258860000, 0], [1657258920000, 0], [1657258980000, 0], [1657259040000, 0], [1657259100000, 0], [1657259160000, 0], [1657259220000, 0], [1657259280000, 0], [1657259340000, 0], [1657259400000, 0], [1657259460000, 0], [1657259520000, 0], [1657259580000, 0], [1657259640000, 0], [1657259700000, 0], [1657259760000, 0], [1657259820000, 0], [1657259880000, 0], [1657259940000, 0], [1657260000000, 0], [1657260060000, 0], [1657260120000, 0], [1657260180000, 0], [1657260240000, 0], [1657260300000, 0], [1657260360000, 0], [1657260420000, 0], [1657260480000, 0], [1657260540000, 0], [1657260600000, 0], [1657260660000, 0], [1657260720000, 0], [1657260780000, 0], [1657260840000, 0], [1657260900000, 0], [1657260960000, 0], [1657261020000, 0], [1657261080000, 0], [1657261140000, 0], [1657261200000, 0], [1657261260000, 0], [1657261320000, 0], [1657261380000, 0], [1657261440000, 0], [1657261500000, 0], [1657261560000, 0], [1657261620000, 0], [1657261680000, 0], [1657261740000, 0], [1657261800000, 0], [1657261860000, 0], [1657261920000, 0], [1657261980000, 0], [1657262040000, 0], [1657262100000, 0], [1657262160000, 0], [1657262220000, 0], [1657262280000, 0], [1657262340000, 0], [1657262400000, 0], [1657262460000, 0], [1657262520000, 0], [1657262580000, 0], [1657262640000, 0], [1657262700000, 0], [1657262760000, 0], [1657262820000, 0], [1657262880000, 0], [1657262940000, 0], [1657263000000, 0], [1657263060000, 0], [1657263120000, 0], [1657263180000, 0], [1657263240000, 0], [1657263300000, 0], [1657263360000, 0], [1657263420000, 0], [1657263480000, 0], [1657263540000, 0], [1657263600000, 0], [1657263660000, 0], [1657263720000, 0], [1657263780000, 0], [1657263840000, 0], [1657263900000, 0], [1657263960000, 0], [1657264020000, 0], [1657264080000, 0], [1657264140000, 0], [1657264200000, 0], [1657264260000, 0], [1657264320000, 0], [1657264380000, 0], [1657264440000, 0], [1657264500000, 0], [1657264560000, 0], [1657264620000, 0], [1657264680000, 0], [1657264740000, 0], [1657264800000, 0], [1657264860000, 0], [1657264920000, 0], [1657264980000, 0], [1657265040000, 0], [1657265100000, 0], [1657265160000, 0], [1657265220000, 0], [1657265280000, 0], [1657265340000, 0], [1657265400000, 0], [1657265460000, 0], [1657265520000, 0], [1657265580000, 0], [1657265640000, 0], [1657265700000, 0], [1657265760000, 0], [1657265820000, 0], [1657265880000, 0], [1657265940000, 0], [1657266000000, 0], [1657266060000, 0], [1657266120000, 0], [1657266180000, 0], [1657266240000, 0], [1657266300000, 0], [1657266360000, 0], [1657266420000, 0], [1657266480000, 0], [1657266540000, 0], [1657266600000, 0], [1657266660000, 0], [1657266720000, 0], [1657266780000, 0], [1657266840000, 0], [1657266900000, 0], [1657266960000, 0], [1657267020000, 0], [1657267080000, 0], [1657267140000, 0], [1657267200000, 0], [1657267260000, 0], [1657267320000, 0], [1657267380000, 0], [1657267440000, 0], [1657267500000, 0], [1657267560000, 0], [1657267620000, 0], [1657267680000, 0], [1657267740000, 0], [1657267800000, 0], [1657267860000, 0], [1657267920000, 0], [1657267980000, 0], [1657268040000, 0], [1657268100000, 0], [1657268160000, 0], [1657268220000, 0], [1657268280000, 0], [1657268340000, 0], [1657268400000, 0], [1657268460000, 0], [1657268520000, 0], [1657268580000, 0], [1657268640000, 0], [1657268700000, 0], [1657268760000, 0], [1657268820000, 0], [1657268880000, 0], [1657268940000, 0], [1657269000000, 0], [1657269060000, 0], [1657269120000, 0], [1657269180000, 0], [1657269240000, 0], [1657269300000, 0], [1657269360000, 0], [1657269420000, 0], [1657269480000, 0], [1657269540000, 0], [1657269600000, 0], [1657269660000, 0], [1657269720000, 0], [1657269780000, 0], [1657269840000, 0], [1657269900000, 0], [1657269960000, 0], [1657270020000, 0], [1657270080000, 0], [1657270140000, 0], [1657270200000, 0], [1657270260000, 0], [1657270320000, 0], [1657270380000, 0], [1657270440000, 0], [1657270500000, 0], [1657270560000, 0], [1657270620000, 0], [1657270680000, 0], [1657270740000, 0], [1657270800000, 0], [1657270860000, 0], [1657270920000, 0], [1657270980000, 0], [1657271040000, 0], [1657271100000, 0], [1657271160000, 0], [1657271220000, 0], [1657271280000, 0], [1657271340000, 0], [1657271400000, 0], [1657271460000, 0], [1657271520000, 0], [1657271580000, 0], [1657271640000, 0], [1657271700000, 0], [1657271760000, 0], [1657271820000, 0], [1657271880000, 0], [1657271940000, 0], [1657272000000, 0], [1657272060000, 0], [1657272120000, 0], [1657272180000, 0], [1657272240000, 0], [1657272300000, 0], [1657272360000, 0], [1657272420000, 0], [1657272480000, 0], [1657272540000, 0], [1657272600000, 0], [1657272660000, 0], [1657272720000, 0], [1657272780000, 0], [1657272840000, 0], [1657272900000, 0], [1657272960000, 0], [1657273020000, 0], [1657273080000, 0], [1657273140000, 0], [1657273200000, 0], [1657273260000, 0], [1657273320000, 0], [1657273380000, 0], [1657273440000, 0], [1657273500000, 0], [1657273560000, 0], [1657273620000, 0], [1657273680000, 0], [1657273740000, 0], [1657273800000, 0], [1657273860000, 0], [1657273920000, 0], [1657273980000, 0], [1657274040000, 0], [1657274100000, 0], [1657274160000, 0], [1657274220000, 0], [1657274280000, 0], [1657274340000, 0], [1657274400000, 0], [1657274460000, 0], [1657274520000, 0], [1657274580000, 0], [1657274640000, 0], [1657274700000, 0], [1657274760000, 0], [1657274820000, 0], [1657274880000, 0], [1657274940000, 0], [1657275000000, 0], [1657275060000, 0], [1657275120000, 0], [1657275180000, 0], [1657275240000, 0], [1657275300000, 0], [1657275360000, 0], [1657275420000, 0], [1657275480000, 0], [1657275540000, 0], [1657275600000, 0], [1657275660000, 0], [1657275720000, 0], [1657275780000, 0], [1657275840000, 0], [1657275900000, 0], [1657275960000, 0], [1657276020000, 0], [1657276080000, 0], [1657276140000, 0], [1657276200000, 0], [1657276260000, 0], [1657276320000, 0], [1657276380000, 0], [1657276440000, 0], [1657276500000, 0], [1657276560000, 0], [1657276620000, 0], [1657276680000, 0], [1657276740000, 0], [1657276800000, 0]]}]})

service map:

-----------------[ r1..s1 ]-----------------
      __docid 'T_c8n9nmhaahlf101n1q1g'
  call_counts 6
  create_time 1647221722778
      date_ns 0
          env 'dev'
          host 'k8s-node1'
        source 'service_map'
source_service 'demo-k8s-system'
target_service 'demo-k8s-system'
          time 2022-03-1409:35:00+0800 CST
         type'web'
---------
1 rows,1 series, cost 72ms


{'statement_id':0,'series':[{'columns':['__docid','time','env','source_service','target_service','type','call_counts','create_time','date_ns','host','source'],'values':[['T_c8n9o9k5jjqsb5dg4pe0',1647221760000,'dev','demo-k8s-auth','demo-k8s-auth','web',5,1647221798024,0,'k8s-node1','service_map'],['T_c8n9o9k5jjqsb5dg4peg',1647221760000,'dev','demo-k8s-gateway','demo-k8s-gateway','web',7,1647221798024,0,'k8s-node1','service_map']]}]}

span:

-----------------[ r1..s1 ]-----------------
        __docid 'T_c8n9ovs5jjql3a1t6ou0'
    cluster_name 'k8s-prod'
    create_time 1647221887505
        date_ns 4459
        duration 847
            env 'dev'
            host 'k8s-node1'
        host_ip '172.16.0.230'
    http_method 'PUT'
http_status_code '200'
        message '{"service":"demo-k8s-system","name":"http.request","resource":"PUT /nacos/v1/ns/instance/beat","trace_id":7379685012312364829,"span_id":4567185458459732621,"parent_id":0,"start":1647221885724004459,"duration":847984,"error":0,"meta":{"component":"http-url-connection","env":"dev","http.method":"PUT","http.status_code":"200","http.url":"http://172.16.0.229:8848/nacos/v1/ns/instance/beat","language":"jvm","node_ip":"172.16.0.230","peer.hostname":"172.16.0.229","runtime-id":"df764944-6e79-4cee-9883-c4975a887e56","span.kind":"client","thread.name":"com.alibaba.nacos.naming.beat.sender"},"metrics":{"_dd.agent_psr":1,"_dd.top_level":1,"_sampling_priority_v1":1,"peer.port":8848,"thread.id":83},"type":"http"}'
        node_ip '172.16.0.230'
      operation 'http.request'
      parent_id '0'
        resource 'PUT /nacos/v1/ns/instance/beat'
        service 'demo-k8s-system'
          source 'ddtrace'
        span_id '4567185458459732621'
      span_type 'entry'
          start 1647221885724004
          status 'ok'
            time 2022-03-1409:38:05+0800 CST
        trace_id '7379685012312364829'
           type'web'
---------
1 rows,1 series, cost 1.301s




{'statement_id':0,'series':[{'columns':['host','http_status_code','node_ip','operation','status','__docid','cluster_name','date_ns','trace_id','message','resource','service','span_type','type','create_time','duration','source','http_method','parent_id','span_id','start','time','env','host_ip'],'values':[['k8s-node1','200','172.16.0.230','http.request','ok','T_c8n9p9s5jjqsb5dkviug','k8s-prod',4258,'7237931248846529090','{"service":"demo-k8s-system","name":"http.request","resource":"PUT /nacos/v1/ns/instance/beat","trace_id":7237931248846529090,"span_id":3553429628118875994,"parent_id":0,"start":1647221925732004258,"duration":794942,"error":0,"meta":{"component":"http-url-connection","env":"dev","http.method":"PUT","http.status_code":"200","http.url":"http://172.16.0.229:8848/nacos/v1/ns/instance/beat","language":"jvm","node_ip":"172.16.0.230","peer.hostname":"172.16.0.229","runtime-id":"df764944-6e79-4cee-9883-c4975a887e56","span.kind":"client","thread.name":"com.alibaba.nacos.naming.beat.sender"},"metrics":{"_dd.agent_psr":1,"_dd.top_level":1,"_sampling_priority_v1":1,"peer.port":8848,"thread.id":83},"type":"http"}','PUT /nacos/v1/ns/instance/beat','demo-k8s-system','entry','web',1647221927516,794,'ddtrace','PUT','0','3553429628118875994',1647221925732004,1647221925732,'dev','172.16.0.230'],['k8s-node1','200','172.16.0.230','http.request','ok','T_c8n9p9s5jjqsb5dkviu0','k8s-prod',5238,'7812574577044643721','{"service":"demo-k8s-auth","name":"http.request","resource":"PUT /nacos/v1/ns/instance/beat","trace_id":7812574577044643721,"span_id":822861332187929136,"parent_id":0,"start":1647221925081005238,"duration":821220,"error":0,"meta":{"component":"http-url-connection","env":"dev","http.method":"PUT","http.status_code":"200","http.url":"http://172.16.0.229:8848/nacos/v1/ns/instance/beat","language":"jvm","node_ip":"172.16.0.230","peer.hostname":"172.16.0.229","runtime-id":"c6244f19-6ddf-4ff3-afbd-a49d7751816b","span.kind":"client","thread.name":"com.alibaba.nacos.naming.beat.sender"},"metrics":{"_dd.agent_psr":1,"_dd.top_level":1,"_sampling_priority_v1":1,"peer.port":8848,"thread.id":56},"type":"http"}','PUT /nacos/v1/ns/instance/beat','demo-k8s-auth','entry','web',1647221927516,821,'ddtrace','PUT','0','822861332187929136',1647221925081005,1647221925081,'dev','172.16.0.230']]}]}

字段定义

字段名 类型 说明
date interger 智能巡检事件产生时间。Unix时间戳,单位 ms
df_event_id string 事件 id。注意:相同事件存在ongoing,resolved两种状态。
df_status string 状态。智能巡检状态取值:ongoing , resolved
df_watchdog_category string 智能巡检分类。取值:apm , infrastructure
df_watchdog_type string 智能巡检类型。取值:disk_usage , mem_leak , apm_request_rate,apm_latency,apm_error_rate
df_watchdog_object string 智能巡检对象。取值:host,device,resource
df_watchdog_tags string 智能巡检对象标签。
df_title string 标题。
df_message string 详细内容。

智能巡检原型设计

创建智能巡检

image.png

查看智能巡检事件

image.png

事件结构定义

本字段主要用于提供智能巡检的详细结果信息,可能包含以下一个或多个字段:

df_bot_obs_detail相关定义

字段 类型 说明
df_bot_obs_detail.main dict 主要内容
df_dimension_tags
所表示的检测主对象相关内容
df_bot_obs_detail.main.tags dict 主要内容的 Tags
df_bot_obs_detail.main.fields dict 主要内容的 Fields
df_bot_obs_detail.main.charts[#] CHARTS 检测主对象的图表数据
CHARTS
结构见下文
df_bot_obs_detail.{相关资源} list 所涉及的相关对象列表
df_bot_obs_detail.{相关资源}[#].tags dict 所涉及的相关对象 Tags
df_bot_obs_detail.{相关资源}[#].fields fields 所涉及的相关对象 Fields
df_bot_obs_detail.{相关资源}[#].charts[#] CHARTS 所涉及的相关对象的图表数据
CHARTS
结构见下文

CHARTS数据结构

字段 类型 说明
CHART.name str 图表名称
CHART.units list 图表数据单位,详情见下文
CHART.units[0] str(Enum) 单位分类,如:time
(时间)
CHART.units[1] str(Enum) 单位,如:ms
(毫秒)
CHART.dps DPS 图表数据
CHART.outliers dict 异常点
CHART.outliers.name str 异常点名称
CHART.outliers.dps DPS 异常点数据
CHART.rum_impact dict RUM 影响信息
CHART.rum_impact.app_id str 受影响应用 id
CHART.rum_impact.user dict 受影响用户
CHART.rum_impact.user.count int 受影响用户数量(最多返回 10 个样本)
CHART.rum_impact.user.items[#] str 受影响用户项列表元素
CHART.rum_impact.page dict 受影响页面
CHART.rum_impact.page.count int 受影响页面数量
CHART.rum_impact.page.items[#] str 受影响页面项列表元素
CHART.root_cause dict 根因
CHART.root_cause.trace_ids[#] str 根因关联 Trace ID 列表元素
CHART.root_cause.span_ids[#] str 根因关联 Span ID 列表元素

示例

具体示例如下:

{
  "<... 其他事件字段>": "略",

  "df_dimension_tags" : { "project": "ruoyi-auth", "env": "testing","version": "1.0" },
  "df_monitor_checker": "bot_obs_apm",
  "df_bot_obs_detail": {
    "main": [
      {
        "charts": [
          {
            "tags": { 
              "project": "ruoyi-auth",
              "env": "testing", 
              "version": "1.0",
              "service": "ruoyi-auth", 
              "resource": "GET /" 
            },
            "name": "Web 请求数",
            "unit": [ "custom", "req" ],
            "dps" : [ [ 1640966460000, 200 ], [ 1640966520000, 300 ] ],
            "outliers": [
              {
                "name": "Web 请求数异常",
                "dps" : [ [ 1640966460000, 200 ], [ 1640966520000, 300 ] ]
              },
              {
                "name": "由错误率引发的 Web 请求数异常",
                "dps" : [ [ 1640966460000, 200 ], [ 1640966520000, 300 ] ]
              },
              {
                "name": "由延迟引发的 Web 请求数异常",
                "dps" : [ [ 1640966460000, 200 ], [ 1640966520000, 300 ] ]
              }
            ],
            "rum_impact":{
                "app_id":"appid_9c7fd257fd824300ba70f7e6d3f5083e",
                "user":{
                    "count":xx,
                    "items":[xxx,xxx,xxx]
                },
                "page":{
                    "count":xx,
                    "items":[xx,xx,xx]
                }
            },
            "root_cause":{
                "trace_ids":[xxx,xxxx]
                "span_ids":[xxxx,xxxx]                                                                      
            }
          },
          {
            "name": "Web 错误率",
            "unit": [ "percent", "percent" ],
            "其他内容于上文类似": "略"
          },
          {
            "name": "Web 延迟",
            "unit": [ "time", "ms" ],
            "其他内容于上文类似": "略"
          },
          {
            "name": "非 Web 请求数",
            "unit": [ "custom", "req" ],
            "其他内容于上文类似": "略"
          },
          {
            "name": "非 Web 错误率",
            "unit": [ "percent", "percent" ],
            "其他内容于上文类似": "略"
          },
          {
            "name": "非 Web 延迟",
            "unit": [ "time", "ms" ],
            "其他内容于上文类似": "略"
          }
        ]
      }
    ]
  }
}

实现思路

任务列表

开启 APM 异常根因分析检测器,并且选择要检测的 service 、resource、project、env 信息
开启采集器后按 6 小时为取数周期,1 小时为运行周期,查看区间时间内延迟、错误率是否出现异常,根据出现的异常的 resource 来进行根因分析,并根据 resource 的类型来确定可能因该异常影响的到的用户群体。

触发思路

以延迟、错误率、P99、QPS指标作为切入口,当这指标其中有一个指标发生异常变动时,触发收集报警信息进行根因分析。
跟踪阈值: p99大于15秒, eroor>10%
跟踪请求数、延迟、错误率趋势,当发生剧烈数据变化时触发事件。
追踪请求数、延迟、错误率趋势 ,当触发临界值时触发事件。例如:p90大于15秒 ,error_rate >10% 等等

收集报警信息

直接从 es 中获取数据分析所需时间范围内的所有 trace 数据,从 span 中统计获得对应的 service、resource、latency、request、P99、QPS、error_count、create_time 数据,为了保障 es 的性能最高利用,首先会将所有的 service 查询出来,先进行 service 的异常排查,再根据查询的 service 进一步查询每个 service 对应的 resource 列表中的异常情况,再进一步利用 dql 查询 service 及 resource 以及 where 指定的 project 和 env, 假如没有指定的 project 和 env 则查询所有的 service 。

'''
    # Service 和 resource 结果集
    service_resource = {}
    # 查询所有 Service
    data4 = datakit.query("T::re(`.*`):(distinct(service) as service){ `source` != 'service_map'}[2022-04-11 15:57:00:2022-04-20 21:31:00] ")
    services = [d[1] for d in data4[1]['series'][0]['values']]
    # 查询每个 service 下的 resource
    for service in services:
        dql = "T::RE(`.*`):(DISTINCT(`resource`) as resource) { `source` != 'service_map'  and service =\'" + service + "\'}"
        print(dql)
        data = datakit.query(dql)
        if len(data[1]['series']) > 0:
            resources = [d[1] for d in data[1]['series'][0]['values']]
            service_resource[service] = resources
    for service in service_resource:
        for resource in service_resource[service]:
            # dql 查询当前 service 和 resource 下的 web 请求数
            # ...
'''

提取报警信息的关键特征

从 es 中获取的 trace 的原始数据进行分析,则需要进行按照指定的时间间隔进行数据聚合来获得需要的数据

发现异常点

分别根据 resource 的 duration、request、error_count 的时间数据绘制图表,通过正态分布、分位数、临近数、突变等方法进行异常值确定。如:
image.png
将所有异常点前后对应的时间戳和对应的 resource、service 返回

根因分析

下图是展示自动分析给定问题的所有垂直和水平拓扑依赖关系。在示例中,当应用程序服务的观测指标出现异常时,该服务的底层水平堆栈未显示任何异常事件。智能巡检会自动分析该服务的上下游信息来为该应用来确定根因异常。如 APP service 出现异常但是在该服务的底层信息没有发现异常,但是检测到其对service 1 的依赖,且service 1 也表现出异常行为,但是service 1 的底层也没有发现异常,同理发现服务 1 对service 3 的依赖,检测到 Service 3 的所有依赖项都表现出异常行为,并且是整个问题的根本原因的一部分。
由此可见问题很少是一次性事件,可能通常以常规模式出现,如果依赖于相同组件的任何其他实体也在大约同一时间遇到问题,那么这些实体也将成为问题根本原因分析的一部分。

APM智能巡检设计方案 v2 - 图5

用户影响分析

根据异常的 resource 信息来查看是否为 web 类型的 resource,如果为 web 类型的 resource 则对其进行 RUM 数据查询,来获取对应的用户影响和异常页面影响。

  • 确定遇到错误的确切用户,以便快速评估其范围
  • 通过自动洞察问题对前端视图和后端服务的影响,轻松确定修复工作的优先级

用户影响数据获取:

RUM 关联数据查询 -- 关联影响服务、影响用户、影响页面
前提:需要用户 SDK 中配置 service ,username 信息。service 填写:SDK 引入时,username :通过自定义 tag 标注用户名信息

1)影响服务数量和服务名称 -- 通过 APM 的异常服务列表跟 RUM 的服务列表求交集获得
R::view:(count_distinct(service),distinct(service)) {service = #{服务}} [异常开始时间:结束时间]

2)影响用户数量和用户名称
R::view:(count_distinct(userid),distinct(username)) {service = #{服务}} [异常开始时间:结束时间]

3)影响页面数量和页面名称
R::view:(count_distinct(view_url),distinct(view_url)) {service = #{服务}} [异常开始时间:结束时间]

展示报警

前端获取返回的异常点时间戳以及对应的 resource、service 信息进行数据获取并展示。