所有的/api 打头的接口,都要传一个token作为校验信息,放到header里,header key是:x-user-token,token 可以从 个人设置-密钥管理 获取

数据上报查询

上报监控数据

Nightingale的transfer和collector两个模块都提供了上报数据的接口
本地collector上报接口 POST [http://127.0.0.1:2058/api/collector/push](http://127.0.0.1:2058/api/collector/push)
中心transfer上报接口 POST /api/transfer/push
字段含义见数据规范
注意事项:

  1. 说明一下 extra 字段,用户可以在这个字段中添加额外的信息,比如traceId,当监控数据触发告警后,告警引擎会将 extra 的内容写到告警事件中,透传给告警历史页面,用户可以根据 extra 中的信息更快的定位问题
  2. endpoint 和 nid 两者选填一个即可,有些监控场景是和机器设备不相关的,这个时候可以只写nid,nid为在ecmc页面上鼠标移动到服务节点上看到的节点ID,如果监控数据是和设备相关的,则只写 endpoint

设备无关的监控数据上报样例

  1. QueryString: nid=10 #如果在QueryString写了nid,在Payload中可以不加nid字段
  2. Payload:
  3. [
  4. {
  5. "metric":"monapi.qps", // 必填 监控指标
  6. "endpoints":"10.85.16.13", // 必填 节点id
  7. "timestamp":1559733442, // 必填 指标的时间戳
  8. "step":10,// 必填 上报周期,单位秒
  9. "value":1,// 必填 指标的值
  10. "tags":"",// 选填 指标的tag, 形式是 a=b,c=d
  11. "extra":""//
  12. }
  13. ]

设备相关的监控数据上报样例

Payload:
[
    {
        "metric":"cpu.util",
        "endpoint":"192.168.1.2",
        "timestamp":1559733442,
        "step":10,
        "value":1,
        "tags":"",
        "extra":""
    }
]


查询监控数据

POST /api/transfer/data

  • endpoints 和 nids 两者选填一个即可,查询设备无关的监控数据使用nids,查询设备相关的监控数据,使用endpoints

请求样例

[
    {
        "start":1551861670,//必填
        "end":1551863670,//必填
        "aggrFunc":"sum", //选填
        "groupKey":["iface"],  //选填
        "consolFuc":"AVERAGE", //选填
        "endpoints":["host1","host2"], //必填
        "counters":["nightingale.test/service=graph"],//必填 内容为 $metric/$tagkey=$tagvale
        "step":10,//必填
        "dstype":"GAUGE" //选填
    }
]

返回样例

{
    "dat": [
        {
            "start": 1562925134,
            "end": 1562925234,
            "endpoint": "127.0.0.1",
            "counter": "proc.num/service=n9e-collector,target=collector",
            "step": 10,
            "values": [
                {
                    "timestamp": 1562925120,
                    "value": 1
                }
            ]
        },
        {
            "start": 1562925134,
            "end": 1562925234,
            "endpoint": "127.0.0.2",
            "counter": "proc.num/service=n9e-collector,target=collector",
            "step": 10,
            "values": [
                {
                    "timestamp": 1562925210,
                    "value": 0
                },
                {
                    "timestamp": 1562925220,
                    "value": 0
                },
                {
                    "timestamp": 1562925230,
                    "value": 0
                }
            ]
        }
    ],
    "err": ""
}

查询监控数据2

POST /api/transfer/data/ui

  • endpoints 和 nids 两者选填一个即可,查询设备无关的监控数据使用nids,查询设备相关的监控数据,使用endpoints

请求样例

{
  "endpoints": [
    "172.18.32.40",
  ],
  "nids": null,
  "metric": "net.in.bits",
  "tags": [
    "iface=eth0"
  ],
  "step": 20,
  "dstype": "GAUGE",
  "start": 1630370876,
  "end": 1630374476,
  "consolFuc": "AVERAGE",
  "comparisons": [
    0
  ]
}

返回样例

{
  "dat": [
    {
      "start": 1630370876,
      "end": 1630374476,
      "endpoint": "172.18.32.44",
      "nid": "",
      "counter": "net.in.bits/iface=eth0",
      "dstype": "GAUGE",
      "step": 20,
      "values": [
        {
          "timestamp": 1630374440,
          "value": 2024322.000000
        }
      ],
      "comparison": 0
    }
  ],
  "err": ""
}

查询metric

POST /api/index/metrics

  • endpoints 和 nids 两者选填一个即可,查询设备无关的监控数据使用nids,查询设备相关的监控数据,使用endpoints

请求样例

{
        "nids":["10","11"], 
    "endpoints": ["host1","host2"]
}

返回样例

{
    "dat": [
        {
            "metrics": [
                "cpu.idle"
            ],
        }
    ],
    "err": "",
}

查询tags

POST /api/index/tagkv

  • endpoints 和 nids 两者选填一个即可,查询设备无关的监控数据使用nids,查询设备相关的监控数据,使用endpoints

请求样例

{
        "nids":["10","11"], 
    "endpoints": ["host1","host2"],
    "metrics": ["disk.used.percent"],
}

返回样例

{
    "dat": [
        {
            "endpoints": ["host1","host2"],
            "metric": "disk.used.percent",
            "tagkv": [
                {
                    "tagk": "mount",       
                    "tagv": ["/", "/home"]
                },
            ]
        }
    ],
    "err":""
}



监控大盘

监控大盘列表

GET /api/mon/node/:id/screen

获取screen列表,因为已经是某个节点下的了,量比较少,后端不分页


创建监控大盘

POST /api/mon/node/:id/screen

{
    "name": ""
}

修改监控大盘

PUT /api/mon/screen/:id

其中node_id顺带也可以修改,这样screen相当于直接挪动了挂载节点

{
    "name": "",
    "node_id": 0
}

获取某个监控大盘的信息

GET /api/mon/screen/:id

返回内容示例:
{
    "dat": {
        "id": 1,
        "node_id": 2,
        "name": "巡检大盘",
        "last_updator": "root",
        "last_updated": "2020-07-28T11:17:02+08:00",
        "node_path": "cloudplatform.ecmc"
    },
    "err": ""
}

删除监控大盘

DELETE /api/mon/screen/:id

删除某个screen


获取某个大盘下面的子类列表

GET /api/mon/screen/:id/subclass

获取screen下面的子类,返回的subclass按照weight字段排序


创建大盘子类

POST /api/mon/screen/:id/subclass

创建subclass

{
    "name": "",
    "weight": 0
}

修改大盘子类

PUT /api/mon/subclass

批量修改subclass

[
    {
        "id": 1,
        "name": "a",
        "weight": 1
    },
    {
        "id": 2,
        "name": "b",
        "weight": 0
    }
]

删除大盘子类

DELETE /api/mon/subclass/:id

删除某个subclass


修改大盘子类的归属,可以调整到其他大盘里

PUT /api/mon/subclasses/loc

修改subclass的location,即所属的screen

[
    {
        "id": 1,
        "screen_id": 1
    },
    {
        "id": 2,
        "screen_id": 1
    }
]

获取某个chart的信息

GET /api/mon/subclass/:id/chart

获取chart列表,根据chart的weight排序,不分页


创建chart

POST /api/mon/subclass/:id/chart

创建chart

{
    "configs": "",
    "weight": 0
}

修改chart

PUT /api/mon/chart/:id

修改某个chart的信息

{
    "subclass_id": 1,
    "configs": ""
}

删除某个chart

DELETE /api/mon/chart/:id

删除某个chart


修改chart排序

PUT /api/mon/charts/weights

修改chart的排序权重

{
    "id": 1,
    "weight": 9
}

告警策略

字段说明

  • name:策略名称
  • category:告警策略类型,1为设备相关,2为设备无关
  • nid: 策略关联的对象树节点id
  • excl_nid: 排除关联对象树节点下的子节点id
  • tags: 监控指标的tags
  • priority: 告警等级,可以设置1,2,3
  • alert_dur: 告警统计周期,单位为秒
  • enable_stime:策略生效开始时间
  • enable_etime:策略生效终止时间
  • enable_days_of_week:策略生效日期
  • exprs
    • eopt:操作符,枚举[=,!=,>,>=,<,<=]
    • func:告警函数,支持all happen max min avg sum diff pdiff nodata
    • metric:监控指标
    • params:告警函数需要的参数
    • threshold:告警函数需要的阈值
  • recovery_dur:持续多少秒则产生恢复event,0表示立即产生恢复event
  • recovery_notify:0 发送恢复通知 1不发送恢复通知
  • converge:告警通知收敛,第1个值表示收敛周期,单位秒,第2个值表示周期内允许发送告警次数
  • notify_group:告警信息接收组
  • notify_user:告警信息接收人
  • work_groups: 工单系统的工作组
  • runbook: 故障预案手册的链接地址
  • callback:告警触发之后的回调地址
  • need_upgrade:是否配置告警升级 0表示否 1表示是
  • alert_upgrade
    • duration: 告警持续多久触发升级,单位为秒
    • level:升级的告警等级
    • users: 升级之后发送的告警信息接收人
    • groups: 升级之后发送的告警信息接收组

      创建告警策略

PUT /api/mon/stra
请求样例

{
  "category": 1,
  "name": "策略名称",
  "nid": 2,
  "excl_nid": [
    15
  ],
  "priority": 3,
  "alert_dur": 180,
  "exprs": [
    {
      "metric": "net.in.bits",
      "func": "all",
      "eopt": ">",
      "threshold": 0,
      "params": []
    }
  ],
  "tags": [
    {
      "topt": "=",
      "tkey": "iface",
      "tval": [
        "eth0"
      ]
    }
  ],
  "recovery_dur": 0,
  "recovery_notify": 0,
  "alert_upgrade": {
    "duration": 600,
    "level": 1,
    "users": [
      1
    ],
    "groups": []
  },
  "converge": [
    3600,
    1
  ],
  "notify_group": [],
  "notify_user": [
    6
  ],
  "callback": "n9e.org/callback",
  "enable_stime": "00:00",
  "enable_etime": "23:59",
  "enable_days_of_week": [
    0,
    1,
    2,
    3,
    4,
    5,
    6
  ],
  "need_upgrade": 1
}

更新告警策略

PUT /api/mon/stra

请求样例

{
  "id":1,
  "name": "all必触发", 
  "nid": 21,
  "excl_nid": null,    
  "priority": 3,
  "alert_dur": 60,    
  "exprs": [
    { 
      "eopt": "!=",
      "func": "all",
      "metric": "cpu.idle",
      "params": [],
      "threshold": 0
    }
  ],
  "tags": [],
  "recovery_dur": 0,
  "recovery_notify": 1,
  "alert_upgrade": {
    "duration": 60,
    "level": 1,
    "users": [],
    "groups": []
  },
  "converge": [3600,1],
  "notify_group": [],
  "notify_user": [5],
  "callback": "",
  "enable_stime": "00:00",
  "enable_etime": "23:59",
  "enable_days_of_week": [0,1,2,3,4,5,6],
  "need_upgrade": 0,
  "id": 13
}

返回样例

{
  "err":"",
  "dat":"ok"
}

删除告警策略

DELETE /api/mon/stra

请求样例

{
    "ids":[4]
}

返回样例

{
  "err":"",
  "dat":"ok"
}

查看所有策略

GET /api/mon/stra?nid=1
nid:服务树节点id,选填,不填则获取所有策略
返回样例

{
  "dat": [
    {
      "id": 1,
      "name": "io.util大于90%",
      "category": 1,
      "nid": 100,
      "alert_dur": 600,
      "recovery_dur": 120,
      "enable_stime": "00:00",
      "enable_etime": "23:59",
      "priority": 3,
      "callback": "",
      "creator": "root",
      "created": "2019-03-06T16:47:16+08:00",
      "last_updator": "root",
      "last_updated": "2019-03-06T16:47:16+08:00",
      "excl_nid": [99],
      "exprs": [
        {
          "eopt": ">",
          "func": "abs",
          "metric": "qps",
          "params": [3],
          "threshold": 10
        }
      ],
      "tags": [
        {
          "tkey": "host",
          "topt": "=",
          "tval": ["nightingale.host1"]
        }
      ],
      "enable_days_of_week": [0,1,2,3,4,5,6],
      "converge": [60,3],
      "recovery_notify": 1,
      "notify_group": [1,3],
      "notify_user": [1,3],
      "leaf_nids": null,
      "need_upgrade":1,
      "alert_upgrade":{
        "users":[1,3],
        "groups":[1,3],
        "duration":1000,
        "level":1
      }
    },
  ],
  "err": ""
}

查看单个策略

GET /api/mon/stra/:sid

返回样例

{
  "dat":
    {
      "id": 1,
      "name": "io.util大于90%",
      "category": 1,
      "nid": 100,
      "alert_dur": 600,
      "recovery_dur": 120,
      "enable_stime": "00:00",
      "enable_etime": "23:59",
      "priority": 3,
      "callback": "",
      "creator": "root",
      "created": "2019-03-06T16:47:16+08:00",
      "last_updator": "root",
      "last_updated": "2019-03-06T16:47:16+08:00",
      "excl_nid": [99],
      "exprs": [
        {
          "eopt": ">",
          "func": "abs",
          "metric": "qps",
          "params": [3],
          "threshold": 10
        }
      ],
      "tags": [
        {
          "tkey": "host",
          "topt": "=",
          "tval": ["nightingale.host1"]
        }
      ],
      "enable_days_of_week": [0,1,2,3,4,5,6],
      "converge": [60,3],
      "recovery_notify": 1,
      "notify_group": [1,3],
      "notify_user": [1,3],
      "leaf_nids": null,
      "need_upgrade":1,
      "alert_upgrade":{
        "users":[1,3],
        "groups":[1,3],
        "duration":1000,
        "level":1
      }
    },
  "err": ""
}

查看所有生效策略

GET /api/mon/stras/effective?all=1
返回样例

{
  "dat": [
    {
      "id": 1,
      "name": "io.util大于90%",
      "category": 1,
      "nid": 100,
      "alert_dur": 600,
      "recovery_dur": 120,
      "enable_stime": "00:00",
      "enable_etime": "23:59",
      "priority": 3,
      "callback": "",
      "creator": "root",
      "created": "2019-03-06T16:47:16+08:00",
      "last_updator": "root",
      "last_updated": "2019-03-06T16:47:16+08:00",
      "excl_nid": [99],
      "exprs": [
        {
          "eopt": ">",
          "func": "abs",
          "metric": "qps",
          "params": [3],
          "threshold": 10
        }
      ],
      "tags": [
        {
          "tkey": "host",
          "topt": "=",
          "tval": ["nightingale.host1"]
        }
      ],
      "enable_days_of_week": [0,1,2,3,4,5,6],
      "converge": [60,3],
      "recovery_notify": 1,
      "notify_group": [1,3],
      "notify_user": [1,3],
      "leaf_nids": null,
      "need_upgrade":1,
      "alert_upgrade":{
        "users":[1,3],
        "groups":[1,3],
        "duration":1000,
        "level":1
      }
    },
  ],
  "err": ""
}

告警历史

全部告警历史

GET /api/mon/event/his?nodepath=a.b.c&limit=100
nodepath为节点的路径,limit为最多返回的数量,默认是20个

{
    "dat":{
        "list":[
            {
                "id":1460, //告警事件id
                "sid":31,  //告警策略id
                "sname":"ngx_log_001",      //告警策略名称
                "node_path":"inner.a.d",    //告警策略所属节点路径
                "nid":5,                    //告警策略所属节点id
                "endpoint":"10.178.27.152", //告警的监控对象
                "priority":1,               //告警等级,有 1,2,3 三个等级
                "event_type":"alert",       //告警分类 alert|recovery
                "hashid":640170854899939892,//告警事件hashid
                "etime":1632373770,         //告警触发时间
                "value":"log.nyy_01: 28",   //告警的监控指标现场值
                "info":" log.nyy_01 (happen,10s) [1] \u003e 0", //触发告警的策略表达式
                "tags":"", //告警指标的标签
                "detail":[ //告警监控指标的描述
                    {
                        "metric":"log.nyy_01",  //告警的监控指标
                        "tags":null,            //告警指标的标签
                        "points":[              //触发告警时的现场值
                            {
                                "timestamp":1632373770,
                                "value":0
                            },
                            {
                                "timestamp":1632373760,
                                "value":28
                            }
                        ]
                    }
                ],
                "status":[ //通知结果
                    "无接收人"
                ],
            }
        ],
        "total":1 //告警事件的个数
    },
    "err":""
}

节点信息

获取节点列表

GET /api/rdb/nodes

{
    "dat": [
        {
            "id": 9,
            "pid": 3,
            "ident": "mon", // 节点唯一标识
            "name": "监控",  // 节点的名称
            "note": "监控服务", 
            "path": "inner.mon", // 节点的全路径
            "leaf": 0,
            "cate": "project",
            "icon_color": "#de83cb",
            "icon_char": "P",
            "proxy": 0,
            "creator": "root",
            "last_updated": "2020-08-31T20:18:17+08:00",
            "admins": null
        }
    ],
    "err": ""
}