所有的/api 打头的接口,都要传一个token作为校验信息,放到header里,header key是:x-user-token,token 可以从 个人设置-密钥管理 获取
数据上报查询
上报监控数据
Nightingale的transfer和collector两个模块都提供了上报数据的接口
本地collector上报接口 POST [http://127.0.0.1:2058/api/collector/push](http://127.0.0.1:2058/api/collector/push)
中心transfer上报接口 POST /api/transfer/push
字段含义见数据规范
注意事项:
- 说明一下
extra
字段,用户可以在这个字段中添加额外的信息,比如traceId,当监控数据触发告警后,告警引擎会将extra
的内容写到告警事件中,透传给告警历史页面,用户可以根据extra
中的信息更快的定位问题 - endpoint 和 nid 两者选填一个即可,有些监控场景是和机器设备不相关的,这个时候可以只写nid,nid为在ecmc页面上鼠标移动到服务节点上看到的节点ID,如果监控数据是和设备相关的,则只写 endpoint
设备无关的监控数据上报样例
QueryString: nid=10 #如果在QueryString写了nid,在Payload中可以不加nid字段
Payload:
[
{
"metric":"monapi.qps", // 必填 监控指标
"endpoints":"10.85.16.13", // 必填 节点id
"timestamp":1559733442, // 必填 指标的时间戳
"step":10,// 必填 上报周期,单位秒
"value":1,// 必填 指标的值
"tags":"",// 选填 指标的tag, 形式是 a=b,c=d
"extra":""//
}
]
设备相关的监控数据上报样例
Payload:
[
{
"metric":"cpu.util",
"endpoint":"192.168.1.2",
"timestamp":1559733442,
"step":10,
"value":1,
"tags":"",
"extra":""
}
]
查询监控数据
POST /api/transfer/data
- endpoints 和 nids 两者选填一个即可,查询设备无关的监控数据使用nids,查询设备相关的监控数据,使用endpoints
请求样例
[
{
"start":1551861670,//必填
"end":1551863670,//必填
"aggrFunc":"sum", //选填
"groupKey":["iface"], //选填
"consolFuc":"AVERAGE", //选填
"endpoints":["host1","host2"], //必填
"counters":["nightingale.test/service=graph"],//必填 内容为 $metric/$tagkey=$tagvale
"step":10,//必填
"dstype":"GAUGE" //选填
}
]
返回样例
{
"dat": [
{
"start": 1562925134,
"end": 1562925234,
"endpoint": "127.0.0.1",
"counter": "proc.num/service=n9e-collector,target=collector",
"step": 10,
"values": [
{
"timestamp": 1562925120,
"value": 1
}
]
},
{
"start": 1562925134,
"end": 1562925234,
"endpoint": "127.0.0.2",
"counter": "proc.num/service=n9e-collector,target=collector",
"step": 10,
"values": [
{
"timestamp": 1562925210,
"value": 0
},
{
"timestamp": 1562925220,
"value": 0
},
{
"timestamp": 1562925230,
"value": 0
}
]
}
],
"err": ""
}
查询监控数据2
POST /api/transfer/data/ui
- endpoints 和 nids 两者选填一个即可,查询设备无关的监控数据使用nids,查询设备相关的监控数据,使用endpoints
请求样例
{
"endpoints": [
"172.18.32.40",
],
"nids": null,
"metric": "net.in.bits",
"tags": [
"iface=eth0"
],
"step": 20,
"dstype": "GAUGE",
"start": 1630370876,
"end": 1630374476,
"consolFuc": "AVERAGE",
"comparisons": [
0
]
}
返回样例
{
"dat": [
{
"start": 1630370876,
"end": 1630374476,
"endpoint": "172.18.32.44",
"nid": "",
"counter": "net.in.bits/iface=eth0",
"dstype": "GAUGE",
"step": 20,
"values": [
{
"timestamp": 1630374440,
"value": 2024322.000000
}
],
"comparison": 0
}
],
"err": ""
}
查询metric
POST /api/index/metrics
- endpoints 和 nids 两者选填一个即可,查询设备无关的监控数据使用nids,查询设备相关的监控数据,使用endpoints
请求样例
{
"nids":["10","11"],
"endpoints": ["host1","host2"]
}
返回样例
{
"dat": [
{
"metrics": [
"cpu.idle"
],
}
],
"err": "",
}
查询tags
POST /api/index/tagkv
- endpoints 和 nids 两者选填一个即可,查询设备无关的监控数据使用nids,查询设备相关的监控数据,使用endpoints
请求样例
{
"nids":["10","11"],
"endpoints": ["host1","host2"],
"metrics": ["disk.used.percent"],
}
返回样例
{
"dat": [
{
"endpoints": ["host1","host2"],
"metric": "disk.used.percent",
"tagkv": [
{
"tagk": "mount",
"tagv": ["/", "/home"]
},
]
}
],
"err":""
}
监控大盘
监控大盘列表
GET /api/mon/node/:id/screen
获取screen列表,因为已经是某个节点下的了,量比较少,后端不分页
创建监控大盘
POST /api/mon/node/:id/screen
{
"name": ""
}
修改监控大盘
PUT /api/mon/screen/:id
其中node_id顺带也可以修改,这样screen相当于直接挪动了挂载节点
{
"name": "",
"node_id": 0
}
获取某个监控大盘的信息
GET /api/mon/screen/:id
返回内容示例:
{
"dat": {
"id": 1,
"node_id": 2,
"name": "巡检大盘",
"last_updator": "root",
"last_updated": "2020-07-28T11:17:02+08:00",
"node_path": "cloudplatform.ecmc"
},
"err": ""
}
删除监控大盘
DELETE /api/mon/screen/:id
删除某个screen
获取某个大盘下面的子类列表
GET /api/mon/screen/:id/subclass
获取screen下面的子类,返回的subclass按照weight字段排序
创建大盘子类
POST /api/mon/screen/:id/subclass
创建subclass
{
"name": "",
"weight": 0
}
修改大盘子类
PUT /api/mon/subclass
批量修改subclass
[
{
"id": 1,
"name": "a",
"weight": 1
},
{
"id": 2,
"name": "b",
"weight": 0
}
]
删除大盘子类
DELETE /api/mon/subclass/:id
删除某个subclass
修改大盘子类的归属,可以调整到其他大盘里
PUT /api/mon/subclasses/loc
修改subclass的location,即所属的screen
[
{
"id": 1,
"screen_id": 1
},
{
"id": 2,
"screen_id": 1
}
]
获取某个chart的信息
GET /api/mon/subclass/:id/chart
获取chart列表,根据chart的weight排序,不分页
创建chart
POST /api/mon/subclass/:id/chart
创建chart
{
"configs": "",
"weight": 0
}
修改chart
PUT /api/mon/chart/:id
修改某个chart的信息
{
"subclass_id": 1,
"configs": ""
}
删除某个chart
DELETE /api/mon/chart/:id
删除某个chart
修改chart排序
PUT /api/mon/charts/weights
修改chart的排序权重
{
"id": 1,
"weight": 9
}
告警策略
字段说明
- name:策略名称
- category:告警策略类型,1为设备相关,2为设备无关
- nid: 策略关联的对象树节点id
- excl_nid: 排除关联对象树节点下的子节点id
- tags: 监控指标的tags
- priority: 告警等级,可以设置1,2,3
- alert_dur: 告警统计周期,单位为秒
- enable_stime:策略生效开始时间
- enable_etime:策略生效终止时间
- enable_days_of_week:策略生效日期
- exprs
- eopt:操作符,枚举[=,!=,>,>=,<,<=]
- func:告警函数,支持all happen max min avg sum diff pdiff nodata
- metric:监控指标
- params:告警函数需要的参数
- threshold:告警函数需要的阈值
- recovery_dur:持续多少秒则产生恢复event,0表示立即产生恢复event
- recovery_notify:0 发送恢复通知 1不发送恢复通知
- converge:告警通知收敛,第1个值表示收敛周期,单位秒,第2个值表示周期内允许发送告警次数
- notify_group:告警信息接收组
- notify_user:告警信息接收人
- work_groups: 工单系统的工作组
- runbook: 故障预案手册的链接地址
- callback:告警触发之后的回调地址
- need_upgrade:是否配置告警升级 0表示否 1表示是
- alert_upgrade
PUT /api/mon/stra
请求样例
{
"category": 1,
"name": "策略名称",
"nid": 2,
"excl_nid": [
15
],
"priority": 3,
"alert_dur": 180,
"exprs": [
{
"metric": "net.in.bits",
"func": "all",
"eopt": ">",
"threshold": 0,
"params": []
}
],
"tags": [
{
"topt": "=",
"tkey": "iface",
"tval": [
"eth0"
]
}
],
"recovery_dur": 0,
"recovery_notify": 0,
"alert_upgrade": {
"duration": 600,
"level": 1,
"users": [
1
],
"groups": []
},
"converge": [
3600,
1
],
"notify_group": [],
"notify_user": [
6
],
"callback": "n9e.org/callback",
"enable_stime": "00:00",
"enable_etime": "23:59",
"enable_days_of_week": [
0,
1,
2,
3,
4,
5,
6
],
"need_upgrade": 1
}
更新告警策略
PUT /api/mon/stra
请求样例
{
"id":1,
"name": "all必触发",
"nid": 21,
"excl_nid": null,
"priority": 3,
"alert_dur": 60,
"exprs": [
{
"eopt": "!=",
"func": "all",
"metric": "cpu.idle",
"params": [],
"threshold": 0
}
],
"tags": [],
"recovery_dur": 0,
"recovery_notify": 1,
"alert_upgrade": {
"duration": 60,
"level": 1,
"users": [],
"groups": []
},
"converge": [3600,1],
"notify_group": [],
"notify_user": [5],
"callback": "",
"enable_stime": "00:00",
"enable_etime": "23:59",
"enable_days_of_week": [0,1,2,3,4,5,6],
"need_upgrade": 0,
"id": 13
}
返回样例
{
"err":"",
"dat":"ok"
}
删除告警策略
DELETE /api/mon/stra
请求样例
{
"ids":[4]
}
返回样例
{
"err":"",
"dat":"ok"
}
查看所有策略
GET /api/mon/stra?nid=1
nid:服务树节点id,选填,不填则获取所有策略
返回样例
{
"dat": [
{
"id": 1,
"name": "io.util大于90%",
"category": 1,
"nid": 100,
"alert_dur": 600,
"recovery_dur": 120,
"enable_stime": "00:00",
"enable_etime": "23:59",
"priority": 3,
"callback": "",
"creator": "root",
"created": "2019-03-06T16:47:16+08:00",
"last_updator": "root",
"last_updated": "2019-03-06T16:47:16+08:00",
"excl_nid": [99],
"exprs": [
{
"eopt": ">",
"func": "abs",
"metric": "qps",
"params": [3],
"threshold": 10
}
],
"tags": [
{
"tkey": "host",
"topt": "=",
"tval": ["nightingale.host1"]
}
],
"enable_days_of_week": [0,1,2,3,4,5,6],
"converge": [60,3],
"recovery_notify": 1,
"notify_group": [1,3],
"notify_user": [1,3],
"leaf_nids": null,
"need_upgrade":1,
"alert_upgrade":{
"users":[1,3],
"groups":[1,3],
"duration":1000,
"level":1
}
},
],
"err": ""
}
查看单个策略
GET /api/mon/stra/:sid
返回样例
{
"dat":
{
"id": 1,
"name": "io.util大于90%",
"category": 1,
"nid": 100,
"alert_dur": 600,
"recovery_dur": 120,
"enable_stime": "00:00",
"enable_etime": "23:59",
"priority": 3,
"callback": "",
"creator": "root",
"created": "2019-03-06T16:47:16+08:00",
"last_updator": "root",
"last_updated": "2019-03-06T16:47:16+08:00",
"excl_nid": [99],
"exprs": [
{
"eopt": ">",
"func": "abs",
"metric": "qps",
"params": [3],
"threshold": 10
}
],
"tags": [
{
"tkey": "host",
"topt": "=",
"tval": ["nightingale.host1"]
}
],
"enable_days_of_week": [0,1,2,3,4,5,6],
"converge": [60,3],
"recovery_notify": 1,
"notify_group": [1,3],
"notify_user": [1,3],
"leaf_nids": null,
"need_upgrade":1,
"alert_upgrade":{
"users":[1,3],
"groups":[1,3],
"duration":1000,
"level":1
}
},
"err": ""
}
查看所有生效策略
GET /api/mon/stras/effective?all=1
返回样例
{
"dat": [
{
"id": 1,
"name": "io.util大于90%",
"category": 1,
"nid": 100,
"alert_dur": 600,
"recovery_dur": 120,
"enable_stime": "00:00",
"enable_etime": "23:59",
"priority": 3,
"callback": "",
"creator": "root",
"created": "2019-03-06T16:47:16+08:00",
"last_updator": "root",
"last_updated": "2019-03-06T16:47:16+08:00",
"excl_nid": [99],
"exprs": [
{
"eopt": ">",
"func": "abs",
"metric": "qps",
"params": [3],
"threshold": 10
}
],
"tags": [
{
"tkey": "host",
"topt": "=",
"tval": ["nightingale.host1"]
}
],
"enable_days_of_week": [0,1,2,3,4,5,6],
"converge": [60,3],
"recovery_notify": 1,
"notify_group": [1,3],
"notify_user": [1,3],
"leaf_nids": null,
"need_upgrade":1,
"alert_upgrade":{
"users":[1,3],
"groups":[1,3],
"duration":1000,
"level":1
}
},
],
"err": ""
}
告警历史
全部告警历史
GET /api/mon/event/his?nodepath=a.b.c&limit=100
nodepath为节点的路径,limit为最多返回的数量,默认是20个
{
"dat":{
"list":[
{
"id":1460, //告警事件id
"sid":31, //告警策略id
"sname":"ngx_log_001", //告警策略名称
"node_path":"inner.a.d", //告警策略所属节点路径
"nid":5, //告警策略所属节点id
"endpoint":"10.178.27.152", //告警的监控对象
"priority":1, //告警等级,有 1,2,3 三个等级
"event_type":"alert", //告警分类 alert|recovery
"hashid":640170854899939892,//告警事件hashid
"etime":1632373770, //告警触发时间
"value":"log.nyy_01: 28", //告警的监控指标现场值
"info":" log.nyy_01 (happen,10s) [1] \u003e 0", //触发告警的策略表达式
"tags":"", //告警指标的标签
"detail":[ //告警监控指标的描述
{
"metric":"log.nyy_01", //告警的监控指标
"tags":null, //告警指标的标签
"points":[ //触发告警时的现场值
{
"timestamp":1632373770,
"value":0
},
{
"timestamp":1632373760,
"value":28
}
]
}
],
"status":[ //通知结果
"无接收人"
],
}
],
"total":1 //告警事件的个数
},
"err":""
}
节点信息
获取节点列表
GET /api/rdb/nodes
{
"dat": [
{
"id": 9,
"pid": 3,
"ident": "mon", // 节点唯一标识
"name": "监控", // 节点的名称
"note": "监控服务",
"path": "inner.mon", // 节点的全路径
"leaf": 0,
"cate": "project",
"icon_color": "#de83cb",
"icon_char": "P",
"proxy": 0,
"creator": "root",
"last_updated": "2020-08-31T20:18:17+08:00",
"admins": null
}
],
"err": ""
}