cat health

cat health

原文链接 : https://www.elastic.co/guide/en/elasticsearch/reference/5.0/cat-health.html

译文链接 : http://www.apache.wiki/display/Elasticsearch/cat+health

health 是一个简洁的，一行表示了来自 /_cluster/health 的相同的信息。

curl -XGET 'localhost:9200/_cat/health?v&pretty'

响应如下 :

epoch      timestamp cluster       status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1475871424 16:17:04  elasticsearch green           1         1      5   5    0    0        0             0                  -                100.0%

它有一个 ts 选项，以禁用 timestamping（时间戳）:

curl -XGET 'localhost:9200/_cat/health?v&ts=0&pretty'

响应如下 :

cluster       status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
elasticsearch green           1         1      5   5    0    0        0             0                  -                100.0%

此命令的一个常见用途是验证所有节点的健康是不是一致的 :

% pssh -i -h list.of.cluster.hosts curl -s localhost:9200/_cat/health
[1] 20:20:52 [SUCCESS] es3.vm
1384309218 18:20:18 foo green 3 3 3 3 0 0 0 0
[2] 20:20:52 [SUCCESS] es1.vm
1384309218 18:20:18 foo green 3 3 3 3 0 0 0 0
[3] 20:20:52 [SUCCESS] es2.vm
1384309218 18:20:18 foo green 3 3 3 3 0 0 0 0

一个不太明显的用途是来跟踪一段时间内大型集群的恢复情况。有许多分片，启动集群，或者在丢失一节点后恢复，可能需要一段时间（取决于网络 & 硬盘）。其中的一种方法是在一个延迟的循环中使用跟踪它的进程，命令如下 :

% while true; do curl localhost:9200/_cat/health; sleep 120; done
1384309446 18:24:06 foo red 3 3 20 20 0 0 1812 0
1384309566 18:26:06 foo yellow 3 3 950 916 0 12 870 0
1384309686 18:28:06 foo yellow 3 3 1328 916 0 12 492 0
1384309806 18:30:06 foo green 3 3 1832 916 4 0 0
^C

在这种情况下，我们可以告诉到家，集群恢复大概需要4分钟。如果恢复时间是4个小时的话，我们将会看到 **UNASSIGNED**（未分配的）分片数量减少。如果数量保持不变，集群可能出现问题了。

在集群故障时您通常会使用 health 命令。在这期间，它在整个日志文件，报警系统等相关联的活动是非常重要。

有两种输出。一种是 HH:MM:SS 输出，对于人们来说是友好的。另一种是 epoch time（Epoch 是指定为1970年1月1日凌晨零点零分零秒，格林威治时间），这样保留了更多的信息，其中包括日期，如果恢复是跨天的话还可以排序。