1.副本 Unassigned shard问题处理

#查看异常shard信息
curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED
#单分片重新路由分片
curl -XPOST 'localhost:9200/_cluster/reroute?retry_failed=true' -d '{
     "commands": [{
            "allocate_replica": {
                "index": "index-name",
                "shard": shard-id,
                "node": "node-name"
           }
        }
    ]
 }'

批量处理脚本，代码参考


#!/bin/bash
#用于处理unassigned shard，副本分片因为
#ALLOCATE_FAILE、超过重传次数而分配失败
#
#
NODE="10.116.106.2:9301"
IFS=$'\n'
for line in $(curl -s '10.116.106.1:9200/_cat/shards' | fgrep UNASSIGNED); do
  echo $line
  INDEX=$(echo $line | (awk '{print $1}'))
  SHARD=$(echo $line | (awk '{print $2}'))
  curl -XPOST '10.116.106.1:9200/_cluster/reroute'?retry_failed -d '{
     "commands": [
        {
            "allocate_replica": {
                "index": "'$INDEX'",
                "shard": '$SHARD',
                "node": "'$NODE'"
           }
        }
    ]
  }'
done

2.reindex API

用于重建索引，或者迁移集群数据
跨集群reindex需要增加配置项
reindex.remote.whitelist: “otherhost:9200, another:9200, 127.0.10.:9200, localhost:“
代码示例：

curl -XPOST http://localhost:9200/_reindex?pretty -d '
{
    "source": {
        #批次大小
        "size": 10000,
        "index": "index-name"
     }, 
     "dest": {
         "index": "index-name-bak"
     }
}' &
#reindex remote cluster
curl -XPOST http://localhost:9200/_reindex?pretty -d '
{
    "source": {
        "size": 5000,
        "remote": {
            "host": "http://remote-cluster-ip:9200"
        }, 
        "index": "index-name"
     }, 
     "dest": {
         "index": "index-name-bak"
     }
}' &

提升reindex性能的配置：

临时关闭刷新间隔

curl -X PUT "http://localhost:9200/_cluster/settings" -d'
{ 
"transient" : { 
  "refresh_interval" :  -1
} 
}'

关闭索引副本

curl -XPUT localhost:9200/index-name/_settings?pretty -d '
{
"index" : {
  "number_of_replicas" : "1"
}
}'

3.ES节点维护指南

一般使用方案二的方法

方案一 ```bash

1.修改配置或者执行维护 2.关闭重平衡 curl -XPUT “http://10.116.106.35:9200/_cluster/settings“ -d’ { “transient” : { “cluster.routing.allocation.enable” : “none” } }’ 执行一次流同步(可选) 加速recovery POST _flush/synced 3.重启节点 4.开启重平衡 curl -XPUT “http://10.116.106.35:9200/_cluster/settings“ -d’ { “transient” : { “cluster.routing.allocation.enable” : “all” } }’ 5.确认集群恢复到green状态后，重复2-4步骤


- 方案二
```bash
1.执行动态exclude配置，迁移待维护节点数据
curl -X PUT "http://localhost:9200/_cluster/settings" -d'
{ 
  "transient" : { 
    "cluster.routing.allocation.exclude._ip" : "node-ip" 
  } 
}'
注：exclude支持_ip，_name,并且支持通配符
2.等待数据迁移完成
3.维护完成后重启节点
4.重新均衡数据
curl -X PUT "http://localhost:9200/_cluster/settings" -d'
{ 
  "transient" : { 
    "cluster.routing.allocation.exclude._ip" : "" 
  } 
}'

4.forcemerge

forcemerge对于集群的优化及长期稳定，起到很重要的作用。
用法：

forcemerge
参数
max_num_segments     1为full merge
only_expunge_deletes    只合并删除的文档
flush                 完成后进行flush操作
示例：
curl -XPOST localhost:9200/index/_forcemerge?only_expunge_deletes=ture
注：对于热索引，请谨慎使用

5.关闭、打开索引

post index-name/_close
post index-name/_open

6.迁移分片

迁移分片，对于负载不均衡等场景有效果
示例：

curl -XPOST 'http://localhost:9200/_cluster/reroute' -d '
{ 
  "commands":[{ 
    "move":{ 
      "index":"logsfim", 
      "shard":0, 
      "from_node":"page-node1", 
      "to_node":"page-node2" 
    } 
  }] 
}'