1.副本 Unassigned shard问题处理
#查看异常shard信息
curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED
#单分片重新路由分片
curl -XPOST 'localhost:9200/_cluster/reroute?retry_failed=true' -d '{
"commands": [{
"allocate_replica": {
"index": "index-name",
"shard": shard-id,
"node": "node-name"
}
}
]
}'
批量处理脚本,代码参考
#!/bin/bash
#用于处理unassigned shard,副本分片因为
#ALLOCATE_FAILE、超过重传次数而分配失败
#
#
NODE="10.116.106.2:9301"
IFS=$'\n'
for line in $(curl -s '10.116.106.1:9200/_cat/shards' | fgrep UNASSIGNED); do
echo $line
INDEX=$(echo $line | (awk '{print $1}'))
SHARD=$(echo $line | (awk '{print $2}'))
curl -XPOST '10.116.106.1:9200/_cluster/reroute'?retry_failed -d '{
"commands": [
{
"allocate_replica": {
"index": "'$INDEX'",
"shard": '$SHARD',
"node": "'$NODE'"
}
}
]
}'
done
2.reindex API
用于重建索引,或者迁移集群数据
跨集群reindex需要增加配置项
reindex.remote.whitelist: “otherhost:9200, another:9200, 127.0.10.:9200, localhost:“
代码示例:
curl -XPOST http://localhost:9200/_reindex?pretty -d '
{
"source": {
#批次大小
"size": 10000,
"index": "index-name"
},
"dest": {
"index": "index-name-bak"
}
}' &
#reindex remote cluster
curl -XPOST http://localhost:9200/_reindex?pretty -d '
{
"source": {
"size": 5000,
"remote": {
"host": "http://remote-cluster-ip:9200"
},
"index": "index-name"
},
"dest": {
"index": "index-name-bak"
}
}' &
提升reindex性能的配置:
临时关闭刷新间隔
curl -X PUT "http://localhost:9200/_cluster/settings" -d'
{
"transient" : {
"refresh_interval" : -1
}
}'
关闭索引副本
curl -XPUT localhost:9200/index-name/_settings?pretty -d '
{
"index" : {
"number_of_replicas" : "1"
}
}'
3.ES节点维护指南
一般使用方案二的方法
- 方案一 ```bash
1.修改配置或者执行维护 2.关闭重平衡 curl -XPUT “http://10.116.106.35:9200/_cluster/settings“ -d’ { “transient” : { “cluster.routing.allocation.enable” : “none” } }’ 执行一次流同步(可选) 加速recovery POST _flush/synced 3.重启节点 4.开启重平衡 curl -XPUT “http://10.116.106.35:9200/_cluster/settings“ -d’ { “transient” : { “cluster.routing.allocation.enable” : “all” } }’ 5.确认集群恢复到green状态后,重复2-4步骤
- 方案二
```bash
1.执行动态exclude配置,迁移待维护节点数据
curl -X PUT "http://localhost:9200/_cluster/settings" -d'
{
"transient" : {
"cluster.routing.allocation.exclude._ip" : "node-ip"
}
}'
注:exclude支持_ip,_name,并且支持通配符
2.等待数据迁移完成
3.维护完成后重启节点
4.重新均衡数据
curl -X PUT "http://localhost:9200/_cluster/settings" -d'
{
"transient" : {
"cluster.routing.allocation.exclude._ip" : ""
}
}'
4.forcemerge
forcemerge对于集群的优化及长期稳定,起到很重要的作用。
用法:
forcemerge
参数
max_num_segments 1为full merge
only_expunge_deletes 只合并删除的文档
flush 完成后进行flush操作
示例:
curl -XPOST localhost:9200/index/_forcemerge?only_expunge_deletes=ture
注:对于热索引,请谨慎使用
5.关闭、打开索引
post index-name/_close
post index-name/_open
6.迁移分片
迁移分片,对于负载不均衡等场景有效果
示例:
curl -XPOST 'http://localhost:9200/_cluster/reroute' -d '
{
"commands":[{
"move":{
"index":"logsfim",
"shard":0,
"from_node":"page-node1",
"to_node":"page-node2"
}
}]
}'