1.副本 Unassigned shard问题处理

  1. #查看异常shard信息
  2. curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED
  3. #单分片重新路由分片
  4. curl -XPOST 'localhost:9200/_cluster/reroute?retry_failed=true' -d '{
  5. "commands": [{
  6. "allocate_replica": {
  7. "index": "index-name",
  8. "shard": shard-id,
  9. "node": "node-name"
  10. }
  11. }
  12. ]
  13. }'

批量处理脚本,代码参考

  1. #!/bin/bash
  2. #用于处理unassigned shard,副本分片因为
  3. #ALLOCATE_FAILE、超过重传次数而分配失败
  4. #
  5. #
  6. NODE="10.116.106.2:9301"
  7. IFS=$'\n'
  8. for line in $(curl -s '10.116.106.1:9200/_cat/shards' | fgrep UNASSIGNED); do
  9. echo $line
  10. INDEX=$(echo $line | (awk '{print $1}'))
  11. SHARD=$(echo $line | (awk '{print $2}'))
  12. curl -XPOST '10.116.106.1:9200/_cluster/reroute'?retry_failed -d '{
  13. "commands": [
  14. {
  15. "allocate_replica": {
  16. "index": "'$INDEX'",
  17. "shard": '$SHARD',
  18. "node": "'$NODE'"
  19. }
  20. }
  21. ]
  22. }'
  23. done

2.reindex API

用于重建索引,或者迁移集群数据
跨集群reindex需要增加配置项
reindex.remote.whitelist: “otherhost:9200, another:9200, 127.0.10.:9200, localhost:
代码示例:

  1. curl -XPOST http://localhost:9200/_reindex?pretty -d '
  2. {
  3. "source": {
  4. #批次大小
  5. "size": 10000,
  6. "index": "index-name"
  7. },
  8. "dest": {
  9. "index": "index-name-bak"
  10. }
  11. }' &
  12. #reindex remote cluster
  13. curl -XPOST http://localhost:9200/_reindex?pretty -d '
  14. {
  15. "source": {
  16. "size": 5000,
  17. "remote": {
  18. "host": "http://remote-cluster-ip:9200"
  19. },
  20. "index": "index-name"
  21. },
  22. "dest": {
  23. "index": "index-name-bak"
  24. }
  25. }' &

提升reindex性能的配置:

  • 临时关闭刷新间隔

    1. curl -X PUT "http://localhost:9200/_cluster/settings" -d'
    2. {
    3. "transient" : {
    4. "refresh_interval" : -1
    5. }
    6. }'
  • 关闭索引副本

    1. curl -XPUT localhost:9200/index-name/_settings?pretty -d '
    2. {
    3. "index" : {
    4. "number_of_replicas" : "1"
    5. }
    6. }'

3.ES节点维护指南

一般使用方案二的方法

  • 方案一 ```bash

1.修改配置或者执行维护 2.关闭重平衡 curl -XPUT “http://10.116.106.35:9200/_cluster/settings“ -d’ { “transient” : { “cluster.routing.allocation.enable” : “none” } }’ 执行一次流同步(可选) 加速recovery POST _flush/synced 3.重启节点 4.开启重平衡 curl -XPUT “http://10.116.106.35:9200/_cluster/settings“ -d’ { “transient” : { “cluster.routing.allocation.enable” : “all” } }’ 5.确认集群恢复到green状态后,重复2-4步骤

  1. - 方案二
  2. ```bash
  3. 1.执行动态exclude配置,迁移待维护节点数据
  4. curl -X PUT "http://localhost:9200/_cluster/settings" -d'
  5. {
  6. "transient" : {
  7. "cluster.routing.allocation.exclude._ip" : "node-ip"
  8. }
  9. }'
  10. 注:exclude支持_ip,_name,并且支持通配符
  11. 2.等待数据迁移完成
  12. 3.维护完成后重启节点
  13. 4.重新均衡数据
  14. curl -X PUT "http://localhost:9200/_cluster/settings" -d'
  15. {
  16. "transient" : {
  17. "cluster.routing.allocation.exclude._ip" : ""
  18. }
  19. }'

4.forcemerge

forcemerge对于集群的优化及长期稳定,起到很重要的作用。
用法:

  1. forcemerge
  2. 参数
  3. max_num_segments 1full merge
  4. only_expunge_deletes 只合并删除的文档
  5. flush 完成后进行flush操作
  6. 示例:
  7. curl -XPOST localhost:9200/index/_forcemerge?only_expunge_deletes=ture
  8. 注:对于热索引,请谨慎使用

5.关闭、打开索引

  1. post index-name/_close
  2. post index-name/_open

6.迁移分片

迁移分片,对于负载不均衡等场景有效果
示例:

  1. curl -XPOST 'http://localhost:9200/_cluster/reroute' -d '
  2. {
  3. "commands":[{
  4. "move":{
  5. "index":"logsfim",
  6. "shard":0,
  7. "from_node":"page-node1",
  8. "to_node":"page-node2"
  9. }
  10. }]
  11. }'