总结 - 分享结构 - 《elasticsearch》

tanslog 学习
索引优化速度
增加相关cache的配置
其他优化参数
总结
优化层面
待测试方案

Elasticsearch 基础理论
Elasticsearch 查询
Elasticsearch 聚合
Elasticsearch 文本分析与映射
Elasticsearch 节点与分片
Elasticsearch 监控与诊断

https://juejin.im/post/5ba06f96f265da0ac84929e9

开源分布式搜索引擎
特点：
近实时索引
高级分析聚合查询
分布式
零配置
自动发现
索引自动分片
索引副本机制
restful 风格接口
多租户
schema free (不懂)
自动搜索负载

节点与分片
节点主要分为 4 种
master
ingest
data
coordinate

tanslog 学习

索引优化速度

index.refresh_interval :-1
index.number_of_shards : X
index.number_of_replicas : 0
index.translog.sync_interval : 30s
index.translog.durability : “async”
index.translog.flush_threshold_size: 4g index.translog.flush_threshold_ops: 50000

增加相关cache的配置

uindices.cache.filter.size: 30%
uindices.fielddata.cache.size: 60%
uindex.cache.field.type: soft
uindices.breaker.fielddata.limit: 70%

其他优化参数

Global ordinals
Index warmer
考虑调整aggregation的collect_mode, breadth_first or depth_first
文件缓存预热 index.store.preload: [“*”]
配置 index.sort

总结

能用filter就不用query
增加冗余字段将部分range aggregation查询变成terms aggregation
为常用字段增加配置，将fielddata的loading设成eager，尽量多加载到内存
增加集群的缓存资源，把内存尽量多的用起来
Global ordinals
Index warmer
调整aggregation的collect_mode
上SSD

优化层面

OS 优化

适当调整 OS file descriptors，推荐使用 64k
禁用 swap 交换
ulimit -l unlimited 锁内存

索引速度优化

去掉 _all 字段可以节省一半空间和提升索引速度
不分词的字段不要设置成分词
如果对索引数据不要求 100% 可靠，可以使用不设置副本
最好使用 SSD
设置合理的的 refresh 时间
设置合理的 flush 时间间隔
适当增加索引设置
适当提供 bulk 队列

稳定性优化

适当调整 ping 参数
调整所有 client 和 data 节点的 jvm 新生代大小（-Xmn 3g），减小 young gc 频率
fielddata（应该是旧版本中存在的问题）

调整shard 数，index.routing.allocation.total_shards_per_node: 2
移除 node 前先 exclude 要移除的 node

设置合理的refresh时间index.refresh_interval: 300Sl设置合理的flush间隔index.translog.flush_threshold_size: 4gindex.translog.flush_threshold_ops: 50000l适当增加索引限制indices.store.throttle.max_bytes_per_sec: 60mbl适当提高bulk队列threadpool.bulk.queue_size: 1000

待测试方案

大数据搜索超时，但 ES 负载持续飙高
https://elasticsearch.cn/slides/169#page=17
高性价比方案——调整索引存储策略
index.store.type

mmapfs（默认），适用于小索引
niofs，适用于大索引、历史索引

Elasticsearch集群监控：elasticsearch_exporter + Prometheus + Grafana

工具
dejavu