一、概念初识

1、索引的背景

数据库的痛点

数据库只适合存储海量数据,不适合搜索海量数据。数据库模糊匹配查询速度慢且容易造成全盘扫描,即一个字符一个字符比对,显然性能上无法满足;其次,即便全盘扫描,也有可能匹配不到,如用户输入错误。
数据库也有索引,索引的查询是需要有条件的,符合最左前缀原则,如果没有查询到,还是会触发全盘扫描。

索引的局限性

拓展:什么是最左前缀原则?
数据库中的索引会先排序最左边的第一个字段「name」,基于第一个字段「name」基础上对第二个字段「cid」进行排序。因此直接查找「cid」找不到最左边的「name」,也就无法命中的数据库索引,从而进行索引树的全盘扫描。

  1. CREATE TABLE `stu` (
  2. `id` int(11) NOT NULL AUTO_INCREMENT,
  3. `name` varchar(255) DEFAULT NULL,
  4. `cid` int(11) DEFAULT NULL,
  5. PRIMARY KEY (`id`),
  6. KEY `idx_name_cid` (`name`,`cid`)
  7. );

image.png

参考文档 :https://cloud.tencent.com/developer/article/1449114

sql执行过程

关注连接类型和扫描行
1、交车状态没有建立索引
explain select from table a where delivery_status=’23’;
image.png
2、交易单号值作为唯一索引:
explain select
from table a where order_no=’110487433035’
image.png
3、同样的交易单号,没有符合最左前缀原则:
explain select * from table a where order_no like ‘%110487433035’
image.png

官方解释

ref

  • All rows with matching index values are read from this table for each combination of rows from the previous tables. ref is used if the join uses only a leftmost prefix of the key or if the key is not a PRIMARY KEY or UNIQUE index (in other words, if the join cannot select a single row based on the key value). If the key that is used matches only a few rows, this is a good join type.

ref can be used for indexed columns that are compared using the = or <=> operator. In the following examples, MySQL can use a ref join to process ref_table.

  • 只有符合最左前缀才走索引,能够用最少的行匹配数据,才是一个高效的连接类型

index

  • The index join type is the same as ALL, except that the index tree is scanned. This occurs two ways:

If the index is a covering index for the queries and can be used to satisfy all data required from the table, only the index tree is scanned. In this case, the Extra column says Using index. An index-only scan usually is faster than ALL because the size of the index usually is smaller than the table data.
A full table scan is performed using reads from the index to look up data rows in index order. Uses index does not appear in the Extra column.

  • 未命中最左前缀,全盘扫描索引树

all

  • A full table scan is done for each combination of rows from the previous tables. This is normally not good if the table is the first table not marked const, and usually very bad in all other cases. Normally, you can avoid ALL by adding indexes that enable row retrieval from the table based on constant values or column values from earlier tables.
  • all连接类型则是全盘扫描,设计时尽量避免

ES应运而生解决全表扫描和内容匹配,区别在于搜索速度的快慢和搜索内容的好坏。

2、ES索引的优势

1.1 避免全盘扫描
基于新的数据结构排序,通过倒排索引算法,先查分词,符合分词,按照匹配度排序,即便用户输入错误也能够搜索到匹配度相关的内容
比方说数据写入es之前,苹果手机会先进行分词,苹果手机分词为苹果和手机。搜索时会匹配分词,根据匹配度做排序。

正排索引

  • 例如我们常常会通过一个 id ,来查到一条或者多条记录 | id | content | | —- | —- | | 1001 | i like apple fruit | | 1002 | apple phone | | 1003 | android phone |

倒排索引

  • 当我们往ES写入数据:i like apple fruit ,以分词作为关键字索引,然后记录它们所属的 id | keyword | id | | —- | —- | | apple | 1001,1002 | | fruit | 1001 | | i | 1001 | | like | 1001 | | phone | 1002,1003 | | android | 1003 |

1.2 匹配度的内容优化
按照匹配度排序,即便用户输入错误也能够搜索到匹配度相关的内容

3、ES索引结构

ES索引 映射mysql数据库 备注
mapping 表结构
index
type
document 行:row docID类似数据表的主键
field 列: colomn

二、简单实践

0、下载es服务

官网:https://www.elastic.co/cn/downloads/past-releases/elasticsearch-7-8-0
解压包后进入bin目录运行
bin ./elasticsearch
image.png
默认端口9200
运行结果
image.png

1、检测ES索引是否安装成功

curl -X GET “localhost:9200/?pretty”
image.png

通过浏览器localhost:9200或者http://127.0.0.1:9200/查看是否运行成功
image.png

2、创建一个商品索引

  • 相当于MySQL中create table,表结构为id和title
  • 使用 PUT 方法创建一个 INDEX,INDEX 的名称是“sku”,直接写在请求的 URL 中。请求的 BODY 是一个 JSON 对象,内容就是我们上面定义的 MAPPING,也就是数据结构。
    1. curl -X PUT "localhost:9200/sku" -H 'Content-Type: application/json' -d '{
    2. "mappings": {
    3. "properties": {
    4. "sku_id": {
    5. "type": "long"
    6. },
    7. "title": {
    8. "type": "text"
    9. }
    10. }
    11. }
    12. }'
    image.png

3、查看索引

  1. curl -X GET "localhost:9200/sku"

image.png

4、删除一个索引

  1. curl -X DELETE "127.0.0.1:9200/sku?"

5、插入索引数据

  • 往 INDEX 中写入商品数据使用HTTP POST 方法
  • 不指定id系统默认分配docID
  • 表结构中插入一条苹果手机、苹果水果、香蕉
  1. curl -X POST "localhost:9200/sku/_doc/" -H 'Content-Type: application/json' -d '{
  2. "sku_id": 1001,
  3. "title": "apple iphone"
  4. }'
  1. curl -X POST "localhost:9200/sku/_doc/" -H 'Content-Type: application/json' -d '{
  2. "sku_id": 1002,
  3. "title": "apple fruit"
  4. }'
  5. {"_index":"sku","_type":"_doc","_id":"N0CHaYMBjmsXC9Zk--4S","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":1,"_primary_term":3}
  1. curl -X POST "localhost:9200/sku/_doc/" -H 'Content-Type: application/json' -d '{
  2. "sku_id": 1003,
  3. "title": "apple"
  4. }'

image.png

6、查询索引数据

  • 搜索apple
  • 命中3条记录,并将匹配度排序,匹配度高的排在最前。
  • 请求路径加上pretty,返回的数据展示更好看
    1. curl -X GET "localhost:9200/sku/_search?pretty" -H 'Content-Type: application/json' -d '{
    2. "query": {"match":{"title":"apple"}}
    3. }'
    image.png
    image.png

三、业务系统接入

服务商如何快速搜索工单?
通过输入vin码后四位,匹配该服务商下所属vin码的所有工单

ES索引探索学习 - 图14 小结:业务系统接入方式较老,接入公司自研中间件不需要新开一个应用专门做同步的事情。

四、常见错误

未分配分片的原因

  1. curl -X GET "localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason"| grep UNASSIGNED

创建索引成功插入数据失败

报错:
curl -X POST “localhost:9200/sku/_doc/“ -H ‘Content-Type: application/json’ -d ‘{
“sku_id”: 1001,
“title”: “apple iphone”
}’
{“error”:{“root_cause”:[{“type”:”cluster_block_exception”,”reason”:”index [sku] blocked by: [TOO_MANY_REQUESTS/12/index read-only / allow delete (api)];”}],”type”:”cluster_block_exception”,”reason”:”index [sku] blocked by: [TOO_MANY_REQUESTS/12/index read-only / allow delete (api)];”},”status”:429}%

解决:
原因为磁盘空间低于5%
image.png

释放空间后重新启动:
[MacBook-Pro-9.local] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[sku][0]]]).
[2022-09-23T16:42:20,278][WARN ][o.e.c.r.a.DiskThresholdMonitor] [MacBook-Pro-9.local] high disk watermark [90%] exceeded on [nTPFtpMZT6-e_fovh_Pm5g][MacBook-Pro-9.local][/Users/aisuzhen/Downloads/elasticsearch-7.8.0/data/nodes/0] free: 5.9gb[5.2%], shards will be relocated away from this node; currently relocating away shards totalling [0] bytes; the node is expected to continue to exceed the high disk watermark when these relocations are complete

引申
测试索引关注点:
解决数据库和索引数据不一致的问题
同步失败的问题检测、告警
同步ES失败,落表,定时任务轮询检查、更新
随着日益增长的数据,是否磁盘已满,是否需要扩容