Mapping-映射

https://www.elastic.co/guide/en/elasticsearch/reference/7.11/mapping.html

1. Mapping介绍

Maping是用来定义一个文档(document),以及它所包含的属性(field)是如何存储和索引的。
比如:使用maping来定义:

  • 哪些字符串属性应该被看做全文本属性(full text fields);
  • 哪些属性包含数字,日期或地理位置;
  • 文档中的所有属性是否都嫩被索引(all 配置);
  • 日期的格式;
  • 自定义映射规则来执行动态添加属性;

查看mapping信息

  1. GET bank/_mapping
  2. {
  3. "bank" : {
  4. "mappings" : {
  5. "properties" : {
  6. "account_number" : {
  7. "type" : "long"
  8. },
  9. "address" : {
  10. "type" : "text",
  11. "fields" : {
  12. "keyword" : {
  13. "type" : "keyword",
  14. "ignore_above" : 256
  15. }
  16. }
  17. },
  18. "age" : {
  19. "type" : "long"
  20. },
  21. "balance" : {
  22. "type" : "long"
  23. },
  24. "city" : {
  25. "type" : "text",
  26. "fields" : {
  27. "keyword" : {
  28. "type" : "keyword",
  29. "ignore_above" : 256
  30. }
  31. }
  32. },
  33. "email" : {
  34. "type" : "text",
  35. "fields" : {
  36. "keyword" : {
  37. "type" : "keyword",
  38. "ignore_above" : 256
  39. }
  40. }
  41. },
  42. "employer" : {
  43. "type" : "text",
  44. "fields" : {
  45. "keyword" : {
  46. "type" : "keyword",
  47. "ignore_above" : 256
  48. }
  49. }
  50. },
  51. "firstname" : {
  52. "type" : "text",
  53. "fields" : {
  54. "keyword" : {
  55. "type" : "keyword",
  56. "ignore_above" : 256
  57. }
  58. }
  59. },
  60. "gender" : {
  61. "type" : "text",
  62. "fields" : {
  63. "keyword" : {
  64. "type" : "keyword",
  65. "ignore_above" : 256
  66. }
  67. }
  68. },
  69. "lastname" : {
  70. "type" : "text",
  71. "fields" : {
  72. "keyword" : {
  73. "type" : "keyword",
  74. "ignore_above" : 256
  75. }
  76. }
  77. },
  78. "state" : {
  79. "type" : "text",
  80. "fields" : {
  81. "keyword" : {
  82. "type" : "keyword",
  83. "ignore_above" : 256
  84. }
  85. }
  86. }
  87. }
  88. }
  89. }
  90. }

2. 新版本type移除

ElasticSearch7-去掉type概念

  1. 关系型数据库中两个数据表示是独立的,即使他们里面有相同名称的列也不影响使用,但ES中不是这样的。elasticsearch是基于Lucene开发的搜索引擎,而ES中不同type下名称相同的filed最终在Lucene中的处理方式是一样的。
    • 两个不同type下的两个user_name,在ES同一个索引下其实被认为是同一个filed,你必须在两个不同的type中定义相同的filed映射。否则,不同type中的相同字段名称就会在处理中出现冲突的情况,导致Lucene处理效率下降。
    • 去掉type就是为了提高ES处理数据的效率。
  2. Elasticsearch 7.x URL中的type参数为可选。比如,索引一个文档不再要求提供文档类型。
  3. Elasticsearch 8.x 不再支持URL中的type参数。
  4. 解决:
    将索引从多类型迁移到单类型,每种类型文档一个独立索引
    将已存在的索引下的类型数据,全部迁移到指定位置即可。详见数据迁移

Elasticsearch 7.x

  • Specifying types in requests is deprecated. For instance, indexing a document no longer requires a document type. The new index APIs are PUT {index}/_doc/{id} in case of explicit ids and POST {index}/_doc for auto-generated ids. Note that in 7.0, _doc is a permanent part of the path, and represents the endpoint name rather than the document type.
  • The include_type_name parameter in the index creation, index template, and mapping APIs will default to false. Setting the parameter at all will result in a deprecation warning.
  • The _default_ mapping type is removed.

Elasticsearch 8.x

  • Specifying types in requests is no longer supported.
  • The include_type_name parameter is removed.

3. 属性类型

参考:官方属性类型

映射操作

参考:创建映射操作

1. 创建索引映射

创建索引并指定属性的映射规则(相当于新建表并指定字段和字段类型)

  1. PUT /my_index
  2. {
  3. "mappings": {
  4. "properties": {
  5. "age": {
  6. "type": "integer"
  7. },
  8. "email": {
  9. "type": "keyword"
  10. },
  11. "name": {
  12. "type": "text"
  13. }
  14. }
  15. }
  16. }

结果:

  1. {
  2. "acknowledged" : true,
  3. "shards_acknowledged" : true,
  4. "index" : "my_index"
  5. }

2. 给已有映射增加字段

https://www.elastic.co/guide/en/elasticsearch/reference/7.x/explicit-mapping.html#add-field-mapping

  1. PUT /my_index/_mapping
  2. {
  3. "properties": {
  4. "employee-id": {
  5. "type": "keyword",
  6. "index": false
  7. }
  8. }
  9. }
  10. # 这里的 "index": false,表明新增的字段不能被检索。默认是true
  11. # https://www.elastic.co/guide/en/elasticsearch/reference/7.x/mapping-index.html

结果:

  1. {
  2. "acknowledged" : true
  3. }

3. 查看映射

https://www.elastic.co/guide/en/elasticsearch/reference/7.x/explicit-mapping.html#view-mapping

  1. GET /my_index/_mapping
  2. # 查看某一个字段的映射
  3. GET /my_index/_mapping/field/employee-id

结果:

  1. {
  2. "my_index" : {
  3. "mappings" : {
  4. "properties" : {
  5. "age" : {
  6. "type" : "integer"
  7. },
  8. "email" : {
  9. "type" : "keyword"
  10. },
  11. "employee-id" : {
  12. "type" : "keyword",
  13. "index" : false
  14. },
  15. "name" : {
  16. "type" : "text"
  17. }
  18. }
  19. }
  20. }
  21. }
  22. # index false 表示不能被索引找到

4. 更新映射

https://www.elastic.co/guide/en/elasticsearch/reference/7.x/explicit-mapping.html#update-mapping

对于已经存在的字段映射,我们不能更新。更新必须创建新的索引,进行数据迁移。

Changing an existing field could invalidate data that’s already indexed.

5. 数据迁移

迁移方式分为两种,一种是7和7之后去掉type的情况,一种是包含type 迁移的情况。

无type数据迁移

  1. POST reindex [固定写法]
  2. {
  3. "source":{
  4. "index":"twitter"
  5. },
  6. "dest":{
  7. "index":"new_twitters"
  8. }
  9. }

有type数据迁移

  1. POST reindex [固定写法]
  2. {
  3. "source":{
  4. "index":"twitter",
  5. "twitter":"twitter"
  6. },
  7. "dest":{
  8. "index":"new_twitters"
  9. }
  10. }

6. 数据迁移实例

对于我们的测试数据,是包含 type 的索引 bank。
现在我们创建新的索引 newbank 并修改一些字段的类型来演示当需要更新映射时的数据迁移操作。

① 查看索引 bank 当前字段映射类型

  1. GET /bank/_mapping
  2. # 结果
  3. {
  4. "bank" : {
  5. "mappings" : {
  6. "properties" : {
  7. "account_number" : {
  8. "type" : "long"
  9. },
  10. "address" : {
  11. "type" : "text",
  12. "fields" : {
  13. "keyword" : {
  14. "type" : "keyword",
  15. "ignore_above" : 256
  16. }
  17. }
  18. },
  19. "age" : {
  20. "type" : "long"
  21. },
  22. "balance" : {
  23. "type" : "long"
  24. },
  25. "city" : {
  26. "type" : "text",
  27. "fields" : {
  28. "keyword" : {
  29. "type" : "keyword",
  30. "ignore_above" : 256
  31. }
  32. }
  33. },
  34. "email" : {
  35. "type" : "text",
  36. "fields" : {
  37. "keyword" : {
  38. "type" : "keyword",
  39. "ignore_above" : 256
  40. }
  41. }
  42. },
  43. "employer" : {
  44. "type" : "text",
  45. "fields" : {
  46. "keyword" : {
  47. "type" : "keyword",
  48. "ignore_above" : 256
  49. }
  50. }
  51. },
  52. "firstname" : {
  53. "type" : "text",
  54. "fields" : {
  55. "keyword" : {
  56. "type" : "keyword",
  57. "ignore_above" : 256
  58. }
  59. }
  60. },
  61. "gender" : {
  62. "type" : "text",
  63. "fields" : {
  64. "keyword" : {
  65. "type" : "keyword",
  66. "ignore_above" : 256
  67. }
  68. }
  69. },
  70. "lastname" : {
  71. "type" : "text",
  72. "fields" : {
  73. "keyword" : {
  74. "type" : "keyword",
  75. "ignore_above" : 256
  76. }
  77. }
  78. },
  79. "state" : {
  80. "type" : "text",
  81. "fields" : {
  82. "keyword" : {
  83. "type" : "keyword",
  84. "ignore_above" : 256
  85. }
  86. }
  87. }
  88. }
  89. }
  90. }
  91. }

② 创建新索引 newbank 并修改字段类型

  1. PUT /newbank
  2. {
  3. "mappings": {
  4. "properties": {
  5. "account_number": {
  6. "type": "long"
  7. },
  8. "address": {
  9. "type": "text"
  10. },
  11. "age": {
  12. "type": "integer"
  13. },
  14. "balance": {
  15. "type": "long"
  16. },
  17. "city": {
  18. "type": "keyword"
  19. },
  20. "email": {
  21. "type": "keyword"
  22. },
  23. "employer": {
  24. "type": "keyword"
  25. },
  26. "firstname": {
  27. "type": "text"
  28. },
  29. "gender": {
  30. "type": "keyword"
  31. },
  32. "lastname": {
  33. "type": "text",
  34. "fields": {
  35. "keyword": {
  36. "type": "keyword",
  37. "ignore_above": 256
  38. }
  39. }
  40. },
  41. "state": {
  42. "type": "keyword"
  43. }
  44. }
  45. }
  46. }

③ 数据迁移

  1. POST _reindex
  2. {
  3. "source": {
  4. "index": "bank",
  5. "type": "account"
  6. },
  7. "dest": {
  8. "index": "newbank"
  9. }
  10. }

结果:

  1. #! Deprecation: [types removal] Specifying types in reindex requests is deprecated.
  2. {
  3. "took" : 269,
  4. "timed_out" : false,
  5. "total" : 1000,
  6. "updated" : 0,
  7. "created" : 1000,
  8. "deleted" : 0,
  9. "batches" : 1,
  10. "version_conflicts" : 0,
  11. "noops" : 0,
  12. "retries" : {
  13. "bulk" : 0,
  14. "search" : 0
  15. },
  16. "throttled_millis" : 0,
  17. "requests_per_second" : -1.0,
  18. "throttled_until_millis" : 0,
  19. "failures" : [ ]
  20. }

④ 查看迁移后的数据

  1. GET /newbank/_search
  2. # 结果: 迁移后 type 统一为 _doc 移除 type
  3. {
  4. "took" : 367,
  5. "timed_out" : false,
  6. "_shards" : {
  7. "total" : 1,
  8. "successful" : 1,
  9. "skipped" : 0,
  10. "failed" : 0
  11. },
  12. "hits" : {
  13. "total" : {
  14. "value" : 1000,
  15. "relation" : "eq"
  16. },
  17. "max_score" : 1.0,
  18. "hits" : [
  19. {
  20. "_index" : "newbank",
  21. "_type" : "_doc",
  22. "_id" : "1",
  23. "_score" : 1.0,
  24. "_source" : {
  25. "account_number" : 1,
  26. "balance" : 39225,
  27. "firstname" : "Amber",
  28. "lastname" : "Duke",
  29. "age" : 32,
  30. "gender" : "M",
  31. "address" : "880 Holmes Lane",
  32. "employer" : "Pyrami",
  33. "email" : "amberduke@pyrami.com",
  34. "city" : "Brogan",
  35. "state" : "IL"
  36. }
  37. },
  38. ...