速查手册02-Mapping-字段参数 - 《ElasticSearch知识梳理》

一览
示例

一览

配置项	作用	注意事项	默认值	使用场景	示例
_source(元数据属性)	存储post json内容到ES的原始文档	关闭后，存储在倒排索引中，无法在_source中查看，且无法reindex，update和update_by_query	开启	不存储原始数据，或者原始数据在其他地方有存储，或者开启了store，从store中查看数据。如果磁盘空间是一个问题，而是增加压缩级别而不是禁用_source
index	是否加入倒排索引	关闭后无法对其进行搜索，字段仍会存储到_source和doc_values，字段可以被排序和聚合	开启	如果字段只需要聚合，不需要检索
doc_values	支持排序、聚合	会占用额外存储空间，与_source独立，同时开启doc_values和_source则会将该字段原始内容保存两份，数据在磁盘上采用列式存储，关闭后无法使用排序和聚合	开启	如果字段不需要聚合，可以关闭
enabled	是否对该字段进行处理	关闭后，只在_source中存储，类似index与doc_value的总开关	开启	不需要检索和聚合，只需要存储
store	是否单独存储该field	会占用额外存储空间，与_source独立，同时开启store和_source则会将该字段原始内容保存两份，字段单独存储，数据在磁盘上不连续，若读取多个字段需要seek多次，如需读取多个字段，需权衡比较_source与store效率	关闭
analyzer	指定分词器		standard	自定义分词器参考：https://www.yuque.com/timeasving/xovt29/ud3e8c
boost	建索引时，指定字段的权重			不建议建索引时指定，可查询时动态指定
coerce	强制数据类型转换	当设置false时：必须严格按mapping类型存储。比如设置integer类型，存储”10”将报错。设置为true时，允许存储10,10.1，”10”，但是存”10.1”时，倒排索引中是10，_source中是10.1，需要注意	true	一般不严格限制，默认true就可以
copy_to	将A和B字段索引时，拼接复制到C字段，查询时查C，可以查询A和B的内容。	1：对象类型不支持 2：copy_to后的值，在_source中和store都看不到，如果想看，可在query同级设置”stored_fields”: [“name”]。 3：拼接后的数据，是一个数组，不可进行折叠查看，折叠只支持单类型 collapse	无	使用效果和muity_match时类型使用cross_field效果一致。但是可以解决长尾数据问题和数据不均为分配时的召回顺序，其实底层就是新加一个字段，拼接两个字段到倒排索引中。通过 mapping加新字段和代码录入时处理效果一样。另外，新增的索引字段，同样生效
dynamic	是否支持动态增加字段	true：新字段被添加到映射中（默认）。 runtime：新字段作为运行时字段添加到映射中。这些字段未编入索引，并_source在查询时加载。 false：这些字段不会添加到映射中，必须显式添加新字段。 strict：新字段必须显式添加到映射中，动态插入未提前定义的字段报错	true	一般用于使用动态模板时使用，防止自定义产生一些垃圾字段。
eager_global_ordinals	全局唯一序号，提高聚合速度	1：千万不能在frozen index上使用，全局序数在每次搜索后都会被丢弃，并在被请求时重新构建。2：数据量巨大时慎用，高基数字段上的聚合可能会使用大量内存并触发字段数据断路器	false	聚合keyword，ip和flattened，这包括terms上面提到的聚合，以及 diversified_sampler和 significant_terms。 text需要启用fielddata以使用桶聚合
format	日期设置格式	格式需要对应才能索引数据	ISO的一些格式，包括毫秒和strict_date_optional_time等等	一般使用时，指定需要的格式即可，比如：”format”:”yyyy-MM-dd HH:mm:ss \|\| yyyy-MM-dd \|\|epoch_millis”
ignore_above	keyword过长时，限制长度，不会被索引和store	1：限制长度后，多余的部分不会存在于store模块和doc value中，但是会在_source中保存原始数据。 2：ES限制的是字节数量，但是Lucene使用的是字节，如果使用utf-8（最多4个字符），则最大长度支持：32766/4=8191	默认最大：32766，超过后会索引报错	keyword类型且可能很大的数据
ignore_malformed	忽略插入时类型错误	1：可以动态修改 2：支持的类型：number（统称），date，date_nanos，geo_point，geo_shape，ip 3：不能处理 nested，object，range 数据中的类型错误	false	可能发生类型错误，且不影响插入的场景，可以设置此参数	1：索引级别设置：”settings”: { “index.mapping.ignore_malformed”: true }, 2：字段级别设置：”ignore_malformed”: false
index_options	1：此设置只支持text类型 2：倒排索引记录哪些值，支持：docs（编号），freqs（编号+词频），positions（默认，编号+词频+位置），offsets（词项开始和结束字符位置）	无	positions	不需要match_phase查询时，可考虑使用freqs，将不记录位置信息
index_phrases	优化match_phase查询，将词term两个合并成一个词组后进行查询	1：是text的属性 2：适用于match_phase	false	需要使用短语查询的地方，使用此设置，可提高一倍的查询性能。
index_prefixes	优化prefix查询，通过设置min_charts和max_charts，可以减少字段过少时，匹配过多的情况	1：适用于优化prefix查询	无	1：需要前缀查询的业务场景 2：对于suggest查询也有一定的场景使用，防止召回数据过多
meta	ES 7.x版本下测试，不支持的类型
fields	子字段	无	无	非常常用，用于建立一个字段的不同子字段，比如：animal字段下有cat和dog
normalizer	normalizer是 keyword的一个属性，可以对 keyword生成的单一 Term再做进一步的处理，比如 lowercase，即做小写变换	是keyword的属性	无	1：对keyword的进一步处理时有用
norms	是否text开启或关闭长度打分	1：text的属性 2：一般不需要设置	true	主要用于优化，暂未找到使用场景
null_value	空数据段时的默认值	无	无	1：数据为空需要设置默认值的业务场景。
position_increment_gap	设置数组中，每个元素之间的间隙，默认100。	适用于match_phase查询	100	1：值是数组，且短语查询时，根据业务需要，指定间隙长度。
properties	为nested或者object类型设置子类型时使用	适用于 nested或者object，子类（子字段）可以是任意类型。	无	很常用，当字段类型是nested或者object时使用
search_analyzer	搜索时指定分析器	无	无	适用于索引和搜索，使用不同分析器时使用。更加灵活。比如索引使用ngram，搜索可以设置成ngram类似模糊搜索，也可以指定成ik查询，或其他查询
similarity	指定相似度模型	1：建索引时设置	bm25	1：默认bm25可满足大部分场景 2：可调整bm25的一些参数 3：可自定义打分模型（需要进行插件开发，不常用）
term_vector	no 不存储术语向量。（默认） yes 只存储字段中的术语。 with_positions 存储条款和位置。 with_offsets 存储术语和字符偏移。 with_positions_offsets 存储术语、位置和字符偏移。 with_positions_payloads 存储术语、位置和有效载荷。 with_positions_offsets_payloads 存储术语、位置、偏移量和有效载荷。	1：设with_positions_offset将使字段索引的大小加倍 2：可召回数据的向量信息	no	暂为找到使用场景

示例

dynamic示例

出入数据时，如果是ES自动检测数据类型，dynamic设置为true和runtime时，映射关系

插入数据时的数据类型	索引设置：”dynamic”:”true”时	索引设置：”dynamic”:”runtime”时
null	不自动添加字段类型	不自动添加字段类型
true or false	boolean	boolean
double	float	double
integer	long	long
object	object	不自动添加字段类型
array	根据数组中第一个非空值来判断	根据数组中第一个非空值来判断
string类型，ES检测是日期	date	date
string类型，ES检测是数字	float 或者 long	double 或者 long
string类型，ES检测非日期也非数字	text和keword子字段	keyword

PUT my-index-000001
{
  "mappings": {
    "dynamic": false, 
    "properties": {
      "user": { 
        "properties": {
          "name": {
            "type": "text"
          },
          "social_networks": {
            "dynamic": true, 
            "properties": {}
          }
        }
      }
    }
  }
}

_source示例

PUT my-index-000001{
  "mappings": {
    "_source": {
      "excludes": [
        "*_not_source"
      ]
    },
    "properties": {
      "user_id": {
        "type":  "keyword"
      },
      "session_data_not_source": {
        "type": "object"
      }
    }
  }
}
PUT my-index-000001/_doc/session_2
{ "user_id":"111",
  "session_data_not_source": {
    "some_array": [
      {
        "foo": "bar"
      },
      {
        "baz": 2
      }
    ]
  }
}
GET my-index-000001/_doc/session_2

copy_to示例

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "first_name": {
        "type": "text",
        "copy_to": "full_name" 
      },
      "last_name": {
        "type": "text",
        "copy_to": "full_name" 
      },
      "full_name": {
        "type": "text"
      }
    }
  }
}
PUT my-index-000001/_doc/1
{
  "first_name": "John",
  "last_name": "Smith"
}
GET my-index-000001/_search
{
  "query": {
    "match": {
      "full_name": { 
        "query": "John Smith",
        "operator": "and"
      }
    }
  }
}

match_phrase示例

POST /my-index-000001/_mapping
{
  "properties": {
    "text_index_phrase6_true": {
      "type": "text",
      "index_phrases": "true"
    }
  }
}
POST /my-index-000001/_mapping
{
  "properties": {
    "text_index_phrase6_false": {
      "type": "text",
      "index_phrases": "false"
    }
  }
}
PUT my-index-000001/_doc/1
{
    "text_index_phrase6_true" : "我爱广州小蛮腰",
    "text_index_phrase6_false" : "我爱广州小蛮腰"
}
GET my-index-000001/_search
{
  "explain": true, 
  "query": {
    "match": {
      "text_index_phrase6_true": "广州"
    }
  }
}
GET my-index-000001/_search
{
   "explain": true, 
  "query": {
    "match_phrase": {
      "text_index_phrase6_false": "广州"
    }
  }
}
GET /my-index-000001/_termvectors/1?fields=text_index_phrase6_true._index_phrase

normalizer示例

PUT test_normalizer
{
  "settings": {
    "analysis": {
      "normalizer": {
        "lowercase": {
          "type": "custom",
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "type": {
          "type": "keyword"
        },
        "type_normalizer": {
          "type": "keyword",
          "normalizer": "lowercase"
        }
      }
    }
  }
}
PUT test_normalizer/doc/1
{
  "type": "apple",
  "type_normalizer": "apple"
}
PUT test_normalizer/doc/2
{
  "type": "Apple",
  "type_normalizer": "Apple"
}
# 查询三
GET test_normalizer/_search
{
  "query": {
    "term":{
      "type":"aPple"
    }
  }
}
# 查询四
GET test_normalizer/_search
{
  "query": {
    "term":{
      "type_normalizer":"aPple"
    }
  }
}

null_value示例

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "status_code": {
        "type":       "keyword",
        "null_value": "_es_default_null" 
      }
    }
  }
}
PUT my-index-000001/_doc/1
{
  "status_code": "null"
}
PUT my-index-000001/_doc/2
{
  "status_code": null
}
PUT my-index-000001/_doc/3
{
  "status_code": [] 
}
GET my-index-000001/_search
{
  "query": {
    "term": {
      "status_code": "_es_default_null" 
    }
  }
}
# 结果：文档2

position_increment_gap示例

# 1 可以搜索到数据
PUT my-index-000001
{
  "mappings": {
    "properties": {
      "names": {
        "type": "text",
        "position_increment_gap": 0 
      }
    }
  }
}
PUT my-index-000001/_doc/1
{
  "names": [ "John Abraham", "Lincoln Smith"]
}
GET my-index-000001/_search
{
  "explain": true, 
  "query": {
    "match_phrase": {
      "names": "Abraham Lincoln" 
    }
  }
}
DELETE my-index-000001
# 2 不设置slop不能搜索到数据，设置100以上的slope竟然搜索到了，证明默认长度时100
PUT my-index-000001/_doc/1
{
  "names": [ "John Abraham", "Lincoln Smith"]
}
GET my-index-000001/_search
{
  "query": {
    "match_phrase": {
      "names": {
        "query": "Abraham Lincoln" 
      }
    }
  }
}
GET my-index-000001/_search
{
  "query": {
    "match_phrase": {
      "names": {
        "query": "Abraham Lincoln",
        "slop": 100 
      }
    }
  }
}

参考官网位置：https://www.elastic.co/guide/en/elasticsearch/reference/7.x/mapping-params.html