速查手册03-Mapping-字段数据类型 - 《ElasticSearch知识梳理》

一览
示例

一览

tips：x-pack中的类型不做分析

配置项	作用	注意事项	默认值	使用场景
Aggregate metric（xpack）	存储数据的聚合结果，min,max等
Alias	别名，指向另一个资源，查询，聚合时使用	1：不能用于copy_to，多字段类型的字段 2：别名字段，不能在_source中展示 3：不支持对象类型，nested类型需要有同样的scope（？）	无	无代码改动，搜索一些字段时，可以指定别名，用于搜索另一个字段。
Arrays	数组类型，多值场景 1：字符串数组: [ “one”, “two” ] 2：数字数组: [ 1, 2 ] 3：数组数组: [ 1, [ 2, 3 ]] 等同于[ 1, 2, 3 ] 4：对象数组: [ { “name”: “Mary”, “age”: 12 }, { “name”: “John”, “age”: 10 }]	1：没有具体的定义数组类型 2：数组中数据类型要一致 3：对于对象类型，如果有关联型查询，使用nested	无
Binary	存储Base64编码字符串的二进制值,	1：不以默认的方式存储, 且不能被搜索 2：不能存储换行符 \n	doc_values：false store：false	二进制
Boolean	用于true和false的存储	1：true时支持：true,”true” 2：false是支持：false,”false” 3：版本有差异，在7.x官网说明空文本””代表false，但是在7.1.1测试，存储””时，查询false并不能查询，使用中可严格进行true或false的存储	1：boost默认：1.0 2：doc_value默认true 3：index默认true	只有true或者false的场景
Date	日期类型，es底层存储单位是毫秒	1：小数点类型，点后的数据会被舍弃，应避免 2：允许非负值，及1970年以来的毫秒值	strict_date_optional_time\|\|epoch_millis
Date nanoseconds	日期类型，es底层存储单位是纳秒	date日期类型的增强，存储纳秒值	strict_date_optional_time\|\|epoch_millis	存储纳秒的场景
Dense vector（x-pack）	向量存储
Flattened（7.3）	新的对象类型，默认情况下，一个对象类型字段，其中的所有子字段都会分别进行索引与映射(扁平化），这样可能会导致映射爆炸ES通过 flattened 类型，提供了一种折衷的解决方法， (1) flattened类型会将整个对象，映射为一整个字段 (2) 但是, flattened类型，只提供了部分的查询功能	不支持的查询：范围查询，match分词查询，高亮，不支持存储store	无	此数据类型对于索引具有大量或未知数量的唯一键的对象很有用。仅为整个 JSON 对象创建一个字段映射，这可以帮助防止由于大量不同的字段映射而导致映射爆炸。
Geo-point Geo-shape（xpack）	1：位置查询，在【速查手册06-QUERY DSL】中会有详细的记录，支持的查询： 1.1）：Geo-bounding-box（矩形中的坐标点） 1.2）：boxGeo-distance（找出与指定位置在给定距离内的点） 1.3）：Geo-polygon（找出落在多边形中的点。这个过滤器使用代价很大。当你觉得自己需要使用它，最好先看看 geo-shapes） 1.4）：Geo-shape（地理形状（ Geo-shapes ）使用一种与地理坐标点完全不同的方法。我们在计算机屏幕上看到的圆形并不是由完美的连续的线组成的） [

参考官网位置：https://www.elastic.co/guide/en/elasticsearch/reference/7.x/mapping-types.html

示例

ip示例

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "ip_addr": {
        "type": "ip"
      }
    }
  }
}
PUT my-index-000001/_doc/1
{
  "ip_addr": "192.168.1.1"
}
GET my-index-000001/_search
{
  "query": {
    "term": {
      "ip_addr": "192.168.0.0/16"
    }
  }
}
GET my-index-000001/_search
{
  "query": {
    "query_string" : {
      "query": "ip_addr:\"2001:db8::/48\""
    }
  }
}

join类型示例

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "my_id": {
        "type": "keyword"
      },
      "my_join_field": { 
        "type": "join",
        "relations": {
          "question": "answer" 
        }
      }
    }
  }
}
PUT my-index-000001/_doc/1?refresh
{
  "my_id": "1",
  "text": "This is a question",
  "my_join_field": {
    "name": "question" 
  }
}

PUT my-index-000001/_doc/2?refresh
{
  "my_id": "2",
  "text": "This is another question",
  "my_join_field": {
    "name": "question"
  }
}
PUT my-index-000001/_doc/3?routing=1&refresh
{
  "my_id": "3",
  "text": "This is an answer",
  "my_join_field": {
    "name": "answer", 
    "parent": "1" 
  }
}

PUT my-index-000001/_doc/4?routing=1&refresh
{
  "my_id": "4",
  "text": "This is another answer",
  "my_join_field": {
    "name": "answer",
    "parent": "1"
  }
}

#搜索 has_parent
GET my-index-000001/_search
{
  "query": {
    "has_parent": {
      "parent_type": "question",
      "query": {
        "match_all": {}
      }
    }
  }
}
#搜索 has_child
GET my-index-000001/_search
{
  "query": {
    "has_child": {
      "type": "answer",
      "query": {
        "term": {
          "my_id": {
            "value": "4"
          }
        }
      }
    }
  }
}

#聚合
GET my-index-000001/_search
{
  "query": {
    "parent_id": { 
      "type": "answer",
      "id": "1"
    }
  },
  "aggs": {
    "parents": {
      "terms": {
        "field": "my_join_field#question", 
        "size": 10
      }
    }
  }
}
#多级父子关系
PUT my-index-000001
{
  "mappings": {
    "properties": {
      "my_join_field": {
        "type": "join",
        "relations": {
          "question": ["answer", "comment"],  
          "answer": "vote" 
        }
      }
    }
  }
}
存储结构如下 ↓
  question
    /    \
   /      \
comment  answer
           |
           |
          vote

range field 示例

PUT range_index
{
  "settings": {
    "number_of_shards": 2
  },
  "mappings": {
    "properties": {
      "expected_attendees": {
        "type": "integer_range"
      },
      "time_frame": {
        "type": "date_range", 
        "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
      }
    }
  }
}

PUT range_index/_doc/1?refresh
{
  "expected_attendees" : { 
    "gte" : 10,
    "lt" : 20
  },
  "time_frame" : {
    "gte" : "2015-10-31 12:00:00", 
    "lte" : "2015-11-01"
  }
}

GET range_index/_search
{
  "query": {
    "term": {
      "time_frame": {
        "value": "2015-10-31 13:00:00"
      }
    }
  }
}

range_feature示例

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "pagerank": {
        "type": "rank_feature" 
      },
      "url_length": {
        "type": "rank_feature"
      }
    }
  }
}

PUT my-index-000001/_doc/1
{
  "pagerank": 2,
  "url_length": 23
}
PUT my-index-000001/_doc/2
{
  "pagerank": 5,
  "url_length": 5
}
PUT my-index-000001/_doc/3
{
  "pagerank": 105,
  "url_length": 22
}

GET my-index-000001/_search
{
  "explain": true, 
  "query": {
    "bool": {
      "should": [
        {
          "rank_feature": {
            "field": "pagerank",
            "boost":0.1
          }
        },
         {
          "rank_feature": {
            "field": "url_length",
            "boost": 0.2
          }
        }
      ]
    }
  }
}
#输出结果 3,1,2
简单分析：
以文档3的得分为例：
1：字段（pagerank）
w = boost自定义设置 = 0.1
k =  6.359375
S = 文档值 = 105
分数 = w * S / (S + k) = 10.5/(105+6.359375) =  0.094289325
2：同理：字段url_length得分：0.12137931 
总得分：0.21566863

token count使用场景

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "name": { 
        "type": "text",
        "fields": {
          "length": { 
            "type":     "token_count",
            "analyzer": "standard"
          }
        }
      }
    }
  }
}

PUT my-index-000001/_doc/1
{ "name": "John Smith" }

PUT my-index-000001/_doc/2
{ "name": "Rachel Alice Williams good" }

GET my-index-000001/_search
{
  "query": {
    "term": {
      "name.length": 4 
    }
  }
}