数据类型 - 过滤器类型（Percolator） - 《elasticsearch》

重新索引 percolator 查询
优化查询时间文本分析

percolator 字段将 Json 结构解析为一个本地查询，并将其存储，以便 percolator 查询可以用它来匹配提供的文档。
包含 json 对象的任何字段都可以配置为 percolator 字段。percolator 字段类型没有任何设置。仅仅配置 percolator字段类型就足以指示 elasticsearch 将字段视为查询。
如下映射将 query字段设置为 percolator类型：

curl -X PUT "localhost:9200/my_index?pretty" -H 'Content-Type: application/json' -d'
{
    "mappings": {
        "_doc": {
            "properties": {
                "query": {
                    "type": "percolator"
                },
                "field": {
                    "type": "text"
                }
            }
        }
    }
}
'

索引一个查询：

curl -X PUT "localhost:9200/my_index/_doc/match_value?pretty" -H 'Content-Type: application/json' -d'
{
    "query" : {
        "match" : {
            "field" : "value"
        }
    }
}
'

percolator 查询中引用的字段必须在用于 percolator 查询的索引映射中存在。为了确保这些字段存在，通过 create index与 put mappingAPIs 创建及更新映射。过滤查询引用的字段可以存在于任何类型的，包含 percolator 字段类型的索引中。

重新索引 percolator 查询
重新索引 percolator 查询有时候是为了获取新的发新版中 percolator字段的改进。可以使用 reindex api来对 percolator 查询重建索引。让我们看一下带有 percolator 字段的索引：

PUT index
{
  "mappings": {
    "_doc" : {
      "properties": {
        "query" : {
          "type" : "percolator"
        },
        "body" : {
          "type": "text"
        }
      }
    }
  }
}
POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "index",
        "alias": "queries" 
      }
    }
  ]
}
PUT queries/_doc/1?refresh
{
  "query" : {
    "match" : {
      "body" : "quick brown fox"
    }
  }
}

始终建议为索引定义别名，以便于一个重建索引的系统/应用不需要去知道 percolator 已经换了一个不同的索引。假设你将要升级到一个新的版本，为了让新的版本依然能够读取到你的查询，需要将查询重新索引到目前的新版本 elasticsearch 中：

PUT new_index
{
  "mappings": {
    "_doc" : {
      "properties": {
        "query" : {
          "type" : "percolator"
        },
        "body" : {
          "type": "text"
        }
      }
    }
  }
}
POST /_reindex?refresh
{
  "source": {
    "index": "index"
  },
  "dest": {
    "index": "new_index"
  }
}
POST _aliases
{
  "actions": [ 
    {
      "remove": {
        "index" : "index",
        "alias": "queries"
      }
    },
    {
      "add": {
        "index": "new_index",
        "alias": "queries"
      }
    }
  ]
}

如果有一个索引，不要忘记指向新的索引。通过queries别名执行 percolate查询：

GET /queries/_search
{
  "query": {
    "percolate" : {
      "field" : "query",
      "document" : {
        "body" : "fox jumps over the lazy dog"
      }
    }
  }
}

新索引返回匹配项：

{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 0.2876821,
        "hits": [
            {
                "_index": "new_index",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.2876821,
                "_source": {
                    "query": {
                        "match": {
                            "body": "quick brown fox"
                        }
                    }
                },
                "fields": {
                    "_percolator_document_slot": [
                        0
                    ]
                }
            }
        ]
    }
}

现在，从新索引中获取到了 percolator 命中的结果。

优化查询时间文本分析

当过滤器验证过滤候选匹配时，它将进行解析，执行查询时间文本分析，并且在要过滤文旦上实际运行过滤查询。这是针对每个候选匹配项，以及每次执行 percolator都要做的。对于查询解析，如果你的查询时间文本分析是相对比较昂贵的一部分，那么文本分析将会是过滤查询主要耗时的因素。当过滤器最终验证很多候选查询匹配时，该查询解析的开销会很明显。