Multi value fields - Elasticsearch 权威指南中文版


Multi value fields

发布于 2019-07-04 字数 2927 浏览 715 评论 0

=== Multivalue Fields

A curious thing can happen when you try to use phrase matching on multivalue
fields. (((“proximity matching”, “on multivalue fields”)))(((“match_phrase query”, “on multivalue fields”))) Imagine that you index this document:


PUT /my_index/groups/1 { “names”: [ “John Abraham”, “Lincoln Smith”] }

// SENSE: 120_Proximity_Matching/15_Multi_value_fields.json

Then run a phrase query for Abraham Lincoln:


GET /my_index/groups/_search { “query”: { “match_phrase”: { “names”: “Abraham Lincoln” } } }

// SENSE: 120_Proximity_Matching/15_Multi_value_fields.json

Surprisingly, our document matches, even though Abraham and Lincoln
belong to two different people in the names array. The reason for this comes
down to the way arrays are indexed in Elasticsearch.

When John Abraham is analyzed, it produces this:

  • Position 1: john
  • Position 2: abraham

Then when Lincoln Smith is analyzed, it produces this:

  • Position 3: lincoln
  • Position 4: smith

In other words, Elasticsearch produces exactly the same list of tokens as it would have
for the single string John Abraham Lincoln Smith. Our example query
looks for abraham directly followed by lincoln, and these two terms do
indeed exist, and they are right next to each other, so the query matches.

Fortunately, there is a simple workaround for cases like these, called the
position_offset_gap, which(((“mapping (types)”, “position_offset_gap”)))(((“position_offset_gap”))) we need to configure in the field mapping:


DELETE /my_index/groups/

PUT /my_index/_mapping/groups { “properties”: { “names”: { “type”: “string”, “position_offset_gap”: 100 } } }

// SENSE: 120_Proximity_Matching/15_Multi_value_fields.json

First delete the `groups` mapping and all documents of that type.
Then create a new `groups` mapping with the correct values.

The position_offset_gap setting tells Elasticsearch that it should increase
the current term position by the specified value for every new array
element. So now, when we index the array of names, the terms are emitted with
the following positions:

  • Position 1: john
  • Position 2: abraham
  • Position 103: lincoln
  • Position 104: smith

Our phrase query would no longer match a document like this because abraham
and lincoln are now 100 positions apart. You would have to add a slop
value of 100 in order for this document to match.




需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。