组合查询 - Elasticsearch 权威指南中文版

返回介绍

组合查询

发布于 2019-07-04 字数 7104 浏览 848 评论 0

组合查询

在《组合过滤》中我们讨论了怎样用布尔过滤器组合多个用and, or, and not逻辑组成的过滤子句,在查询中, 布尔查询充当着相似的作用,但是有一个重要的区别。

过滤器会做一个判断: 是否应该将文档添加到结果集? 然而查询会做更精细的判断. 他们不仅决定一个文档是否要添加到结果集,而且还要计算文档的相关性(relevant).

像过滤器一样, 布尔查询接受多个用must, must_not, and should的查询子句. 例:

GET /my_index/my_type/_search
{
  "query": {
    "bool": {
      "must":     { "match": { "title": "quick" }},
      "must_not": { "match": { "title": "lazy"  }},
      "should": [
                  { "match": { "title": "brown" }},
                  { "match": { "title": "dog"   }}
      ]
    }
  }
}

在前面的查询中,凡是满足title字段中包含quick,但是不包含lazy的文档都会在查询结果中。到目前为止,布尔查询的作用非常类似于布尔过滤的作用。

should过滤器中有两个子句时不同的地方就体现出来了,下面例子就可以体现:一个文档不需要同时包含browndog,但如果同时有这两个词,这个文档的相关性就更高:

{
  "hits": [
     {
        "_id":      "3",
        "_score":   0.70134366, <1>
        "_source": {
           "title": "The quick brown fox jumps over the quick dog"
        }
     },
     {
        "_id":      "1",
        "_score":   0.3312608,
        "_source": {
           "title": "The quick brown fox"
        }
     }
  ]
}

文档3的得分更高,是因为它同时包含了`brown` 和 `dog`。

得分计算

布尔查询通过把所有符合mustshould的子句得分加起来,然后除以mustshould子句的总数为每个文档计算相关性得分。

must_not子句并不影响得分;他们存在的意义是排除已经被包含的文档。

精度控制

所有的 must 子句必须匹配, 并且所有的 must_not 子句必须不匹配, 但是多少 should 子句应该匹配呢? 默认的,不需要匹配任何 should 子句,一种情况例外:如果没有must子句,就必须至少匹配一个should子句。

像我们控制match查询的精度一样,我们也可以通过minimum_should_match参数控制多少should子句需要被匹配,这个参数可以是正整数,也可以是百分比。

GET /my_index/my_type/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "title": "brown" }},
        { "match": { "title": "fox"   }},
        { "match": { "title": "dog"   }}
      ],
      "minimum_should_match": 2 <1>
    }
  }
}

这也可以用百分比表示

结果集仅包含title字段中有"brown" 和 "fox", "brown" 和 "dog", 或 "fox" 和 "dog"的文档。如果一个文档包含上述三个条件,那么它的相关性就会比其他仅包含三者中的两个条件的文档要高。

<!–
[[bool-query]]
=== Combining Queries
In <> we discussed how to(((“full text search”, “combining queries”))), use the `bool` filter to combine
multiple filter clauses with `and`, `or`, and `not` logic. In query land, the
`bool` query does a similar job but with one important difference.
Filters make a binary decision: should this document be included in the
results list or not? Queries, however, are more subtle. They decide not only
whether to include a document, but also how _relevant_ that document is.
Like the filter equivalent, the `bool` query accepts(((“bool query”))) multiple query clauses
under the `must`, `must_not`, and `should` parameters. For instance:
[source,js]
————————————————–
GET /my_index/my_type/_search
{
“query”: {
“bool”: {
“must”: { “match”: { “title”: “quick” }},
“must_not”: { “match”: { “title”: “lazy” }},
“should”: [
{ “match”: { “title”: “brown” }},
{ “match”: { “title”: “dog” }}
]
}
}
}
————————————————–
// SENSE: 100_Full_Text_Search/15_Bool_query.json
The results from the preceding query include any document whose `title` field
contains the term `quick`, except for those that also contain `lazy`. So
far, this is pretty similar to how the `bool` filter works.
The difference comes in with the two `should` clauses, which say that: a document
is _not required_ to contain (((“should clause”, “in bool queries”)))either `brown` or `dog`, but if it does, then
it should be considered _more relevant_:
[source,js]
————————————————–
{
“hits”: [
{
“_id”: “3”,
“_score”: 0.70134366,
“_source”: {
“title”: “The quick brown fox jumps over the quick dog”
}
},
{
“_id”: “1”,
“_score”: 0.3312608,
“_source”: {
“title”: “The quick brown fox”
}
}
]
}
————————————————–
Document 3 scores higher because it contains both `brown` and `dog`.
==== Score Calculation
The `bool` query calculates(((“relevance scores”, “calculation in bool queries”)))(((“bool query”, “score calculation”))) the relevance `_score` for each document by adding
together the `_score` from all of the matching `must` and `should` clauses,
and then dividing by the total number of `must` and `should` clauses.
The `must_not` clauses do not affect (((“must_not clause”, “in bool queries”)))the score; their only purpose is to
exclude documents that might otherwise have been included.
==== Controlling Precision
All the `must` clauses must match, and all the `must_not` clauses must not
match, but how many `should` clauses(((“bool query”, “controlling precision”)))(((“full text search”, “combining queries”, “controlling precision”)))(((“precision”, “controlling for bool query”))) should match? By default, none of the `should` clauses are required to match, with one
exception: if there are no `must` clauses, then at least one `should` clause
must match.
Just as we can control the <>,
we can control how many `should` clauses need to match by using the
`minimum_should_match` parameter,(((“minimum_should_match parameter”, “in bool queries”))) either as an absolute number or as a
percentage:
[source,js]
————————————————–
GET /my_index/my_type/_search
{
“query”: {
“bool”: {
“should”: [
{ “match”: { “title”: “brown” }},
{ “match”: { “title”: “fox” }},
{ “match”: { “title”: “dog” }}
],
“minimum_should_match”: 2
}
}
}
————————————————–
// SENSE: 100_Full_Text_Search/15_Bool_query.json
This could also be expressed as a percentage.
The results would include only documents whose `title` field contains `”brown”
AND “fox”`, `”brown” AND “dog”`, or `”fox” AND “dog”`. If a document contains
all three, it would be considered more relevant than those that contain
just two of the three.
–>

上一篇:多词查询

下一篇:布尔匹配

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

目前还没有任何评论,快来抢沙发吧!