Wildcard Regexp - Elasticsearch 权威指南中文版

返回介绍

Wildcard Regexp

发布于 2019-07-04 字数 3530 浏览 924 评论 0

=== wildcard and regexp Queries

The wildcard query is a low-level, term-based query (((“wildcard query”)))(((“partial matching”, “wildcard and regexp queries”)))similar in nature to the
prefix query, but it allows you to specify a pattern instead of just a prefix.
It uses the standard shell wildcards: ? matches any character, and *
matches zero or more characters.(((“postcodes (UK), partial matching with”, “wildcard queries”)))

This query would match the documents containing W1F 7HW and W2F 8HW:

[source,js]

GET /my_index/address/_search { “query”: { “wildcard”: { “postcode”: “W?F*HW” } } }

// SENSE: 130_Partial_Matching/15_Wildcard_regexp.json

The `?` matches the `1` and the `2`, while the `*` matches the space
and the `7` and `8`.

Imagine now that you want to match all postcodes just in the W area. A
prefix match would also include postcodes starting with WC, and you would
have a similar problem with a wildcard match. We want to match only postcodes
that begin with a W, followed by a number.(((“postcodes (UK), partial matching with”, “regexp query”)))(((“regexp query”))) The regexp query allows you to
write these more complicated patterns:

[source,js]

GET /my_index/address/_search { “query”: { “regexp”: { “postcode”: “W[0-9].+” } } }

// SENSE: 130_Partial_Matching/15_Wildcard_regexp.json

The regular expression says that the term must begin with a `W`, followed
by any number from 0 to 9, followed by one or more other characters.

The wildcard and regexp queries work in exactly the same way as the
prefix query. They also have to scan the list of terms in the inverted
index to find all matching terms, and gather document IDs term by term. The
only difference between them and the prefix query is that they support more-complex patterns.

This means that the same caveats apply. Running these queries on a field with
many unique terms can be resource intensive indeed. Avoid using a
pattern that starts with a wildcard (for example, *foo or, as a regexp, .*foo).

Whereas prefix matching can be made more efficient by preparing your data at
index time, wildcard and regular expression matching can be done only
at query time. These queries have their place but should be used sparingly.

The prefix, wildcard, and regexp queries operate on terms. If you use
them to query an analyzed field, they will examine each term in the
field, not the field as a whole.(((“prefix query”, “on analyzed fields”)))(((“wildcard query”, “on analyzed fields”)))(((“regexp query”, “on analyzed fields”)))(((“analyzed fields”, “prefix, wildcard, and regexp queries on”)))

For instance, let’s say that our title field contains `Quick brown fox'' which produces the termsquick,brown, andfox`.

This query would match:

[source,json]

{ “regexp”: { “title”: “br.*” }}

But neither of these queries would match:

[source,json]

{ “regexp”: { “title”: “Qu.” }} { “regexp”: { “title”: “quick br” }}

The term in the index is `quick`, not `Quick`.
`quick` and `brown` are separate terms.

=================================================

上一篇:Prefix query

下一篇:Match phrase prefix

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

目前还没有任何评论,快来抢沙发吧!