Using - Elasticsearch 权威指南中文版

返回介绍

Using

发布于 2019-07-04 字数 2672 浏览 823 评论 0

[[using-language-analyzers]]
=== Using Language Analyzers

The built-in language analyzers are available globally and don’t need to be
configured before being used.(((“language analyzers”, “using”))) They can be specified directly in the field
mapping:

[source,js]

PUT /my_index { “mappings”: { “blog”: { “properties”: { “title”: { “type”: “string”, “analyzer”: “english” } } } } }

The `title` field will use the `english` analyzer instead of the default
`standard` analyzer.

Of course, by passing (((“english analyzer”, “information lost with”)))text through the english analyzer, we lose
information:

[source,js]

GET /my_index/_analyze?field=title I’m not happy about the foxes

Emits token: `i’m`, `happi`, `about`, `fox`

We can’t tell if the document mentions one fox or many foxes; the word
not is a stopword and is removed, so we can’t tell whether the document is
happy about foxes or not. By using the english analyzer, we have increased
recall as we can match more loosely, but we have reduced our ability to rank
documents accurately.

To get the best of both worlds, we can use <<multi-fields,multifields>> to
index the title field twice: once(((“multifields”, “using to index a field with two different analyzers”))) with the english analyzer and once with
the standard analyzer:

[source,js]

PUT /my_index { “mappings”: { “blog”: { “properties”: { “title”: { “type”: “string”, “fields”: { “english”: { “type”: “string”, “analyzer”: “english” } } } } } } }

The main `title` field uses the `standard` analyzer.
The `title.english` subfield uses the `english` analyzer.

With this mapping in place, we can index some test documents to demonstrate
how to use both fields at query time:

[source,js]

PUT /my_index/blog/1
{ “title”: “I’m happy for this fox” }

PUT /my_index/blog/2
{ “title”: “I’m not happy about my fox problem” }

GET /_search { “query”: { “multi_match”: { “type”: “most_fields”, “query”: “not happy foxes”, “fields”: [ “title”, “title.english” ] } } }

Use the <> query type to match the
same text in as many fields as possible.

Even (((“most fields queries”)))though neither of our documents contain the word foxes, both documents
are returned as results thanks to the word stemming on the title.english
field. The second document is ranked as more relevant, because the word not
matches on the title field.

上一篇:Intro

下一篇:Configuring

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

目前还没有任何评论,快来抢沙发吧!