Intro - Elasticsearch 权威指南中文版

返回介绍

Intro

发布于 2019-07-04 字数 2263 浏览 966 评论 0

[[language-intro]]
== Getting Started with Languages

Elasticsearch ships with a collection of language analyzers that provide
good, basic, out-of-the-box (((“language analyzers”)))(((“languages”, “getting started with”)))support for many of the world’s most common
languages:

Arabic, Armenian, Basque, Brazilian, Bulgarian, Catalan, Chinese,
Czech, Danish, Dutch, English, Finnish, French, Galician, German, Greek,
Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Korean, Kurdish,
Norwegian, Persian, Portuguese, Romanian, Russian, Spanish, Swedish,
Turkish, and Thai.

These analyzers typically(((“language analyzers”, “roles performed by”))) perform four roles:

  • Tokenize text into individual words:
  • The quick brown foxes -> [The, quick, brown, foxes]
  • Lowercase tokens:
  • The -> the
  • Remove common stopwords:
  • [The, quick, brown, foxes] -> [quick, brown, foxes]
  • Stem tokens to their root form:
  • foxes -> fox

Each analyzer may also apply other transformations specific to its language in
order to make words from that(((“language analyzers”, “other transformations specific to the language”))) language more searchable:

  • The english analyzer (((“english analyzer”)))removes the possessive 's:
  • John's -> john
  • The french analyzer (((“french analyzer”)))removes elisions like l' and qu' and
    diacritics like ¨ or ^:
  • l'église -> eglis
  • The german analyzer normalizes(((“german analyzer”))) terms, replacing ä and ae with a, or
    ß with ss, among others:
  • äußerst -> ausserst

上一篇:Conclusion

下一篇:Using

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

目前还没有任何评论,快来抢沙发吧!