Intro - Elasticsearch 权威指南中文版



发布于 2019-07-04 字数 2263 浏览 966 评论 0

== Getting Started with Languages

Elasticsearch ships with a collection of language analyzers that provide
good, basic, out-of-the-box (((“language analyzers”)))(((“languages”, “getting started with”)))support for many of the world’s most common

Arabic, Armenian, Basque, Brazilian, Bulgarian, Catalan, Chinese,
Czech, Danish, Dutch, English, Finnish, French, Galician, German, Greek,
Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Korean, Kurdish,
Norwegian, Persian, Portuguese, Romanian, Russian, Spanish, Swedish,
Turkish, and Thai.

These analyzers typically(((“language analyzers”, “roles performed by”))) perform four roles:

  • Tokenize text into individual words:
  • The quick brown foxes -> [The, quick, brown, foxes]
  • Lowercase tokens:
  • The -> the
  • Remove common stopwords:
  • [The, quick, brown, foxes] -> [quick, brown, foxes]
  • Stem tokens to their root form:
  • foxes -> fox

Each analyzer may also apply other transformations specific to its language in
order to make words from that(((“language analyzers”, “other transformations specific to the language”))) language more searchable:

  • The english analyzer (((“english analyzer”)))removes the possessive 's:
  • John's -> john
  • The french analyzer (((“french analyzer”)))removes elisions like l' and qu' and
    diacritics like ¨ or ^:
  • l'église -> eglis
  • The german analyzer normalizes(((“german analyzer”))) terms, replacing ä and ae with a, or
    ß with ss, among others:
  • äußerst -> ausserst




需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。