Index per timeframe - Elasticsearch 权威指南中文版

返回介绍

Index per timeframe

发布于 2019-07-04 字数 3752 浏览 814 评论 0

[[time-based]]
=== Time-Based Data

One of the most common use cases for Elasticsearch is for logging,(((“logging”, “using Elasticsearch for”)))(((“time-based data”)))(((“scaling”, “time-based data and”))) so common
in fact that Elasticsearch provides an integrated(((“ELK stack”))) logging platform called the
ELK stack—Elasticsearch, Logstash, and Kibana–to make the process easy.

http://www.elasticsearch.org/overview/logstash[Logstash] collects, parses, and
enriches logs before indexing them into Elasticsearch.(((“Logstash”))) Elasticsearch acts as
a centralized logging server, and
http://www.elasticsearch.org/overview/kibana[Kibana] is a(((“Kibana”))) graphic frontend
that makes it easy to query and visualize what is happening across your
network in near real-time.

Most traditional use cases for search engines involve a relatively static
collection of documents that grows slowly. Searches look for the most relevant
documents, regardless of when they were created.

Logging–and other time-based data streams such as social-network activity–are very different in nature. (((“social-network activity”))) The number of documents in the index grows
rapidly, often accelerating with time. Documents are almost never updated,
and searches mostly target the most recent documents. As documents age, they
lose value.

We need to adapt our index design to function with the flow of time-based
data.

[[index-per-timeframe]]
==== Index per Time Frame

If we were to have one big index for documents of this type, we would soon run
out of space. Logging events just keep on coming, without pause or
interruption. We could delete the old events, with a delete-by-query:

[source,json]

DELETE /logs/event/_query { “query”: { “range”: { “@timestamp”: { “lt”: “now-90d” } } } }

Deletes all documents where Logstash’s `@timestamp` field is
older than 90 days.

But this approach is very inefficient. Remember that when you delete a
document, it is only marked as deleted (see <>). It won’t
be physically deleted until the segment containing it is merged away.

Instead, use an index per time frame. (((“indices”, “index per-timeframe”)))You could start out with an index per
year (logs_2014) or per month (logs_2014-10). Perhaps, when your
website gets really busy, you need to switch to an index per day
(logs_2014-10-24). Purging old data is easy: just delete old indices.

This approach has the advantage of allowing you to scale as and when you need
to. You don’t have to make any difficult decisions up front. Every day is a
new opportunity to change your indexing time frames to suit the current demand.
Apply the same logic to how big you make each index. Perhaps all you need is
one primary shard per week initially. Later, maybe you need five primary shards
per day. It doesn’t matter–you can adjust to new circumstances at any
time.

Aliases can help make switching indices more transparent.(((“aliases, index”))) For indexing,
you can point logs_current to the index currently accepting new log events,
and for searching, update last_3_months to point to all indices for the
previous three months:

[source,json]

POST /_aliases { “actions”: [ { “add”: { “alias”: “logs_current”, “index”: “logs_2014-10” }}, { “remove”: { “alias”: “logs_current”, “index”: “logs_2014-09” }}, { “add”: { “alias”: “last_3_months”, “index”: “logs_2014-10” }}, { “remove”: { “alias”: “last_3_months”, “index”: “logs_2014-07” }} ] }

Switch `logs_current` from September to October.
Add October to `last_3_months` and remove July.

上一篇:Multiple indices

下一篇:Index templates

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

目前还没有任何评论,快来抢沙发吧!