Shard - Elasticsearch 权威指南中文版



发布于 2019-07-04 字数 2049 浏览 944 评论 0

=== The Unit of Scale

In <>, we explained that a shard is a Lucene index and that
an Elasticsearch index is a collection of shards.(((“scaling”, “shard as unit of scale”))) Your application talks to an
index, and Elasticsearch routes your requests to the appropriate shards.

A shard is the unit of scale. (((“shards”, “as unit of scale”))) The smallest index you can have is one with a
single shard. This may be more than sufficient for your needs–a single
shard can hold a lot of data–but it limits your ability to scale.

Imagine that our cluster consists of one node, and in our cluster we have one
index, which has only one shard:


PUT /my_index { “settings”: { “number_of_shards”: 1, “number_of_replicas”: 0 } }

Create an index with one primary shard and zero replica shards.

This setup may be small, but it serves our current needs and is cheap to run.

At the moment we are talking about only primary shards.(((“primary shards”))) We discuss
replica shards in <>.


One glorious day, the Internet discovers us, and a single node just can’t keep up with
the traffic. We decide to add a second node, as per <>. What happens?

.An index with one shard has no scale factor
image::images/elas_4401.png[“An index with one shard has no scale factor”]

The answer is: nothing. Because we have only one shard, there is nothing to
put on the second node. We can’t increase the number of shards in the index,
because the number of shards is an important element in the algorithm used to
<<routing-value,route documents to shards>>:

shard = hash(routing) % number_of_primary_shards

Our only option now is to reindex our data into a new, bigger index that has
more shards, but that will take time that we can ill afford. By planning
ahead, we could have avoided this problem completely by overallocating.




需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。