Tuesday, June 16, 2020

Elasticsearch - What is shard?

In Elasticsearch (ES), index is mapping to RDMS table, and the set of indices available are grouped in a cluster, mapping to database/catalog. Data in Elasticsearch is stored in one or more indices. Data in an index is partitioned across shards to make storage more manageable.

Sharding is splitting up the your index data into a number of chunks so that searches can operate on multiple parts in parallel.

Each shard has a state that needs to be kept in memory for fast access. The more shards you use, the more overhead can build up and affect resource usage and performance.

Each shard is replicated based on the number_of_replicas setting for the index.

Shard can have one or many replicas, it is also important not to have too many replicas. The primary shard is the main shard that handles the indexing of documents and can also handle processing of queries. The replica shards process queries but do not index documents directly.

Replica shards must reside on a different host than their primary.

By default shards are automatically spread across the number of hosts in the cluster, but multiple primary shards can be placed on the same physical host.

Shards can not be further divided. Each individual shard must reside on only one host.

The number of shards that an index creates can be set during index creation or a global default can be used. Once the index is created, the number of shards cannot be changed without reindexing.

The number of replicas that an index has can be set either during index creation or a global default can be used. This can be changed after the index is created.


No comments:

Post a Comment