Friday, June 19, 2020

Elasticsearch - how many indices and shards?

Elastic.co provides many best practices regarding Elasticsearch configurations. A lot of the decisions around how to best distribute your data across indices and shards will however depend on the use-case specifics, and it can sometimes be hard to determine how to best apply the advice available.

Use multiple indexes.
ES stack usually creates daily indexes by default, which is a good practice. You can then use aliases to limit the scope of searches to specific date ranges, curator to remove old indexes as they age, and modify index settings as your data grows without having to reindex the old data.

Data with a longer retention period, especially if the daily volumes do not warrant the use of daily indices, often use weekly or monthly indices in order to keep the shard size up.

It is now possible to switch to a new index at a specific size, which makes it possible to more easily achieve an even shard size for all indices.

Avoid big index and big shard.
If a shard is larger than 40% of the size of a data node, that shard is probably too big. Shards should be no larger than 50GB. Reindex to an index with more shards.

Avoid too many indexes and shards.
Having a large number of indices and shards in a cluster can therefore result in a large cluster state, especially if mappings are large. This can become slow to update as all updates need to be done through a single thread in order to guarantee consistency before the changes are distributed across the cluster.

In order to reduce the number of indices and avoid large and sprawling mappings, consider storing data with similar structure in the same index.

The more heap space a node has, the more data and shards it can handle. Indices and shards are therefore not free from a cluster perspective, as there is some level of resource overhead for each index and shard.

Small shards result in small segments, which increases overhead. For use-cases with time-based data, it is common to see shards between 20GB and 40GB in size.

The number of shards you can hold on a node will be proportional to the amount of heap you have available. The number of shards per node per GB heap is no more than 20, so if you have 10GB heap size, then you should not have more than 200 shards on that data node.

Manage the index lifecycle.
  • Use rollover API to avoid having too large or too small shards when volumes are unpredictable. Rolls an alias over to a new index when the existing index meets one of the rollover conditions, like size, age, and document count.
  • Use shrink API to shrink an existing index into a new index with fewer primary shards.
  • Force merge: Reduce the number of index segments and purge deleted documents. Makes the index read-only.
  • Freeze the index to minimize its memory footprint.


Tuesday, June 16, 2020

Elasticsearch - What is shard?

In Elasticsearch (ES), index is mapping to RDMS table, and the set of indices available are grouped in a cluster, mapping to database/catalog. Data in Elasticsearch is stored in one or more indices. Data in an index is partitioned across shards to make storage more manageable.

Sharding is splitting up the your index data into a number of chunks so that searches can operate on multiple parts in parallel.

Each shard has a state that needs to be kept in memory for fast access. The more shards you use, the more overhead can build up and affect resource usage and performance.

Each shard is replicated based on the number_of_replicas setting for the index.

Shard can have one or many replicas, it is also important not to have too many replicas. The primary shard is the main shard that handles the indexing of documents and can also handle processing of queries. The replica shards process queries but do not index documents directly.

Replica shards must reside on a different host than their primary.

By default shards are automatically spread across the number of hosts in the cluster, but multiple primary shards can be placed on the same physical host.

Shards can not be further divided. Each individual shard must reside on only one host.

The number of shards that an index creates can be set during index creation or a global default can be used. Once the index is created, the number of shards cannot be changed without reindexing.

The number of replicas that an index has can be set either during index creation or a global default can be used. This can be changed after the index is created.


From intern to CEO

The messages and experiences from Enrique Lores, CEO of HP Inc. are so inspiring.

Enrique started his career as an intern at HP, and moved all the way to CEO of HP in the past 30+ years. He shared passion, consistent learning, and thinking are key factors for his success. This actually reconciles deliver happiness, read books, and think 10 minutes every day.

When talking about coronavirus pandemic impact, he mentioned it will change the mindset of WFH opportunity for many CEOs, and for manufactures, need to globalize supply chain, and etc.

He used a garage background in the video call, and that garage was the place where Hewlett and Packard began their company in Palo Alto. This implies his love to HP. If we want to succeed, we need to love the work, be passionate about the work.

He also talked about caring community, caring company, DNA of the company, diversity, innovations, work and life equally important, and many many insights.

When talking about his leadership style, Enrique shared the following points:
  • imagination and belief
  • risk management
  • strategy and execution balance
  • team work
Soft with people, but tough with problems. This is my key takeaway from his sharing. This is impressive.

Tuesday, June 9, 2020

RSU 101

RSUs (Restricted stock units) are a form of stock-based employee compensation. RSUs give an employee an incentive to stay with a company long term and help it perform well so that their shares increase in value.

RSU has grant date and vest date, usually employer grants an amount of RSU with a vesting period of four years in bay area. Once vested, the RSUs are just like any other shares of company stock.

Unlike ESPP or stock options, there are no any tax advantages to hold vested RSUs.

There is likewise no tax reason to hold RSU shares after the vesting date, because RSUs are taxed as they vest. The employer will withhold federal and state income tax on RSU income at the mandatory “supplemental” withholding rates, which are different from regular income tax withholding rates. For tax purposes the entire value of vested RSUs must be included as ordinary income in the year of vesting.

RSUs aren't eligible for the Internal Revenue Code (IRC) 83(b) Election, which allows an employee to pay tax before vesting, as the Internal Revenue Service (IRS) doesn't consider them tangible property.

With that, it’s best to sell your vested RSU shares as soon as they vest, and add the proceeds to your well-diversified investment portfolio.

However, if you believe your company stock price will go up, you can choose to hold it to save your keyboard typing time. And, if you are considered a company insider or possess material non-public information about the company, you may need to hold your RSU shares until you are no longer in danger of violating insider-trading laws.