Aliased index rebuilds in Elasticsearch

Aliases are really useful tools in Elasticsearch. They allow you to create something that looks and acts like an index, but is really a pointer to another index (or multiple indices).

One really handy use of aliases is in doing full-index rebuilds. There are a handful of reasons you may need to do a rebuld:

  1. You only rebuild an index periodically – either on a schedule or in response to some sort of event (such as an import of some sort).
  2. You do online updates to your index, but want to periodically rebuild it to keep it in sync with the system of record in cases where you might miss new/updated data for various reasons.
  3. You do online updates to your index, but want to do a full rebuild in background in response to certain events, because handling them online would be inefficient or cause delays to real-time events.
  4. You want to update your index in a way that is backwards compatible, and do so in a way that there is no downtime.

The general idea behind an aliased index rebuild is that you never query an index directly – instead, you query an alias that points to the real index. When you need to rebuild the index, you create a new index, populate it, and then point your alias at it.

There are several advantages to doing this:

  1. Your index can be rebuilt as often as you’d like, with no downtime – even if it takes a long time to rebuild.
  2. You can keep a history (space allowing) of past index instances – if there is a problem with new data, you can easily roll back to a previous index.

There are two main strategies for how you structure your backing indexes:

  1. Rolling index. You choose to have 2 or more backing indices. When you start, it will be on index 0. Subsequent rebuilds increment the index counter, until it reaches the amount of indices you want to keep, then roll over and start at the beginning. So, for example, if you chose to only have 2 indices (one hot, one cold), the first time you built your index, it would point to 0. Then when you rebuilt the index, index 1 would be created and when it was done being populated, the alias would point to it. When you rebuilt it a second time, index 0 would be deleted and rebuilt with new data, then the alias would switch from 1 to 0.
  2. Time-based. You start with an index that has some element of time in it (for example, 2016-04-09) and then create new ones also based on time. The alias always points at the latest-built one. Pruning old indices when they are already needed is done by a separate process.

We’re going to look at how to do the first one – using a rolling index – in this example. Sample code that you can run can be found here. The steps are roughly as follows:

  1. Get a lock to make sure no other process can also attempt to rebuild the index. This would leave the alias/index in a bad state and would possibly result in downtime.
  2. Figure out which index (i), if any, the alias currently belongs to. If the alias doesn’t exist or doesn’t point to any index yet, we’ll start at 0. Otherwise, we start at i + 1. The underlying index name will be <alias name>-<index number>. So, if our alias was called ‘cars’ and we were building it for the first time, the index would be named ‘cars-0’.
  3. If an index with this name already exists, delete it.
  4. Create the index, with the given properties (index name, shards, replica count).
  5. Put the type mapping, if needed.
  6. Rebuild the index, using a user-supplied function.
  7. Create the alias (if it doesn’t exist), or switch an existing alias to point to our newly-created and populated index.
  8. Release the lock (if any errors occur, this probably should happen as well).

The sample project will go through all of these steps using the Java API. It isn’t production-ready code, but it will run out of the box against Elasticsearch 2.3 (run net.uresk.samples.elastic.aliasing.sample.AliasRebuildApp) and should get you pointed in the right direction. All of the interesting code is in net.uresk.samples.elastic.aliasing.AliasingRebuildService, which I’ll walk through now.

Step 1 – Get a lock

We want to create a distributed locking mechanism to ensure that only one thing is attempting to rebuild any given index at a time. If two process were to rebuild an index at the same time, they would overwrite each other in various ways, and would almost certainly result in temporary availability issues with the index.

We can do this by creating an empty document in a special locks index in Elasticsearch. If we try to index a document and set create = true, it will fail if the document already exists. This guarantees that if we successfully create a document, we now hold the lock and can proceed.

This code will attempt to create a document. It will return true if it does, and false if it cannot. If the result is false, we assume something else is rebuilding the index and terminate.

One thing we don’t address here is a timeout on the lock – Elasticsearch does support TTLs on documents, and we could use this so that a lock would expire after a certain amount of time. This is useful to handle cases where a rebuild job crashes for whatever reason and you don’t want future jobs to get blocked. However, figuring out a good TTL value takes a bit of work, and I’ve left it out of this example.

Step 2 – Decide which index to use

Here we want to figure out which index we should be writing to, based on where the alias is currently pointing. If the alias doesn’t exist, we’ll start with <alias name>-0. Otherwise, we’ll use <alias name>-(i+1), where i is the current index. If we’ve reached the maximum number of indices to create, we’ll start back at 0.

These 3 methods are useful for doing this. If we find an index name that matches the alias name we are trying to use, or if the alias is pointing at more than one index, we throw an exception and stop processing because we are in an unknown state and don’t want to break the current setup.

Step 3 – Delete the index if it exists

This is pretty straightforward. If we are writing to a previously used index, we want to delete it first.

Step 4 – Create the index

Again, pretty straightforward – we create the index with our name (derived from the alias name) and the shard/replica counts specified.

Step 5 – Put the type mapping, if needed

In the sample, I’m not doing this, but it is often necessary to specify type mappings (ie, to define how certain should be stored or analyzed, or to tell it to not analyze certain fields). This will load JSON from src/main/mappings/<aliasName>.json and use it to create a type mapping.

Step 6 – Rebuild the index

Here we just call a user-defined function for rebuilding the index. We can pass it the actual index name so it knows where to index documents. It can also return data about its results (a count, or a list of ids) so we can report it back.

Step 7 – Create/Move the alias

Now it is time to either create the alias, or move an existing alias to a new index. It is important that moving/renaming the alias be done atomically, otherwise, you could end up in an inconsistent state (an alias pointing at 2 indices) or without an alias at all.

Step 8 – Release the lock

Now that we’re done, we can release the lock by deleting the document we created in step 1.

That’s it! Now our index is rebuilt and search traffic is hitting it, without any downtime.

Note: If you are doing online updates in addition to rebuilds, you’ll probably want to pre-create all of your indices ahead of time and do your online updates to all of them – this will minimize the amount of stale data that sits in your index due to data that changed or was inserted while the rebuild job was running.

If you have any problems, questions, or ideas for making this better, please let me know!

Also – If you are working with Elasticsearch, I’ve found Elasticsearch: The Definitive Guide to be a really useful resource.


Leave a Comment

NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre lang="" line="" escaped="">