Migrating Elasticsearch 2.x → 5.x
Migrating Elasticsearch 2.x → 5.x
I upgraded Hacker News Books from Elasticsearch 2.x to 5.x to get onto a newer supported baseline, clean up mappings, and make search behavior more explicit. Below is the exact playbook I followed and the code-level changes that mattered most.
What changed in our codebase
Reworked mappings around text and keyword
This was the biggest change in the move to 5.x.
In 2.x, a lot of fields still used the old string type with analyzed or not_analyzed. In 5.x, that split became text for full-text search and keyword for exact matches, sorting, and aggregations. That forced me to be more deliberate about what each field was actually for.
Kept full-text search and exact-match behavior separate
Book titles, comment text, and other searchable fields were mapped as text.
Slugs, tags, URLs, and other exact-match values were mapped as keyword.
That made the query layer easier to reason about too. match queries stayed focused on full-text fields, while sorting, filters, and aggregations pointed at exact-match fields instead of relying on older behavior.
Rebuilt indices instead of trying to patch everything in place
I trusted rebuilding more than patching.
Because the field model changed in a meaningful way, this was a good point to recreate indices with cleaner mappings and make sure the crawler and app were still producing the data I thought they were.
Used the migration to tighten search assumptions
Like the 1.x → 2.x move, this upgrade ended up being about more than just version numbers. It was a chance to make the search layer more intentional and remove some of the ambiguity that had accumulated over time.
Mapping changes (2.x → 5.x)
Old 2.x style
{
"properties": {
"title": { "type": "string", "index": "analyzed" },
"slug": { "type": "string", "index": "not_analyzed" }
}
}
New 5.x style
{
"properties": {
"title": { "type": "text" },
"slug": { "type": "keyword" }
}
}
Multi-field pattern where both behaviors were useful
{
"properties": {
"title": {
"type": "text",
"fields": {
"raw": { "type": "keyword" }
}
}
}
}
That let me keep full-text search on title while still allowing exact sorting and aggregations on title.raw.
Query DSL cleanup for 5.x
Sorting on analyzed fields → sort on exact-match subfields
Before
{
"sort": [
{ "title": "asc" }
]
}
After
{
"sort": [
{ "title.raw": "asc" }
]
}
Exact-match filters stayed on keyword fields, while full-text search stayed on text fields. That split made queries less surprising and made the mapping intent much clearer.
Mapping & index gotchas I checked
stringfields had to becometextorkeyword; there was no direct carry-forward.- Sorting and aggregations could not rely on analyzed text fields anymore.
- Multi-fields were useful when a field needed both search and exact-match behavior.
- Rebuilding indices was cleaner than trying to preserve every old mapping assumption.
Ops playbook I used
- Snapshot first and test on a separate 5.x cluster.
- Update mappings to replace
stringwithtext/keyword. - Rebuild indices with the new mapping model.
- Verify the crawler, indexing flow, filters, sorting, and search queries against real data.
Before/After code we deployed
Mapping intent (2.x style → 5.x style)
Before
"title": {"type": "string", "index": "analyzed"},
"slug": {"type": "string", "index": "not_analyzed"}
After
"title": {"type": "text"},
"slug": {"type": "keyword"}
Sorting behavior
Before
res = es.search(index="hnbooks", body={
"sort": [{"title": "asc"}]
})
After
res = es.search(index="hnbooks", body={
"sort": [{"title.raw": "asc"}]
})
Results, not just process
- No visible feature changes for readers, but the search layer became easier to understand.
- Field intent became clearer: full-text fields were no longer mixed with exact-match fields.
- The upgrade path improved again: 5.x forced cleaner mappings and made later search changes easier to reason about.