HackerNewsBooks Blog - Margin Notes

Migrating Elasticsearch 2.x → 5.x

Migrating Elasticsearch 2.x → 5.x

I upgraded Hacker News Books from Elasticsearch 2.x to 5.x to get onto a newer supported baseline, clean up mappings, and make search behavior more explicit. Below is the exact playbook I followed and the code-level changes that mattered most.

What changed in our codebase

Reworked mappings around text and keyword

This was the biggest change in the move to 5.x.

In 2.x, a lot of fields still used the old string type with analyzed or not_analyzed. In 5.x, that split became text for full-text search and keyword for exact matches, sorting, and aggregations. That forced me to be more deliberate about what each field was actually for.

Kept full-text search and exact-match behavior separate

Book titles, comment text, and other searchable fields were mapped as text.

Slugs, tags, URLs, and other exact-match values were mapped as keyword.

That made the query layer easier to reason about too. match queries stayed focused on full-text fields, while sorting, filters, and aggregations pointed at exact-match fields instead of relying on older behavior.

Rebuilt indices instead of trying to patch everything in place

I trusted rebuilding more than patching.

Because the field model changed in a meaningful way, this was a good point to recreate indices with cleaner mappings and make sure the crawler and app were still producing the data I thought they were.

Used the migration to tighten search assumptions

Like the 1.x → 2.x move, this upgrade ended up being about more than just version numbers. It was a chance to make the search layer more intentional and remove some of the ambiguity that had accumulated over time.


Mapping changes (2.x → 5.x)

Old 2.x style

{
  "properties": {
    "title": { "type": "string", "index": "analyzed" },
    "slug":  { "type": "string", "index": "not_analyzed" }
  }
}

New 5.x style

{
  "properties": {
    "title": { "type": "text" },
    "slug":  { "type": "keyword" }
  }
}

Multi-field pattern where both behaviors were useful

{
  "properties": {
    "title": {
      "type": "text",
      "fields": {
        "raw": { "type": "keyword" }
      }
    }
  }
}

That let me keep full-text search on title while still allowing exact sorting and aggregations on title.raw.


Query DSL cleanup for 5.x

Sorting on analyzed fields → sort on exact-match subfields

Before

{
  "sort": [
    { "title": "asc" }
  ]
}

After

{
  "sort": [
    { "title.raw": "asc" }
  ]
}

Exact-match filters stayed on keyword fields, while full-text search stayed on text fields. That split made queries less surprising and made the mapping intent much clearer.


Mapping & index gotchas I checked


Ops playbook I used

  1. Snapshot first and test on a separate 5.x cluster.
  2. Update mappings to replace string with text / keyword.
  3. Rebuild indices with the new mapping model.
  4. Verify the crawler, indexing flow, filters, sorting, and search queries against real data.

Before/After code we deployed

Mapping intent (2.x style → 5.x style)

Before

"title": {"type": "string", "index": "analyzed"},
"slug":  {"type": "string", "index": "not_analyzed"}

After

"title": {"type": "text"},
"slug":  {"type": "keyword"}

Sorting behavior

Before

res = es.search(index="hnbooks", body={
  "sort": [{"title": "asc"}]
})

After

res = es.search(index="hnbooks", body={
  "sort": [{"title.raw": "asc"}]
})

Results, not just process

#elasticsearch #engineering #migration