Thread by @michaelvillar, Search is hard, even with the best tools out there. Here's a [...]

Michaël Villar

michaelvillar

Search is hard, even with the best tools out there.
Here's a few things we've done in the past weeks to improve our search.
↓

1/ Context: we use Elasticsearch, so those changes are specific to it. We also search through different models/indices (lists, tasks, messages, users) which makes it more difficult for results to be ranked correctly.

2/ To get coherent searches across models, each index now have one field in common to search in ("Summary") → The ES ranking algo can then respect frequency of terms and field lengths.
i.e. type "emoji" will get the list before tasks because the list is a better match.

3/ We improved english understanding with a versatile analyzer and an english analyzer and taking the best result.
i.e. type "emojis" or "emoji" should get the same results.

4/ Match both exact search and fuzzy searches and add both scores together so that we always have exact results before fuzzy ones and fuzzy results are ranked lower, but can be found in case it was misspelled.
i.e. "emojy" only appears later in the results

5/ Use simple boosts to have predictable results across indices.

Summary (exact) → 1
Description (exact) → 0.5
Messages (exact) → 0.2
Summary (fuzzy) → 0.1
Description (fuzzy) → 0.05
Messages (fuzzy) → 0.02

6/ Remove nested types for performance improvement, so we append metadata in the fields directly but we don’t analyze and match on them. We then use highlights to recover metadata from messages without loading all the messages of a task.

More changes from this week → https://twitter.com/height_app/status/1347252047545049092

https://twitter.com/height_app/status/1347252047545049092

You can follow @michaelvillar.

Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: