Contents

Why do we wait for search results?

Searching digital data is a daily task in our lives, allowing us to navigate vast amounts of information.

It is both common and complex, involving a wide range of search engines, indexing methods, search algorithms, and implementations.

Typically, searches are performed remotely - on some server, database or a cloud service. We are accustomed to waiting for search results.

/posts/20240229-client-side-search/robo2.png

What if we could get search results instantly?

A typical search-within-website

Consider an e-commerce website with a search box. It might use technologies like ElasticSearch, SOLR, PostgreSQL, or a SaaS offering like Algolia. These engines index product data, allowing for searches.

A separate process maintains the index, adding and updating products, pages, questions and answers.

Users send a query, wait a few hundred milliseconds, and see the results.

We can do that faster

We can speed up this process by searching on the client-side, in browsers or within smartphone apps. This approach offers a clear benefit: instantaneous results. Eliminating the need to send requests to a server and wait for a response, results appear with imperceptible latency, creating a surprising and positive user experience.

/posts/20240229-client-side-search/minisearch-demo.gif

Shifting the search to client-side also means that we won’t have to maintain any services ‘in the backend’, and we won’t have to worry if the backend will handle increased traffic. On the other side, we need to make sure that user devices will be able to cope with our dataset.

There are disadvantages. We are trading per-query latency for a one-time first-visit index download. Index updates are not instant - users will need to redownload them. Also, the users will store the index on their devices, therefeore it can’t contain any confidential data.

Feature parity

What is the feature parity of such local index? We have made succesful deployments using the MiniSearch library. It is indeed a mini-, but it can do a lot: autocomplete, fuzzy search, BM25, field boosting, ranking, tokenization, stop words and much more. You won’t find semantic search, but I’d venture to say it does more than adequate in a large number of use cases.

The autocomplete functionality looks funny. It seems completely superfluous when the actual results appear as quickly as the hints to them.

An important aspect of the client-side search engine is the size of the index. After all, the user’s browser will have to download it. I did a test on a internal product database. The dataset contained: name, description, url and contained over 106,000 items. In plain text it weighted 20MB, but took 5.1MB as compressed JSON. The dataset was very large - several times larger than the one I originally tested the possibility of using client-side search for, but I wanted to see how it looks and works just for an overly large dataset. You can see the effect in the animation above.

Does it scale?

When considering this solution, we’ve met with concerns, that this solution cannot scale to more products, that one product description/article content can be as much as X MB of data and when multiplied by the list of items, the usual gigabytes come out. This may or may not be true.

To realize a scale, a following fact may be useful: the first part of Harry Potter has 77 thousand words. That’s +-350kB in pure text. A whole one book. One product/article should not have more than that, unless you’re including whole books in description. If you do, then yes, the local search may not scale, but neither will ElasticSearch (32kB limit in Lucene) or Meilisearch (65k token limit).

In our deployments, this solution works superbly. It has its limitations, it is not suitable for every use-case, but it is certainly a technology worth considering in the future and one that we hope to implement more.

See more

  • MiniSearch the described search engine. Check out out their demo .
  • Lunr a simpler alternative.

/posts/20240229-client-side-search/robo1.png