Autocomplete with 5 million+ proper names

By Erik J. Groeneveld

The Library of the Technical University Delft (TUDelft) added an autocomplete function on their Discover site using Meresco. It suggests search terms using the full corpus of all their databases.  Also, it suggests more specific terms when users use fields in their query.

Proper names

TUDelfts’ databases contain many technical terms and other proper names such as structured chemical names.  Discover uses these to suggest search terms instead of history from users’ queries.  The example below shows a user typing tri and the autocomplete suggesting chemical names such as tri-0-acetyl and triacyglycerol from over 5.000.000 available terms.

Faceted Search integration

The autocomplete is fully integrated with the facets in Discover. As a result it is able to show exactly how many results a user can expect for each suggested term given the current selection of facets and of course it does not suggests terms that yield no results.

Suggestions for Fields

As an option, users can use fields to limit the range for specific keywords.  For example the query author=johnson will search for johnson only in the field author.  The search box automatically detects fields and applies the proper suggestions for that field.  The example below show a user searching for author=nahu with the search box suggesting 3 different spelling variants together with the amount of results to expect.

Implementation

Discover’s autocomplete has been implemented with Meresco’s autocomplete and facetting.  The autocomplete is capable of handling millions of terms under a high userload due to it implementation of a Burst Trie which is integrated with the facet index. On the client side it uses JQuery and CSS to present the autocomplete search box.

2 Responses to “Autocomplete with 5 million+ proper names”

  1. What makes Meresco differ from Solr? « MERESCO Says:

    [...] information attached to alphabetically ordered terms. It supports prefix queries such as needed for auto-complete. It is implemented using a Burst [...]

  2. How to scale up Meresco « MERESCO Says:

    [...] homegrown Facet Index and Sorted Dictionary Index (used for auto-complete) can be scaled following approach B. However, with a single-node limit of roughly one billion [...]

Leave a Reply

You must be logged in to post a comment.