Defining relevance engineering part 4: tools

Charlie Hull — Thu, 15 Nov 2018 14:30:51 +0000

Relevance Engineering is a relatively new concept but companies such as Flax and our partners Open Source Connections have been carrying out relevance engineering for many years. So what is a relevance engineer and what do they do? In this series of blog posts I’ll try to explain what I see as a new, emerging and important profession.

In my previous installment of this guide I promised to write next about how to deliver the results of a relevance assessment, but I’ve since decided that this blog should instead cover the tools a relevance engineer can use to measure and tune search performance. Of course, some of these might be used to show results to a client as well, so it’s not an entirely different direction!

It’s also important to note that this is a rapidly evolving field and therefore cannot be a definitive list – and I welcome comments with further suggestions.

1. Gathering judgements

There are various ways to measure relevance, and one is to gather judgement data – either explicit (literally asking users to manually rate how relevant a result is) and implicit (using click data as a proxy, assuming that clicking on a result means it is relevant – which isn’t always true, unfortunately). One can build a user interface that lets users rate results (e.g. from Agnes Van Belle’s talk at Haystack Europe, see page 7) which may be available to everyone or just a select group, or one can use a specialised tool like Quepid that provides an alternative UI on top of your search engine. Even Excel or another spreadsheet can be used to record judgements (although this can become unwieldly at scale). For implicit ratings, there are Javascript libraries such as SearchHub’s search-collector or more complete analytics platforms such as Snowplow which will let you record the events happening on your search pages.

2. Understanding the query landscape

To find out what users are actually searching for and how successful their search journeys are, you will need to look at the log files of the search engine and the hosting platform it runs within. Open source engines such as Solr can provide detailed logs of every query, which will need to be processed into an overall picture. Google Analytics will tell you which Google queries brought users to your site. Some sophisticated analytics & query dashboards are also available – Luigi’s Box is a particularly powerful example for site search. Even a spreadsheets can be useful to graph the distribution of queries by volume, so you can see both the popular queries and those rare queries in the ‘long tail’. On Elasticsearch it’s even possible to submit this log data back into a search index and to display it using a Kibana visualisation.

3. Measurement and metrics

Once you have your data it’s usually necessary to calculate some metrics – overall measurements of how ‘good’ or ‘bad’ relevance is. There’s a long list of metrics commonly used by the Information Retrieval community such as NCDG which show the usefulness, or gain of a search result based on its position in a list. Tools such as Rated Ranking Evaluator (RRE) can calculate these metrics from supplied judgement lists (RRE can also run a whole test environment, spinning up Solr or Elasticsearch, performing a list of queries and recording and displaying the results).

4. Tuning the engine

Next you’ll need a way to adjust the configuration of the engine and/or figure out just why particular results are appearing (or not). These tools are usually specific to the search engine being used: Quepid, for example works with Solr and Elasticsearch and allows you to change query parameters and observe the effect on relevance scores; with RRE you can control the whole configuration of the Solr or Elasticsearch engine that it can then spin up for you. Commercial search engines will have their own tools for adjusting configuration or you may have to work within an overall content management (e.g Drupal) or e-commerce system (e.g. Hybris). Some of these latter systems may only give you limited control of the search engine, but could also let you adjust how content is processed and ingested or how synonyms are generated.

For Solr, tools such as the Google Chrome extension Solr Query Debugger can be used and the Solr Admin UI itself allows full control of Solr’s configuration. Solr’s debug query shows hugely detailed information as to why a query returned a result, but tools such as Splainer and Solr Explain are useful to make sense of this.

For Elasticsearch, the Kopf plugin was a useful tool, but has now been replaced by Cerebro. Elastic, the commercial company behind Elasticsearch offer their own tool Marvel on a 30-day free trial, after which you’ll need an Elastic subscription to use it. Marvel is built on the open source Kibana which also includes various developer tools.

If you need to dig (much) deeper into the Lucene indexes underneath Solr and Elasticsearch, the Lucene Index Toolbox (Luke) is available, or Flax’s own Marple index inspector.

As I said at the beginning this is by no means a definitive list – what are your favourite relevance tuning tools? Let me know in the comments!

In the next post I’ll cover how a relevance engineer can develop more powerful and ‘intelligent’ ways to tune search. In the meantime you can read the free Search Insights 2018 report by the Search Network. Of course, feel free to contact us if you need help with relevance engineering.

The post Defining relevance engineering part 4: tools appeared first on Flax.

Release 1.0 of Marple, a Lucene index detective

Tom — Fri, 24 Feb 2017 14:34:05 +0000

Back in October at our London Lucene Hackday Flax’s Alan Woodward started to write Marple, a new open source tool for inspecting Lucene indexes. Since then we have made nearly 240 commits to the Marple GitHub repository, and are now happy to announce its first release.

Marple was envisaged as an alternative to Luke, a GUI tool for introspecting Lucene indexes. Luke is a powerful tool but its Java GUI has not aged well, and development is not as active as it once was. Whereas Luke uses Java widgets, Marple achieves platform independence by using the browser as the UI platform. It has been developed as two loosely-coupled components: a Java and Dropwizard web service with a REST/JSON API, and a UI implemented in React.js. This approach should make development simpler and faster, especially as there are (arguably) many more React experts around these days than native Java UI developers, and will also allow Marple’s index inspection functionality to be easily added to other applications.

Marple is, of course, named in honour of the famous fictional detective created by Agatha Christie.

What is Marple for? We have two broad use cases in mind: the first is as an aid for solving problems with Lucene indexes. With Marple, you can quickly examine fields, terms, doc values, etc. and check whether the index is being created as you expect, and that your search signals are valid. The other main area of use we imagine is as an educational tool. We have made an effort to make the API and UI designs reflect the underlying Lucene APIs and data structures as far as is practical. I have certainly learned a lot more about Lucene from developing Marple, and we hope that other people will benefit similarly.

The current release of Marple is not complete. It omits points entirely, and has only a simple UI for viewing documents (stored fields). However, there is a reasonably complete handling of terms and doc values. We’ll continue to develop Marple but of course any contributions are welcome.

You can download this first release of Marple here together with a small Lucene index of Project Gutenberg to inspect. Details of how to run Marple (you’ll need Java) are available in the README. Do let us know what you think – bug reports or feature requests can be submitted via Github. We’ll also be demonstrating Marple in London on March 23rd 2017 at the next London Lucene/Solr Meetup.

The post Release 1.0 of Marple, a Lucene index detective appeared first on Flax.