Haystack, the search relevance conference

Two weeks ago I attended the Haystack relevance conference – I’ve already written about my overall impressions and on the first day’s talks but the following are some more notes on the conference sessions. Note that some of the presentations I attended have already been covered in detail by Sujit Pal’s excellent blog. Some of the presentations I haven’t linked to directly have now appeared on the conference website.

The second day of the event started for me with the enjoyable job of hosting a ‘fishbowl’ style panel session titled “No, You Don’t Want to Do It Like That! Stories from the search trenches”. The idea was that a rotating panel of speakers would tell us tales of their worst and hopefully most instructive search tuning experiences and we heard some great stories – this was by its nature an informal session and I don’t think anyone kept any notes (probably a good idea in the case of commercial sensitivity!).

The next talk was my favourite of the conference, given by René Kriegler on relevance scoring using product data and image recognition. René is an expert on e-commerce search (he also runs the MICES event in Berlin which I’m looking forward to) and described how this domain is unlike many others: the interests of the consumer (e.g. price or availability) becoming part of the relevance criteria. One of the interesting questions for e-commerce applications is how ranking can affect profit. Standard TF/IDF models don’t always work well for e-commerce data with short fields, leading to a score that can be almost binary: as he said ‘a laptop can’t be more laptop-ish than another’. Image recognition is a potentially useful technique and he demonstrated a way to take the output Google’s Inception machine learning model and use it to enrich documents within a search index. However, there can be over 1000 vectors output from this model and he described how a technique called random projection trees can be used to partition the vector space and thus produce simpler data for adding to the index (I think this is basically like slicing up a fruitcake and recording whether a currant was one side of the knife or the other, but that may not be quite how it works!). René has built a Solr plugin to implement this technique.

Next I went to Matt Overstreet’s talk on Vespa, a recently open sourced search and Big Data library from Oath (a part of Yahoo! Inc.). Matt described how Vespa could be used to build highly scalable personalised recommendation, search or realtime data display applications and took us through how Vespa is configured through a series of APIs and XML files. Interestingly (and perhaps unsurprisingly) Vespa has very little support for languages other than English at present. Queries are carried out through its own SQL-like language, YQL, and grouping and data aggregation functions are available. He also described how Vespa can use multidimensional arrays of values – tensors, for example from a neural network. Matt recommended we all try out Vespa – but on a cloud service not a low-powered laptop!

Ryan Pedala was up next to talk about named entity recognition (NER) and how it can be used to annodate or label data. He showed his experiments with tools including Prodigy and a custom GUI he had built and compared various NER libraries such Stanford NLP and OpenNLP and referenced an interesting paper on NER for travel-related queries. I didn’t learn a whole lot of new information from this talk but it may have been useful to those who haven’t considered using NER before.

Scott Stultz talked next on how to integrate business rules into a search application. He started with examples of key performance indicators (KPIs) that can be used for search – e.g. conversion ratios or average purchase values and how these should be tied to search metrics. They can then be measured both before and after changes are made to the search application: automated unit tests and more complex integration tests should also be used to check that search performance is actually improving. Interestingly for me he included within the umbrella of integration tests such techniques as testing the search with recent queries extracted from logs. He made some good practical points such as ‘think twice before adding complexity’ and that good autocomplete will often ‘cannibalize’ existing search as users simply choose the suggested completion rather than finishing typing the entire query. There were some great tips here for practical business-focused search improvements.

I then went to hear John Kane’s talk about interleaving for relevancy tuning which covered a method for updating a machine learning model in real-time using feedback from the current ranking powered by this model – simply by interleaving the results from two versions of this model. This isn’t a particularly new technique and the talk was somewhat of a product pitch for 904Labs, but the technique does apparently work and some customers have seen a 30% increase in conversion rate.

The last talk of the day came from Tim Allison on an evaluation platform for Apache Tika, a well-known library for text extraction from a variety of file formats. Interspersed with tales of ‘amusing’ and sometimes catastrophic ways for text extraction to fail, Tim described how tika-eval can be used to test how good Tika is at extracting data and output a set of metrics e.g. how many different MIME file types were found. The tool is now used to run regular regression tests for Tika on a dataset of 3 million files from the CommonCrawl project. We’re regular users of Tika at Flax and it was great to hear about the project is moving forward.

Doug Turnbull finished the conference with a brief summing up and thanks. There was a general feeling in the room that this conference was the start of something big and people were already asking when the next event would be! One of my takeaways from the event was that even though many of the talks used open source tools (perhaps unsurprisingly as it is so much easier to talk about these publically) the relevance tuning techniques and methods described can be applied to any search engine. The attendees were from a huge variety of companies, large and small, open and closed source based. This was an event about relevance engineering, not technology choices.

Thanks to all at OSC who made the event possible and for inviting us all to your home town – I think most if not all of us would happily visit again.