search relevance tuning – Flax http://www.flax.co.uk The Open Source Search Specialists Thu, 10 Oct 2019 09:03:26 +0000 en-GB hourly 1 https://wordpress.org/?v=4.9.8 Haystack Europe 2018, a brief retrospective http://www.flax.co.uk/blog/2018/10/15/haystack-europe-2018-a-brief-retrospective/ http://www.flax.co.uk/blog/2018/10/15/haystack-europe-2018-a-brief-retrospective/#comments Mon, 15 Oct 2018 15:15:49 +0000 http://www.flax.co.uk/?p=3914 It’s been a couple of weeks now since the first Haystack search relevance conference in Europe, which we ran with our partners Open Source Connections (OSC). Just under a hundred people came to the Friends’ House in Euston for a … More

The post Haystack Europe 2018, a brief retrospective appeared first on Flax.

]]>
It’s been a couple of weeks now since the first Haystack search relevance conference in Europe, which we ran with our partners Open Source Connections (OSC). Just under a hundred people came to the Friends’ House in Euston for a day of talks covering both the business and technical aspects of relevance engineering. Doug Turnbull of OSC started the day by introducing what would be a major theme of the conference, Learning to Rank, and how Bloomberg had used and benefited from open sourcing their LTR plugin for Solr. Karen Renshaw of Zoro (a division of Grainger Global Online) talked about how to tune relevance from a business perspective. Sebastian Russ of Tudock showed how even something as simple as an Excel spreadsheet can be a useful visualisation tool for relevance, while Alessandro Benedetti and Andrea Gazzarini of Sease demonstrated Rated Ranking Evaluator, a complete platform for relevance measurement. After lunch, Torsten Köster & Fabian Klenk of Shopping 24 and consultant René Kriegler described their journey with LTR for an ecommerce site and Agnes Van Belle of Textkernel showed how similar techniques can be applied to recruitment search. Tony Russell-Rose was our last speaker on strategies and tools for managing complex Boolean queries.

My only regret was how little time I had personally to catch up with the attendees, many of whom were from Flax clients past and present – I must have had 20 or 30 very brief chats during the day! Luckily a few of us went on for a drink afterwards and eventually a curry nearby. It was a very long day but from the feedback we’ve recieved so far a very successful one. We hope to make this a regular event on the calendar.

Thanks to all who made the event possible, our speakers and everyone who came – the slides are now available on the event website.

The post Haystack Europe 2018, a brief retrospective appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2018/10/15/haystack-europe-2018-a-brief-retrospective/feed/ 2
Defining relevance engineering part 3: technical assessment http://www.flax.co.uk/blog/2018/07/11/defining-relevance-engineering-part-3-technical-assessment/ http://www.flax.co.uk/blog/2018/07/11/defining-relevance-engineering-part-3-technical-assessment/#respond Wed, 11 Jul 2018 09:49:11 +0000 http://www.flax.co.uk/?p=3873 Relevance Engineering is a relatively new concept but companies such as Flax and our partners Open Source Connections have been carrying out relevance engineering for many years. So what is a relevance engineer and what do they do? In this … More

The post Defining relevance engineering part 3: technical assessment appeared first on Flax.

]]>
Relevance Engineering is a relatively new concept but companies such as Flax and our partners Open Source Connections have been carrying out relevance engineering for many years. So what is a relevance engineer and what do they do?

In this series of blog posts I’ll try to explain what I see as a new, emerging and important profession.

When Flax is working with clients on relevance tuning engagements we aim to gain an overview of the various technology the client uses and how it is obtained, deployed, managed and maintained. This will include not just the search engine but the various systems that supply data to it, host it, monitor it and interface to it to pass results to users. In addition we must understand who is responsible for the various areas, be it in-house staff, consultants, outsourcing or third party suppliers.

We try to answer the following questions in detail, including who supplies, modifies, maintains and supports the various systems concerned, what versions are used and where and how they are hosted and configured. We hope for full access to inspect the systems but this is not always possible – at the least, we need copies of configuration files and settings.

  • What systems supply the source data for search?
  • What is the current search technology?
  • Is the search engine part of another system (such as a content management system or product information system)?
  • What interface is there between the systems that supply source data and the search engine?
  • What systems monitor and manage the search engine?
  • What systems are used to submit queries to the search engine?
  • What query logging is performed and at what level?
  • How are development, test, staging and production systems arranged and what access is available to these?
  • What are the processes used to deploy new software and configuration?
  • What testing is performed?

It’s common to find flaws in the overall technical landscape – as an example, we’ll often find that there is no effective source control of search engine configuration files, with these having been originally derived from an example setup not intended for production use and since modified ad-hoc as issues arose. In this case it’s quite common that no-one knows why a particular setting has been used!

Without a good overall idea of the technology landscape it will be hard if not impossible to improve relevance. External processes (such as how hard it is to obtain a recent and complete log file from a production system) will also impact how effective these improvements will be.

Finally, as search is often owned by the IT department (and by the time we arrive, search is usually viewed as ‘broken’) we sometimes find a ‘bunker mentality’ – those responsible for the implementation are hunkered down and used to being harried and complained at by others who are unhappy with how search is (not) working. It’s important to communicate that only by being open and honest about the current situation can we all work together to improve things and build better search.

In the next post I’ll cover the tools a relevance engineer can use. In the meantime you can read the free Search Insights 2018 report by the Search Network. Of course, feel free to contact us if you need help with relevance engineering.

The post Defining relevance engineering part 3: technical assessment appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2018/07/11/defining-relevance-engineering-part-3-technical-assessment/feed/ 0
Lucene Solr London: Search Quality Testing and Search Procurement http://www.flax.co.uk/blog/2018/06/29/lucene-solr-london-search-quality-testing-and-search-procurement/ http://www.flax.co.uk/blog/2018/06/29/lucene-solr-london-search-quality-testing-and-search-procurement/#respond Fri, 29 Jun 2018 11:09:34 +0000 http://www.flax.co.uk/?p=3850 Mimecast were our kind hosts for the latest London Lucene/Solr Meetup (and even provided goodie bags). It’s worth repeating that we couldn’t run these events without the help of sponsors and hosts and we’re always very grateful (and keep those … More

The post Lucene Solr London: Search Quality Testing and Search Procurement appeared first on Flax.

]]>
Mimecast were our kind hosts for the latest London Lucene/Solr Meetup (and even provided goodie bags). It’s worth repeating that we couldn’t run these events without the help of sponsors and hosts and we’re always very grateful (and keep those offers coming!).

First up was Andrea Gazzarini presenting a brand new framework for search quality testing. Designed for offline measurement, Rated Ranking Evaluator is an open source Java library (although it can be used from other languages). It uses a heirarchical model to arrange queries into query groups (all queries in a query group should be producing the same results). Each test can run across a number of search engine configuration versions and outputs results in JSON format – but these can also be translated into Excel spreadsheets, PDFs or sent to a server that provides a live console showing how search quality is affected by a search engine configuration change. Although aimed at Elasticsearch and Solr, the platform is extensible to any underlying search engine. This is a very useful tool for search developers and joins Quepid and Searchhub’s recently released search analytics acquisition library in the ‘toolbox’ for relevance engineers. You can see Andrea’s slides here.

Martin White spoke next on how open source search solutions fare in corporate procurements for enterprise search. This was an engaging talk from Martin , showing the scale of the opportunities for open source platforms with budgets of several million pounds being common for enterprise search projects. However, as he mentioned it can be very difficult for procurement departments to get information from vendors and ‘the last thing you’ll know about a piece of enterprise software is how much it will cost’. He detailed how open source solutions often compare badly against closed source commercial offerings due to it being hard to see the ‘edges’ – e.g. what custom development will be necessary to fulfil enterprise requirements. Although the opportunities are clear, it seems open source based solutions still have a way to go to compete. You can read more from Martin on this subject in the recent free Search Insights report.

Thanks to Mimecast and both speakers – we’ll be back after the summer with another Meetup!

The post Lucene Solr London: Search Quality Testing and Search Procurement appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2018/06/29/lucene-solr-london-search-quality-testing-and-search-procurement/feed/ 0
London Lucene/Solr Meetup – Relevance tuning for Elsevier’s Datasearch & harvesting data from PDFs http://www.flax.co.uk/blog/2018/05/03/london-lucene-solr-meetup-elseviers-datasearch-harvesting-data-from-pdfs/ http://www.flax.co.uk/blog/2018/05/03/london-lucene-solr-meetup-elseviers-datasearch-harvesting-data-from-pdfs/#respond Thu, 03 May 2018 09:47:48 +0000 http://www.flax.co.uk/?p=3812 Elsevier were our kind hosts for the latest London Lucene/Solr Meetup and also provided the first speaker, Peter Cotroneo. Peter spoke about their DataSearch project, a search engine for scientific data. After describing how most other data search engines only … More

The post London Lucene/Solr Meetup – Relevance tuning for Elsevier’s Datasearch & harvesting data from PDFs appeared first on Flax.

]]>
Elsevier were our kind hosts for the latest London Lucene/Solr Meetup and also provided the first speaker, Peter Cotroneo. Peter spoke about their DataSearch project, a search engine for scientific data. After describing how most other data search engines only index and rank results using metadata, Peter showed how Elsevier’s product indexes the data itself and also provides detailed previews. DataSearch uses Apache NiFi to connect to the source repositories, Amazon S3 for asset storage, Apache Spark to pre-process the data and Apache Solr for search. This is a huge project with many millions of items indexed.

Relevance is a major concern for this kind of system and Elsevier have developed many strategies for relevance tuning. Features such as highlighting and auto-suggest are used, lemmatisation rather than stemming (with scientific data, stemming can cause issues such as turning ‘Age’ into ‘Ag’ – the chemical symbol for silver) and a custom rescoring algorithm that can be used to promote up to 3 data results to the top of the list if deemed particularly relevant. Elsevier use both search logs and test queries generated by subject matter experts to feed into a custom-built judgement tool – which they are hoping to open source at some point (this would be a great complement to Quepid for test-based relevance tuning)

Peter also described a strategy for automatic optimization of the many query parameters available in Solr, using machine learning, based on some ideas first proposed by Simon Hughes of dice.com. Elsevier have also developed a Phrase Service API, which helps improve phrase based search over the standard un-ordered ‘bag of words’ model by recognising acronyms, chemical formulae, species, geolocations and more, expanding the original phrase based on these terms and then boosting them using Solr’s query parameters. He also mentioned a ‘push API’ available for data providers to push data directly into DataSearch. This was a necessarily brief dive into what is obviously a highly complex and powerful search engine built by Elsevier using many cutting-edge ideas.

Our next speaker, Michael Hardwick of Elite Software, talked about how textual data is stored in PDF files and the implications for extracting this data for search applications. In an engaging (and at some times slightly horrifying) talk he showed how PDFs effectively contain instructions for ‘painting’ characters onto the page and how certain essential text items such as spaces may not be stored at all. He demonstrated how fonts are stored within the PDF itself, how character encodings may be deliberately incorrect to prevent copy-and-paste operations and in general how very little if any semantic information is available. Using newspaper content as an example he showed how reading order is often difficult to extract as the PDF layout is a combination of the text from the original author and how it has been laid out on the page by an editor – so the headline may be have been added after the article text, which itself may have been split up into sections.

Tables in PDFs were described as a particular issue when attempting to extract numerical data for re-use – the data order may not be in the same order as it appears, for example if only part of a table is updated each week a regular publication appears. With PDF files sometimes compressed and encrypted the task of data extraction can become even more difficult. Michael laid out the choices available to those wanting to extract data: optical character recognition, a potentially very expensive Adobe API (that only gives the same quality of output as copy-and-paste), custom code as developed by his company and finally manual retyping, the latter being surprisingly common.

Thanks to both our speakers and our hosts Elsevier – we’re planning another Meetup soon, hopefully in mid to late June.

The post London Lucene/Solr Meetup – Relevance tuning for Elsevier’s Datasearch & harvesting data from PDFs appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2018/05/03/london-lucene-solr-meetup-elseviers-datasearch-harvesting-data-from-pdfs/feed/ 0
Haystack, the search relevance conference – day 2 http://www.flax.co.uk/blog/2018/04/23/haystack-the-search-relevance-conference-day-2/ http://www.flax.co.uk/blog/2018/04/23/haystack-the-search-relevance-conference-day-2/#respond Mon, 23 Apr 2018 15:23:56 +0000 http://www.flax.co.uk/?p=3798 Two weeks ago I attended the Haystack relevance conference – I’ve already written about my overall impressions and on the first day’s talks but the following are some more notes on the conference sessions. Note that some of the presentations … More

The post Haystack, the search relevance conference – day 2 appeared first on Flax.

]]>
Two weeks ago I attended the Haystack relevance conference – I’ve already written about my overall impressions and on the first day’s talks but the following are some more notes on the conference sessions. Note that some of the presentations I attended have already been covered in detail by Sujit Pal’s excellent blog. Some of the presentations I haven’t linked to directly have now appeared on the conference website.

The second day of the event started for me with the enjoyable job of hosting a ‘fishbowl’ style panel session titled “No, You Don’t Want to Do It Like That! Stories from the search trenches”. The idea was that a rotating panel of speakers would tell us tales of their worst and hopefully most instructive search tuning experiences and we heard some great stories – this was by its nature an informal session and I don’t think anyone kept any notes (probably a good idea in the case of commercial sensitivity!).

The next talk was my favourite of the conference, given by René Kriegler on relevance scoring using product data and image recognition. René is an expert on e-commerce search (he also runs the MICES event in Berlin which I’m looking forward to) and described how this domain is unlike many others: the interests of the consumer (e.g. price or availability) becoming part of the relevance criteria. One of the interesting questions for e-commerce applications is how ranking can affect profit. Standard TF/IDF models don’t always work well for e-commerce data with short fields, leading to a score that can be almost binary: as he said ‘a laptop can’t be more laptop-ish than another’. Image recognition is a potentially useful technique and he demonstrated a way to take the output Google’s Inception machine learning model and use it to enrich documents within a search index. However, there can be over 1000 vectors output from this model and he described how a technique called random projection trees can be used to partition the vector space and thus produce simpler data for adding to the index (I think this is basically like slicing up a fruitcake and recording whether a currant was one side of the knife or the other, but that may not be quite how it works!). René has built a Solr plugin to implement this technique.

Next I went to Matt Overstreet’s talk on Vespa, a recently open sourced search and Big Data library from Oath (a part of Yahoo! Inc.). Matt described how Vespa could be used to build highly scalable personalised recommendation, search or realtime data display applications and took us through how Vespa is configured through a series of APIs and XML files. Interestingly (and perhaps unsurprisingly) Vespa has very little support for languages other than English at present. Queries are carried out through its own SQL-like language, YQL, and grouping and data aggregation functions are available. He also described how Vespa can use multidimensional arrays of values – tensors, for example from a neural network. Matt recommended we all try out Vespa – but on a cloud service not a low-powered laptop!

Ryan Pedala was up next to talk about named entity recognition (NER) and how it can be used to annodate or label data. He showed his experiments with tools including Prodigy and a custom GUI he had built and compared various NER libraries such Stanford NLP and OpenNLP and referenced an interesting paper on NER for travel-related queries. I didn’t learn a whole lot of new information from this talk but it may have been useful to those who haven’t considered using NER before.

Scott Stultz talked next on how to integrate business rules into a search application. He started with examples of key performance indicators (KPIs) that can be used for search – e.g. conversion ratios or average purchase values and how these should be tied to search metrics. They can then be measured both before and after changes are made to the search application: automated unit tests and more complex integration tests should also be used to check that search performance is actually improving. Interestingly for me he included within the umbrella of integration tests such techniques as testing the search with recent queries extracted from logs. He made some good practical points such as ‘think twice before adding complexity’ and that good autocomplete will often ‘cannibalize’ existing search as users simply choose the suggested completion rather than finishing typing the entire query. There were some great tips here for practical business-focused search improvements.

I then went to hear John Kane’s talk about interleaving for relevancy tuning which covered a method for updating a machine learning model in real-time using feedback from the current ranking powered by this model – simply by interleaving the results from two versions of this model. This isn’t a particularly new technique and the talk was somewhat of a product pitch for 904Labs, but the technique does apparently work and some customers have seen a 30% increase in conversion rate.

The last talk of the day came from Tim Allison on an evaluation platform for Apache Tika, a well-known library for text extraction from a variety of file formats. Interspersed with tales of ‘amusing’ and sometimes catastrophic ways for text extraction to fail, Tim described how tika-eval can be used to test how good Tika is at extracting data and output a set of metrics e.g. how many different MIME file types were found. The tool is now used to run regular regression tests for Tika on a dataset of 3 million files from the CommonCrawl project. We’re regular users of Tika at Flax and it was great to hear about the project is moving forward.

Doug Turnbull finished the conference with a brief summing up and thanks. There was a general feeling in the room that this conference was the start of something big and people were already asking when the next event would be! One of my takeaways from the event was that even though many of the talks used open source tools (perhaps unsurprisingly as it is so much easier to talk about these publically) the relevance tuning techniques and methods described can be applied to any search engine. The attendees were from a huge variety of companies, large and small, open and closed source based. This was an event about relevance engineering, not technology choices.

Thanks to all at OSC who made the event possible and for inviting us all to your home town – I think most if not all of us would happily visit again.

The post Haystack, the search relevance conference – day 2 appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2018/04/23/haystack-the-search-relevance-conference-day-2/feed/ 0
Haystack, the search relevance conference – day 1 http://www.flax.co.uk/blog/2018/04/18/haystack-the-search-relevance-conference-day-1/ http://www.flax.co.uk/blog/2018/04/18/haystack-the-search-relevance-conference-day-1/#respond Wed, 18 Apr 2018 12:53:41 +0000 http://www.flax.co.uk/?p=3788 Last week I attended the Haystack relevance conference – I’ve already written about my overall impressions but the following are some more notes on the conference sessions. Note that some of the presentations I attended have already been covered in … More

The post Haystack, the search relevance conference – day 1 appeared first on Flax.

]]>
Last week I attended the Haystack relevance conference – I’ve already written about my overall impressions but the following are some more notes on the conference sessions. Note that some of the presentations I attended have already been covered in detail by Sujit Pal’s excellent blog. Those presentations I haven’t linked to directly should appear soon on the conference website.

Doug Turnbull of Open Source Connections gave the keynote presentation which led on the idea that we need more open source tools and methods for tuning relevance, including those to gather search analytics. He noted how the Learning to Rank plugins recently developed for both Solr and Elasticsearch have provided commoditized capabilities previously only described by academia and how we also need to build a cohesive community around search relevance. As it turned out, this conference did in my view signal the birth of that community.

Next up was Peter Fries who talked about a business-friendly approach to search quality, a subject close to my heart as I regularly have to discuss relevance tuning with non-technical staff. Peter described how search quality is often presented to business teams as mysterious and ‘not for them’ – without convincing these people of the value of search tuning we will fail to take account of business-related factors (and we’re also unlikely to get full buy-in for a relevance tuning project). He went on to say how it is important to include the marketing and management mindsets in this process and a method for search tuning involving feedback loops and an ‘iron triangle’ of measurement, data and optimisation. This was a very useful talk.

I then went to hear Chao Han of Lucidworks demonstrate how their product Fusion App Studio allows one to capture various signals and use these for ‘head and tail analysis’ – looking not just at the ‘head’ of popular, often-clicked results but those in the ‘tail’ that attract few clicks, possibly due to problems such as mis-spellings. Interestingly this approach allows automatic tail query rewriting – an example might be spotting a colour word such as ‘red’ in the query and rewriting this into a field query of colour:red. This was a popular talk although the presenter was a little mysterious about the exact methodology used, perhaps unsurprisingly as Fusion is a commercial product.

After a tasty Mexican-themed lunch I took a short break for some meetings, so missed the next set of talks. I then went to Elizabeth Haubert’s talk on Click Analytics. She began with a description of the venerable TREC conference (now in its 27th year!) which has evaluated relevance judgements and how these methods might be applied to real-world situations. For example, the TREC evaluations have shown that how relevance tests are assessed is as important as the tests themselves – the assessors are effectively also users of the system under test. She recommended calbrating both the rankings to a tester and the tester to the rankings, and to create a story around each test to put it in context and to help with disambiguation.

We finished the day with some lightning talks, sadly I didn’t take notes on these but check out Sujit’s aforementioned blog for more information. I do remember Tom Burgmans’ visualisation tool for Solr’s Explain debug feature which I’m very much looking forward to seeing as open source. The evening continued with a conference dinner nearby and some excellent local craft beer.

I’ll be covering the second day next.

 

 

The post Haystack, the search relevance conference – day 1 appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2018/04/18/haystack-the-search-relevance-conference-day-1/feed/ 0
How to build a search relevance team http://www.flax.co.uk/blog/2017/09/11/build-search-relevance-team/ http://www.flax.co.uk/blog/2017/09/11/build-search-relevance-team/#respond Mon, 11 Sep 2017 11:08:48 +0000 http://www.flax.co.uk/?p=3601 We’ve spent a lot of time working with clients who recognise that their search engine isn’t delivering relevant results to users. Often this is seen as solely a technical problem, which can be resolved simply by changing query parameters or … More

The post How to build a search relevance team appeared first on Flax.

]]>
We’ve spent a lot of time working with clients who recognise that their search engine isn’t delivering relevant results to users. Often this is seen as solely a technical problem, which can be resolved simply by changing query parameters or the search engine configuration – but technical teams need clear direction on why a result should or should not appear at a certain position, not just request for general relevance improvements.

It’s thus important to consider relevance as a business-wide issue, with multiple stakeholders providing input to the tuning process. We recommend the creation of a search relevance team – in a perfect world this should consist of dedicated staff, but even in the largest organisations this can be difficult to resource. It’s possible however to create a team to share the responsibility of improving relevance, contributing as they can.

The team should be drawn from the following business areas. Note that in some organisations some of these roles will be shared.

  • Content – the content team create and manage the source data for the search engine, are responsible for keeping this data clean and consistent with reliable metadata. They may process external data into a database or other repository as well as creating it from scratch. The best search engine in the world can’t give good results if the underlying data is unreliable, inconsistent or badly formatted.
  • Vendor – if the search engine is a commercial product, the vendor must provide sufficient documentation, training and support to the client to allow the engine to be tuned. If the engine is an open source project this information should be openly available and backed up by specialist consultancies who can provide training and technical support (such as Flax).
  • Development – the development team are responsible for integrating the search engine into the client’s systems, indexing the source data, maintaining the configuration, writing the search queries and adding new features. They will make any changes that will improve relevance.
  • Testing – the test team should create a process for test-driven relevance tuningusing tools such as Quepid to gather relevance judgements from the business. The test cases themselves can be built up from a combination of query logs, known important query terms (e.g. new products, common industry terms, SEO terms) and those queries deemed most valuable to the business.
  • Operations – this team is responsible for keeping the search engine running at best performance with appropriate server provision and monitoring, plus providing a failover capacity as required.
  • Sales & marketing, product owners – these teams should know why a particular result is more relevant than another to a customer or other user, by gathering online feedback, talking to users and knowing the current business goals. This team can thus help create the test cases discussed above.
  • Management – management support of the relevance tuning process is essential, to commit whatever resources are required to the technical implementation and test process and to lead the search relevance team.

The search relevance team should meet on a regular basis to discuss how to build test cases for important search queries, examine the current position in terms of search relevance and set out objectives for improving relevance. The metrics chosen to measure progress should be available to all of the team.

Search relevance tuning should be seen as a shared responsibility, rather than simply a technical issue or something that can be easily resolved by building or buying a new search engine (a new, un-tuned search engine is unlikely to be as good as the current one). A well structured and resourced search relevance team can make huge strides towards improving search across the business – reducing the time users take to find information and improving responsiveness. For businesses that trade online, relevant search results are simply essential for retaining customers and a high level of conversion.

Flax regularly visit clients to discuss how to build an effective search team – do get in touch if we can help your business in this way.

The post How to build a search relevance team appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2017/09/11/build-search-relevance-team/feed/ 0