Events – Flax http://www.flax.co.uk The Open Source Search Specialists Thu, 10 Oct 2019 09:03:26 +0000 en-GB hourly 1 https://wordpress.org/?v=4.9.8 Little Mermaids, Haystacks and moving on http://www.flax.co.uk/blog/2019/02/15/little-mermaids-haystacks-and-moving-on/ http://www.flax.co.uk/blog/2019/02/15/little-mermaids-haystacks-and-moving-on/#respond Fri, 15 Feb 2019 09:47:25 +0000 http://www.flax.co.uk/?p=4033 As I announced recently Flax is joining OpenSource Connections, and I recently spent a very pleasant week in Virginia with my new colleagues discussing our plans for the year to come. Without giving too much away I can say that … More

The post Little Mermaids, Haystacks and moving on appeared first on Flax.

]]>
As I announced recently Flax is joining OpenSource Connections, and I recently spent a very pleasant week in Virginia with my new colleagues discussing our plans for the year to come. Without giving too much away I can say that this is a very exciting time to be joining OSC: one thing I will be doing soon is starting to write more about OSC’s proven process for supporting our clients as they move up the search relevance curve.

However before then I’ll be at speaking at a few events. At the end of this month I’ll be in Copenhagen to speak on Keeping Search Relevant in a Digital Workplace at the Intrateam conference. This is a fantastic conference on intranets and I’m looking forward to speaking for the second time and joining a very august gathering of speakers. I’m also glad to be returning to both City University and the University of Essex during February and March to talk to students about working in search and information retrieval

In April I’ll be returning to the US for OSC’s Haystack search relevance conference, which was my favourite event of last year – I liked it so much I brought it to London that October. This year we have a fantastic lineup of talks from speakers representing organisations including LexisNexis, Wikimedia Foundation, Eventbrite and Yelp, a new and more capacious venue in downtown Charlottesville, three training options before the main conference (Think Like A Relevance Engineer for Elasticsearch and Solr, and Learning to Rank) and of course the chance to meet, chat with and get to know some of the best search people in the business. Earlybird tickets are available until the end of February and are already selling well, so make your plans to join us soon!

It’s already shaping up to be a busy year – so do keep an eye on this blog and my new home at www.opensourceconnections.com/blog for further news, and if you’d like to know how OSC can help you empower your search team get in touch.

The post Little Mermaids, Haystacks and moving on appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2019/02/15/little-mermaids-haystacks-and-moving-on/feed/ 0
More needles, more Haystacks, more relevance! http://www.flax.co.uk/blog/2018/12/05/more-needles-more-haystacks-more-relevance/ http://www.flax.co.uk/blog/2018/12/05/more-needles-more-haystacks-more-relevance/#respond Wed, 05 Dec 2018 11:28:31 +0000 http://www.flax.co.uk/?p=4009 Those of us who have been working in the search sector for a while know that search tuning isn’t just a matter of installing the default configuration, pointing the engine at some content and starting it up – in fact, … More

The post More needles, more Haystacks, more relevance! appeared first on Flax.

]]>
Those of us who have been working in the search sector for a while know that search tuning isn’t just a matter of installing the default configuration, pointing the engine at some content and starting it up – in fact, if you do just that you’ll probably end up with a search user experience that’s even worse then whatever you’re replacing and certainly a lot worse than your competitors’ solution. It’s also no longer about just knowing how one engine behaves and the magic tweaks to improve it – you need to understand the fundamentals of search and how a range of different products and projects implement this. You also need to understand user requirements and their often entirely subjective views of what is a ‘good’ and ‘bad’ search result, plus how different types of businesses can use search technology for site search, enterprise search, media monitoring, process improvement and myriad of other uses.

Over the last year or so we’ve seen the emergence of a new profession dedicated to improving how search systems present information to users – Relevance Engineering. Importantly this covers not just the technical aspects of search, but the business aspects – understanding the why as much as the how. Relevance engineers understand that search tuning is a multifaceted problem and there are no magic bullets (or magic AI robots) that will do all the work for you. I’ve started to write about relevance engineering recently to try and define what it means.

One of my favourite events last year was the first Haystack conference run by our partners Open Source Connections, which brought together both experienced relevance engineers and those new to the profession. It was friendly, informal, focused and informative. In fact, I enjoyed it so much that by the second day I was already thinking about how to bring the event to Europe – which we did successfully in October.

I’m very happy to say that Haystack is back in April 2019 and the Call for Papers is open until January 9th. If you’ve got an exciting relevance project or idea to talk about please do submit it. See you there!

The post More needles, more Haystacks, more relevance! appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2018/12/05/more-needles-more-haystacks-more-relevance/feed/ 0
Activate 2018 day 2 – AI and Search in Montreal http://www.flax.co.uk/blog/2018/11/07/activate-2018-day-2-ai-and-search-in-montreal/ http://www.flax.co.uk/blog/2018/11/07/activate-2018-day-2-ai-and-search-in-montreal/#respond Wed, 07 Nov 2018 12:09:38 +0000 http://www.flax.co.uk/?p=3983 I’ve already written about Day 1 of Lucidworks’ Activate conference; the second day started with a keynote on ‘moral code’, ethics & AI which unfortunately I missed, but a colleague reported that it was very encouraging to see topics such … More

The post Activate 2018 day 2 – AI and Search in Montreal appeared first on Flax.

]]>
I’ve already written about Day 1 of Lucidworks’ Activate conference; the second day started with a keynote on ‘moral code’, ethics & AI which unfortunately I missed, but a colleague reported that it was very encouraging to see topics such as diversity and inclusion raised in a keynote talk. Note that videos of some of the talks is starting to appear on Lucidworks’ Youtube channel.

Steve Rowe of Lucidworks gave a talk on what’s coming in Lucene/Solr 8 – a long list of improvements and new features from 7.x releases including autoscaling of SolrCloud clusters, better cross-datacentre replication (CDCR), time routed index aliases for time-series data, new replica types, streaming expressions, a JSON query DSL, better segment merge policies..it’s clear that a huge amount of work continues to go into Solr. In 8.x releases we’ll hopefully see HTTP/2 capability for faster throughput and perhaps Luke, the Lucene Index Toolbox, becoming part of the main project.

Cassandra Targett, also of Lucidworks, spoke about the Lucene/Solr Reference Guide which is now actually part of Solr’s source code in Asciidoc format. She had attempted to build this into a searchable, fully-hyperlinked documentation source using Solr itself but this quickly ran into issues with HTML tags and maintaining correct links. Lucidworks’ own Site Search did a lot better but the result still wasn’t perfect. Work remains to be done here but encouragingly in the last few weeks there’s also been some thinking about how to better document Solr’s huge and complex test suite on SOLR-12930. As Cassandra mentioned, effective documentation isn’t always the focus of Solr committers, but it’s essential for Solr users.

The next talk I caught came from Andrzej Bialecki on Solr’s autoscaling functionality and some impressive testing he’s done. Autoscaling analyzes your Solr cluster and makes suggestions about how to restructure it – which you can then do manually or automatically using other Solr features. These features are generally tested on collections of 1 billion documents – but Andrzej has manually tested them on 1 trillion simulated documents (yes, you read that right). Now that’s some scale!

The final talk I caught before the closing keynote was Chris ‘Hossman’ Hosstetter on How to be a Solr Contributor, amusingly peppered with profanity as is his usual style. There were a number of us in the room with some small concerns about Solr patches that have not been committed, and in general about how Solr might need more committers and how this might happen, but the talk mainly focused on how to generate new patches. He also mentioned how new features can have an unexpected cost, as they must then be maintained and might have totally unexpected consequences for other parts of the platform. Some of the audience raised questions about Solr tests (some of which regularly fail) – however since the conference Mark Miller has taken the lead on this under SOLR-12801 which is encouraging.

The closing keynote by Trey Grainger brought together the threads of search and AI – and also mentioned that if anyone had some spare server capacity, it would be fun to properly test Solr at trillion-document scale…

So in conclusion how did Activate compare to its previous incarnation as Lucene/Solr Revolution? Is search really the foundation of AI? Well, the talks I attended mainly focused on Solr features, but various colleagues heard about machine learning, learning-to-rank and self-aware machines, all of which is becoming easier to implement using Lucene/Solr. However, as Doug Turnbull writes if you’re thinking of a AI for search, you should be wary of the potential cost and complexity. There are no magic robots (Kevin Watters’ robot however, is rather wonderful!).

Huge thanks must go to all at Lucidworks for putting on such a well-organised and thought-provoking event and bringing together so many Lucene/Solr enthusiasts.

The post Activate 2018 day 2 – AI and Search in Montreal appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2018/11/07/activate-2018-day-2-ai-and-search-in-montreal/feed/ 0
Activate 2018 day 1 – AI and Search in Montreal http://www.flax.co.uk/blog/2018/10/30/activate-2018-day-1-ai-and-search-in-montreal/ http://www.flax.co.uk/blog/2018/10/30/activate-2018-day-1-ai-and-search-in-montreal/#respond Tue, 30 Oct 2018 13:34:53 +0000 http://www.flax.co.uk/?p=3922 Activate is the successor to the Lucene/Solr Revolution conference that our partner Lucidworks runs every Autumn and was held this year in Montreal, Canada. After running a successful Lucene Hackday on the Monday before the conference, we joined hundreds of … More

The post Activate 2018 day 1 – AI and Search in Montreal appeared first on Flax.

]]>
Activate is the successor to the Lucene/Solr Revolution conference that our partner Lucidworks runs every Autumn and was held this year in Montreal, Canada. After running a successful Lucene Hackday on the Monday before the conference, we joined hundreds of others to hear Will Hayes, the CEO of Lucidworks, explain the new name and direction of the event – it was nice to hear he agrees with me that search is the key to AI. Yoshua Bengio of local AI laboratory MILA followed Will and described some recent breakthroughs in AI including speech recognition, image recognition and went on to talk about Creative AI which can ‘imagine’ new faces after sufficient training. He listed five necessary ingredients for successful machine learning: lots of data, flexible models, enough compute power, computationally efficient inference and powerful prior assumptions to deflect the ‘curse of dimensionality’. These are hard to get right – he told us how even cutting-edge AI is still far from human-level intelligence but can be used to extend human cognitive power. MILA is the greatest concentration of academics working in deep learning in the world and heavily funded by the Canadian government.

I was also pleased to notice our Luwak stored search library mentioned in the handout Bloomberg had placed on every seat!

The talks I attended after the keynote were generally focused on open source, Solr or search topics, but the theme of AI was everywhere. The first talk I went to was about Accenture’s Content Analytics Studio – which looks like a useful tool for building search and analytics applications using a library of widgets and a Python code editor. Unfortunately it wasn’t very clear how one might use this platform, with the presenter eventually admitting that it was a proprietary product but not giving any idea of the price or business model. I would much prefer if presenters were up-front about commercial products, especially as many attendees were from an open source background.

David Smiley‘s talk on Querying Hundreds of Fields at Scale was a lot more interesting: he described how Salesforce run millions of Solr cores and index extremely diverse customer data (as each one can customise their field structure). Using the usual Solr qf operator across possibly 150 fields can lead to thousands of subqueries being generated which also need to be run across each segment. His approach to optimising performance included analysing the input data per field type rather than per field, building a custom segment merge policy and encoding the field type as a term suffix in the term dictionary. Although this uses more CPU time, it improves performance by at least a factor of 10. David hopes to contribute some of this work back to Solr as open source, although much is specific to Salesforce’ use case. This was a fascinating talk about some very clever low-level Lucene techniques.


Next was my favourite talk of the conference – Kevin Watters on the Intersection of Robotics, Search & AI, featuring a completely 3D-printed humanoid robot based on the open source InMoov platform and MyRobotLab software. Kevin has used hundreds of open source projects to add capabilities such as speech recognition, question answering (based on Wikipedia), computer vision, deep learning etc. using a pub/sub architecture. The robot’s ‘memory’ – everything it does, sees, hears and how the various modules interact – is stored in a Solr index. Kevin’s engaging talk showed us examples of how the robot’s search engine powered memory can be used for deep learning, for example for image recognition – in his demo it could be trained to recognise pictures of some Solr commmitters. This really was the crossover between search and AI!

Joel Bernstein then took us through Applied Mathematical Modelling with Apache Solr – describing the ongoing work to integrate the Apache Commons Math library. In particular he showed how these new features can be used for anomaly detection (e.g. an unusually slow network connection) using a simple linear regression model. Solr’s Streaming API can be used to run a constant prediction of the likely response times for sending files of a certain size and any statistically significant differences noted. This is just one example of the powerful features now available for Solr-based analytics – there was more to come in Amrit Sarkar‘s talk afterwards on Building Analytics Applications with Streaming Expressions. Amrit showed a demo (code available here) using Apache Zeppelin where Solr’s various SQL-style operations can be run in parallel for better performance, splitting the job up over a number of worker collections. As the demo imported data directly from a database using a JDBC connector, some of us in the room wondered whether this might be a higher-performing alternative to the venerable (and slow) Data Import Handler…

That was the last talk I saw on Wednesday: that evening was the conference party in a nearby bar, which was a lot of fun (although the massive TV screen showing that night’s hockey game was a little distracting!). I’ll write about day 2 soon: videos of the talks are likely to be available soon on Lucidworks’ Youtube channel and I’ll update this post when they appear.

The post Activate 2018 day 1 – AI and Search in Montreal appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2018/10/30/activate-2018-day-1-ai-and-search-in-montreal/feed/ 0
Lucene Hackdays in London & Montreal http://www.flax.co.uk/blog/2018/10/23/lucene-hackdays-in-london-montreal/ http://www.flax.co.uk/blog/2018/10/23/lucene-hackdays-in-london-montreal/#respond Tue, 23 Oct 2018 09:35:13 +0000 http://www.flax.co.uk/?p=3919 We ran a couple of Lucene Hackdays over the last couple of weeks: a chance to get together with other people working on open source search, learn from each other and to try and improve both Lucene and associated software. … More

The post Lucene Hackdays in London & Montreal appeared first on Flax.

]]>
We ran a couple of Lucene Hackdays over the last couple of weeks: a chance to get together with other people working on open source search, learn from each other and to try and improve both Lucene and associated software.

Our first Hackday was in London, hosted by Mimecast at their offices near Moorgate. Despite a fire alarm practice (during which we ended up under some flats at the Barbican, whose residents may have been a little surprised at quite how many people ended up milling around under their balconies) we had a busy day – we split into three groups to look at tools for inspecting Lucene indexes, various outstanding bugs and issues with Lucene and Solr and to review a well-known issue where different Solr replicas can provide slightly different result ordering. By 5.30 p.m. when we were scheduled to finish we were still frantically hacking on some last-minute Javascript to add a feature to our Marple index inspector – luckily a few minutes later to a collective sigh of relief we had it working and we repaired to a local pub for food and drink (kindly sponsored by Elastic).

The next week a number of us were in Montreal for the Activate conference (previously known as Lucene/Solr Revolution but now sprinkled with cutting-edge AI fairy dust!). Our second Hackday was hosted by Netgovern and we worked on various Lucene/Solr issues, some improvements to our Harahachibu proxy (which attempts to block Solr updates when disk space is low) and discussed in depth how to improve the Solr onboarded experience. Pizza (sponsored by OneMoreCloud) and coffee fueled the hacking and we also added some new features including a Query Parser for MinHash queries. Many Lucene/Solr committers attended and afterwards we met up for a drink & food nearby (thanks to Searchstax for sponsoring this!) where we were joined by a few others – including Yonik Seeley, creator of Solr.

Next it was time for Activate – of which more later! Thanks to everyone who attended – you can see some notes and links about what we worked on here. Work will be continuing on these issues I’m sure.

The post Lucene Hackdays in London & Montreal appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2018/10/23/lucene-hackdays-in-london-montreal/feed/ 0
Haystack Europe 2018, a brief retrospective http://www.flax.co.uk/blog/2018/10/15/haystack-europe-2018-a-brief-retrospective/ http://www.flax.co.uk/blog/2018/10/15/haystack-europe-2018-a-brief-retrospective/#comments Mon, 15 Oct 2018 15:15:49 +0000 http://www.flax.co.uk/?p=3914 It’s been a couple of weeks now since the first Haystack search relevance conference in Europe, which we ran with our partners Open Source Connections (OSC). Just under a hundred people came to the Friends’ House in Euston for a … More

The post Haystack Europe 2018, a brief retrospective appeared first on Flax.

]]>
It’s been a couple of weeks now since the first Haystack search relevance conference in Europe, which we ran with our partners Open Source Connections (OSC). Just under a hundred people came to the Friends’ House in Euston for a day of talks covering both the business and technical aspects of relevance engineering. Doug Turnbull of OSC started the day by introducing what would be a major theme of the conference, Learning to Rank, and how Bloomberg had used and benefited from open sourcing their LTR plugin for Solr. Karen Renshaw of Zoro (a division of Grainger Global Online) talked about how to tune relevance from a business perspective. Sebastian Russ of Tudock showed how even something as simple as an Excel spreadsheet can be a useful visualisation tool for relevance, while Alessandro Benedetti and Andrea Gazzarini of Sease demonstrated Rated Ranking Evaluator, a complete platform for relevance measurement. After lunch, Torsten Köster & Fabian Klenk of Shopping 24 and consultant René Kriegler described their journey with LTR for an ecommerce site and Agnes Van Belle of Textkernel showed how similar techniques can be applied to recruitment search. Tony Russell-Rose was our last speaker on strategies and tools for managing complex Boolean queries.

My only regret was how little time I had personally to catch up with the attendees, many of whom were from Flax clients past and present – I must have had 20 or 30 very brief chats during the day! Luckily a few of us went on for a drink afterwards and eventually a curry nearby. It was a very long day but from the feedback we’ve recieved so far a very successful one. We hope to make this a regular event on the calendar.

Thanks to all who made the event possible, our speakers and everyone who came – the slides are now available on the event website.

The post Haystack Europe 2018, a brief retrospective appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2018/10/15/haystack-europe-2018-a-brief-retrospective/feed/ 2
Lifting the hood of AI – to find a search engine? http://www.flax.co.uk/blog/2018/09/14/lifting-the-hood-of-ai-to-find-a-search-engine/ http://www.flax.co.uk/blog/2018/09/14/lifting-the-hood-of-ai-to-find-a-search-engine/#respond Fri, 14 Sep 2018 09:56:49 +0000 http://www.flax.co.uk/?p=3904 A few years ago much marketing noise was made about Big Data. Every software vendor suddenly had a Big Data suite; you could suddenly buy Big Data capable hardware; consultants and experts would release thought pieces, blogs and books all … More

The post Lifting the hood of AI – to find a search engine? appeared first on Flax.

]]>
A few years ago much marketing noise was made about Big Data. Every software vendor suddenly had a Big Data suite; you could suddenly buy Big Data capable hardware; consultants and experts would release thought pieces, blogs and books all about Big Data and how it would change the world. The reality of course was slightly different: Big Data meant…well, it meant whatever you wanted it to mean for your commercial purpose. For some people, what didn’t fit in an Excel spreadsheet was Big Data, for others with actually large collections of data to process it was often hard to sort the wheat from the PR chaff and find a solution that worked.

Those of us in the search engine sector would occasionally mention that we’d been dealing with not inconsequential amounts of data for many years (for example, the founders of Flax met while building a half-billion-page web search engine back in 1999). We already knew something about distributed computing, clusters of servers and how to scale for performance and reliability. There’s even some shared history: Hadoop, the foundation of so many Big Data architectures, was created by the same person who created the search library Lucene and the web crawler Nutch – so he could build a big search engine. As a result we ended up with suites of Big Data-capable software where the clever bit was… search technology.

We’re at a similar point now with AI. No matter how many pictures of humanoid robots they use, what people are calling AI is not the Terminator or a robot companion built by a reclusive billionaire. It’s generally a combination of techniques such as machine learning (ML) and natural language processing (NLP), some of which have been around for decades, which can (if you get them right) spot patterns in data, recognise graphical shapes, analyze human speech etc. Getting them right is the hard bit – you need good, reliable signals; models that work and most importantly clever people to put it together (and few of these people are available).

Again, some of the most interesting (and more likely to be real, rather than just a dodgy prototype thrown together in the hope that Google will buy your startup) work is happening in the world of search, where the underlying and necessary fundamentals of large-scale data processing, text processing, user interaction and matching are well understood through decades of experience. Here, AI techniques can be applied with practical results – for example, Learning to Rank which cleverly re-orders search results based on signals important to the business or user. So again, underneath the current trend we find a dependence on search technology. It’s unfortunate that some commentators have assumed that this means that everything in search is powered by magic AI – rather the reverse in some cases.

Activate, a conference previously known as Lucene Revolution and run by our partners Lucidworks, has brought together AI and search deliberately to explore these connections. We’re looking forward to attending next month – come and find us if you want to discuss your project!

The post Lifting the hood of AI – to find a search engine? appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2018/09/14/lifting-the-hood-of-ai-to-find-a-search-engine/feed/ 0
Three weeks of search events this October from Flax http://www.flax.co.uk/blog/2018/09/04/three-weeks-of-search-events-this-october-from-flax/ http://www.flax.co.uk/blog/2018/09/04/three-weeks-of-search-events-this-october-from-flax/#respond Tue, 04 Sep 2018 10:11:56 +0000 http://www.flax.co.uk/?p=3891 Flax has always been very active at conferences and events – we enjoy meeting people to talk about search! With much of our consultancy work being carried out remotely these days, attending events is a great way to catch up … More

The post Three weeks of search events this October from Flax appeared first on Flax.

]]>
Flax has always been very active at conferences and events – we enjoy meeting people to talk about search! With much of our consultancy work being carried out remotely these days, attending events is a great way to catch up in person with our clients, colleagues and peers and to learn from others about what works (and what doesn’t) when building cutting-edge search solutions. I’m thus very glad to announce that we’re running three search events this coming October.

Earlier in the year I attended Haystack in Charlottesville, one of my favourite search conferences ever – and almost immediately began to think about whether we could run a similar event here in Europe. Although we’ve only had a few months I’m very happy to say we’ve managed to pull together a high-quality programme of talks for our first Haystack Europe event, to be held in London on October 2nd. The event is focused on search relevance from both a business and a technical perspective and we have speakers from global retailers and by specialist consultants and authors. Tickets are already selling well and we have limited space, so I would encourage you to register as soon as you can (Haystack USA sold out even after the capacity was increased). We’re running the event in partnership with Open Source Connections.

The next week we’re running a Lucene Hackday on October 9th as part of our London Lucene/Solr Meetup programme. Building on previous successful events, this is a day of hacking on the Apache Lucene search engine and associated software such as Apache Solr and Elasticsearch. You can read up on what we achieved at our last event a couple of years ago – again, space is limited, so sign up soon to this free event (huge thanks to Mimecast for providing the venue and to Elastic for sponsoring drinks and food for an evening get-together afterwards). Bring a laptop and your ideas (and do comment on the event page if you have any suggestions for what we should work on).

We’ll be flying to Montreal soon afterwards to attend the Activate conference (run by our partners Lucidworks) and while we’re there we’ll host another free Lucene Hackday on October 15th. Again, this would not be possible without sponsorship and so thanks must go to Netgovern, SearchStax and One More Cloud. Remember to tell us your ideas in the comments.

So that’s three weeks of excellent search events – see you there!

The post Three weeks of search events this October from Flax appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2018/09/04/three-weeks-of-search-events-this-october-from-flax/feed/ 0
Lucene Solr London: Search Quality Testing and Search Procurement http://www.flax.co.uk/blog/2018/06/29/lucene-solr-london-search-quality-testing-and-search-procurement/ http://www.flax.co.uk/blog/2018/06/29/lucene-solr-london-search-quality-testing-and-search-procurement/#respond Fri, 29 Jun 2018 11:09:34 +0000 http://www.flax.co.uk/?p=3850 Mimecast were our kind hosts for the latest London Lucene/Solr Meetup (and even provided goodie bags). It’s worth repeating that we couldn’t run these events without the help of sponsors and hosts and we’re always very grateful (and keep those … More

The post Lucene Solr London: Search Quality Testing and Search Procurement appeared first on Flax.

]]>
Mimecast were our kind hosts for the latest London Lucene/Solr Meetup (and even provided goodie bags). It’s worth repeating that we couldn’t run these events without the help of sponsors and hosts and we’re always very grateful (and keep those offers coming!).

First up was Andrea Gazzarini presenting a brand new framework for search quality testing. Designed for offline measurement, Rated Ranking Evaluator is an open source Java library (although it can be used from other languages). It uses a heirarchical model to arrange queries into query groups (all queries in a query group should be producing the same results). Each test can run across a number of search engine configuration versions and outputs results in JSON format – but these can also be translated into Excel spreadsheets, PDFs or sent to a server that provides a live console showing how search quality is affected by a search engine configuration change. Although aimed at Elasticsearch and Solr, the platform is extensible to any underlying search engine. This is a very useful tool for search developers and joins Quepid and Searchhub’s recently released search analytics acquisition library in the ‘toolbox’ for relevance engineers. You can see Andrea’s slides here.

Martin White spoke next on how open source search solutions fare in corporate procurements for enterprise search. This was an engaging talk from Martin , showing the scale of the opportunities for open source platforms with budgets of several million pounds being common for enterprise search projects. However, as he mentioned it can be very difficult for procurement departments to get information from vendors and ‘the last thing you’ll know about a piece of enterprise software is how much it will cost’. He detailed how open source solutions often compare badly against closed source commercial offerings due to it being hard to see the ‘edges’ – e.g. what custom development will be necessary to fulfil enterprise requirements. Although the opportunities are clear, it seems open source based solutions still have a way to go to compete. You can read more from Martin on this subject in the recent free Search Insights report.

Thanks to Mimecast and both speakers – we’ll be back after the summer with another Meetup!

The post Lucene Solr London: Search Quality Testing and Search Procurement appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2018/06/29/lucene-solr-london-search-quality-testing-and-search-procurement/feed/ 0
Catching MICES – a focus on e-commerce search http://www.flax.co.uk/blog/2018/06/19/catching-mices-a-focus-on-e-commerce-search/ http://www.flax.co.uk/blog/2018/06/19/catching-mices-a-focus-on-e-commerce-search/#respond Tue, 19 Jun 2018 14:15:55 +0000 http://www.flax.co.uk/?p=3831 The second event I attended in Berlin last week was the Mix Camp on e-commerce search (MICES), a small and focused event now in its second year and kindly hosted by Mytoys at their offices. Slides for the talks are … More

The post Catching MICES – a focus on e-commerce search appeared first on Flax.

]]>
The second event I attended in Berlin last week was the Mix Camp on e-commerce search (MICES), a small and focused event now in its second year and kindly hosted by Mytoys at their offices. Slides for the talks are available here and I hope videos will appear soon.

The first talk was given by Karen Renshaw of Grainger, who Flax worked with at RS Components (she also wrote a great series of blog posts for us on improving relevancy). Karen’s talk drew on her long experience of managing search teams from a business standpoint – this wasn’t about technology but about combining processes, targets and objectives to improve search quality. She showed how to get started by examining customer feedback, known issues, competitors and benchmarks; how to understand and categorise query types; create a test plan within a cross-functional team and to plan for incremental change. Testing was covered including how to score search quality and how to examine the impact of search changes, with the message that “all aspects of search should work together to help customers through their journey”. She concluded with the clear point that there are no silver bullets, and that expectations must be managed during an ongoing, iterative process of improvement. This was a talk to set the scene for the day and containing lessons for every search manager (and a good few search technologists who often ignore the business factors!).

Next up were Christine Bellstedt & Jens Kürsten from Otto, Germany’s second biggest online retailer with over 850,000 search queries a day. Their talk focused on bringing together the users and business perspective to create a search quality testing cycle. They quoted Peter Freis’ graphic from his excellent talk at Haystack to illustrate how they created an offline system for experimentation with new ranking methods based on linear combinations of relevance scores from Solr, business performance indicators and product availability. They described how they learnt how hard it can be to select ranking features, create test query sets with suitable coverage and select appropriate metrics to measure. They also talked about how the experimentation cycle can be used to select ‘challengers’ to the current ‘champion’ ranking method, which can then be A/B tested online.

Pavel Penchev of SearchHub was next and presented their new search event collector library – a Javascript SDK which can be used to collect all kinds of metrics around user behaviour and submit them directly to a storage or analytics system (which could even be a search engine itself – e.g. Elasticsearch/Kibana). This is a very welcome development – only a couple of months ago at Haystack I heard several people bemoaning the lack of open source tools for collecting search analytics. We’ll certainly be trying out this open source library.

Andreas Brückner of e-commerce search vendor Fredhopper talked about the best way to optimise search quality in a business context. His ten headings included “build a dedicated search team” – although 14% of Fredhoppers own customers have no dedicated search staff – “build a measurement framework” – how else can you see how revenue might be improved? and “start with user needs, not features”. Much to agree with in this talk from someone with long experience of the sector from a vendor viewpoint.

Johannes Peter of MediaMarktSaturn described an implementation of a ‘semantic’ search platform which attempts to understand queries such as ‘MyMobile 7 without contract’, recognising this is a combination of a product name, a Boolean operator and an attribute. He described how an ontology (perhaps showing a family of available products and their variants) can be used in combination with various rules to create a more focused query e.g. “title:(“MyMobile7″) AND NOT (flag:contract)”. He also mentioned machine learning and term co-occurrence as useful methods but stressed that these experimental techniques should be treated with caution and one should ‘fail early’ if they are not producing useful results.

Ashraf Aaref & Felipe Besson described their journey using Learning to Rank to improve search at GetYourGuide, a marketplace for activities (e.g. tours and holidays). Using Elasticsearch and the LtR plugin recently released by our partners OpenSourceConnections they tried to improve the results for their ‘location pages’ (e.g. for Paris) but their first iteration actually gave worse results than the current system and was thus rejected by their QA process. They hope to repeat the process using what they have learned about how difficult it is to create good judgement data. This isn’t the first talk I’ve seen that honestly admits that ML approaches to improving search aren’t a magic silver bullet and the work itself is difficult and requires significant investment.

Duncan Blythe of Zalando gave what was the most forward-looking talk of the event, showing a pure Deep Learning approach to matching search queries to results – no query parsing, language analysis, ranking or anything, just a system that tries to learn what queries match which results for a product search. This reminded me of Doug & Tommaso’s talk at Buzzwords a couple of days before, using neural networks to learn the journey between query and document. Duncan did admit that this technique is computationally expensive and in no way ready for production, but it was exciting to hear about such cutting-edge (and well funded) research.

Doug Turnbull was the last speaker with a call to arms for more open source tooling, datasets and relevance judgements to be made available so we can all build better search technology. He gave a similar talk to keynote the Haystack event two months ago and you won’t be surprised to hear that I completely agree with his viewpoint – we all benefit from sharing information.

Unfortunately I had to leave MICES at this point and missed the more informal ‘bar camp’ event to follow, but I would like to thank all the hosts and organisers especially René Kriegler for such an interesting day. There seems to be a great community forming around e-commerce search which is highly encouraging – after all, this is one of the few sectors where one can draw a clear line between improving relevance and delivering more revenue.

The post Catching MICES – a focus on e-commerce search appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2018/06/19/catching-mices-a-focus-on-e-commerce-search/feed/ 0