Search Solutions 2017 review

Charlie Hull — Thu, 14 Dec 2017 15:33:19 +0000

Search Solutions is one of my favourite search events of the year – small, focused and varied, with presentations from both the largest and smallest players in the world of search, drawn from both industry and academia.

This year’s event started with Edgar Meij of Bloomberg, who Flax have helped in the past with their large-scale search and alerting systems. I’d seen most of the details in this talk before so I won’t dwell on them but will thank Bloomberg again for their commitment and contributions to the open source community, particularly to Solr and our Luwak stored search library. Mark Fea of LexisNexis was up next with a talk about taxonomies and how they have built a semi-automated classification system combining supervised machine learning and Boolean rules-based systems: a pragmatic approach to combine the strengths of both approaches as machine learning isn’t always as clever as one might want, and Boolean rules can be hard to build and maintain. Like Bloomberg they are working at large scale: Mark mentioned taxonomies of 21,000 terms and 9 levels, applied to over 1 billion documents.

Mark Harwood of Elastic was up next with one of his always fascinating talks on discovering unknown patterns in data with Elasticsearch. He showed how he had explored ‘toxic’ content (far-right music and those who like it) and fake reviews on Amazon with some great visual demonstrations. An interesting conclusion was how ‘bad actors’ make strange, recognisable shapes in visualised data. [Mark later won the Best Presentation award, richly deserved!]. Anna Kolliakou of King’s College London spoke next on ‘veracity intelligence’ tools to help monitor terms connected to mental health across news media and social networks: an interesting example was ‘mephedrone’ around the time of reclassification of this particular recreational drug. Next up was independent consultant Phil Bradley with a detailed, well-researched and passionate talk on fake news and how one cannot trust any web search engine to present the full picture. Phil is obviously extremely concerned about this issue and his talk spurred discussion amongst the audience about how user education is essential to counter the usual viewpoint of ‘it’s on Google, it must be true’.

Coincidentally, Filip Radlinski of Google started the next session, describing a model for conversation information retrieval. He spoke about how the user and IR system reveal information about themselves as the conversation progresses, how the system may need a memory of past interactions and how it may present a set of potential answers. This is a useful model for the future, although most current ‘conversational’ systems are simplistic. Fabrizio Silvestri then spoke on the various types of search Facebook provides, mostly related to finding people but also images, video and news. He explained how every search operation needs to consider privacy and how Facebook use query rewriting to expand enhance the terms provided by the user. Nicola Cancedda of Microsoft was next with a talk on automated query extraction from emails, to help the user find and attach relevant documents in response (for example, after a colleague asks ‘can you send me the cost projections for 2017’). Her work involves training machine learning models after extracting candidate terms with high TF/IDF values from the email. [Interestingly this reminded me of work I carried out nearly 20 years ago on an email signature that when clicked would search for content relevant to the email – although this relied on Javascript working in an email client which is rather a security problem!].

Last of our scheduled talks was from Mark Stanger of Search Technologies (recently acquired by Accenture) about their work on Elsevier’s DataSearch platform. He described how they developed a Phrase Service that identifies phrases in the user’s query using various methods including acronym detection, dictionary lookup and natural language processing, then expands these phrases as necessary to provide enhanced search. After identifying these key terms they can be boosted appropriately for search (DataSearch itself is based on Solr).

The DataSearch project is impressive, and later on it won the Best Search Project award (I am proud to say I served as part of the judging panel for these awards this year). The other winner of most promising search startup Search|hub by CXP Commerce Experts GmbH.

We finished with some lightning talks and a brief Fishbowl session, dominated this time by discussions on Fake News and how it affects the world of search technology. Thanks to the BCS IRSG again for a fascinating and enlightening day.

The post Search Solutions 2017 review appeared first on Flax.

Search Solutions 2016 review

Charlie Hull — Wed, 07 Dec 2016 10:31:08 +0000

Last week I attended Search Solutions, one of my favourite annual events where all aspects of search are covered from web to intranet to enterprise. The first speaker Sebastian Blohm from Microsoft spoke about a new personalised Clutter folder for email and how his team had first developed a global model and then developed a way for this to be tuned for each user. The system can then filter email that isn’t actually spam, but might not be important (e.g. a company-wide announcement about car parking) into the Clutter folder. Context and interaction patterns were just a few of the signals used in a probabilistic programming model. Marc Bron of Yahoo then described his work developing quality metrics for online adverts, including measurements of mobile friendliness and aesthetic quality. Filtering the 5% of ‘bad’ adverts can be shown to increase conversion by 10% and showing only the 20% most ‘beautiful’ adverts can increase it up to 60%. Qin Yin of Google was next to tell us about the development of an on-device search capability for Android devices, to find content that is locally stored or cached – useful when connection quality is bad for example. Although light on technical detail Qin Yin did describe the Firebase API that app developers can use to submit content to be indexed as memory-mapped files on Flash storage.

The next session was started by Graham Digby of Lexis Nexis on building a legal question answering system. This was a great pragmatic approach where XML content was marked up as answers and a simple query matching algorithm allowed auto-suggestion of likely questions based on what the user was typing. Although simple, this approach works well and his team are now looking at ways to further analyse the query using part-of-speech tagging. Jon Brassey of Trip Database was next describing the evolution of his system from a simple list of hyperlinks to searchable resources to a comprehensive searchable database of medical information. Interestingly his index is enhanced with a quality metric so data that has undergone systematic review is ranked higher in results.

Vassilis Plachouras of Thomson Reuters started the next session, talking about a system to automatically generate textual descriptions of data, including some fascinating ordered lists of verbs used to describe trends – “plunge”, “rocketing” and “surged” all indicating a strong change. Next was Paul Cleverley who has carried out some fascinating research into the complaints made about enterprise search, identifying likely causal factors – it was interesting to see how technology was only responsible for 38% of the complaints. Paul also showed a fascinating model of the modality of search which with his permission I reproduce below:

I was up next to briefly describe the Lucene4IR workshop held in Glasgow recently to try to encourage the academic and industry communities to work closer on teaching and improving Lucene-based search engines. My slides are available here.

The next session started with an amusing talk by Matthew Karas, a veteran of speech recognition who has worked with Autonomy, the BBC and others on many applications. Apparently Welsh is far easier to automatically convert into text than English! Frederic Fol Leymarie finished up with a description of how shape-based search can be performed using lessons from research into human visual perception. A panel session followed which mainly covered the skills shortage mentioned in my talk, with a wider discussion of how non-technical ‘search managers’ are also in short supply.

The day finished with the inaugaral IRSG Search Industry Awards – I was honoured to have been asked to help judge these. The results have been announced by event chair Tony Russell-Rose, congratulations to all the winners. As ever this was a fascinating day covering many aspects of search and thanks must go to the BCS IRSG and all who spoke.

The post Search Solutions 2016 review appeared first on Flax.

search solutions – Flax

Search Solutions 2017 review

Search Solutions 2016 review