I visited Aberdeen before Easter to speak at Industry Day, a part of the European Conference on Information Retrieval. Following a reception at Aberdeen’s Town House (a wonderful building) hosted by the Lord Provost I spent an evening with various information retrieval luminaries including Professor Udo Kruschwitz of the University of Essex. We had a chance to discuss the book we’re co-authoring (draft title ‘Searching the Enterprise’, designed as a review of the subject for those considering a PhD or those in business wanting to know the current state of the art – it should be out later this year) and I also caught up with our associate Tony Russell-Rose of UXLabs.
Industry Day started with a talk by Peter Mika of Norwegian media group Schibsted on modelling user behaviour for delivering personalised news. It was interesting to hear his views on Facebook and the recent controversy about their removal of a photo posted by a Schibsted group newspaper, and how this might be a reason Schibsted carry out their own internal developments rather than relying on the algorithms used by much larger companies. Edgar Meij was up next talking about search at Bloomberg (which we’ve been involved in) and it was interesting to hear that they might be contributing some of their alerting infrastructure back to Apache Lucene/Solr. James McMinn of startup Scoop Analytics followed, talking about real time news monitoring. They have built a prototype system based on PostgresSQL rather than a search engine, indexing around half a billion tweets, that allows one to spot breaking news much earlier than the main news outlets might report it.
The next session started with Michaela Regneri of OTTO on Newsleak.io, a project in collaboration with Der Speigel “producing a piece of software that allows to quickly and intuitively explore large amounts of textual data”. She stressed how important it is to have a common view of what is ‘good’ performance in collaborative projects like this. Richard Boulton (who worked at Flax many years ago) was next in his role as Head of Software Engineering at the Government Digital Service, talking about the ambitious project to create a taxonomy for all UK government content. So far, his team have managed to create an alpha version of this for educational content – not that they don’t have the time or resources in-house to tag content, so must therefore work with the relevant departments to do so. They have created various software tools to help including an automatic topic tagger using Latent Dirichlet Allocation – which given this is the GDS, is of course open source and available.
Unfortunately I missed a session after this due to a phone call, but managed to catch some of Elizabeth Daly of IBM talking about automatic claim detection using the Watson framework. Using Wikipedia as a source, this can identify statements that support a particular claim for an argument and tag them as ‘pro’ or ‘con’. This topic led neatly on to Will Moy of Full Fact who we have been working with recently, in a ‘sandwich’ session with myself. Will talked about how Full Fact has been working for many years to develop neutral, un-biased factchecking tools and services and I then spoke about the hackday we ran recently for FullFact and particularly about our Luwak library and how it can be used to spot known claims by politicians in streaming news. Will then surprised me and impressed the audience by showing a prototype service that watches several UK television channels in real time, extracts the subtitles and checks them against a list of previously factchecked claims – using the Luwak backend we built at the hackday. Yes, that’s live factchecking of television news, very exciting!
Thanks to Professor Kruschwitz and Tony Russell-Rose for putting together the agenda and inviting both me and Will to speak – it was great to be able to talk about the exciting work we’re doing with Full Fact and to hear about the other projects.