factchecking – Flax

ECIR 2017 Industry Day, our book & a demo of live TV factchecking

Charlie Hull — Mon, 24 Apr 2017 13:45:44 +0000

I visited Aberdeen before Easter to speak at Industry Day, a part of the European Conference on Information Retrieval. Following a reception at Aberdeen’s Town House (a wonderful building) hosted by the Lord Provost I spent an evening with various information retrieval luminaries including Professor Udo Kruschwitz of the University of Essex. We had a chance to discuss the book we’re co-authoring (draft title ‘Searching the Enterprise’, designed as a review of the subject for those considering a PhD or those in business wanting to know the current state of the art – it should be out later this year) and I also caught up with our associate Tony Russell-Rose of UXLabs.

Industry Day started with a talk by Peter Mika of Norwegian media group Schibsted on modelling user behaviour for delivering personalised news. It was interesting to hear his views on Facebook and the recent controversy about their removal of a photo posted by a Schibsted group newspaper, and how this might be a reason Schibsted carry out their own internal developments rather than relying on the algorithms used by much larger companies. Edgar Meij was up next talking about search at Bloomberg (which we’ve been involved in) and it was interesting to hear that they might be contributing some of their alerting infrastructure back to Apache Lucene/Solr. James McMinn of startup Scoop Analytics followed, talking about real time news monitoring. They have built a prototype system based on PostgresSQL rather than a search engine, indexing around half a billion tweets, that allows one to spot breaking news much earlier than the main news outlets might report it.

The next session started with Michaela Regneri of OTTO on Newsleak.io, a project in collaboration with Der Speigel “producing a piece of software that allows to quickly and intuitively explore large amounts of textual data”. She stressed how important it is to have a common view of what is ‘good’ performance in collaborative projects like this. Richard Boulton (who worked at Flax many years ago) was next in his role as Head of Software Engineering at the Government Digital Service, talking about the ambitious project to create a taxonomy for all UK government content. So far, his team have managed to create an alpha version of this for educational content – not that they don’t have the time or resources in-house to tag content, so must therefore work with the relevant departments to do so. They have created various software tools to help including an automatic topic tagger using Latent Dirichlet Allocation – which given this is the GDS, is of course open source and available.

Unfortunately I missed a session after this due to a phone call, but managed to catch some of Elizabeth Daly of IBM talking about automatic claim detection using the Watson framework. Using Wikipedia as a source, this can identify statements that support a particular claim for an argument and tag them as ‘pro’ or ‘con’. This topic led neatly on to Will Moy of Full Fact who we have been working with recently, in a ‘sandwich’ session with myself. Will talked about how Full Fact has been working for many years to develop neutral, un-biased factchecking tools and services and I then spoke about the hackday we ran recently for FullFact and particularly about our Luwak library and how it can be used to spot known claims by politicians in streaming news. Will then surprised me and impressed the audience by showing a prototype service that watches several UK television channels in real time, extracts the subtitles and checks them against a list of previously factchecked claims – using the Luwak backend we built at the hackday. Yes, that’s live factchecking of television news, very exciting!

Thanks to Professor Kruschwitz and Tony Russell-Rose for putting together the agenda and inviting both me and Will to speak – it was great to be able to talk about the exciting work we’re doing with Full Fact and to hear about the other projects.

The post ECIR 2017 Industry Day, our book & a demo of live TV factchecking appeared first on Flax.

A fabulous FactHack for Full Fact

Charlie Hull — Fri, 27 Jan 2017 10:49:20 +0000

Last week we ran a hackday for Full Fact, hosted by Facebook in their London office. We had planned to gather a room full of search experts from our London Lucene/Solr Meetup and around twenty people attended from a range of companies including Bloomberg, Alfresco and the European Bioinformatics Institute, including a number of Lucene/Solr committers.

Mevan Babakar of Full Fact has already written a detailed review of the day, but to summarise we worked on three areas:

Building a web service around our Luwak stored query engine, to give it an easy-to-use API. We now have an early version of this which allows Full Fact to check claims they have previously fact checked against a stream of incoming data (e.g. subtitles or transcripts of political events).
Creating a way to extract numbers from text and turn them into a consistent form (e.g. ‘eleven percent’, ‘11%’, ‘0.11’) so that we can use range queries more easily – Derek Jones’ team researched existing solutions and he has blogged about what they achieved.
Investigating how to use natural language processing to identify parts of speech and tag them in a Lucene index using synonyms and token stacking, to allow for queries such as ‘ is rising’ to match text like ‘crime is rising’ – the team forked Lucene/Solr to experiment with this.

We’re hoping to build on these achievements to continue to support Full Fact as they develop open source automated fact checking tools for both their own operations and for other fact checking organisations across the world (there were fact checkers from Argentina and Africa attending to give us an international perspective). Our thanks to all of those who contributed.

I’ve also introduced Full Fact to many others within the search and text analytics community and we would welcome further contributions from anyone who can lend their expertise and time – get in touch if you can help. This is only the beginning!

The post A fabulous FactHack for Full Fact appeared first on Flax.

Just the facts with Solr & Luwak

Charlie Hull — Wed, 04 Jan 2017 15:58:19 +0000

It won’t have escaped your notice that factchecking is very much in the news recently due to last year’s political upheavals in both the US and UK and the suspected influence of fake news on voters. Both traditional and social media organisations are making efforts in this area; examples include Channel 4 and Facebook.

At our recent London Lucene/Solr Meetup UK charity Full Fact spoke eloquently on the need for automated factchecking tools to help identify and correct stories that are demonstrably false. They’ve also published a great report on The State of Automated Factchecking which mentions both Apache Solr and our powerful stored query library Luwak as components of their platform. We’ve been helping FullFact with their prototype factchecking tools for a while now but during the Meetup I suggested we might run a hackday to develop these further.

Thus I’m very pleased to announce that Facebook have offered us a venue in London for the hackday on January 20th (register here). Many Solr developers, including several committers and PMC members, are signed up to attend already. We’ll use Full Fact’s report and their experiences of factchecking newspapers, TV’s Question Time and Hansard to design and build practical, useful tools and identify a future roadmap. We’ll aim to publish what we build as open source software which should also benefit factchecking organisations across the world.

If you’re concerned about the impact of fake news on the political process and want to help, join the Meetup and/or donate to Full Fact.

The post Just the facts with Solr & Luwak appeared first on Flax.

Meetup at Big Data London – One-click Solr & Factchecking with Solr

Charlie Hull — Thu, 10 Nov 2016 11:22:26 +0000

Last week I spoke at the Big Data London conference, a very busy event with several thousand people attending. My session was on using open source search to make sense of Big Data – you can get slides here.

In the evening we ran another Lucene/Solr London Usergroup event with speakers Upayavira and Full Fact. After a brief but friendly fight with the Datastax team over pizza we settled down to see Upayavira show us his method for creating a fully functional SolrCloud stack and search application with a single command line using tools such as Docker, Rancher and Exhibitor. Upayavira’s system only needs to be given details of an Amazon Web Services cloud hosting account and it will create host instances, install and start Zookeeper, wait for a quorum to be established, install and start Solr and create a SolrCloud cluster and finally install and start a search application. The whole thing is managed by his own script Uberstack and is undeniably impressive.

Our second talk (and I think my favourite talk from all our Solr Meetups) was from Will Moy and Mevan Babakar of Full Fact, a charity who monitor the news for accuracy (something we increasingly require in these ‘post-truth’ days). Will told us how false and misleading claims can be amplified by the media and may end up directly influencing government policy, even though the underlying facts are wrong. FullFact are attempting to build open source, freely available systems for automating the factchecking process using Apache Lucene/Solr and our own stored query library Luwak and Flax have been donating some time to help them with this process. Their Hawk system currently indexes over 70 million sentences. This project is a wonderful example of how free, open source software can be used to create tools that benefit us all and at the end of this inspiring talk many of the audience offered ideas and even direct assistance with the project. I urge you to read Full Fact’s recent report on automated factchecking and get involved if you can. One idea was to run a Hackday for Full Fact – more details when we have them.

Thanks to Big Data London for inviting me to speak and hosting the Meetup and to Elsevier for sponsoring pizza and drinks. We’ll be back with another Meetup soon!

The post Meetup at Big Data London – One-click Solr & Factchecking with Solr appeared first on Flax.