Highly scalable stored search and media monitoring with open source software
Flax builds high-performance media monitoring systems using our own open source library, Luwak. Simply put, it allows you to define a set of search queries and then monitor a stream of documents for any that might match these queries: a function also known as ‘reverse search’ and ‘document routing’. Based on the powerful Apache Lucene library, it can be used to build monitoring and classification systems that apply millions of highly complex queries over millions of documents a day. It can also be used to replace legacy systems such as Verity and dtSearch – and we have custom parsers available so you don’t have to translate your stored queries to a new format.
Luwak is used by companies including Bloomberg and Infomedia to power their large scale media monitoring systems. We were invited to speak about our work with Infomedia at the 2015 FIBEP World Media Intelligence Congress.
Luwak is available as open source software from our Github repository. In our tests, it has been shown to be 6-40x faster than the Elasticsearch Percolator.
You can browse the Flax blog for posts about Luwak and read Scott Stults’ introductory post on How to use Luwak to run preset queries against incoming documents.
You can find out a bit more about how Flax use Luwak for media monitoring applications in this video from Lucene Revolution 2013 and this video from Berlin Buzzwords 2014 and this post on how we combined it with Apache Samza (including a great illustration of how Luwak internals work).
Contact us for details of how to use Luwak in your application.