A few weeks ago I sat in on a workshop in London at the Taxonomy Boot Camp conference, run by Jeff Fried of BA Insight. I've known Jeff for many years from various events and we share some views on how search systems should be built and managed - using best-of-breed technology and effective management processes. He was kind enough to ask me to join a recent podcast. During the podcast, we had a great conversation about open source search, enterprise...Continue reading
Worth the wait – Apache Kafka hits 1.0 release
We've known about Apache Kafka for several years now - we first encountered it when we developed a prototype streaming Boolean search engine for media monitoring with our own library Luwak. Kafka is a distributed streaming platform with some simple but powerful concepts - everything it deals with is a stream ...Continue reading
Elastic London Meetup: Rightmove & Signal Media and a new free security plugin for Elasticsearch
I finally made it to a London Elastic Meetup again after missing a few of the recent events: this time Rightmove were the hosts and the first speakers. They described how they had used Elasticsearch Percolator to run 3.5 million stored searches on new property listings as part of an overall migration from the Exalead search engine and Oracle database to a new stack bas...Continue reading
How to build a search relevance team
We've spent a lot of time working with clients who recognise that their search engine isn't delivering relevant results to users. Often this is seen as solely a technical problem, which can be resolved simply by changing query parameters or the search engine configuration - but technical teams need clear direction on why a result should or should not appear at a certain position, not just request for general relevance improvements. It's thus important to consider relevance as...Continue reading
Better performance with the Logstash DNS filter
We've been working on a project for a customer which uses Logstash to read messages from Kafka and write them to Elasticsearch. It also parses the messages into fields, and depending on the content type does DNS lookups (both forward and reverse.) While performance testing I noticed that adding caching to the Logstash DNS filter actually reduced performance, contrary to expectations. With four filter worker threads, and the following configuration:
dns { resolve => [ ...Continue reading
Elasticsearch, Kibana and duplicate keys in JSON
JSON has been the lingua franca of data exchange for many years. It's human-readable, lightweight and widely supported. However, the JSON spec does not define what parsers should do when they encounter a duplicate key in an object, e.g.:
{ "foo": "spam", "foo": "eggs", ... }Implementations are free to interpret this how they like. When different systems have different interpretations this can cause problems. We recently encounter...Continue reading
Announcing our new book, Searching the Enterprise
For the last year or so I've been working with Professor Udo Kruschwitz of the University of Essex on a long-form journal article on enterprise search - although at 156 pages this is more of a book than a journal. Released as part of the Foundations and Trends® in Information Retrieval series by Now Publishing, the b...Continue reading
A lack of cognition and some fresh FUD from Forrester
Last night the estimable Martin White, intranet and enterprise search expert and author of many books on the subject, flagged up two surprising articles from Forrester who have declared that Cognitive Search (we'll define this using their own terms in...Continue reading
London Lucene/Solr Meetup: Query Pre-processing & SQL with Solr
Bloomberg kindly hosted the London Lucene/Solr Meetup last night and we were lucky enough to have two excellent speakers for the thirty or so attendees. kicked off with a talk about the Kriegler
ECIR 2017 Industry Day, our book & a demo of live TV factchecking
I visited Aberdeen before Easter to speak at Industry Day, a part of the European Conference on Information Retrieval. Following a reception at Aberdeen's Town House (a wonderful building) hosted by the Lord Provost I spent an evening with various information retrieval luminaries including Professor Udo Kruschwitz of the University of Essex. We had a chance to discuss the book we're co-authoring (draft title 'Searching the Enterprise', designed as a review of t...Continue reading