It’s not just about technology – training for search managers is vital

Posted on November 7, 2017 by Charlie Hull

A few weeks ago I sat in on a workshop in London at the Taxonomy Boot Camp conference, run by Jeff Fried of BA Insight. I've known Jeff for many years from various events and we share some views on how search systems should be built and managed - using best-of-breed technology and effective management processes. He was kind enough to ask me to join a recent podcast. During the podcast, we had a great conversation about open source search, enterprise...Continue reading

Worth the wait – Apache Kafka hits 1.0 release

Posted on November 2, 2017 by Charlie Hull

We've known about Apache Kafka for several years now - we first encountered it when we developed a prototype streaming Boolean search engine for media monitoring with our own library Luwak. Kafka is a distributed streaming platform with some simple but powerful concepts - everything it deals with is a stream ...Continue reading

Elastic London Meetup: Rightmove & Signal Media and a new free security plugin for Elasticsearch

Posted on September 28, 2017 by Charlie Hull

I finally made it to a London Elastic Meetup again after missing a few of the recent events: this time Rightmove were the hosts and the first speakers. They described how they had used Elasticsearch Percolator to run 3.5 million stored searches on new property listings as part of an overall migration from the Exalead search engine and Oracle database to a new stack bas...Continue reading

How to build a search relevance team

Posted on September 11, 2017 by Charlie Hull

We've spent a lot of time working with clients who recognise that their search engine isn't delivering relevant results to users. Often this is seen as solely a technical problem, which can be resolved simply by changing query parameters or the search engine configuration - but technical teams need clear direction on why a result should or should not appear at a certain position, not just request for general relevance improvements. It's thus important to consider relevance as...Continue reading

Better performance with the Logstash DNS filter

Posted on August 17, 2017 by Tom

We've been working on a project for a customer which uses Logstash to read messages from Kafka and write them to Elasticsearch. It also parses the messages into fields, and depending on the content type does DNS lookups (both forward and reverse.) While performance testing I noticed that adding caching to the Logstash DNS filter actually reduced performance, contrary to expectations. With four filter worker threads, and the following configuration:

dns { 
  resolve => [ ...Continue reading

Elasticsearch, Kibana and duplicate keys in JSON

Posted on August 3, 2017 by Tom

JSON has been the lingua franca of data exchange for many years. It's human-readable, lightweight and widely supported. However, the JSON spec does not define what parsers should do when they encounter a duplicate key in an object, e.g.:

{
  "foo": "spam",
  "foo": "eggs",
  ...
}

Implementations are free to interpret this how they like. When different systems have different interpretations this can cause problems. We recently encounter...Continue reading

Announcing our new book, Searching the Enterprise

Posted on July 26, 2017 by Charlie Hull

For the last year or so I've been working with Professor Udo Kruschwitz of the University of Essex on a long-form journal article on enterprise search - although at 156 pages this is more of a book than a journal. Released as part of the Foundations and Trends® in Information Retrieval series by Now Publishing, the b...Continue reading

A lack of cognition and some fresh FUD from Forrester

Posted on June 14, 2017 by Charlie Hull

Last night the estimable Martin White, intranet and enterprise search expert and author of many books on the subject, flagged up two surprising articles from Forrester who have declared that Cognitive Search (we'll define this using their own terms in...Continue reading

London Lucene/Solr Meetup: Query Pre-processing & SQL with Solr

Posted on June 2, 2017 by Charlie Hull

Bloomberg kindly hosted the London Lucene/Solr Meetup last night and we were lucky enough to have two excellent speakers for the thirty or so attendees. René Kriegler kicked off with a talk about the Continue reading

ECIR 2017 Industry Day, our book & a demo of live TV factchecking

Posted on April 24, 2017 by Charlie Hull

I visited Aberdeen before Easter to speak at Industry Day, a part of the European Conference on Information Retrieval. Following a reception at Aberdeen's Town House (a wonderful building) hosted by the Lord Provost I spent an evening with various information retrieval luminaries including Professor Udo Kruschwitz of the University of Essex. We had a chance to discuss the book we're co-authoring (draft title 'Searching the Enterprise', designed as a review of t...Continue reading