Searching for IP addresses in text with Elasticsearch

We recently implemented a search solution for a customer using Elasticsearch. Most of their requirements were fairly standard, however they also wanted to be able to search for IP addresses embedded in the document text, using a flexible and precise search syntax, e.g. given the following document fragment:

    ... the API can be accessed at 167.87.3.201 on port 8700 ...

the following searches should all find the document:

...Continue reading

Updating individual fields in Lucene with a Redis-backed codec

A customer of ours has a potential search application which requires (largely for reasons of performance) the ability to update specific individual fields of Apache Lucene documents. This is not the first time that someone has asked for this functionality. However, until now, it has been impossible to change field values in a Lucene document without re-indexing the...Continue reading

Better search for e-petitions – handling misspelled content with a Solr phonetic filter

We recently overhauled the search functionality for the UK government's e-petitions site, run by the Government Digital Service, a new team within the Cabinet Office. Search has an important function on the site; users are forced to search for existing petitions which cover their area of concern before creating a new one. This cuts down on the number of near-duplicate petitions, and makes petitions ...Continue reading

Flax Search Service alpha release

The Flax team are pleased to announce the alpha release of Flax Search Service (FSS). FSS combines powerful, high-level indexing and search features with a well-designed Web Services interface. FSS is Open Source software (under the MIT licence) and is available as a free download from Google Code. Web Services and Service Oriented Architectures (SOA) have become increasingly popular in recent years due to their many advantages. FSS provides...Continue reading

Distributed search and partition functions

For most applications, Xapian/Flax's search performance will be excellent to acceptable on a single machine of reasonable spec (see here for a discussion of CPU and RAM requirements). However, if the document corpus is unusually large - more than about 20 million items - then one server may not be enough for acceptable speed. Xapian provides a mechanism called remote backends which lets the load be shared ov...Continue reading

Xapian Search Architecture

This is not strictly a Flax post, but is intended to clarify the Xapian search architecture for people using Xapian directly. It's not intended for experienced Xapian hackers, neither is it a general introduction to using Xapian (see here instead). The Xapian API is fairly complex, and there is often confusion about the role of the QueryParser, terms, document values, document data etc. in indexing and searching. It is probably worth pointi...Continue reading