Xapian compared

Vik Singh has been comparing various open source solutions for search. He only spent a weekend performing the comparison, which is probably not enough time to get any search software performing at its best, and his results reflect this. Xapian was marked down for being slow at indexing (he says 5x slower than SQLite - but then again, SQLite isn't a search engine, it's a RDBMS, and...Continue reading

Flax Search Service alpha release

The Flax team are pleased to announce the alpha release of Flax Search Service (FSS). FSS combines powerful, high-level indexing and search features with a well-designed Web Services interface. FSS is Open Source software (under the MIT licence) and is available as a free download from Google Code. Web Services and Service Oriented Architectures (SOA) have become increasingly popular in recent years due to their many advantages. FSS provides...Continue reading

Distributed search and partition functions

For most applications, Xapian/Flax's search performance will be excellent to acceptable on a single machine of reasonable spec (see here for a discussion of CPU and RAM requirements). However, if the document corpus is unusually large - more than about 20 million items - then one server may not be enough for acceptable speed. Xapian provides a mechanism called remote backends which lets the load be shared ov...Continue reading

Flax stack and pre-built binaries

We've updated the Flax website with a page showing the Flax software stack - hopefully this will go some way towards explaining how Xapian, Xappy and parts of Flax all fit together. There's still lots in development so expect some more news later this month. As part of this, we've created a new page bringing together all the Win32 files for Xapian that we maintain - including some pre-built binaries for those of you who don't want to compile Xapian yourself. We're working on creating one-clic...Continue reading

Xapian Search Architecture

This is not strictly a Flax post, but is intended to clarify the Xapian search architecture for people using Xapian directly. It's not intended for experienced Xapian hackers, neither is it a general introduction to using Xapian (see here instead). The Xapian API is fairly complex, and there is often confusion about the role of the QueryParser, terms, document values, document data etc. in indexing and searching. It is probably worth pointi...Continue reading

More on performance metrics

Anurag Goel recently carried out a comparitive test of Xapian/Flax and Lucene/Solr. Some interesting results here: it seems Lucene is faster at building indexes, but Xapian is faster and possibly more accurate at searching. We can expect some further speed improvements over the next few months as a new, more compact backend to Xapian is released. By the way, the article mentions Xappy: this is a Python interface to Xapian that is a major part of our Flax enterprise search platform. You can ge...Continue reading

Image searching

Searching images is a difficult problem, and it's not a feature offered by many commercial search engines. Some will cheat slightly, by indexing the title or filename of the image, or the text surrounding an image embedded on a page, and call this 'image search' - but this method doesn't work very well, especially when you have a standalone image called 'IMG0000064.jpg' which is actually a picture of an apple. We've seen some good demos of actual image search - I...Continue reading

Open source data integration and file format translation

One of the challenges we often come up against is indexing data held in other proprietary or open source systems, such as databases or content management systems. Talend is an open source data integration platform that lets you connect to a huge variety of these systems, from Salesforce to Oracle to SugarCRM. Talend is an offshoot of the Continue reading