stored search – Flax http://www.flax.co.uk The Open Source Search Specialists Thu, 10 Oct 2019 09:03:26 +0000 en-GB hourly 1 https://wordpress.org/?v=4.9.8 Elastic London Meetup: Rightmove & Signal Media and a new free security plugin for Elasticsearch http://www.flax.co.uk/blog/2017/09/28/elastic-london-meetup-rightmove-signal-media-new-free-security-plugin-elasticsearch/ http://www.flax.co.uk/blog/2017/09/28/elastic-london-meetup-rightmove-signal-media-new-free-security-plugin-elasticsearch/#respond Thu, 28 Sep 2017 08:44:26 +0000 http://www.flax.co.uk/?p=3613 I finally made it to a London Elastic Meetup again after missing a few of the recent events: this time Rightmove were the hosts and the first speakers. They described how they had used Elasticsearch Percolator to run 3.5 million … More

The post Elastic London Meetup: Rightmove & Signal Media and a new free security plugin for Elasticsearch appeared first on Flax.

]]>
I finally made it to a London Elastic Meetup again after missing a few of the recent events: this time Rightmove were the hosts and the first speakers. They described how they had used Elasticsearch Percolator to run 3.5 million stored searches on new property listings as part of an overall migration from the Exalead search engine and Oracle database to a new stack based on Elasticsearch, Apache Kafka and CouchDB. After creating a proof-of-concept system on Amazon’s cloud they discovered that simply running all 3.5m Percolator queries every time a new property appeared would be too slow and thus implemented a series of filters to cut down the number of queries applied, including filtering out rental properties and those in the wrong location. They are now running around 40m saved searches per day and also plan to upgrade from their current Elasticsearch 2.4 system to the newer version 5, as well as carry out further performance improvements. After the talk I chatted to the presenter George Theofanous about our work for Bloomberg using our own library Luwak, which could be an way for Rightmove to run stored searches much more efficiently.

Next up was Signal Media, describing how they built an automated system for upgrading Elasticsearch after their cluster grew to over 60 nodes (they ingest a million articles a day and up to May 2016 were running on Elasticsearch 1.5 which had a number of issues with stability and performance). To avoid having to competely shut down and upgrade their cluster, Joachim Draeger described how they carried out major version upgrades by creating a new, parallel cluster (he named this the ‘blue/green’ method), with their indexing pipeline supplying both clusters and their UI code being gradually switched over to the new cluster once stability and performance were verified. This process has cut their cluster to only 23 nodes with a 50% cost saving and many performance and stability benefits. For ongoing minor version changes they have built an automated rolling upgrade system using two Amazon EBS volumes for each node (one is for the system, and is simply switched off as a node is disabled, the other is data and is re-attached to a new node once it is created with the upgraded Elasticsearch machine image). With careful monitoring of cluster stability and (of course) testing, this system enables them to upgrade their entire production cluster in a safe and reliable way without affecting their customers.

After the talks I announced the Search Industry Awards I’ll be helping to judge in November (please apply if you have a suitable search project or innovation!) and then spoke to Simone Scarduzio about his free Elasticsearch and Kibana security plugin, a great alternative to the Elastic X-Pack (only available to Elastic subscription customers). We’ll certainly be taking a deeper look at this plugin for our own clients.

Thanks again to Yann Cluchey for organising the event and all the speakers and hosts.

The post Elastic London Meetup: Rightmove & Signal Media and a new free security plugin for Elasticsearch appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2017/09/28/elastic-london-meetup-rightmove-signal-media-new-free-security-plugin-elasticsearch/feed/ 0
Out with the old – and in with the new Lucene query parser? http://www.flax.co.uk/blog/2016/05/13/old-new-query-parser/ http://www.flax.co.uk/blog/2016/05/13/old-new-query-parser/#respond Fri, 13 May 2016 12:41:52 +0000 http://www.flax.co.uk/?p=3273 Over the years we’ve dealt with quite a few migration projects where the query syntax of the client’s existing search engine must be preserved. This might be because other systems (or users) depend on it, or a large number of … More

The post Out with the old – and in with the new Lucene query parser? appeared first on Flax.

]]>
Over the years we’ve dealt with quite a few migration projects where the query syntax of the client’s existing search engine must be preserved. This might be because other systems (or users) depend on it, or a large number of stored expressions exist and it is difficult or uneconomic to translate them all by hand. Our usual approach is to write a query parser, which understands the current syntax but creates a query suitable for a modern open source search engine based on Apache Lucene. We’ve done this for legacy engines including dtSearch and Verity and also for in-house query languages developed by clients themselves. This allows you to keep the existing syntax but improve performance, scalability and accuracy of your search engine.

There are a few points to note during this process:

  • What appears to be a simple query in your current language may not translate to a simple Lucene query, which may lead to performance issues if you are not careful. Wildcards for example can be very expensive to process.
  • You cannot guarantee that the new search system will return exactly the same results, in the same order, as the old one, no matter how carefully the query parser is designed. After all, the underlying search engine algorithms are different.
  • Some element of manual translation may be necessary for particularly large, complex or unusual queries, especially if the original intention of the person who wrote the query is unclear.
  • You may want to create a vendor-neutral query language as an intermediate step – so you can migrate more easily next time. We’ve done this for Danish media monitors Infomedia.
  • If you have particularly large and/or complex queries that may have been added to incrementally over time, they may contain errors or logistical inconsistencies – which your current engine may not be telling you about! If you find these you have two choices: fix the query expression (which may then give you slightly different results) or make the new system give the same (incorrect) results as before.

To mitigate these issues it is important to decide on a test set of queries and expected results, and what level of ‘correctness’ is required – bearing in mind 100% is going to be difficult if not impossible. If you are dealing with languages outside the experience of the team you should also make sure you have access to a native speaker – so you can be sure that results really are relevant!

Do let us know if you’re planning this kind of migration and how we can help – building Lucene query parsers is not a simple task and some past experience can be invaluable.

The post Out with the old – and in with the new Lucene query parser? appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2016/05/13/old-new-query-parser/feed/ 0
Luwak 1.3.0 released http://www.flax.co.uk/blog/2015/11/17/luwak-1-3-0-released/ http://www.flax.co.uk/blog/2015/11/17/luwak-1-3-0-released/#respond Tue, 17 Nov 2015 14:13:42 +0000 http://www.flax.co.uk/?p=2802 The latest version of Luwak, our open-source streaming query engine, has been released on the Sonatype Nexus repository and will be making its way to Maven Central in the next few hours.  Here’s a summary of the new features and … More

The post Luwak 1.3.0 released appeared first on Flax.

]]>
The latest version of Luwak, our open-source streaming query engine, has been released on the Sonatype Nexus repository and will be making its way to Maven Central in the next few hours.  Here’s a summary of the new features and improvements we’ve made:

Batch processing

Inspired by a question raised during our talk at FOSDEM last February, you can now stream documents through the Luwak Monitor in batches, as well as one-at-a-time. This will generally improve your throughput, at the cost of a drop in latency. For example, local benchmarking against a set of 10,000 queries showed an improvement from 10 documents/second to 30 documents/second when the batch size was increased from 1 document to 30 documents; however, processing latency went from ~100ms for the single document to 10 seconds for the larger batch. You’ll need to experiment with batch sizes to find the right balance for your own use.

Presearcher performance improvements

Luwak speeds up document matching by filtering out queries that we can detect won’t match a given document or batch, a process we call presearching. Profiling revealed that creating the presearcher query was a serious performance bottleneck, particularly for presearchers using the WildcardNGramPresearcherComponent, so this has been largely rewritten in 1.3.0. We’ve seen improvements of up to 400% in query build times after this rewrite.

Concurrent query loading

Luwak now ships with a ConcurrentQueryLoader helper class to help speed up Monitor startup. The loader uses multiple threads to add queries to the index, allowing you to make use of all your CPUs when parsing and analyzing queries. Note that this requires your MonitorQueryParser implementations to be thread-safe!

Easier configuration and state monitoring

In 1.2.0 and earlier, clients had to extend the Monitor itself in order to configure the internal query caches or get state update information. Configuration has now been extracted into a QueryIndexConfiguration class, passed to the Monitor at construction, and you can get notified about updates to the query index by registering QueryIndexUpdateListeners.

For more information, see the CHANGES for 1.3.0. We’ll also be re-running the comparison with Elasticsearch Percolator soon, as this has also been improved as part of Elasticsearch’s recent 2.0 release.

The post Luwak 1.3.0 released appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2015/11/17/luwak-1-3-0-released/feed/ 0