elastic – Flax http://www.flax.co.uk The Open Source Search Specialists Thu, 10 Oct 2019 09:03:26 +0000 en-GB hourly 1 https://wordpress.org/?v=4.9.8 When even the commercial vendors are using it, has open source search won? http://www.flax.co.uk/blog/2018/03/15/even-commercial-vendors-using-open-source-search-won/ http://www.flax.co.uk/blog/2018/03/15/even-commercial-vendors-using-open-source-search-won/#respond Thu, 15 Mar 2018 12:03:32 +0000 http://www.flax.co.uk/?p=3718 There have been some interesting announcements recently which may point to an increasing realisation amongst commercial search firms that an open source model is an essential advantage in today’s search market. Coveo have announced that their enterprise search engine can … More

The post When even the commercial vendors are using it, has open source search won? appeared first on Flax.

]]>
There have been some interesting announcements recently which may point to an increasing realisation amongst commercial search firms that an open source model is an essential advantage in today’s search market. Coveo have announced that their enterprise search engine can run on an Elasticsearch core, an interesting move for a previously decidedly closed source company. BA Insight, who have previously provided extensions and enhancements for Microsoft’s decidedly closed-source Sharepoint search facility, have been offering Elasticsearch as a core search engine for quite a while. It is also an open secret that some other commercial search firms (such as Attivio) use Apache Lucene as a core technology.

The commercial search firms will have noticed that Lucidworks (who employ a large proportion of Lucene/Solr committers) have announced Lucidworks Fusion 4, which can be used for site and enterprise search. Elastic, the company behind Elasticsearch, recently acquired Swiftype and have repositioned it as a packaged site search engine (with an enterprise search version in beta and rumoured to appear later this year). Both Lucidworks and Elastic are thus attempting to capture a larger segment of the search market, using their dominance and expertise in the open source world. Note however that all these products are ‘open core’ rather than ‘open source’ (despite Elastic’s attempts to pretend otherwise) – which is not very different from Coveo or BA Insight’s approach – so the distance between the traditonally separate ‘open source’ and ‘closed source’ search vendors is now closing.

The question for any search vendor should be whether there is any point developing and maintaining a closed source search engine core, when Lucene derivatives such as Solr and Elasticsearch are so well established. The race between closed and open source is perhaps over.

Here at Flax we’ve been building open source search engines since 2001 and we’re independent of any vendor – so if you need help with your search project, do let us know.

Note: Enterprise Search is usually defined as a search engine working behind a corporate firewall, indexing different content sources such as flat files, databases and intranets. Site Search is usually visible to non-employees and only indexes websites. However, when site search includes an intranet the boundary becomes a little fuzzy – is this lightweight enterprise search? In most cases this doesn’t hugely matter – the underlying search engine core will be the same, it’s simply a difference in where source data comes from and how it is presented to users. However, these two options are often presented as different products by vendors.

UPDATE: A few days after I posted this blog, commercial vendor Attivio released SUIT, an open source user interface library that can run on their own engine, Elasticsearch or Solr. It seems the trend continues.

The post When even the commercial vendors are using it, has open source search won? appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2018/03/15/even-commercial-vendors-using-open-source-search-won/feed/ 0
No, Elastic X-Pack is not going to be open source – according to Elastic themselves http://www.flax.co.uk/blog/2018/03/02/no-elastic-x-pack-not-going-open-source-according-elastic/ http://www.flax.co.uk/blog/2018/03/02/no-elastic-x-pack-not-going-open-source-according-elastic/#comments Fri, 02 Mar 2018 14:47:49 +0000 http://www.flax.co.uk/?p=3709 Elastic are the company founded by the creator of Elasticsearch, Shay Banon. At this time of year they have their annual Elasticon conference in San Francisco and as you might expect a lot of announcements are made during the week of the conference. The major ones to appear this time are that Swiftype, which Elastic acquired last year, has reappeared as Elastic Site Search and that Elastic are opening the code for their commercial X-Pack features. More

The post No, Elastic X-Pack is not going to be open source – according to Elastic themselves appeared first on Flax.

]]>
Elastic are the company founded by the creator of Elasticsearch, Shay Banon. At this time of year they have their annual Elasticon conference in San Francisco and as you might expect a lot of announcements are made during the week of the conference. The major ones to appear this time are that Swiftype, which Elastic acquired last year, has reappeared as Elastic Site Search and that Elastic are opening the code for their commercial X-Pack features.

Shay Banon is always keen to relate how Elasticsearch started as open source and will remain true to that heritage, which is always encouraging to hear. However it’s unfortunate to note that the announcement has been reported by many as ‘X-Pack is now open source’ – and the truth is a little more complicated than that.

Firstly, let’s look at the Elasticsearch core code itself. Yes, this is open source under the Apache 2 license, so you can download it, modify it, fork it, even incorporate it into your own products if you like. However most people would like to keep up with the latest and greatest developments so they’ll want to stick with the ‘official’ stream of updates, and what goes into this is entirely up to Elastic employees as they are the only ones allowed to commit to the codebase. Some measure of control of an open source project is essential of course, but this is certainly not ‘open development’ even though it is ‘open source’. Compare this to Apache Lucene/Solr, where those that are allowed to commit code to the official releases are from a wide variety of organisations (and elected as committers by merit, by a group of other longstanding committers). This distinction is important but makes little difference to most adopters.

Elastic have also for some years produced commercial, closed-source software in addition to Elasticsearch – which they call the X-Pack. To use this code you have to license it, although for some of the features the license is free. The announcement this week is that the source code for the X-Pack will be open and available to read under a Elastic license (which hasn’t yet been made available). As Doug Turnbull of our partner company Open Source Connections writes Be careful: The ‘open source’ Elastic XPack is very different than what most think of as ‘open source'”. To use some of these features you have the source code for in production, you will still need to pay Elastic for a license. If you spot a problem in the source code and submit a patch, you still may end up paying Elastic for the privilege of running it. This is an ‘open core’ model, where the further you move away from the core, the less open and free things become – and as Shay writes this is a key part of their business model.

The final word on this comes from Elastic’s own FAQ on the X-Pack: ” Open Source licensing maintains a strict definition from the Open Source Initiative (OSI). As of 6.3, the X-Pack code will be opened under an Elastic EULA. However, it will not be ‘Open Source’ as it will not be covered by an OSI approved license. “. It’s a shame that this hasn’t been accurately reported.

If you are considering open source search software for your project, contact us for independent and honest advice. We’ve been building open source search applications since 2001.

The post No, Elastic X-Pack is not going to be open source – according to Elastic themselves appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2018/03/02/no-elastic-x-pack-not-going-open-source-according-elastic/feed/ 2
Elastic acquires Swiftype and broadens its offering to include enterprise search http://www.flax.co.uk/blog/2017/11/09/elastic-acquires-swiftype-broadens-offering/ http://www.flax.co.uk/blog/2017/11/09/elastic-acquires-swiftype-broadens-offering/#respond Thu, 09 Nov 2017 16:12:09 +0000 http://www.flax.co.uk/?p=3635 The news today that Elastic (the company behind the open source Elasticsearch software) has acquired Swiftype will have surprised a few people, even though Elastic has already acquired a good number of other companies. Swiftype have a couple of products … More

The post Elastic acquires Swiftype and broadens its offering to include enterprise search appeared first on Flax.

]]>
The news today that Elastic (the company behind the open source Elasticsearch software) has acquired Swiftype will have surprised a few people, even though Elastic has already acquired a good number of other companies. Swiftype have a couple of products that deliver cloud-based site and enterprise search and under the hood, all of this is built on Elasticsearch.  Swiftype are part of a new breed of enterprise search companies – often based on open source cores (such as Lucidworks & Attivio), able to index cloud applications and data and with modern, clean, responsive user interfaces.

The same problems remain however with making enterprise search work in practise: data locked in hard to access legacy systems, low-quality content and metadata, unrealistic expectations driven by over-optimistic marketing and most importantly the various people factors that affect all cross-departmental large-scale IT systems. No matter how clever the software, without the right people with the right training it’s very hard to deliver effective search.

It remains to be seen how the acquisition will change the enterprise search market – Elastic certainly have significant funding and admirable ambition – which in itself is probably enough to worry a few of Swiftype’s competitors.

 

The post Elastic acquires Swiftype and broadens its offering to include enterprise search appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2017/11/09/elastic-acquires-swiftype-broadens-offering/feed/ 0
Better performance with the Logstash DNS filter http://www.flax.co.uk/blog/2017/08/17/better-performance-logstash-dns-filter/ http://www.flax.co.uk/blog/2017/08/17/better-performance-logstash-dns-filter/#comments Thu, 17 Aug 2017 15:45:58 +0000 http://www.flax.co.uk/?p=3591 We’ve been working on a project for a customer which uses Logstash to read messages from Kafka and write them to Elasticsearch. It also parses the messages into fields, and depending on the content type does DNS lookups (both forward and … More

The post Better performance with the Logstash DNS filter appeared first on Flax.

]]>
We’ve been working on a project for a customer which uses Logstash to read messages from Kafka and write them to Elasticsearch. It also parses the messages into fields, and depending on the content type does DNS lookups (both forward and reverse.)

While performance testing I noticed that adding caching to the Logstash DNS filter actually reduced performance, contrary to expectations. With four filter worker threads, and the following configuration:

dns { 
  resolve => [ "Source_IP" ] 
  action => "replace" 
  hit_cache_size => 8000 
  hit_cache_ttl => 300 
  failed_cache_size => 1000 
  failed_cache_ttl => 10
}

the maximum throughput was only 600 messages/s, as opposed to 1000 messages/s with no caching (4000/s with no DNS lookup at all).

This was very odd, so I looked at the source code. Here is the DNS lookup when a cache is configured:

address = @hitcache.getset(raw) { retriable_getaddress(raw) }

This executes retriable_getaddress(raw) inside the getset() cache method, which is synchronised. Therefore, concurrent DNS lookups are impossible when a cache is used.

To see if this was the problem, I created a fork of the dns filter which does not synchronise the retriable_getaddress() call.

 address = @hit_cache[raw]
 if address.nil?
   address = retriable_getaddress(raw)
   unless address.nil?
     @hit_cache[raw] = address
   end
 end

Tests on the same data revealed a throughput of nearly 2000 messages/s with four worker threads (and 2600 with eight threads), which is a significant improvement.

This filter has the disadvantage that it might redundantly look up the same address multiple times, if the same domain name/IP address turns up in several worker threads simultaneously (but the risk of this is probably pretty low, depending on the input data, and in any case it’s harmless.)

I have released a gem of the plugin if you want to try it. Comments appreciated.

The post Better performance with the Logstash DNS filter appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2017/08/17/better-performance-logstash-dns-filter/feed/ 2
Elasticsearch London Meetup – Exploring the Graph API & SearchKit UI components http://www.flax.co.uk/blog/2016/03/24/elasticsearch-london-meetup-exploring-graph-api-searchkit-ui-components/ http://www.flax.co.uk/blog/2016/03/24/elasticsearch-london-meetup-exploring-graph-api-searchkit-ui-components/#respond Thu, 24 Mar 2016 11:14:44 +0000 http://www.flax.co.uk/?p=3156 This month’s Elasticsearch Meetup was hosted by Argos at their Victoria Digital Hub with a relatively small crowd this time – I suspect quite a few who registered didn’t actually turn up or release their tickets, which is a shame … More

The post Elasticsearch London Meetup – Exploring the Graph API & SearchKit UI components appeared first on Flax.

]]>
This month’s Elasticsearch Meetup was hosted by Argos at their Victoria Digital Hub with a relatively small crowd this time – I suspect quite a few who registered didn’t actually turn up or release their tickets, which is a shame as there was a waiting list.

Mark Harwood of Elastic was first with a talk about the new Graph API and visualisation components, which will shortly be available to Elastic subscription customers. Mark’s talks are always fascinating and entertaining and this one was no exception, covering how to derive network graphs from data in Elasticsearch and discover how indexed items are connected. Using publically available data he showed us how a Swedish metal band had proportionally more listeners in Finland than in Sweden (and how many bands of this genre seem to be named after unpleasant medical conditions), how clickthrough data can reveal who is buying food mixers and who is buying audio mixers and amusingly how a mysterious person called ‘Ravi’ has registered for hundreds of different Meetup events without attending a single one (as far as we know). Building on the significant terms aggregation, these graph features are a powerful tool for discovery (especially in a forensics context) of real and unexpected connections within your data.

Siavash and Joseph from TenEleven then showed us their component library for building Elasticsearch user interfaces, SearchKit. Based on React and allows one to “rapidly create beautiful search applications using declarative components, and without being an ElasticSearch expert.” They showed us a range of impressive demos with search interfaces created with only a few lines of configuration. SearchKit is open source under the Apache 2 license and they have seen huge interest – as of today the project has attracted over 1500 stars on Github! We’ll certainly be considering SearchKit for future Elasticsearch projects and we think the project has a bright future.

The evening ended with a Q&A session – thanks to our hosts Argos and both speakers, see you next time!

The post Elasticsearch London Meetup – Exploring the Graph API & SearchKit UI components appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2016/03/24/elasticsearch-london-meetup-exploring-graph-api-searchkit-ui-components/feed/ 0
Elastic London User Group Meetup – scaling with Kafka and Cassandra http://www.flax.co.uk/blog/2015/03/26/elastic-london-user-group-meetup-scaling-with-kafka-and-cassandra/ http://www.flax.co.uk/blog/2015/03/26/elastic-london-user-group-meetup-scaling-with-kafka-and-cassandra/#respond Thu, 26 Mar 2015 10:41:02 +0000 http://www.flax.co.uk/blog/?p=1421 The Elastic London User Group Meetup this week was slightly unusual in that the talks focussed not so much on Elasticsearch but rather on how to scale the systems around it using other technologies. First up was Paul Stack with … More

The post Elastic London User Group Meetup – scaling with Kafka and Cassandra appeared first on Flax.

]]>
The Elastic London User Group Meetup this week was slightly unusual in that the talks focussed not so much on Elasticsearch but rather on how to scale the systems around it using other technologies. First up was Paul Stack with an amusing description of how he had worked on scaling the logging infrastructure for a major restaurant booking website, to cope with hundreds of millions of messages a day across up to 6 datacentres. Moving from an original architecture based on SQL and ASP.NET, they started by using Redis as a queue and Logstash to feed the logs to Elasticsearch. Further instances of Logstash were added to glue other parts of the system together but Redis proved unable to handle this volume of data reliably and a new architecture was developed based on Apache Kafka, a highly scalable message passing platform originally built at LinkedIn. Kafka proved very good at retaining data even under fault conditions. He continued with a description of how the Kafka architecture was further modified (not entirely successfully) and how monitoring systems based on Nagios and Graphite were developed for both the Kafka and Elasticsearch nodes (with the infamous split brain problem being one condition to be watched for). Although the project had its problems, the system did manage to cope with 840 million messages one Valentine’s day, which is impressive. Paul concluded that although scaling to this level is undeniably hard, Kafka was a good technology choice. Some of his software is available as open source.

Next, Jamie Turner of PostcodeAnywhere described in general terms how they had used Apache Cassandra and Apache Spark to build a scalable architecture for logging interactions with their service, so they could learn about and improve customer experiences. They explored many different options for their database, including MySQL and MongoDB (regarding Mongo, Jamie raised a laugh with ‘bless them, they do try’) before settling on Cassandra which does seem to be a popular choice for a rock-solid distributed database. As PostcodeAnywhere are a Windows house, the availability and performance of .Net compatible clients was key and luckily they have had a good experience with the NEST client for Elasticsearch. Although light on technical detail, Jamie did mention how they use Markov chains to model customer experiences.

After a short break for snacks and beer we returned for a Q&A with Elastic team members: one interesting announcement was that there will be a Elastic(on) in Europe some time this year (if anyone from the Elastic team is reading this please try and avoid a clash with Enterprise Search Europe on October 20th/21st!). Thanks as ever to Yann Cluchey for organising the event and to open source recruiters eSynergySolutions for sponsoring the venue and refreshments.

The post Elastic London User Group Meetup – scaling with Kafka and Cassandra appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2015/03/26/elastic-london-user-group-meetup-scaling-with-kafka-and-cassandra/feed/ 0