NoSQL – Flax http://www.flax.co.uk The Open Source Search Specialists Thu, 10 Oct 2019 09:03:26 +0000 en-GB hourly 1 https://wordpress.org/?v=4.9.8 Elasticsearch Meetup – Spark, postcodes and Couchbase http://www.flax.co.uk/blog/2015/09/25/elasticsearch-meetup-spark-postcodes-and-couchbase/ http://www.flax.co.uk/blog/2015/09/25/elasticsearch-meetup-spark-postcodes-and-couchbase/#respond Fri, 25 Sep 2015 13:30:52 +0000 http://www.flax.co.uk/?p=2669 Three speakers for this month’s Elasticsearch Meetup (slides now up), kindly hosted by JustEat’s technical department. Neil Andrassy kicked us off with a talk about how TheFilter (which you may know counts Peter Gabriel as an investor) use Apache Spark … More

The post Elasticsearch Meetup – Spark, postcodes and Couchbase appeared first on Flax.

]]>
Three speakers for this month’s Elasticsearch Meetup (slides now up), kindly hosted by JustEat’s technical department. Neil Andrassy kicked us off with a talk about how TheFilter (which you may know counts Peter Gabriel as an investor) use Apache Spark to load data into their Elasticsearch cluster. Neil described how Spark and Elasticsearch have superseded both Microsoft SQL and MongoDB – Spark in particular being described as ‘speedy, flexible and componentized’, with Spark’s RDD (Resilient Distributed Datasets) mapping cleanly to Elasticsearch shards. He then showed a demo of UK road accident data being imported into Spark as CSV files, indexed automatically in Elasticsearch and then queried both using Elasticsearch and by Spark’s SQL-like facility. Interestingly, this allows a powerful combination of free text search and relational JOINs to be applied to data in a highly scalable fashion – Spark also features machine learning and streaming data components.

After a quick plug for ElastiCON in London in November, Matt Jones of JustEat described how they have used Elasticsearch’s geolocation search function to improve their handling of restaurant delivery areas. Their previous system only handled the first part of postcodes (e.g ‘SE1’) and they needed finer-grained control of the areas that restaurants were able to deliver to. By indexing polygons representing UK postcode areas and combining these with custom shapes (i.e. a circle representing a maximum delivery distance) they have created a powerful and extendable way to restrict search results. Matt has blogged about this in more detail.

The last talk was by Tom Green of Couchbase, who described how this powerful NoSQL platform is architected and how it can be connected directly to Elasticsearch using its own Cross Data Centre Replication (XDCR) feature. We finished with the usual Q&A during which Mark Harwood responded to my own question on exact facet counts in Elasticsearch with a plea to the industry to be more honest about the limitations of distributed systems – much like the CAP theorem, perhaps we need a similar triangle with vertices of Big Data, Speed and Accuracy – pick two!

Thanks as ever to all the speakers and the hosts, and to Yann Cluchey for organising the Meetup.

The post Elasticsearch Meetup – Spark, postcodes and Couchbase appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2015/09/25/elasticsearch-meetup-spark-postcodes-and-couchbase/feed/ 0
Cambridge Search Meetup – Cassandra & Solr http://www.flax.co.uk/blog/2014/05/15/cambridge-search-meetup-cassandra-solr/ http://www.flax.co.uk/blog/2014/05/15/cambridge-search-meetup-cassandra-solr/#respond Thu, 15 May 2014 07:32:23 +0000 http://www.flax.co.uk/blog/?p=1202 A sunny evening last night for the latest Cambridge Search Meetup, which featured a couple of talks from Datastax on the highly scalable NoSQL database Apache Cassandra and how it is integrated with Apache Lucene/Solr. Jeremy Hanna started us off … More

The post Cambridge Search Meetup – Cassandra & Solr appeared first on Flax.

]]>
A sunny evening last night for the latest Cambridge Search Meetup, which featured a couple of talks from Datastax on the highly scalable NoSQL database Apache Cassandra and how it is integrated with Apache Lucene/Solr. Jeremy Hanna started us off with a brief history of the Facebook-incubated Cassandra, which is a fully distributed, highly reliable system used by many including Netflix and Spotify with some customers running thousands of nodes in multiple data centres. Cassandra has its own SQL-like language, CQL3 and some basic collections such as Lists and Maps, but due to its fully distributed nature does lack some traditional features such as JOINs. Datastax themselves are now responsible for most of the ongoing work on Cassandra and offer the usual array of training, support, management services and tools. One common application mentioned was high speed and reliable recording of sensor data, increasingly important now with the rise of the Internet of Things.

After a short break for drinks and snacks (which this time were kindly sponsored by Datastax) Sergio Bossa told us how Solr is integrated with Cassandra, also running in a distributed fashion. Interestingly, this integration doesn’t use the same Zookeeper system as SolrCloud (the standard way to run clusters of Solr servers) but relies instead on Cassandra’s own internal scaling systems, passing data about using ‘gossip‘ between nodes. Zookeeper is not always the easiest thing to get running so an alternative is very interesting! Data can be added to the system over HTTP or the aforementioned CQL3 and after being entered into Cassandra’s tables is subsequently indexed by Solr. Queries can then be made over HTTP as usual. Some work is still necessary to prevent duplication of effort (at present one needs to create data structures in Cassandra and subsequently in Solr).

It was pleasing so see that so much care has been taken with this integration process and also that Datastax offer their Datastax Enterprise Search stack not only free for non-production use, but free to startups. Thanks to Jeremy, Sergio and all who came along and we’ll be back with another Search Meetup soon.

The post Cambridge Search Meetup – Cassandra & Solr appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2014/05/15/cambridge-search-meetup-cassandra-solr/feed/ 0
Cambridge Search Meetup – Search for publication success and low-cost apps http://www.flax.co.uk/blog/2012/10/18/cambridge-search-meetup-search-for-publication-success-and-low-cost-apps/ http://www.flax.co.uk/blog/2012/10/18/cambridge-search-meetup-search-for-publication-success-and-low-cost-apps/#respond Thu, 18 Oct 2012 09:45:45 +0000 http://www.flax.co.uk/blog/?p=878 After a short break the Cambridge Search Meetup returned last night with our usual mix of presentations, questions, networking, beer and snacks. We had a few issues with the projector and cables (one of these is on the shopping list … More

The post Cambridge Search Meetup – Search for publication success and low-cost apps appeared first on Flax.

]]>
After a short break the Cambridge Search Meetup returned last night with our usual mix of presentations, questions, networking, beer and snacks. We had a few issues with the projector and cables (one of these is on the shopping list for next time) so thanks to both presenters and audience for their patience!

First up was Liang Shen with a description of Journal Selector, a system for helping those publishing academic papers to find the correct journals to approach. The system allows one to copy and paste a chunk of a paper to a website and find which journals best match the subject matter, based on what they have published in the past. Running on the Amazon EC2 cloud the service indexes journals from feeds, HTML webpages and other sources, processes and stores this data in Amazon’s Hadoop-compatible database, indexes it with Apache Solr and then presents the results via the Drupal CMS. The results are impressive, allowing users to see exactly on what basis the system has recommended a journal to approach. You can see the presentation slides here.

Next was Rich Marr, who bravely offered to live-code a demonstration of his low-cost prototyping methodology for startups needing both NoSQL data storage and search across this data. In only 20 lines or so of code he showed us how to use Node.js to build a simple server that could accept messages (over Telnet, although HTTP or even IMAP would be as easy), store them in a CouchDB database and index them for searching (using a different message) with Elasticsearch. Rich’s demo prompted a lively discussion of how commoditized and componentized search technology is becoming, with open source components that allow one to build a prototype search engine in minutes.

Thanks to both our speakers – and the Meetups continue, with Rich Marr’s own London Open Source Search Social meeting on Tuesday 23rd October, and in Cambridge the Data Insights Meetup where I’ll be talking on November 1st.

The post Cambridge Search Meetup – Search for publication success and low-cost apps appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2012/10/18/cambridge-search-meetup-search-for-publication-success-and-low-cost-apps/feed/ 0
London Enterprise Search Meetup – Databases vs. Search and Taxonomies http://www.flax.co.uk/blog/2011/04/14/london-enterprise-search-meetup-databases-vs-search-and-taxonomies/ http://www.flax.co.uk/blog/2011/04/14/london-enterprise-search-meetup-databases-vs-search-and-taxonomies/#comments Thu, 14 Apr 2011 08:45:47 +0000 http://www.flax.co.uk/blog/?p=546 Back to London for the next Enterprise Search Meetup, this time featuring Stefan Olafsson of TwigKit and Jeremy Bentley of Smartlogic. Stefan started off with a brief look at relational databases and search engines, and whether the latter can ever … More

The post London Enterprise Search Meetup – Databases vs. Search and Taxonomies appeared first on Flax.

]]>
Back to London for the next Enterprise Search Meetup, this time featuring Stefan Olafsson of TwigKit and Jeremy Bentley of Smartlogic.

Stefan started off with a brief look at relational databases and search engines, and whether the latter can ever supersede the former. He talked about how modern search technologies such as Apache Solr share many of the same features as the new generation of NoSQL databases, but how in practise one often seems to end up with a combination of search engine and relational database – an experience we share, although we have a small number of customers who have entirely moved away from databases in favour of a search engine.

Jeremy’s talk was an in-depth look at Smartlogic’s products, which include taxonomy creation and management tools, and are designed to complement search engines such as Solr or the GSA. Some interesting points here including the assertion that ‘we trust our content to systems that know nothing about our content’ – i.e. word processors, content storage and management systems – and that we rely on users to add consistent metadata. Smartlogic’s products promise to automate this metadata creation and he had some interesting examples such as the NHS Choices website.

Some interesting discussions followed on the value of taxonomies. Our view is that open taxonomy resources such as Freebase are better than those developed and kept private within organisations, as this can prevent duplication and promote cooperation and the sharing of information. Also, taxonomies often seem to be introduced as a way to fix a broken search experience – maybe fixing the search should be a higher priority.

Thanks to Tyler Tate for organising the event – the tenth in this series of Meetups, and now a regular and much anticipated event in the calendar.

The post London Enterprise Search Meetup – Databases vs. Search and Taxonomies appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2011/04/14/london-enterprise-search-meetup-databases-vs-search-and-taxonomies/feed/ 1