Worth the wait – Apache Kafka hits 1.0 release

Charlie Hull — Thu, 02 Nov 2017 09:50:20 +0000

We’ve known about Apache Kafka for several years now – we first encountered it when we developed a prototype streaming Boolean search engine for media monitoring with our own library Luwak. Kafka is a distributed streaming platform with some simple but powerful concepts – everything it deals with is a stream of data (like a messaging system), streams can be combined for processing and stored reliably in a highly fault-tolerant way. It’s also massively scalable.

For search applications, Kafka is a great choice for the ‘wiring’ between source data (databases, crawlers, flat files, feeds) and the search index and other parts of the system. We’ve used other message passing systems (like RabbitMQ) in projects before, but none have the simplicity and power of Kafka. Combine the search index with analysis and visualisation tools such as Kibana and you can build scalable, real-time systems for ingesting, storing, searching and analysing huge volumes of data – for example, we’ve already done this for clients in the financial sector wanting to monitor log data using open-source technology, rather than commercial tools such as Splunk.

The development of Kafka has been masterminded by our partners Confluent, and it’s a testament to this careful management that the milestone 1.0 version has only just appeared. This doesn’t mean that previous versions weren’t production ready – far from it – but it’s a sign that Kafka has now matured to be a truly enterprise-scale project. Congratulations to all the Kafka team for this great achievement.

We look forward to working more with this great software – and if you need help with your Kafka project do get in touch!

The post Worth the wait – Apache Kafka hits 1.0 release appeared first on Flax.

Apache Kafka London Meetup – Real time search and insights

Charlie Hull — Thu, 14 Apr 2016 09:50:05 +0000

The rise of Apache Kafka as a streaming data solution is something we’ve been watching for a while – as part of a collection of Big Data tools, it provides a ‘TiVo for data‘ feature. We’ve begun to use it in client projects covering both search and log analysis and we’ve recently partnered with Confluent, founded by the creators of Kafka.

Last night we spoke at the Apache Kafka London Meetup – hosted by British Gas Connected Homes, it was well supplied with drinks, pizza and snacks and also very well attended – there was a great buzz of conversation before the talks had even started! Alan Woodward of Flax started with an updated talk about our proof-of-concept integration of Kafka, Apache Samza and our own Luwak streaming search library (slides are available here). This allows full-text search within a Kafka stream, with the search queries supplied as another stream, for a truly real-time solution – as opposed to the more usual (and much higher latency) approach of indexing the endpoint of a stream. Alan has also tried the very new Kafka Streams feature which can be used as an alternative to Apache Samza – there is some very early code available, although note that this still needs some work! (We’ll update this blog when it’s finished).

The second talk was by one of our hosts, Josep Casals, on how British Gas have used Kafka, Spark Streaming and Apache Cassandra to build a platform for analyzing data from smart meters, boilers and thermostats. Over 2 million smart meters are installed across the UK and there are also over 300,000 connected thermostats, plus many other data sources, and these devices can report every 30 minutes and 2 minutes respectively, so their system has to cope with around 30,000 messages/second. One interesting feature for me was how machine learning is used to disaggregrate power consumption data, so the consumption for say, a fridge can be split out from the overall figure. Apache Samza is also used in this system to provide estimates of consumption and interpolate between readings, allowing data to be fed back to an app on the customer’s mobile device. Further use cases include spotting outlier events, which might indicate failing heating devices or even unusual patterns in an elderly person’s home to alert relatives or carers.

Both talks were live streamed and you can watch them here.

We concluded with some informal discussion and a chance to meet some of Confluent’s UK-based team. Thanks to the organisers and hosts and we look forward to returning! If you have a Kafka project and you’d like any help or advice, do let us know.

The post Apache Kafka London Meetup – Real time search and insights appeared first on Flax.

streaming data – Flax

Worth the wait – Apache Kafka hits 1.0 release

Apache Kafka London Meetup – Real time search and insights