sphinx – Flax http://www.flax.co.uk The Open Source Search Specialists Thu, 10 Oct 2019 09:03:26 +0000 en-GB hourly 1 https://wordpress.org/?v=4.9.8 Better search for life sciences at the BioSolr Workshop, day 2 – Elasticsearch & others http://www.flax.co.uk/blog/2016/02/15/better-search-life-sciences-biosolr-workshop-day-2-elasticsearch-others/ http://www.flax.co.uk/blog/2016/02/15/better-search-life-sciences-biosolr-workshop-day-2-elasticsearch-others/#respond Mon, 15 Feb 2016 11:32:13 +0000 http://www.flax.co.uk/?p=3017 Over the last 18 months we’ve been working closely with the European Bioinformatics Institute on a project to improve their use of open source search engines, funded by the BBSRC. The project was originally named BioSolr but has since grown … More

The post Better search for life sciences at the BioSolr Workshop, day 2 – Elasticsearch & others appeared first on Flax.

]]>
Over the last 18 months we’ve been working closely with the European Bioinformatics Institute on a project to improve their use of open source search engines, funded by the BBSRC. The project was originally named BioSolr but has since grown to encompass Elasticsearch. Last week we held a two-day workshop on the Wellcome Genome Campus near Cambridge to showcase our achievements and hear from others working in the same field, focused on Solr on the first day and Elasticsearch and other solutions on the second. Attendees included both bioinformaticians and search experts, as the project has very much been about collaboration and learning from each other.Read about the first day here.

The second day started with Eric Pugh’s second talk on The (Unofficial) State of Elasticsearch, bringing us all up to date on the meteoric rise of this technology and the opportunities it opens up especially in analytics and visualisation. Eric foresees Elastisearch continuing to specialise in this area, with Solr sticking closer to its roots in information retrieval. Giovanni Tumarello followed with a fast-paced demonstration of Kibi, a platform built on Elasticsearch and Kibana. Kibi allows one to very quickly join, visualise and explore different data sets and I was impressed with the range of potential applications including in the life sciences.

Evan Bolton of the US-based NCBI was next, talking about the massive PubChem dataset (80 million unique chemical structures, 200 million chemical substance descriptions, and 230 million biological activities, all heavily crosslinked). Although both Solr and CLucene had been considered, they eventually settled on the Sphinx engine with its great support for SQL queries and JOINs, although Evan admitted this was not a cloud-friendly solution. His team are now considering knowledge graphs and how to present up to 100 billion RDF triples. Andrea Pierleoni of the Centre for Therapeutic Target Validation then talked about an Elasticsearch cluster he has developed to index ‘evidence strings’ (which relate targets to diseases using evidence). This is a relatively small collection of 2.1 million association objects, pre-processed using Python and stored in Redis before indexing.

Next up was Nikos Marinos from the EBI Literature Services team talking about their recent migration from Lucene to Solr. As he explained most of this was a straightforward task, with one wrinkle being the use of DIH Transformers where array data was used. Rafael Jimenez then talked about projects he has worked on using both Elasticsearch and Solr, and stressed the importance of adhering to open standards and re-use of software where possible – key strengths of open source of course. Michal Nowotka then talked about a proposed system to replace the current ChEMBL search using Solr and django-haystack (the latter allows one to use a variety of underlying search engines from Django). Finally, Nicola Buso talked about EBISearch, based on Lucene.

We then concluded with another hands-on session, more aimed at Elasticsearch this time. As you can probably tell we had been shown a huge variety of different search needs and solutions using a range of technologies over the two days and it was clear to me that the BioSolr project is only a small first step towards improving the software available – we have applied for further funding and we hope to have good news soon! Working with life science data, often at significant scale, has been fascinating.

Most of the presentations are now available for download. Thanks to all the presenters (especially those who travelled from abroad), the EBI for kindly hosting the event and in particular to Dr Sameer Velankar who has been the driving force behind this project.

The post Better search for life sciences at the BioSolr Workshop, day 2 – Elasticsearch & others appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2016/02/15/better-search-life-sciences-biosolr-workshop-day-2-elasticsearch-others/feed/ 0
Cambridge Search Meetup: free postcodes from old maps & visualising fish http://www.flax.co.uk/blog/2013/07/18/cambridge-search-meetup-free-postcodes-from-old-maps-visualising-fish/ http://www.flax.co.uk/blog/2013/07/18/cambridge-search-meetup-free-postcodes-from-old-maps-visualising-fish/#respond Thu, 18 Jul 2013 08:44:54 +0000 http://www.flax.co.uk/blog/?p=974 Last night was a particularly hot Cambridge Search Meetup: someone suggested that next time we lash four punts together and float down the river – would certainly be a little cooler though I’m still not sure how to rig a … More

The post Cambridge Search Meetup: free postcodes from old maps & visualising fish appeared first on Flax.

]]>
Last night was a particularly hot Cambridge Search Meetup: someone suggested that next time we lash four punts together and float down the river – would certainly be a little cooler though I’m still not sure how to rig a projector!

Our first speaker was Nick Burch who told us about a fascinating past project of his to source freely available postcode data for the UK. His team collected out-of-copyright maps, scanned them and created a website to crowd-source knowledge of the postcode of individual features (say a childhood home or a church). In 6-9 months they had a database of the first four characters of all UK postcodes (e.g CB1 1xx) and their locations – good enough for many location based services to take advantage of their free feeds. Shortly afterwards the UK’s Ordnance Survey released their own data for free – partly as a result of projects like Nick’s and pressure from the burgeoning Open Data movement. Nick suggested that the best way to approach projects such as this is to look for data similar to what you require, find a way to interest people on the Internet in it, provide an API for corrections and feedback and to release all your data using a permissive license.

Next up was Craig Mills, who provided some background on his past projects monitoring cod stocks (apparently it’s good to offer fisherman a £500 bounty for returning your tagging hardware!) and more recently on tools for monitoring and visualising ecology. He mentioned the open source Sphinx search engine and visualisation tool CartoDB as two key technologies, and talked about how a clickable map interface is often preferred to the traditional search box. An interesting technique was to crowd source photos from around the world and use an algorithm to spot the relative amount of ‘nature’ and ‘man-made’ textures in them – a potentially powerful way to measure how humans are changing the planet.

We finished as ever with beers, snacks and chat in the thankfully cooler downstairs bar. Thanks to both our speakers and all who came – next week we have a fantastic opportunity to join Grant Ingersoll on a free Apache Lucene/Solr hack day – do let us know if you’re coming as space is limited.

The post Cambridge Search Meetup: free postcodes from old maps & visualising fish appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2013/07/18/cambridge-search-meetup-free-postcodes-from-old-maps-visualising-fish/feed/ 0
An open day on open source search from Sirius & Flax http://www.flax.co.uk/blog/2012/07/23/an-open-day-on-open-source-search-from-sirius-flax/ http://www.flax.co.uk/blog/2012/07/23/an-open-day-on-open-source-search-from-sirius-flax/#respond Mon, 23 Jul 2012 15:26:39 +0000 http://www.flax.co.uk/blog/?p=828 We spent Friday at the riverside offices of Sirius Corporation, our support partners, for the first and hopefully not the last of their Open Days on open source enterprise search. We were lucky to have Mike Davis, a very well … More

The post An open day on open source search from Sirius & Flax appeared first on Flax.

]]>
We spent Friday at the riverside offices of Sirius Corporation, our support partners, for the first and hopefully not the last of their Open Days on open source enterprise search. We were lucky to have Mike Davis, a very well known and highly experienced analyst to open the talks – despite suffering from flu he gave an engaging talk on why open source enterprise search software should be your first port of call, and how you should only consider closed source options when you need particular features they provide.

We then gave a quick Introduction to Open Source Search, detailing the various packages available (from Apache Lucene/Solr to Xapian and Sphinx) and showing a quick Solr-powered demo we’d built to search some pages from the BBC Music website. Using the programmer’s first choice for an example query (the ever reliable ‘foo*’) we discovered the wonderfully named Original Rabbit Foot Spasm Band – which interestingly you can’t find via the BBC’s own site search engine due to lack of wildcard support.

Andrew Savory, Sirius’ CTO and Apache Foundation member, then gave a presentation on what an Apache project actually is and how best to engage with an open source community – very useful for those considering open source for the first time. The morning finished with a delicious barbeque on the riverbank provided by Sirius. We thought the event went very well and we’d love to confirm the rumour that this will become a regular event. Thanks to all at Sirius for organising and hosting the day and we look forward to returning.

The post An open day on open source search from Sirius & Flax appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2012/07/23/an-open-day-on-open-source-search-from-sirius-flax/feed/ 0
Chalk and cheese – the difficulty of analysing open source options http://www.flax.co.uk/blog/2010/12/09/chalk-and-cheese-the-difficulty-of-analysing-open-source-options/ http://www.flax.co.uk/blog/2010/12/09/chalk-and-cheese-the-difficulty-of-analysing-open-source-options/#respond Thu, 09 Dec 2010 14:55:53 +0000 http://www.flax.co.uk/blog/?p=452 David Fishman of Lucid Imagination has blogged on how open source search is treated by the analyst community (you can even use his links to get hold of some of the reports mentioned for the usual price of your contact … More

The post Chalk and cheese – the difficulty of analysing open source options appeared first on Flax.

]]>
David Fishman of Lucid Imagination has blogged on how open source search is treated by the analyst community (you can even use his links to get hold of some of the reports mentioned for the usual price of your contact details). We can add to his list a report from the Real Story Group – and I hear Ovum will shortly release an updated report.

What I find most interesting about these analyst reports is how various vendors are subdivided – either by target market, or by size, or by how ‘complex’ their platform is. Open source solutions don’t always fit the categories – for example Real Story Group list ‘Apache Project’ as a ‘specialised vendor’ – which it really isn’t. Perhaps it’s time for some new categories in these analyst reports – maybe a list of specialist open source integrators, linked with the available technologies such as Lucene, Xapian or Sphinx, combined with some data about likely costs.

The post Chalk and cheese – the difficulty of analysing open source options appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2010/12/09/chalk-and-cheese-the-difficulty-of-analysing-open-source-options/feed/ 0