Over the last 18 months we've been working closely with the European Bioinformatics Institute on a project to improve their use of open source search engines, funded by the BBSRC. The project was originally named BioSolr but has since grown to encompass Continue reading
Tag Archives: python
XJoin for Solr, part 1: filtering using price discount data
In this blog post I want to introduce you to a new Apache Solr plugin component called XJoin. I'll show how we can use this to solve a common problem in e-commerce - how to use price discount data, provided by an external web API, to either filter the results of a product search or boost scores. A further post will show another example, using click-through data to influence the score of subsequent searches.
What is XJoin?
...Continue readingHow we built a search engine for UK MP tweets with Solr, Python & StanfordNLP
Matt Pearce writes: We recently released UKMP, a search application built on work done on last year's Enterprise Search hack day. This presents the tweets of UK Members of Parliament with search options including filtering by party, retweet and favourite count, and entities (people, locations a...Continue reading
Cambridge Search Meetup – a night of crawling and scraping
Last night was the busiest ever Cambridge Search Meetup, with two excellent talks and a lot of discussion and networking. First was Harry Waye of Arachnys, who provide access to data on emerging markets that no-one else has using a variety of custom crawling technology and heavy use of tools such Google Translate. If you want to trawl the Greek corporate registry or find out financial news...Continue reading
Open source search engines and programming languages
So you're writing a search-related application in your favourite language, and you've decided to choose an open source search engine to power it. So far, so good - but how are the two going to communicate? Let's look at two engines, Xapian and Lucene, and compare how this might be done. Lucene is written in Java, Xapian in C/C++ - so if you're using those languages respectively, everything should be relatively simple - j...Continue reading
flax.crawler arrives
We've recently uploaded a new crawler framework to the Flax code repository. This is designed for use from Python to build a web crawler for your project. It's multithreaded and simple to use, here's a minimal example:
import crawler
crawler.dump = MyContentDumperImplementati...Continue reading
flax.core 0.1 available
Charlie wrote previously that we try and work with flexible, lightweight frameworks: flax.core is a Python library for conveniently adding functionality to Xapian projects. The current (and first!) version is 0.1, which can be checked out from the flaxcode repository. This version supports named fields for inde...Continue reading
Packaged solutions and customisability, the Python way
With any large scale software installation, there is going to be some customisation and tweaking necessary, and enterprise search systems are no exception. Whatever features are packaged with a system, some of those you need will be missing and some won't be used at all. It's rare to see a situation where the search engine can just be installed straight out of the box. Our Flax system is based on the Xapian core, which has a set of bindings to various differe...Continue reading
Python and Flax presentation
My colleague Richard Boulton will be presenting at Europython in Birmingham, U.K. next week, specifically at 15.30 on Tuesday 30th June - an abstract is available. He'll be talking about Xapian, Xappy and Flax, and showing examples of these in action including one using a Django integration layer....Continue reading