government – Flax http://www.flax.co.uk The Open Source Search Specialists Thu, 10 Oct 2019 09:03:26 +0000 en-GB hourly 1 https://wordpress.org/?v=4.9.8 Enterprise Search Europe 2015 review – day 1 http://www.flax.co.uk/blog/2015/10/28/enterprise-search-europe-2015-review-day-1/ http://www.flax.co.uk/blog/2015/10/28/enterprise-search-europe-2015-review-day-1/#comments Wed, 28 Oct 2015 15:07:53 +0000 http://www.flax.co.uk/?p=2753 This year’s Enterprise Search Europe started early for me – I had been invited to give the opening keynote, so I made sure I arrived early enough to make sure my laptop would play nicely with the projector, always a … More

The post Enterprise Search Europe 2015 review – day 1 appeared first on Flax.

]]>
This year’s Enterprise Search Europe started early for me – I had been invited to give the opening keynote, so I made sure I arrived early enough to make sure my laptop would play nicely with the projector, always a worry! The keynote was well recieved and I’m very grateful for the opportunity to talk about Big Data Analytics and streaming search.

Next up were Hans-Josef Jeanrond of Sinequa and Steve Woodward of AstraZeneca, a return visit after an excellent presentation by Steve’s colleague Nick Brown last year. AstraZeneca have a committed approach to search led directly by their CTO office, running hackathons, pilots and larger projects to rapidly deliver a raft of applications built on their core search platform. One key feature was delivering ‘cards’ in some cases – like Google, a calculator ‘card’ when a maths query is entered, or a calendar when someone asks about booking meetings. AstraZeneca are also building mobile apps, including a ‘people search’ that allows one to call or email with a single click. It’s great to see a large company putting significant resources into enterprise search and the benefits this can bring.

Dayle Collins of PwC and Vince McNamara of Dahu were next with a talk about PwC’s Exalead-powered enterprise search across a range of business-critical content. Dayle talked about how analysis and interviews were carried out to identify recurring search patterns in the business and identify a strategic focus and Vince then explained some of the technical features developed, including custom relevance ranking. Interestingly, entity extraction is also used at query time to classify which type of query a user has entered – are they asking about a company, product or employee for example. They mentioned how a ‘gold standard’ for search relevance is being developed – it seems this is being recorded in spreadsheets currently: perhaps they should consider a more interactive tool.

The next talk came from Ian Williams of NHS Wales Informatics Service who are building a large scale patient record service using Apache Solr. Ian explained the pressures facing the NHS (austerity, difficulty with staffing, ageing populations) and how patient records are currently distributed across a number of locations and sometimes still paper-based. This exciting project (which should be an example both the the rest of the NHS and other healthcare providers) uses Solr to create a single Welsh Clinical Portal, where healthcare providers can find information on 3 million patients in 135 hospitals and 400 GP practices across Wales. We’ve been lucky enough to work with Ian’s team on this project in a small way and it was very exciting to find out more details and hear about their future plans.

After lunch, Lesley Holmes of Nottinghamshire County Council told us about how they have attempted to improve search by focussing on metadata quality – using tools from ConceptSearching to automatically apply tags. Their content is spread across many servers and often duplicated but improving search can be have huge value to their users, who often provide services for vulnerable people where accurate and up-to-date information is essential. Cedric Ulmer of FranceLabs was next, describing with Alban Ferignac of the IFCE a project to replace Exalead with Apache Solr. Interestingly this talk contained some concrete numbers – Exalead was costing €75,000 plus a €15,000 support fee for a maximum of 6 million documents (their target is 50m), with updates costing even more, and IFCE were finding it difficult to obtain reactive support. The open source Solr (supported by a worldwide community and under constant development) gave them far more flexibility, no effective limit on the number of documents indexed and the migration process cost only €15,000 – as clear an indication of the benefits of open source search as I have seen.

Next I ran a roundtable discussion on implementing open source search which was well-attended and interactive – we discussed search engine pipelines for indexing thousands of sources amongst other subjects, and the discussions continued well after we had to vacate the room! I had to rush off soon afterwards to run the evening Meetup at a local pub, where I demonstrated the Quepid search relevance tool we’ve been using for client projects recently.

The post Enterprise Search Europe 2015 review – day 1 appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2015/10/28/enterprise-search-europe-2015-review-day-1/feed/ 2
A review of Stephen Arnold’s CyberOSINT & Next Generation Information Access http://www.flax.co.uk/blog/2015/02/17/a-review-of-stephen-arnolds-cyberosint-next-generation-information-access/ http://www.flax.co.uk/blog/2015/02/17/a-review-of-stephen-arnolds-cyberosint-next-generation-information-access/#respond Tue, 17 Feb 2015 11:25:26 +0000 http://www.flax.co.uk/blog/?p=1388 Stephen Arnold, whose blog I enjoy due to its unabashed cynicism about overenthusiastic marketing of search technology, was kind enough to send me a copy of his recent report on CyberOSINT & Next Generation Information Access (NGIA), the latter being … More

The post A review of Stephen Arnold’s CyberOSINT & Next Generation Information Access appeared first on Flax.

]]>
Stephen Arnold, whose blog I enjoy due to its unabashed cynicism about overenthusiastic marketing of search technology, was kind enough to send me a copy of his recent report on CyberOSINT & Next Generation Information Access (NGIA), the latter being a term he has recently coined. OSINT itself refers to intelligence gathered from open, publically available sources, not anything to do with software licenses – so yes, this is all about the NSA, CIA and others, who as you might expect are keen on anything that can filter out the interesting from the noise. Let’s leave the definition (and the moral questionability) of ‘publically available’ aside for now – even if you disagree with its motives, this is a use case which can inform anyone with search requirements of the state of the art and what the future holds.

The report starts off with a foreword by Robert David Steele, who has had a varied and interesting career and lately has become a cheerleader for the other kind of open source – software – as a foundation for intelligence gathering. His view is that the tools used by the intelligence agencies ‘are also not good enough’ and ‘We have a very long way to go’. Although he writes that ‘the systems described in this volume have something to offer’ he later concludes that ‘This monograph is a starting point for those who might wish to demand a “full spectrum” solution, one that is 100% open source, and thus affordable, interoperable, and scalable.’ So for those of us in the open source sector, we could consider Arnold’s report as a good indicator of what to shoot for, a snapshot of the state of the art in search.

Arnold then starts the report with some explanation of the NGIA concept. This is largely a list of the common failings of traditional search platforms (basic keyword search, oft-confusing syntax, separate silos of information, lack of multimedia features and personalization) and how they might be addressed (natural language search, automatic querying, federated search, analytics). I am unconvinced this is as big a step as Arnold suggests though: it seems rather to imply that all past search systems were badly set up and configured and somehow a NGIA system will magically pull everything together for you and tell you the answer to questions you hadn’t even asked yet.

Disappointingly the exemplar chosen in the next chapter is Autonomy IDOL: regular readers will not be surprised by my feelings about this technology. Arnold suggests the creation of the Autonomy software was influenced by cracking World War II codes, rock music and artificial intelligence, which is in my mind adding egg to an already very eggy pudding, and not in step with what I know about the background of Cambridge Neurodynamics (Autonomy’s progenitor, created very soon after – and across the corridor from – Muscat, another Cambridge Bayesian search technology firm where Flax’s founders cut their teeth on search). In particular, Autonomy’s Kenjin tool – which automatically suggested related documents – is identified as a NGIA feature, although at the time I remember it being reminiscent of features we had built a year earlier at Muscat – we even applied for a patent. Arnold does note that ‘[Autonomy founder, Mike] Lynch and his colleagues clamped down on information about the inner workings of its smart software.’ and ‘The Autonomy approach locks down the IDOL components.’ – this was a magic black box of course, with a magically increasing price tag as well. The price tag rose to ridiculous dimensions (even after an equally ridiculous writedown) when Hewlett Packard bought the company.

The report continues with analysis of various other potential NGIA contenders, including Google-funded timeline analysis specialists Recorded Future and BAE Detica – interestingly one of the search specialists from this British company has now gone on to work at Elasticsearch.

The report concludes with a look at the future, correctly identifying advanced analytics as one key future trend. However this conclusion also echoes the foreword, with ‘The cost of proprietary licensing, maintenance, and training is now killing the marketplace. Open source alternatives will emerge, and among these may be a 900 pound gorilla that is free, interoperable and scalable.’. Although I have my issues with some of the examples chosen, the report will be very useful I’m sure to those in the intelligence sector, who like many are still looking for search that works.

The post A review of Stephen Arnold’s CyberOSINT & Next Generation Information Access appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2015/02/17/a-review-of-stephen-arnolds-cyberosint-next-generation-information-access/feed/ 0
Search Solutions 2015 – Is semantic search finally here? http://www.flax.co.uk/blog/2014/12/04/search-solutions-2015-is-semantic-search-finally-here/ http://www.flax.co.uk/blog/2014/12/04/search-solutions-2015-is-semantic-search-finally-here/#respond Thu, 04 Dec 2014 14:07:20 +0000 http://www.flax.co.uk/blog/?p=1325 Last week I attended one of my favourite annual search events, Search Solutions, held at the British Computer Society’s base in Covent Garden. As usual this is a great chance to see what’s new in the linked worlds of web, … More

The post Search Solutions 2015 – Is semantic search finally here? appeared first on Flax.

]]>
Last week I attended one of my favourite annual search events, Search Solutions, held at the British Computer Society’s base in Covent Garden. As usual this is a great chance to see what’s new in the linked worlds of web, intranet and enterprise search and this year there was a focus on semantic search by several of the presenters.

Peter Mika of Yahoo! started us off with a brief history of semantic search including how misplaced expectations have led to a general lack of adoption. However, the large web search companies have made significant progress over the years leading to shared standards for semantically marking of web content and some large collections of knowledge, which allows them to display content for certain queries, e.g. actor’s biographies shown on the right of the usual search results. He suggested the next step is to better understand queries as most of the work to date has been on understanding documents. Christopher Semturs of Google followed with a description of their efforts in this space, Google’s Knowledge Graph containing 40 billion facts about 530 million entities, built in part by converting web pages directly (including how some badly structured websites can contain the most interesting and rare knowledge). He reminded us of the importance of context and showed some great examples of queries that are still hard to answer correctly. Katja Hofmann of Microsoft then described some ways in which search engines might learn directly from user interactions, including some wonderfully named methodologies such as Counterfactual Reasoning and the Contextual Bandit. She also mentioned their continuing work on Learning to Rank with the open source Lerot software.

Next up was our own Tom Mortimer presenting our study comparing the performance of Apache Solr and Elasticsearch – you can see his slides here. While there are few differences Tom has found that Solr can support three times the query rate. Iadh Ounis of the University of Glasgow followed, describing another open source engine, Terrier, which although mainly focused on academic research does now contain some cutting edge features including the aforementioned Learning to Rank and near real-time search.

The next session featured Dan Jackson of UCL describing the challenges of building website search across a complex set of websites and data, a similar talk to one he gave at an earlier event this year. Next was our ex-colleague Richard Boulton describing how the Gov.uk team use metrics to tune their search capability (based on Elasticsearch). Interestingly most of their metric data is drawn from Google Analytics, as a heavy use of caching means they have few useful query logs.

Jussi Karlgren of Gavagai then described how they have built a ‘living lexicon’ of text in several languages, allowing for the representation of the huge volume of new terms that appear on social media every week. They have also worked on multi-dimensional sentiment analysis and visualisations: I’ll be following these developments with interest as they echo some of the work we have done in media monitoring. Richard Ranft of the British Library then showed us some of the ways search is used to access the BL’s collection of 6 million audio tracks including very early wax cylinder recordings – they have so much content it would take you 115 years to listen to it all! The last presentation of the day was by Jochen Leidner of Thomson Reuters who showed some of the R&D projects he has worked on for data including legal content and mining Twitter for trading signals.

After a quick fishbowl discussion and a glass of wine the event ended for me, but I’d like to thank the BCS IRSG for a fascinating day and for inviting us to speak – see you next year!

The post Search Solutions 2015 – Is semantic search finally here? appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2014/12/04/search-solutions-2015-is-semantic-search-finally-here/feed/ 0
Why GCloud search is badly broken & how to fix it http://www.flax.co.uk/blog/2014/06/26/why-gcloud-search-is-badly-broken-how-to-fix-it/ http://www.flax.co.uk/blog/2014/06/26/why-gcloud-search-is-badly-broken-how-to-fix-it/#respond Thu, 26 Jun 2014 15:26:23 +0000 http://www.flax.co.uk/blog/?p=1239 The GCloud initiative and the associated CloudStore are a great idea – hoping to level the field of UK government IT supply, take advantage of flexible and agile delivery of software and services and help SMEs like ourselves compete against … More

The post Why GCloud search is badly broken & how to fix it appeared first on Flax.

]]>
The GCloud initiative and the associated CloudStore are a great idea – hoping to level the field of UK government IT supply, take advantage of flexible and agile delivery of software and services and help SMEs like ourselves compete against the large System Integrators (SIs) that dominate this market. GCloud sales have now reached £154m although this is still a fraction of what the UK government spends on IT. We’re on GCloud 5 ourselves by the way so I have a vested interest in helping potential customers find us, and we’ve helped with government systems before.

Unfortunately the Cloudstore itself has a search facility that is badly broken. There are several obvious issues: many of the entries created by the larger suppliers have been keyword stuffed – here’s a particularly egregious example from Atos which seems to include most of the terms used in software in the last few years. I found this using the search terms ‘enterprise search’ which produces very few relevant looking results. The online guidance for CloudStore search suggests putting double quotes around my terms (sadly I think few users will think of this) which improves things a little but there are still a lot of irrelevant results – an online conferencing system is fifth for example.

Fortunately all is not lost and in the next iteration of GCloud we are promised major improvements to the search engine. I’m hoping this will include phrase boosting. However, if the big SIs and others are allowed to create the sort of bad-quality content I have shown above, no search engine in the world will be able to sort the wheat from the chaff. It is essential that CloudStore entries are subject to some kind of curation and that keyword stuffing is banned and/or heavily penalised, otherwise SMEs like ourselves will still find it very hard to compete with the big SIs.

Update: it seems there is a new system under construction, and the search works a lot better. Let’s hope it comes out of alpha soon and can be used by purchasers!

The post Why GCloud search is badly broken & how to fix it appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2014/06/26/why-gcloud-search-is-badly-broken-how-to-fix-it/feed/ 0
How we built a search engine for UK MP tweets with Solr, Python & StanfordNLP http://www.flax.co.uk/blog/2014/02/06/how-we-built-a-search-engine-for-uk-mp-tweets-with-solr-python-stanfordnlp/ http://www.flax.co.uk/blog/2014/02/06/how-we-built-a-search-engine-for-uk-mp-tweets-with-solr-python-stanfordnlp/#respond Thu, 06 Feb 2014 10:44:59 +0000 http://www.flax.co.uk/blog/?p=1134 Matt Pearce writes: We recently released UKMP, a search application built on work done on last year’s Enterprise Search hack day. This presents the tweets of UK Members of Parliament with search options including filtering by party, retweet and favourite … More

The post How we built a search engine for UK MP tweets with Solr, Python & StanfordNLP appeared first on Flax.

]]>
Matt Pearce writes:

We recently released UKMP, a search application built on work done on last year’s Enterprise Search hack day. This presents the tweets of UK Members of Parliament with search options including filtering by party, retweet and favourite count, and entities (people, locations and organisations) extracted from the tweet text. This is obviously its first incarnation, so there are still a number of features in development, but I thought I would comment on some of the decisions taken while developing the site.

I started off by deciding which bits of the hack day code would be most useful, from both the Solr set-up side and the web application we were hoping to build. During the hack day, the group had split into a number of smaller teams, with two of them working on a set of data downloaded from Twitter, containing the original set of UK MP tweets. I took the basic Solr setup and indexing code from one group, and the initial web application from the other.

Obviously we couldn’t work with a completely static data set, so I set about putting together a Python script to grab the tweets. This was where I met the first hurdle: I was trying to grab tweets from individual MPs’ feeds, but kept getting blocked by the Twitter API, even though I didn’t think I was over-stepping the limits set on the calls. With 200-plus MPs to track, a different approach would be required to avoid being blocked. Eventually, I took a different approach, and started using the lists compiled by Tweetminster, who track politicians tweets themselves. This worked much better, and I could soon start building a useful data set.

I chose the second group’s web application because it already used the Stanford NLP software to extract entities from the tweet text. The indexer script, also written in Python, calls the web app to extract the entities before indexing the tweets. We spent some time trying to incorporate the Stanford sentiment analysis as well, but found it wasn’t practical – the response time was too slow, and we didn’t have time to train the dataset to provide a more useful analysis of the content (almost all tweets were rated as either “negative” or “neutral”, which didn’t accurately reflect the sentiments in the data).

Since this was an entirely new project, and because it was being done outside the main client workflow, I took the opportunity to try out AngularJS, an MVC-oriented JavaScript front-end framework. This runs on top of, and calls back to, the DropWizard web application, which provides the Model part of the Model-View-Controller system. AngularJS itself provides the Controller, while the Views are all written in fairly standard HTML, with some AngularJS frosting to fill in the content.

AngularJS itself generally made development very easy and fast, and I was pleased by how little JavaScript I had to write to build a working application (there is also a Bootstrap crossover module, providing AngularJS directives to work with the UI layout tools Bootstrap provides). As a small site, there are only two controllers in play: one for each page. AngularJS also makes it very easy to plug in other script modules, such as that used to generate the word cloud on the About page. However, I did come across a few sticking points as I built the app, as one might expect from a first-time user. The principle one was handling the search box at the top of the page, which had to be independent of the view while needing to modify it to display the search results. I am still not sure that I ended up with the best approach – the search form fires an event when submitted, which then percolates up the AngularJS control hierarchy until caught and dealt with: within the search page, the search is handled normally; from other pages, we redirect to the search page and pass in the term. It doesn’t feel as smooth as it should do, which is why I remain unconvinced this is the best solution.

All in all, this was an interesting sideline project, and provided a good excuse to try out some new technology. The code itself, along with some notes on how to get the system up and running, is in our github repository – feel free to try it out, and make suggestions for improvements or better ways to use the code.

The post How we built a search engine for UK MP tweets with Solr, Python & StanfordNLP appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2014/02/06/how-we-built-a-search-engine-for-uk-mp-tweets-with-solr-python-stanfordnlp/feed/ 0
G-Cloud and open file formats, a cautionary tale http://www.flax.co.uk/blog/2013/11/01/g-cloud-and-open-file-formats-a-cautionary-tale/ http://www.flax.co.uk/blog/2013/11/01/g-cloud-and-open-file-formats-a-cautionary-tale/#comments Fri, 01 Nov 2013 13:56:52 +0000 http://www.flax.co.uk/blog/?p=1028 We’re lucky enough to have our services available on the G-Cloud, a new initiative by the UK Government’s Cabinet Office with the aim of breaking the sometimes monopolistic practices of ‘big IT’ when supplying government clients. We’ve recently had a … More

The post G-Cloud and open file formats, a cautionary tale appeared first on Flax.

]]>
We’re lucky enough to have our services available on the G-Cloud, a new initiative by the UK Government’s Cabinet Office with the aim of breaking the sometimes monopolistic practices of ‘big IT’ when supplying government clients. We’ve recently had a couple of contracts procured via the G-Cloud iii framework and one of the requirements is to report whenever a client is invoiced. This is done via a website called Management Information Systems Online (MISO).

Part of the process is to input various mysterious Product Codes, and to find out what these were I downloaded a file from the MISO website. I use the Firefox browser and OpenOffice so I had assumed that opening this file would be a relatively simple process…perhaps unwisely.

Firstly, due to some quirk of the website and/or browser the file arrives with no file extension. I’m assuming it’s some kind of Microsoft Office document so I try renaming it to .xls as an Excel spreadsheet, and open it in OpenOffice Calc. This doesn’t work, as I end up with a load of XML in the spreadsheet cells. As it’s XML I wonder if it’s a newer, XML-powered Office format, so rename to .xlsx, but no, it seems that doesn’t work either. Opening up the file in a text editor shows it’s some kind of XML with Microsoft schemas abounding. At this point I tried contacting the MISO technical support department but they weren’t able to help.

A quick Google and I’ve discovered that the file is probably SpreadsheetML, a file format used before 2007 when Microsoft finally went the whole hog and embraced (well, forced everyone else to embrace) their own XML-based standard for Office documents. The latter format is something OpenOffice can easily read, so I try renaming the file as .xml and importing it. OpenOffice now tells me "OpenOffice.org requires a Java runtime environment (JRE) to perform this task. The selected JRE is defective."

This is now taking far too long. After some more research I discover what this actually means is OpenOffice needs a version of Java 6 (now discouraged by Oracle). I have to register for an Oracle account to even download it. Finally, Open Office is able to read the file and I can now fill in the original form.

If anything this process proves that central government has a long way to go towards adopting open standards and using plain, widely adopted file formats. The G-Cloud framework is a great step forward – but some of the details still need some work.

The post G-Cloud and open file formats, a cautionary tale appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2013/11/01/g-cloud-and-open-file-formats-a-cautionary-tale/feed/ 2
An open approach to tuning search for gov.uk http://www.flax.co.uk/blog/2013/06/12/an-open-approach-to-tuning-search-for-gov-uk/ http://www.flax.co.uk/blog/2013/06/12/an-open-approach-to-tuning-search-for-gov-uk/#respond Wed, 12 Jun 2013 13:18:29 +0000 http://www.flax.co.uk/blog/?p=967 Roo Reynolds from the GDS team has written a great blog post about the ongoing process of tuning the search for gov.uk which I can highly recommend. We regularly see situations where a search project has been set up as … More

The post An open approach to tuning search for gov.uk appeared first on Flax.

]]>
Roo Reynolds from the GDS team has written a great blog post about the ongoing process of tuning the search for gov.uk which I can highly recommend.

We regularly see situations where a search project has been set up as ‘fire and forget’ – which is never a good idea: not only does content grow, but user needs change and search requirements evolve, whatever the application. Search should be a living project: monitoring user behaviour should reveal not just which searches ‘work’ (i.e. the user gets some results which they then click on) but more important which ones don’t. For example, common mispellings or acronyms might be a useful addition to a synonym list; if average search response times are lengthening then it might be time to consider performance tuning or even scaling out; the constant use of the ‘Next 10 Results’ button might indicate a problem with relevance ranking.

Luckily any improvements to gov.uk made by the GDS team should appear in their Github repository at some point – as I mentioned before the GDS team are (very sensibly) committed to an open source approach.

The post An open approach to tuning search for gov.uk appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2013/06/12/an-open-approach-to-tuning-search-for-gov-uk/feed/ 0
A revolution in open standards in government http://www.flax.co.uk/blog/2012/11/02/a-revolution-in-open-standards-in-government/ http://www.flax.co.uk/blog/2012/11/02/a-revolution-in-open-standards-in-government/#respond Fri, 02 Nov 2012 09:58:13 +0000 http://www.flax.co.uk/blog/?p=893 Something revolutionary has been happening recently in the UK government with regard to open source software, standards and data. Change has been promised before and some commentators have been (entirely correctly) cynical about the eventual result, but it seems that … More

The post A revolution in open standards in government appeared first on Flax.

]]>
Something revolutionary has been happening recently in the UK government with regard to open source software, standards and data. Change has been promised before and some commentators have been (entirely correctly) cynical about the eventual result, but it seems that finally we have some concrete results. Not content with a public policy and procurement toolkit for open source software, the Cabinet Office today released a policy on open standards – and unlike many had feared, they have got it right.

Why do open standards matter? Anyone who has attempted to open a Word document of recent vintage in an older version of the same software will know how painful it can be. In the world of search we often have to be creative in how we extract data from proprietary, badly documented and inconsistent formats (get thee behind me, PDF!) – at Flax we came up with a novel method involving a combination of Microsoft’s IFilters and running Open Office as a server (you can download our Flax Filters as open source if you’d like to see how this works). If all else fails it is sometimes possible to extract strings from the raw binary data. However, we generally don’t have to preserve paragraphs, formatting and other specifics – and that is the kind of fine detail that often matters, especially in the government or legal arena. Certain companies have been downright obstructive in how they define their ‘standards’ (and I use that word extremely loosely in this case). The same companies have been accused by many of trying to influence the Cabinet Office consultation process, introducing the badly defined FRAND concept. However, the consultation process has been carefully and correctly run and the eventual policy is clear and well written.

It will be very interesting to see how commercial closed source companies react to this policy – but in the meantime those of us in the open source camp should be cheered by the news that finally, after many false starts and setbacks, ‘open’ really does mean, well, ‘open’.

The post A revolution in open standards in government appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2012/11/02/a-revolution-in-open-standards-in-government/feed/ 0
Tuning and improving elasticsearch for the Government Digital Service http://www.flax.co.uk/blog/2012/10/01/tuning-and-improving-elasticsearch-for-the-government-digital-service/ http://www.flax.co.uk/blog/2012/10/01/tuning-and-improving-elasticsearch-for-the-government-digital-service/#comments Mon, 01 Oct 2012 15:45:03 +0000 http://www.flax.co.uk/blog/?p=855 The exciting GOV.UK project is getting close to its first release date of October 17th and we were asked by them to help with some search tuning as they migrate from Apache Solr to elasticsearch. Although elasticsearch has some great … More

The post Tuning and improving elasticsearch for the Government Digital Service appeared first on Flax.

]]>
The exciting GOV.UK project is getting close to its first release date of October 17th and we were asked by them to help with some search tuning as they migrate from Apache Solr to elasticsearch. Although elasticsearch has some great features there are still some areas where it lags Solr, such as the lack of spelling suggestion and proximity boost features. Alan from Flax spent a couple of days working with the GDS team and has blogged about how proximity boosting in particular can be implemented – at least for terms that are relatively close to each other rather than being separated by a page or so.

If you’re interested in more details of how we fixed this and a few other elasticsearch issues, you may want to take a look at the code we worked on – one of the best things about working with the GOV.UK team is that it was already up as open source software within a day (yes, you read that right – code paid for by the taxpayer is open source, as it should be!). We’re looking forward to launch day!

Update: changed ‘proximity search’ to ‘proximity boost’ – thanks Alan!

The post Tuning and improving elasticsearch for the Government Digital Service appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2012/10/01/tuning-and-improving-elasticsearch-for-the-government-digital-service/feed/ 3
Eleven years of open source search http://www.flax.co.uk/blog/2012/07/30/eleven-years-of-open-source-search/ http://www.flax.co.uk/blog/2012/07/30/eleven-years-of-open-source-search/#respond Mon, 30 Jul 2012 13:43:33 +0000 http://www.flax.co.uk/blog/?p=844 It’s now eleven years since we started Flax (initially as Lemur Consulting Ltd) in late July 2001, deciding to specialise in search application development with a focus on open source software. At the time the fallout from the dotcom crash … More

The post Eleven years of open source search appeared first on Flax.

]]>
It’s now eleven years since we started Flax (initially as Lemur Consulting Ltd) in late July 2001, deciding to specialise in search application development with a focus on open source software. At the time the fallout from the dotcom crash was still evident and like today the economic picture was far from rosy. Since few people even knew what a search engine was (Google was relatively new and had only started selling advertising a year before) it wasn’t always easy for us to find a market for our services.

When we visited clients they would list their requirements and we would then tell them how we believed open source search could help (often having to explain the open source movement first). Things are different these days: most of our enquiries come from those who have already chosen open source search software such as Apache Lucene/Solr but need our help in installing, integrating or supporting it. There’s also a rise in those clients considering applications and techniques outside the traditional site search or intranet search – web scraping and crawling for data aggregation, taxonomies and automatic classification, automatic media monitoring and of course massive scalability, distributed processing and Big Data. Even the UK government are using open source search.

So after all this time I’m tending to agree with Roger Magoulas of O’Reilly: open source won, and we made the right choice all those years ago.

The post Eleven years of open source search appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2012/07/30/eleven-years-of-open-source-search/feed/ 0