user interface – Flax http://www.flax.co.uk The Open Source Search Specialists Thu, 10 Oct 2019 09:03:26 +0000 en-GB hourly 1 https://wordpress.org/?v=4.9.8 Enterprise Search Europe 2014 day 1 – Decisions, research and a Meetup quiz http://www.flax.co.uk/blog/2014/05/01/enterprise-search-europe-2014-day-1-decisions-research-and-a-meetup-quiz/ http://www.flax.co.uk/blog/2014/05/01/enterprise-search-europe-2014-day-1-decisions-research-and-a-meetup-quiz/#respond Thu, 01 May 2014 15:59:38 +0000 http://www.flax.co.uk/blog/?p=1185 This year’s Enterprise Search Europe was held near Victoria train station in London and unfortunately coincided with a two day strike on the London Underground – worrying for the organisers, but apart from a few notable absences it didn’t seem … More

The post Enterprise Search Europe 2014 day 1 – Decisions, research and a Meetup quiz appeared first on Flax.

]]>
This year’s Enterprise Search Europe was held near Victoria train station in London and unfortunately coincided with a two day strike on the London Underground – worrying for the organisers, but apart from a few notable absences it didn’t seem to affect the attendance too much. We started with a keynote from Dale Roberts, whose book on Decision Sourcing inspired a talk about a ‘rational decision making model’. When examining traditional relational database applications Dale said ‘if you peer at it long enough you can see the rows and columns’ and his point was that modern consumer social networking applications don’t exhibit this old pattern – so this is where search application designers should look for inspiration. His co-presenter Rooven Pakkiri said that Enterprise Search should attempt to ‘release the information from inside our heads’, which of course social networking might help with, connecting you with colleagues. I’m not sure that one can easily take lessons learnt from consumer applications and apply them to business use, and some later speakers agreed with me, but this was a high-energy and thought-provoking start.

Next I chaired the Open Source track, where we started with Cedric Ulmer of France Labs, who talked about a search application they built for a consultancy business with around 40 employees. Using Apache Solr, Apache ManifoldCF and their own Datafari open source framework they turned this project around very quickly – interestingly, the end clients needed no training to use the new system, which implies a very well designed UI. Our second talk from Ronald Hobbs of Reed Business International described a project on a much larger scale: 100 million documents, 72 business units and up to 190 queries per second – this was originally served by the FAST ESP engine but they moved to an Apache Solr system, replacing the FAST processing pipeline with Search Technologies Aspire project. His five steps for an effective migration (Prepare, Get the right tools, Get the right team, Migrate in chunks, Clean up) I can only agree with from our own experience of such projects, including one from FAST ESP to Solr. I was amused by his description of the Apache Zookeeper project as ‘a bipolar manic depressive’, although it seemed this was eventually overcome with a successful deployment on Amazon EC2. Next was Galina Hinova of Intrafind on a aftersales search application for MAN Truck and Bus – again at serious scale (MAN have around 1 billion vehicles in existence with 100-150 documents related to each). Interestingly the Euro6 regulations for emissions and standardized EU terms for automobile parts were direct drivers of the project, with Apache Lucene as the base technology. No longer is open source search just for small-scale projects it seems!

After a short break during which I chatted to John Newton, founder of Documentum Alfresco, and his team we returned to hear Dan Jackson give a description of how UCL had improved their website search – with a chaotic mix of low quality content and an ‘awful’ content management system, the challenges were myriad but with the help of experts such as our associate Tony Russell-Rose they have made significant improvements. Next was what was to prove a very popular talk from Nick Brown of AstraZeneca on a huge, well funded project to build applications to support research and development – again, this was at large scale with 75 million documents (including ‘all the patents and all the research papers’). The key here was their creation of many well-targeted ‘apps’ to enable particular uses of the Sinequa search engine they chose for the back end, including mobile apps to help find others in the company (or external to it) who are also working on a particular drug or disease. This presentation showed just what can be achieved if companies really understand the potential of search technology – knowledge sharing and discovery of previously unknown information.

After a short drinks reception we retired to a nearby pub for the combined Cambridge and London Search Meetup – I’d prepared a short quiz (feel free to have a go!) which was won by Tony Russell-Rose’s team. Networking and chatting continued long into the evening, with some people from the wider UK search community also attending.

To be continued! You can see most of the slides here.

The post Enterprise Search Europe 2014 day 1 – Decisions, research and a Meetup quiz appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2014/05/01/enterprise-search-europe-2014-day-1-decisions-research-and-a-meetup-quiz/feed/ 0
Convergence and collisions in Enterprise Search http://www.flax.co.uk/blog/2014/03/03/convergence-and-collisions-in-enterprise-search/ http://www.flax.co.uk/blog/2014/03/03/convergence-and-collisions-in-enterprise-search/#respond Mon, 03 Mar 2014 14:45:25 +0000 http://www.flax.co.uk/blog/?p=1143 At the end of next month I’ll be at Enterprise Search Europe (I’m on the programme committee and help with the open source track) and the opening keynote this year is from Dale Roberts, author of the book Decision Sourcing. … More

The post Convergence and collisions in Enterprise Search appeared first on Flax.

]]>
At the end of next month I’ll be at Enterprise Search Europe (I’m on the programme committee and help with the open source track) and the opening keynote this year is from Dale Roberts, author of the book Decision Sourcing. Dale will be talking about how Social, Big Data, Analytics and Enterprise Search are on a collision course and business leaders ignore these four themes at their peril.

So I wondered if we could see how in practical terms one might build systems based on these four themes. There are technical and logistical challenges of course (not least convincing someone to pay for the effort) but it’s worth exploring nonetheless.

Social in a business context can mean many things: social media is inherently noisy (and as far as I can see mostly cats) but when social tools are used within a business they can be a great way to encourage collaboration. We ourselves have added social features to search applications – user tagging of search results for example, to improve relevance for future searches and to help with de-duplication. Much has been made of the idea of finding not just relevant documents, but the subject matter experts that may have written them, or just other people in your organisation who are interested in the same subject. From a technical point of view none of this is particularly hard – you just have to add these social signals to your index and surface them in some intuitive way – but getting a high enough percentage of users to contribute to shared discussions and participate in tagging can be difficult.

Big Data is an overused term – but in a business context people usually apply it to very large collections of log files or other data showing how your customers are interacting with your business. A lot of search engine experts will tell you that Big Data isn’t always that ‘big’ – we’ve been dealing with collections of hundreds of millions or even billions of indexed items for many years now, the trick is scaling your solution appropriately (not just in technical terms, but in an economic way, as linearly as possible). If you’ve got a few million items, I’m sorry but you haven’t got Big Data, you’ve just got some data.

I’ve always been unsure of the benefits of search Analytics but I’m beginning to change my mind, having seen a some very impressive demos recently. Search engines have always counted things; the clever bit is allowing for queries that can surface unusual or interesting information, and using modern visualisation techniques to show this. Knowing the most popular search term may not be as important as spotting an unexpected one.

So we’ve indexed our data including tags, personnel records, internal chatrooms; put them all onto a elastically scalable platform and built some intuitive and useful interfaces to search and analyze our data. I’m pretty sure you could do all this with the open source technologies we have today (including Scrapy, Apache Lucene/Solr, Elasticsearch, Apache Hadoop, Redis, Logstash, Kibana, JQueryPython and Java). This isn’t the whole story though: you’d need a cross-disciplinary team within your organisation with the ability to gather user requirements and drive adoption, a suitable budget for prototyping, development and ongoing support and refinements to the system and a vision encompassing the benefits that it would bring your business. Not an inconsiderable challenge!

What questions should we be able to ask the system? I’ll leave that as an exercise for the reader.

See you in April! If you’d like a 20% discount on registration use the code HULL20. We’ll also be running an evening Meetup on Tuesday 29th April open to both conference attendees and others.

The post Convergence and collisions in Enterprise Search appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2014/03/03/convergence-and-collisions-in-enterprise-search/feed/ 0
How we built a search engine for UK MP tweets with Solr, Python & StanfordNLP http://www.flax.co.uk/blog/2014/02/06/how-we-built-a-search-engine-for-uk-mp-tweets-with-solr-python-stanfordnlp/ http://www.flax.co.uk/blog/2014/02/06/how-we-built-a-search-engine-for-uk-mp-tweets-with-solr-python-stanfordnlp/#respond Thu, 06 Feb 2014 10:44:59 +0000 http://www.flax.co.uk/blog/?p=1134 Matt Pearce writes: We recently released UKMP, a search application built on work done on last year’s Enterprise Search hack day. This presents the tweets of UK Members of Parliament with search options including filtering by party, retweet and favourite … More

The post How we built a search engine for UK MP tweets with Solr, Python & StanfordNLP appeared first on Flax.

]]>
Matt Pearce writes:

We recently released UKMP, a search application built on work done on last year’s Enterprise Search hack day. This presents the tweets of UK Members of Parliament with search options including filtering by party, retweet and favourite count, and entities (people, locations and organisations) extracted from the tweet text. This is obviously its first incarnation, so there are still a number of features in development, but I thought I would comment on some of the decisions taken while developing the site.

I started off by deciding which bits of the hack day code would be most useful, from both the Solr set-up side and the web application we were hoping to build. During the hack day, the group had split into a number of smaller teams, with two of them working on a set of data downloaded from Twitter, containing the original set of UK MP tweets. I took the basic Solr setup and indexing code from one group, and the initial web application from the other.

Obviously we couldn’t work with a completely static data set, so I set about putting together a Python script to grab the tweets. This was where I met the first hurdle: I was trying to grab tweets from individual MPs’ feeds, but kept getting blocked by the Twitter API, even though I didn’t think I was over-stepping the limits set on the calls. With 200-plus MPs to track, a different approach would be required to avoid being blocked. Eventually, I took a different approach, and started using the lists compiled by Tweetminster, who track politicians tweets themselves. This worked much better, and I could soon start building a useful data set.

I chose the second group’s web application because it already used the Stanford NLP software to extract entities from the tweet text. The indexer script, also written in Python, calls the web app to extract the entities before indexing the tweets. We spent some time trying to incorporate the Stanford sentiment analysis as well, but found it wasn’t practical – the response time was too slow, and we didn’t have time to train the dataset to provide a more useful analysis of the content (almost all tweets were rated as either “negative” or “neutral”, which didn’t accurately reflect the sentiments in the data).

Since this was an entirely new project, and because it was being done outside the main client workflow, I took the opportunity to try out AngularJS, an MVC-oriented JavaScript front-end framework. This runs on top of, and calls back to, the DropWizard web application, which provides the Model part of the Model-View-Controller system. AngularJS itself provides the Controller, while the Views are all written in fairly standard HTML, with some AngularJS frosting to fill in the content.

AngularJS itself generally made development very easy and fast, and I was pleased by how little JavaScript I had to write to build a working application (there is also a Bootstrap crossover module, providing AngularJS directives to work with the UI layout tools Bootstrap provides). As a small site, there are only two controllers in play: one for each page. AngularJS also makes it very easy to plug in other script modules, such as that used to generate the word cloud on the About page. However, I did come across a few sticking points as I built the app, as one might expect from a first-time user. The principle one was handling the search box at the top of the page, which had to be independent of the view while needing to modify it to display the search results. I am still not sure that I ended up with the best approach – the search form fires an event when submitted, which then percolates up the AngularJS control hierarchy until caught and dealt with: within the search page, the search is handled normally; from other pages, we redirect to the search page and pass in the term. It doesn’t feel as smooth as it should do, which is why I remain unconvinced this is the best solution.

All in all, this was an interesting sideline project, and provided a good excuse to try out some new technology. The code itself, along with some notes on how to get the system up and running, is in our github repository – feel free to try it out, and make suggestions for improvements or better ways to use the code.

The post How we built a search engine for UK MP tweets with Solr, Python & StanfordNLP appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2014/02/06/how-we-built-a-search-engine-for-uk-mp-tweets-with-solr-python-stanfordnlp/feed/ 0
Search Solutions 2013, a review http://www.flax.co.uk/blog/2013/11/28/search-solutions-2013-a-review/ http://www.flax.co.uk/blog/2013/11/28/search-solutions-2013-a-review/#respond Thu, 28 Nov 2013 14:25:18 +0000 http://www.flax.co.uk/blog/?p=1053 Yesterday was the always interesting Search Solutions one day conference held by the BCS IRSG in London, a mix of talks on different aspects of search. The first presentation was by Behshad Behzadi of Google on Conversational Search, where he … More

The post Search Solutions 2013, a review appeared first on Flax.

]]>
Yesterday was the always interesting Search Solutions one day conference held by the BCS IRSG in London, a mix of talks on different aspects of search. The first presentation was by Behshad Behzadi of Google on Conversational Search, where he showed a speech-capable search interface that allowed a ‘conversation’ with the search engine – context being preserved – so the query “where are Italian restaurants in Chelsea” followed by “no I prefer Chinese” would correctly return results about Chinese restaurants. The demo was impressive and we can expect to see more of this kind of technology as smartphone adoption rises. Wim Nijmeijer of Coveo followed with details of how their own custom connectors to a multitude of repositories could enable Complex enterprise search delivered in a day. This of course assumes that no complex mapping of fields or schemas from the source to the search engine index is necessary, which I suspect it often is – I’m not alone in being slightly suspicious of the supposed timescale. Nikolaos Nanas from Thessaly in Greece then presented on Adaptive Information Filtering: from theory to practise which I found particularly interesting as it described filtering documents against a user’s interest with the latter modelled by an adaptive, weighted network – he showed the Noowit personalised magazine application as an example. With over 1000 features per user and no language specific requirements this is a powerful idea.

After a short break we continued with a talk by Henning Rode on CV Search at TextKernel. He described a simple yet powerful UI for searching CVs (resumes) with autosuggest and automatic field recognition (type in “Jav” and the system suggests “Java” and knows this is a programming language or skill). He is also working on systems to autogenerate queries from job vacancies using heuristics. We’ve worked in the recruitment space ourselves so it was interesting to hear about their approach, although the technical detail was light. Following Henning was Dermot Frost talking about Information Preservation and Access at the Digital Repository of Ireland and their use of open source technology including Solr and Blacklight to build a search engine with a huge variety of content types, file formats and metadata standards across the items they are trying to digitally preserve. Currently this is a relatively small collection of data but they are planning to scale up over the next few years: this talk reminded me a little of last year‘s by Emma Bayne of the UK’s National Archive.

After lunch we began a session named Understanding the User, beginning with Filip Radlinski of Microsoft Research. He discussed Sensitive Online Search Evaluation (with arXiv.org as a test collection) and how interleaved results is a powerful technique for avoiding bias. Next was Mounia Lalmas of Yahoo! Labs on what makes An Engaging Click (although unfortunately I had to pop out for a short while so I missed most of what I am sure was a fascinating talk!). Mags Hanley was next on Understanding users search intent with examples drawn from her work at TimeOut – the three main lessons being to know the content in context, the time of year and the users’ mental model in context. Interestingly she showed how the most popular facets used differed across TimeOut’s various international sites – in Paris the top facet was perhaps unsurprisingly ‘cuisine’, in London it was ‘date’.

After another short break we continued with Helen Lippell‘s talk on Enterprise Search – how to triage problems quickly and prescribe the right medicine – her five main points being analyze user needs, fix broken content, focus on quick wins in the search UI, make sure you are able to tweak the search engine itself in a documentable fashion and remember the importance of people and process. Her last point ‘if search is a political football, get an outsider perspective’ is of course something we would agree with! Next was Peter Wallqvist of Ravn Systems on Universal Search and Social Networking where he focussed on how to allow users to interact directly with enterprise content items by tagging, sharing and commenting – so as to derive a ‘knowledge graph’ showing how people are connected by their relationships to content. We’ve built systems in the past that have allowed users to tag items in the search result screen itself so we can agree on the value of this approach. Our last presenter with Kristian Norling of Findwise on Reflections on the 2013 Enterprise Search Survey – some more positive news this year, with budgets for search increasing and 79% of respondents indicating that finding information is of high importance for their organisation. Although most respondents still have less than one full time staff member working on search, Kristian made the very good point that recruiting just one extra person would thus give them a competitive advantage. Perhaps as he says we’ve now reached a tipping point for the adoption of properly funded enterprise search regarded as an ongoing journey rather than a ‘fire and forget’ project.

The day finished with a ‘fishbowl’ session, during which there was a lot of discussion of how to foster links between the academic IR community and industry, then the BCS IRSG AGM and finally a drinks reception – thanks to all the organisers for a very interesting and enlightening day and we look forward to next year!

The post Search Solutions 2013, a review appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2013/11/28/search-solutions-2013-a-review/feed/ 0
The trouble with tabbing: editing rich text on the Web http://www.flax.co.uk/blog/2013/08/08/the-trouble-with-tabbing-editing-rich-text-on-the-web/ http://www.flax.co.uk/blog/2013/08/08/the-trouble-with-tabbing-editing-rich-text-on-the-web/#respond Thu, 08 Aug 2013 08:36:16 +0000 http://www.flax.co.uk/blog/?p=1003 Matt Pearce, who joined the Flax team earlier this year, writes: A recent client wished to convert documents to and from Microsoft Office formats, using a web form as an intermediate step for editing the content. The documents were read … More

The post The trouble with tabbing: editing rich text on the Web appeared first on Flax.

]]>
Matt Pearce, who joined the Flax team earlier this year, writes:

A recent client wished to convert documents to and from Microsoft Office formats, using a web form as an intermediate step for editing the content. The documents were read in, imported to a Solr search engine, and could then be searched over, cloned, edited and transformed in batches, before being exported to Office once more.

The content itself was broken down into fields, some of which were simple text or date entry boxes, while others were more complex rich text fields. We opted to use TinyMCE as our rich text editor of choice – it’s small, open source, and easy to extend (we already knew we wanted to write at least one plugin).

The problem arose when the client explained to us that they wanted to use the tab key in rich text fields to create consistent spacing in the text. These needed to display as closely as possible to the original document format, and convert to actual tabs in the Office documents. This presented a number of problems:
By default, the tab key moves the user to the next field on a web page, and needs special handling to prevent this behaviour, especially when it only needs to be applied to certain fields on the page. The spacing had to be consistent, like a word processor’s tab stop. This is tricky when working with proportional fonts, especially in a web form.

The client didn’t want to use an indent feature. The tab only came at the start of the paragraph – beyond that point the text could wrap around to the start of the line. The tab needed to be recognisable in our processing code, so it could be converted to a real tab when it was exported to MS Office.

The preferred solution would have been a document editor like that used for Google Docs. Unfortunately, we didn’t have the time to write the whole input and presentation layer in Javascript as Google have! We also wanted to keep the editing function inside the web application if possible, rather than forcing the user to edit the documents in Microsoft Office and then re-import them every time they needed to make changes.

I started with TinyMCE’s “nonbreaking” plugin, which captures the tab key and converts it to a number of non-breaking spaces. This wasn’t directly suitable for our needs – I discovered that the number of spaces is not always consistent, and they are sometimes converted to regular (rather than non-breaking) spaces. In addition, it doesn’t act like a tab stop – it inserts four spaces wherever you are on the line, which didn’t match the client’s requirement.

I adapted the plugin to insert a <span> into the text, using variable padding to ensure it was the right width. This worked reasonably well, after a not insignificant amount of head scratching trying to work around issues with spacing and space handling. Unfortunately, we struck usability problems when trying to backspace over the tab. The ideal situation would be that a single backspace would remove the entire tab, leaving the user at the start of the line (or the point before they hit the tab key). In fact, a single backspace would leave the user inside the span – two backspaces were required to visibly remove the tab from the editor, and the user could not tell that they were inside the span either. You couldn’t reliably select the “tab” with the mouse either. In addition, Firefox started to behave oddly at this point, putting the cursor in unexpected positions.

My final solution was ugly but workable. We switched to using a monospace font in the rich text editor and, after discussion with the client, started using a variable number of arrow characters to represent the tabs (we actually used , or a closing single quote, if you are reading and writing in German). This made life immediately simpler – dropping the proportional font meant that we didn’t have to worry about getting the width right, just the number of characters to insert. It does mean that in order to remove the tab, the user has to backspace over up to four characters, but the characters are clearly visible: you don’t find yourself inside a span that can’t be seen without viewing the underlying HTML.

While I’m sure this isn’t a unique problem, I couldn’t find anyone else that had been trying to do something similar. I am also not sure whether our choice of rich text editor affected how tricky this problem turned out to be. If anybody reading has suggestions of better approaches to this, we’d be interested to hear from them.

The post The trouble with tabbing: editing rich text on the Web appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2013/08/08/the-trouble-with-tabbing-editing-rich-text-on-the-web/feed/ 0
Search Meetups return with news of two search books http://www.flax.co.uk/blog/2013/02/12/search-meetups-return-with-news-of-two-search-books/ http://www.flax.co.uk/blog/2013/02/12/search-meetups-return-with-news-of-two-search-books/#respond Tue, 12 Feb 2013 11:13:56 +0000 http://www.flax.co.uk/blog/?p=942 Last night the London Search Meetup returned after a year’s absence: it’s great to see it back. The venue was at St Pancras with the room overlooking Eurostar trains and statues, inside the beautifully restored station building. The speakers were … More

The post Search Meetups return with news of two search books appeared first on Flax.

]]>
Last night the London Search Meetup returned after a year’s absence: it’s great to see it back. The venue was at St Pancras with the room overlooking Eurostar trains and statues, inside the beautifully restored station building.

The speakers were both there to talk about their recent books: prolific author Martin White of Intranet Focus has written a book on Enterprise Search with the strapline ‘Enhancing Business Performance’. Martin has decades of experience in the sector, an enviable collection of war stories from inside the enterprise and was as ever an engaging speaker.

Next up were Tony Russell-Rose and Tyler Tate to talk about their new book which focuses on the user experience of search. ‘Designing the Search Experience’ promises to be a rich resource on how, why and where people use search and how this impacts the design of user interfaces.

The evening ended with some lively discussion and a promise that after its long absence this Meetup will now be happening on a more regular basis. We’re also running our own Cambridge search meetup – the next event is on February 21st where we’ll be hearing about web crawling and scraping. Another date for your diary is the Enterprise Search Europe conference on May 15th & 16th this year – the programme has just been published and features speakers from Ernst & Young and Oracle. I’ll also be running a workshop the day before the conference on Getting the Best from Open Source Search.

The post Search Meetups return with news of two search books appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2013/02/12/search-meetups-return-with-news-of-two-search-books/feed/ 0
Search Solutions 2012 – a review http://www.flax.co.uk/blog/2012/12/04/search-solutions-2012-a-review/ http://www.flax.co.uk/blog/2012/12/04/search-solutions-2012-a-review/#comments Tue, 04 Dec 2012 14:17:56 +0000 http://www.flax.co.uk/blog/?p=915 Last Thursday I spent the day at the British Computer Society’s Search Solutions event, run by their Information Retrieval Specialist Group. Unlike some events I could mention, this isn’t a forum for sales pitches, over-inflated claims or business speak – … More

The post Search Solutions 2012 – a review appeared first on Flax.

]]>
Last Thursday I spent the day at the British Computer Society’s Search Solutions event, run by their Information Retrieval Specialist Group. Unlike some events I could mention, this isn’t a forum for sales pitches, over-inflated claims or business speak – just some great presentations on all aspects of search and some lively networking or discussion. It’s one of my favourite events of the year.

Milad Shokouhi of Microsoft Research started us off showing us how he’s worked on query trend analysis for Bing: he showed us how some queries are regular, some spike and go and some spike and remain – and how these trends can be modelled in various ways. Alex Jaimes of Yahoo! Barcelona talked about a human centred approach to search – I agree with his assertion that “we’re great at adapting to bad technology” – still sadly true for many search interfaces! Some of the demographic approaches have led to projects such as Yahoo! Clues which is worth a look.

Martin White of Intranet Focus was up next with some analysis of recent surveys and research, leading to some rather doom-laden conclusions about just how few companies are investing sufficiently in search. Again some great quotes: “Information Architects think they’ve failed if users still need a search engine” and a plea for search vendors (and open source exponents) to come clean about what search can and can’t do. Emma Bayne of the National Archives was next with a description of their new Discovery catalogue, a similar presentation to the one she gave earlier in the year at Enterprise Search Europe. Kristian Norling of Findwise finished with a laconic and amusing treatment of the results from Findwise’s survey on enterprise search – indicating that those who produce systems that users are “very satisfied” usually do the same things, such as regular user testing and employing a specialist internal search team.

Stella Dextre Clark talked next about a new ISO standard for thesauri, taxonomies and their interopability with other vocabularies – some great points on the need for thesauri to break down language barriers, help retrieval in enterprise situations where techniques such as PageRank aren’t so useful and to access data from decades past. Leo Sauermann was next with what was my personal favourite presentation of the day, about a project to develop a truly semantic search engine both for KDE Linux and currently the Cloud. This system, if more widely adopted, promises a true revolution in search, as relationships between data objects are stored directly by the underlying operating system. I spoke next about our Clade taxonomy/classification system and our Flax Media Monitor, which I hope was interesting.

Nicholas Kemp of DSTL was up next exploring how they research new technologies and approaches which might be of interest to the defence sector, followed by Richard Morgan of Funnelback on how to empower intranet searchers with ways to improve relevance. He showed how Funnelback’s own intranet allows users to adjust multiple factors that affect relevance – of course it’s debatable how these may be best applied to customer situations.

The day ended with a ‘fishbowl’ discussion during which a major topic was of course the Autonomy/HP debacle – there seemed to be a collective sense of relief that perhaps now marketing and hype wouldn’t dominate the search market as much as it had previously…but perhaps also that’s just my wishful thinking! All in all this was as ever an interesting and fun day and my thanks to the IRSG organisers for inviting me to speak. Most of the presentations should be available online soon.

The post Search Solutions 2012 – a review appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2012/12/04/search-solutions-2012-a-review/feed/ 1
Cambridge Search Meetup – Flow in Search UX and TrueKnowledge http://www.flax.co.uk/blog/2011/06/23/cambridge-search-meetup-flow-in-search-ux-and-trueknowledge/ http://www.flax.co.uk/blog/2011/06/23/cambridge-search-meetup-flow-in-search-ux-and-trueknowledge/#respond Thu, 23 Jun 2011 08:52:39 +0000 http://www.flax.co.uk/blog/?p=591 The Cambridge Enterprise Search Meetup last night featured Francis Rowland of the European Bioinformatics Institute and Rob Stacey of TrueKnowledge, in a newly refurbished venue. Thanks to all those who came and it was good to meet some new faces. … More

The post Cambridge Search Meetup – Flow in Search UX and TrueKnowledge appeared first on Flax.

]]>
The Cambridge Enterprise Search Meetup last night featured Francis Rowland of the European Bioinformatics Institute and Rob Stacey of TrueKnowledge, in a newly refurbished venue. Thanks to all those who came and it was good to meet some new faces.

Francis talked about how search user interfaces should try not to restrict the user’s ‘flow’ of activity, as search is after all only a means to and end. Among the wealth of material he mentioned was the Endeca User Interface Design Pattern Library and what is sure to be a very useful upcoming book, Search Analytics for Your Site.

Rob told us about how TrueKnowledge provides a semantic question answering system – trying to understand the goal(s) of someone asking the system a question such as “is Madonna single?”. He also mentioned how this kind of technology might be applied to an enterprise environment, for example to answer questions like “has the invoice for last Thursday’s job been paid?”. Rob’s talk sparked off a very active Q&A session, with the audience raising issues such as how TrueKnowledge’s method might be applied to languages other than English and how to model the trustworthiness of their sources, which include Wikipedia.

Francis’ slides are now online – with some great sketchnotes of Rob’s talk as well! Thanks to both our speakers.

The post Cambridge Search Meetup – Flow in Search UX and TrueKnowledge appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2011/06/23/cambridge-search-meetup-flow-in-search-ux-and-trueknowledge/feed/ 0
ECIR 2011 Industry day – part 2 of 2 http://www.flax.co.uk/blog/2011/04/28/ecir-2011-industry-day-part-2-of-2/ http://www.flax.co.uk/blog/2011/04/28/ecir-2011-industry-day-part-2-of-2/#respond Thu, 28 Apr 2011 12:14:39 +0000 http://www.flax.co.uk/blog/?p=560 Here’s the second writeup. We started after lunch with a talk from Flavio Junqueira of Yahoo! on web search engine cacheing. He talked both about the various things that can be cached (query results, term lists and document data) and … More

The post ECIR 2011 Industry day – part 2 of 2 appeared first on Flax.

]]>
Here’s the second writeup.

We started after lunch with a talk from Flavio Junqueira of Yahoo! on web search engine cacheing. He talked both about the various things that can be cached (query results, term lists and document data) and the pros and cons of dynamic versus static caching. His work has focused on the former, with a decoupled approach – i.e. the cache doesn’t automatically know what’s changed in the index. The approach is to give data in the cache a ‘time to live’ (TTL), after which it is refreshed – an acceptable approach as search engines don’t have a ‘perfect’ view of the web at any one point in time. As he mentioned, this method is less useful for ‘real-time’ data such as news.

Francesco Calabrese followed, talking about his work in the IBM Smarter Cities Technology Centre in Dublin itself. Using data from mobile devices his group has looked at ‘digital footprints’ and how they might be used to better understand such things as public transport provision. An interesting effect they have noticed is that they can predict the type of an event (say a football match) from the points of origin of the attendees. This talk wasn’t really about search, although the data gathered would be useful in search applications with geolocation features.

Gery Ducatel from BT was next, with a description of a search application for their mobile workforce, allowing searches over a job database as well as reference and health & safety information. This had some interesting aspects, not least with the user interface – you can’t type long strings wearing heavy gloves while halfway up a telegraph pole! The system uses various NLP features such as a part-of-speech tagger to break down a query and provide easy-to-use dropdown options for potential results. The user interface, while not the prettiest I’ve seen, also made good use of geolocation to show where other engineers had carried out nearby jobs.

I followed with my talk on Unexpected Search, which I’ll detail in a future blog post. We then moved onto a panel discussion on the IBM Watson project – suffice it to say that although I’ve been asked about this a lot in the last few months, it seems to me that this was a great PR coup for IBM rather than a huge leap forward in the technology (which by the way includes the open source Lucene search engine).

Thanks again to Udo and Tony for organising the day, and for inviting me to speak – there was a fascinating range of speakers and topics, and it was great to catch up with others working in the industry.

The post ECIR 2011 Industry day – part 2 of 2 appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2011/04/28/ecir-2011-industry-day-part-2-of-2/feed/ 0
Enterprise Search London – Financial applications, SBA book and Solr searching 120m documents http://www.flax.co.uk/blog/2011/02/10/enterprise-search-london-financial-applications-sba-book-and-solr-searching-120m-documents/ http://www.flax.co.uk/blog/2011/02/10/enterprise-search-london-financial-applications-sba-book-and-solr-searching-120m-documents/#respond Thu, 10 Feb 2011 10:49:15 +0000 http://www.flax.co.uk/blog/?p=507 Another excellent evening as part of the Enterprise Search London Meetup series; very busy as usual. Amir Dotan started us off with details of his work in designing user interfaces for the financial services sector, describing some of the challenges … More

The post Enterprise Search London – Financial applications, SBA book and Solr searching 120m documents appeared first on Flax.

]]>
Another excellent evening as part of the Enterprise Search London Meetup series; very busy as usual.

Amir Dotan started us off with details of his work in designing user interfaces for the financial services sector, describing some of the challenges involved in designing for a high-pressure and highly regulated environment. Although he didn’t talk about search specifically we heard a lot about how to design useful interfaces. Two quotes stood out: “The right user interface can help make billions”, and as a way to get feedback “find someone nice in the business and never let them go”.

Gregory Grefenstette of Exalead was next, talking about his new book on Search Based Applications. He explained how SBAs have advantages over traditional databases in the three areas of agility, usability and performance and went on to show some examples, before an unfortunate combination of a broken slide deck and a failing laptop battery brought him to a halt: in retrospect a great advertisement for a physical book over a computer!

Upayavira of Sourcesense was next with details of a new search built for online news aggregator Moreover. This dealt with scaling Lucene/Solr to cope with indexing 2 million new documents a day, for a rolling 2 month index. He showed how some initial memory and performance problems had been solved with a combination of pre-warming caches, tweaks to the JVM and Java garbage collector and eventually profiling of their custom code. Particularly interesting was how they had developed a system for spinning up a complete copy of the searchable database (for load balancing purposes) on the Amazon EC2 cloud – from a standing start they can allocate servers, install software and copy across searchable indexes in around 40 minutes. This was a great demonstration of the power of the open source model – no more licenses to buy! Search performance over this large collection is pretty good as well, with faceted queries returning in a second or two and unfaceted in half a second.

We also heard from Martin White about an exciting new search related conference to be held in October this year in London in association with Information Today, Inc., and I managed a quick plug for our inaugural Cambridge Enterprise Search Meetup on Wednesday 16th February.

The post Enterprise Search London – Financial applications, SBA book and Solr searching 120m documents appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2011/02/10/enterprise-search-london-financial-applications-sba-book-and-solr-searching-120m-documents/feed/ 0