yahoo – Flax http://www.flax.co.uk The Open Source Search Specialists Thu, 10 Oct 2019 09:03:26 +0000 en-GB hourly 1 https://wordpress.org/?v=4.9.8 Search Solutions 2015 – Is semantic search finally here? http://www.flax.co.uk/blog/2014/12/04/search-solutions-2015-is-semantic-search-finally-here/ http://www.flax.co.uk/blog/2014/12/04/search-solutions-2015-is-semantic-search-finally-here/#respond Thu, 04 Dec 2014 14:07:20 +0000 http://www.flax.co.uk/blog/?p=1325 Last week I attended one of my favourite annual search events, Search Solutions, held at the British Computer Society’s base in Covent Garden. As usual this is a great chance to see what’s new in the linked worlds of web, … More

The post Search Solutions 2015 – Is semantic search finally here? appeared first on Flax.

]]>
Last week I attended one of my favourite annual search events, Search Solutions, held at the British Computer Society’s base in Covent Garden. As usual this is a great chance to see what’s new in the linked worlds of web, intranet and enterprise search and this year there was a focus on semantic search by several of the presenters.

Peter Mika of Yahoo! started us off with a brief history of semantic search including how misplaced expectations have led to a general lack of adoption. However, the large web search companies have made significant progress over the years leading to shared standards for semantically marking of web content and some large collections of knowledge, which allows them to display content for certain queries, e.g. actor’s biographies shown on the right of the usual search results. He suggested the next step is to better understand queries as most of the work to date has been on understanding documents. Christopher Semturs of Google followed with a description of their efforts in this space, Google’s Knowledge Graph containing 40 billion facts about 530 million entities, built in part by converting web pages directly (including how some badly structured websites can contain the most interesting and rare knowledge). He reminded us of the importance of context and showed some great examples of queries that are still hard to answer correctly. Katja Hofmann of Microsoft then described some ways in which search engines might learn directly from user interactions, including some wonderfully named methodologies such as Counterfactual Reasoning and the Contextual Bandit. She also mentioned their continuing work on Learning to Rank with the open source Lerot software.

Next up was our own Tom Mortimer presenting our study comparing the performance of Apache Solr and Elasticsearch – you can see his slides here. While there are few differences Tom has found that Solr can support three times the query rate. Iadh Ounis of the University of Glasgow followed, describing another open source engine, Terrier, which although mainly focused on academic research does now contain some cutting edge features including the aforementioned Learning to Rank and near real-time search.

The next session featured Dan Jackson of UCL describing the challenges of building website search across a complex set of websites and data, a similar talk to one he gave at an earlier event this year. Next was our ex-colleague Richard Boulton describing how the Gov.uk team use metrics to tune their search capability (based on Elasticsearch). Interestingly most of their metric data is drawn from Google Analytics, as a heavy use of caching means they have few useful query logs.

Jussi Karlgren of Gavagai then described how they have built a ‘living lexicon’ of text in several languages, allowing for the representation of the huge volume of new terms that appear on social media every week. They have also worked on multi-dimensional sentiment analysis and visualisations: I’ll be following these developments with interest as they echo some of the work we have done in media monitoring. Richard Ranft of the British Library then showed us some of the ways search is used to access the BL’s collection of 6 million audio tracks including very early wax cylinder recordings – they have so much content it would take you 115 years to listen to it all! The last presentation of the day was by Jochen Leidner of Thomson Reuters who showed some of the R&D projects he has worked on for data including legal content and mining Twitter for trading signals.

After a quick fishbowl discussion and a glass of wine the event ended for me, but I’d like to thank the BCS IRSG for a fascinating day and for inviting us to speak – see you next year!

The post Search Solutions 2015 – Is semantic search finally here? appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2014/12/04/search-solutions-2015-is-semantic-search-finally-here/feed/ 0
Search Solutions 2013, a review http://www.flax.co.uk/blog/2013/11/28/search-solutions-2013-a-review/ http://www.flax.co.uk/blog/2013/11/28/search-solutions-2013-a-review/#respond Thu, 28 Nov 2013 14:25:18 +0000 http://www.flax.co.uk/blog/?p=1053 Yesterday was the always interesting Search Solutions one day conference held by the BCS IRSG in London, a mix of talks on different aspects of search. The first presentation was by Behshad Behzadi of Google on Conversational Search, where he … More

The post Search Solutions 2013, a review appeared first on Flax.

]]>
Yesterday was the always interesting Search Solutions one day conference held by the BCS IRSG in London, a mix of talks on different aspects of search. The first presentation was by Behshad Behzadi of Google on Conversational Search, where he showed a speech-capable search interface that allowed a ‘conversation’ with the search engine – context being preserved – so the query “where are Italian restaurants in Chelsea” followed by “no I prefer Chinese” would correctly return results about Chinese restaurants. The demo was impressive and we can expect to see more of this kind of technology as smartphone adoption rises. Wim Nijmeijer of Coveo followed with details of how their own custom connectors to a multitude of repositories could enable Complex enterprise search delivered in a day. This of course assumes that no complex mapping of fields or schemas from the source to the search engine index is necessary, which I suspect it often is – I’m not alone in being slightly suspicious of the supposed timescale. Nikolaos Nanas from Thessaly in Greece then presented on Adaptive Information Filtering: from theory to practise which I found particularly interesting as it described filtering documents against a user’s interest with the latter modelled by an adaptive, weighted network – he showed the Noowit personalised magazine application as an example. With over 1000 features per user and no language specific requirements this is a powerful idea.

After a short break we continued with a talk by Henning Rode on CV Search at TextKernel. He described a simple yet powerful UI for searching CVs (resumes) with autosuggest and automatic field recognition (type in “Jav” and the system suggests “Java” and knows this is a programming language or skill). He is also working on systems to autogenerate queries from job vacancies using heuristics. We’ve worked in the recruitment space ourselves so it was interesting to hear about their approach, although the technical detail was light. Following Henning was Dermot Frost talking about Information Preservation and Access at the Digital Repository of Ireland and their use of open source technology including Solr and Blacklight to build a search engine with a huge variety of content types, file formats and metadata standards across the items they are trying to digitally preserve. Currently this is a relatively small collection of data but they are planning to scale up over the next few years: this talk reminded me a little of last year‘s by Emma Bayne of the UK’s National Archive.

After lunch we began a session named Understanding the User, beginning with Filip Radlinski of Microsoft Research. He discussed Sensitive Online Search Evaluation (with arXiv.org as a test collection) and how interleaved results is a powerful technique for avoiding bias. Next was Mounia Lalmas of Yahoo! Labs on what makes An Engaging Click (although unfortunately I had to pop out for a short while so I missed most of what I am sure was a fascinating talk!). Mags Hanley was next on Understanding users search intent with examples drawn from her work at TimeOut – the three main lessons being to know the content in context, the time of year and the users’ mental model in context. Interestingly she showed how the most popular facets used differed across TimeOut’s various international sites – in Paris the top facet was perhaps unsurprisingly ‘cuisine’, in London it was ‘date’.

After another short break we continued with Helen Lippell‘s talk on Enterprise Search – how to triage problems quickly and prescribe the right medicine – her five main points being analyze user needs, fix broken content, focus on quick wins in the search UI, make sure you are able to tweak the search engine itself in a documentable fashion and remember the importance of people and process. Her last point ‘if search is a political football, get an outsider perspective’ is of course something we would agree with! Next was Peter Wallqvist of Ravn Systems on Universal Search and Social Networking where he focussed on how to allow users to interact directly with enterprise content items by tagging, sharing and commenting – so as to derive a ‘knowledge graph’ showing how people are connected by their relationships to content. We’ve built systems in the past that have allowed users to tag items in the search result screen itself so we can agree on the value of this approach. Our last presenter with Kristian Norling of Findwise on Reflections on the 2013 Enterprise Search Survey – some more positive news this year, with budgets for search increasing and 79% of respondents indicating that finding information is of high importance for their organisation. Although most respondents still have less than one full time staff member working on search, Kristian made the very good point that recruiting just one extra person would thus give them a competitive advantage. Perhaps as he says we’ve now reached a tipping point for the adoption of properly funded enterprise search regarded as an ongoing journey rather than a ‘fire and forget’ project.

The day finished with a ‘fishbowl’ session, during which there was a lot of discussion of how to foster links between the academic IR community and industry, then the BCS IRSG AGM and finally a drinks reception – thanks to all the organisers for a very interesting and enlightening day and we look forward to next year!

The post Search Solutions 2013, a review appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2013/11/28/search-solutions-2013-a-review/feed/ 0
Search Solutions 2012 – a review http://www.flax.co.uk/blog/2012/12/04/search-solutions-2012-a-review/ http://www.flax.co.uk/blog/2012/12/04/search-solutions-2012-a-review/#comments Tue, 04 Dec 2012 14:17:56 +0000 http://www.flax.co.uk/blog/?p=915 Last Thursday I spent the day at the British Computer Society’s Search Solutions event, run by their Information Retrieval Specialist Group. Unlike some events I could mention, this isn’t a forum for sales pitches, over-inflated claims or business speak – … More

The post Search Solutions 2012 – a review appeared first on Flax.

]]>
Last Thursday I spent the day at the British Computer Society’s Search Solutions event, run by their Information Retrieval Specialist Group. Unlike some events I could mention, this isn’t a forum for sales pitches, over-inflated claims or business speak – just some great presentations on all aspects of search and some lively networking or discussion. It’s one of my favourite events of the year.

Milad Shokouhi of Microsoft Research started us off showing us how he’s worked on query trend analysis for Bing: he showed us how some queries are regular, some spike and go and some spike and remain – and how these trends can be modelled in various ways. Alex Jaimes of Yahoo! Barcelona talked about a human centred approach to search – I agree with his assertion that “we’re great at adapting to bad technology” – still sadly true for many search interfaces! Some of the demographic approaches have led to projects such as Yahoo! Clues which is worth a look.

Martin White of Intranet Focus was up next with some analysis of recent surveys and research, leading to some rather doom-laden conclusions about just how few companies are investing sufficiently in search. Again some great quotes: “Information Architects think they’ve failed if users still need a search engine” and a plea for search vendors (and open source exponents) to come clean about what search can and can’t do. Emma Bayne of the National Archives was next with a description of their new Discovery catalogue, a similar presentation to the one she gave earlier in the year at Enterprise Search Europe. Kristian Norling of Findwise finished with a laconic and amusing treatment of the results from Findwise’s survey on enterprise search – indicating that those who produce systems that users are “very satisfied” usually do the same things, such as regular user testing and employing a specialist internal search team.

Stella Dextre Clark talked next about a new ISO standard for thesauri, taxonomies and their interopability with other vocabularies – some great points on the need for thesauri to break down language barriers, help retrieval in enterprise situations where techniques such as PageRank aren’t so useful and to access data from decades past. Leo Sauermann was next with what was my personal favourite presentation of the day, about a project to develop a truly semantic search engine both for KDE Linux and currently the Cloud. This system, if more widely adopted, promises a true revolution in search, as relationships between data objects are stored directly by the underlying operating system. I spoke next about our Clade taxonomy/classification system and our Flax Media Monitor, which I hope was interesting.

Nicholas Kemp of DSTL was up next exploring how they research new technologies and approaches which might be of interest to the defence sector, followed by Richard Morgan of Funnelback on how to empower intranet searchers with ways to improve relevance. He showed how Funnelback’s own intranet allows users to adjust multiple factors that affect relevance – of course it’s debatable how these may be best applied to customer situations.

The day ended with a ‘fishbowl’ discussion during which a major topic was of course the Autonomy/HP debacle – there seemed to be a collective sense of relief that perhaps now marketing and hype wouldn’t dominate the search market as much as it had previously…but perhaps also that’s just my wishful thinking! All in all this was as ever an interesting and fun day and my thanks to the IRSG organisers for inviting me to speak. Most of the presentations should be available online soon.

The post Search Solutions 2012 – a review appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2012/12/04/search-solutions-2012-a-review/feed/ 1
Search Solutions 2011 review http://www.flax.co.uk/blog/2011/11/17/search-solutions-2011-review/ http://www.flax.co.uk/blog/2011/11/17/search-solutions-2011-review/#comments Thu, 17 Nov 2011 16:29:38 +0000 http://www.flax.co.uk/blog/?p=663 I spent yesterday at the British Computer Society Information Retrieval Specialist Group’s annual Search Solutions conference, which brings together theoreticians and practitioners to discuss the latest advances in search. The day started with a talk by John Tait on the … More

The post Search Solutions 2011 review appeared first on Flax.

]]>
I spent yesterday at the British Computer Society Information Retrieval Specialist Group’s annual Search Solutions conference, which brings together theoreticians and practitioners to discuss the latest advances in search.

The day started with a talk by John Tait on the challenges of patent search where different units are concerned – where for example a search for a plastic with a melting point of 200°C wouldn’t find a patent that uses °F or Kelvin. John presented a solution from max.recall, a plugin for Apache Solr that promises to solve this issue. We then heard from Lewis Crawford of the UK Web Archive on their very large index of 240m archived webpages – some great features were shown including a postcode-based browser. The system is based on Apache Solr and they are also using ‘big data’ projects such as Apache Hadoop – which by the sound of it they’re going to need as they’re expecting to be indexing a lot more websites in the future, up to 4 or 5 million. The third talk in this segment came from Toby Mostyn of Polecat on their MeaningMine social media monitoring system, again built on Solr (a theme was beginning to emerge!). MeaningMine implements an iterative query method, using a form of relevance feedback to help users contribute more useful query information.

Before lunch we heard from Ricardo Baeza-Yates of Yahoo! on moving beyond the ‘ten blue links’ model of web search, with some fascinating ideas around how we should consider a Web of objects rather than web pages. Gabriella Kazai of Microsoft Research followed, talking about how best to gather high-quality relevance judgements for testing search algorithms, using crowdsourcing systems such as Amazon’s Mechanical Turk. Some good insights here as to how a high-quality task description can attract high-quality workers.

After lunch we heard from Marianne Sweeney with a refreshingly candid treatment of how best to tune enterprise search products that very rarely live up to expectations – I liked one of her main points that “the product is never what was used in the demo”. Matt Taylor from Funnelback followed with a brief overview of his company’s technology and some case studies.

The last section of the day featured Iain Fletcher of Search Technologies on the value of metadata and on their interesting new pipeline framework, Aspire. (As an aside, Iain has also joined the Pipelines meetup group I set up recently). Next up was Jared McGinnis of the Press Association on their work on Semantic News – it was good to see an openly available news ontology as a result. Ian Kegel of British Telecom came next with a talk about TV program recommendation systems, and we finished with Kristian Norling‘s talk on a healthcare information system that he worked on before joining Findwise. We ended with a brief Fishbowl discussion which asked amongst other things what the main themes of the day had been – my own contribution being “everyone’s using Solr!”.

It’s rare to find quite so many search experts in one room, and the quality of discussions outside the talks was as high as the quality of the talks themselves – congratulations are due to the organisers for putting together such an interesting programme.

The post Search Solutions 2011 review appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2011/11/17/search-solutions-2011-review/feed/ 1
ECIR 2011 Industry day – part 2 of 2 http://www.flax.co.uk/blog/2011/04/28/ecir-2011-industry-day-part-2-of-2/ http://www.flax.co.uk/blog/2011/04/28/ecir-2011-industry-day-part-2-of-2/#respond Thu, 28 Apr 2011 12:14:39 +0000 http://www.flax.co.uk/blog/?p=560 Here’s the second writeup. We started after lunch with a talk from Flavio Junqueira of Yahoo! on web search engine cacheing. He talked both about the various things that can be cached (query results, term lists and document data) and … More

The post ECIR 2011 Industry day – part 2 of 2 appeared first on Flax.

]]>
Here’s the second writeup.

We started after lunch with a talk from Flavio Junqueira of Yahoo! on web search engine cacheing. He talked both about the various things that can be cached (query results, term lists and document data) and the pros and cons of dynamic versus static caching. His work has focused on the former, with a decoupled approach – i.e. the cache doesn’t automatically know what’s changed in the index. The approach is to give data in the cache a ‘time to live’ (TTL), after which it is refreshed – an acceptable approach as search engines don’t have a ‘perfect’ view of the web at any one point in time. As he mentioned, this method is less useful for ‘real-time’ data such as news.

Francesco Calabrese followed, talking about his work in the IBM Smarter Cities Technology Centre in Dublin itself. Using data from mobile devices his group has looked at ‘digital footprints’ and how they might be used to better understand such things as public transport provision. An interesting effect they have noticed is that they can predict the type of an event (say a football match) from the points of origin of the attendees. This talk wasn’t really about search, although the data gathered would be useful in search applications with geolocation features.

Gery Ducatel from BT was next, with a description of a search application for their mobile workforce, allowing searches over a job database as well as reference and health & safety information. This had some interesting aspects, not least with the user interface – you can’t type long strings wearing heavy gloves while halfway up a telegraph pole! The system uses various NLP features such as a part-of-speech tagger to break down a query and provide easy-to-use dropdown options for potential results. The user interface, while not the prettiest I’ve seen, also made good use of geolocation to show where other engineers had carried out nearby jobs.

I followed with my talk on Unexpected Search, which I’ll detail in a future blog post. We then moved onto a panel discussion on the IBM Watson project – suffice it to say that although I’ve been asked about this a lot in the last few months, it seems to me that this was a great PR coup for IBM rather than a huge leap forward in the technology (which by the way includes the open source Lucene search engine).

Thanks again to Udo and Tony for organising the day, and for inviting me to speak – there was a fascinating range of speakers and topics, and it was great to catch up with others working in the industry.

The post ECIR 2011 Industry day – part 2 of 2 appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2011/04/28/ecir-2011-industry-day-part-2-of-2/feed/ 0
Speaking to DZone and at ECIR http://www.flax.co.uk/blog/2011/03/10/speaking-to-dzone-and-at-ecir/ http://www.flax.co.uk/blog/2011/03/10/speaking-to-dzone-and-at-ecir/#respond Thu, 10 Mar 2011 14:42:34 +0000 http://www.flax.co.uk/blog/?p=522 I was recently interviewed by Mitchell Pronschinske for the DZone website on the subjects of open source search: you can download the podcast here. It’s part of a large resource they have on open source search, well worth a browse. … More

The post Speaking to DZone and at ECIR appeared first on Flax.

]]>
I was recently interviewed by Mitchell Pronschinske for the DZone website on the subjects of open source search: you can download the podcast here. It’s part of a large resource they have on open source search, well worth a browse. We discussed how open source enterprise search has reached parity with closed source solutions, the various options available and what future developments might be.

You can also hear me talk at the European Conference on Information Retrieval (ECIR) in Dublin, as part of Industry Day on Thursday 21st April alongside speakers from Microsoft, Google, Yahoo and IBM amongst others. Do get in touch if you’re attending and would like to meet up for a chat about search over a pint of Guinness!

The post Speaking to DZone and at ECIR appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2011/03/10/speaking-to-dzone-and-at-ecir/feed/ 0