Search Solutions 2010 – a brief review

I spent yesterday at Search Solutions 2010, hosted by the British Computer Society. They’d been kind enough to ask me to speak (Update: my slides are available here, the rest are available at the event website above), but there were plenty of other people to listen to as well. There’s a great blow-by-blow account from Tyler Tate already, but here are some personal highlights:

Google’s Behshad Behzadi spoke about freshness for web content and how Google’s usual ranking strategy favours older results over new ones – as the new ones don’t have so many links. Vishwa Vinay from Microsoft talked on what to do with click data in enterprise search – he listed lots of papers on the subject, hopefully his slides will be published so we can follow them up. He made the point that any ‘adaptive’ ranking based on click data must still work well out of the box, before any clicks have happened. This section of the event finished with Vivian Lin Dufour of Yahoo!, talking about some ways of guiding searchers from within the UI, with auto-suggest and similar techniques. Apparently the research the Yahoo team are doing on trending has let them spot news stories 12-24 hours before they hit the papers. I wondered afterwards, is this current fad for ‘trendspotting’ turning search engines into just a media channel? I don’t care much about the X-Factor TV show myself, so why should this current trend influence my search results?

Nick Patience started the next session talking about trends in the Enterprise Search market: he acknowledged the rapid rise of open source solutions and talked about how search-based applications will become increasingly important, with a huge market for ‘information governance’ solutions opening up. Chirag Ghandhi of Mphasis, a search integrator, talked about how customers are disillusioned with enterprise search, and how difficult it is to build solutions that cope with data from a range of different sources and in different languages. Dusan Rnic of Endeca stressed the importance of being able to handle the ‘long tail’ of search results – the ones that aren’t the most popular and showed us his favourite website – strangely enough, an Endeca customer.

Greg Lyndahl talked about how Blekko have built an innovative web crawling/indexing framework, which has enabled them to build up a 3 billion page index very efficiently – looking forward to seeing more of this. As he said, what they’re doing isn’t necessarily better than Google, but it’s certainly different. My talk on open source search for news content followed, and then Roberto Cornacchia showed us Spinque’s approach to building search platforms – encapsulating search expert knowledge into logical ‘blocks’ that can be combined by domain experts into the solutions they actually need.

The last session began with Till Kinstler of GBV Common Library Network, a self-described ‘library hacker’, on building a search system using the open source engine Solr over 25 million library records – they’re now aiming for 120 million, taken from 400 different libraries, in source formats going all the way back to tape and paper library cards! We then heard about the Information Retrieval Facility, an open IR research institution – I liked their three principles of ‘open science, open source, open market’. The talks finished with Rob Stacey on True Knowledge’s ways of checking the veracity of facts gathered from the internet.

We then moved on to an open panel – some great themes here including the rise of search as a platform for new applications, what exciting (or scary) things Facebook might bring to the world of search, and how we should all work harder to bring good information retrieval mechanisms to those who cannot currently access them due to poverty, language barriers or disability.

Thanks to the BCS IRSG and in particular to Udo Kruschwitz for a very interesting and enlightening day.

4 thoughts on “Search Solutions 2010 – a brief review

  1. Charlie,
    I just wanted to say that I enjoyed your talk at the Search Solutions event yesterday. For context – I’m Vinay, and I was representing Microsoft, and gave my talk earlier in the day.

    As I mentioned in my talk yesterday, I’m a researcher building the relevance ranking model for Microsoft’s enterprise search products. By my own admission, my knowledge of the wider use of these products in the real world is somewhat limited. It is therefore great for me to come to these events, helps me gather more information that will no doubt be helpful in doing my job better.

    I have two hats on – as an employee of Microsoft, clearly all this ‘open source’ stuff is something I need to combat! At the same time, I’m a researcher, I’m trying to target the challenge that some of the methods we work on now could become part of the standard off-the-shelf toolkits that others can then build on.

    A couple comments:
    1) The spelling of my name in your report above: “Vishwa Vinay”
    2) Maybe pointing to my current web address: http://research.microsoft.com/en-us/people/vvinay/

    Once again, great talk and probably see you around in similar events (or in Cambridge!)

    Vinay

  2. Sorry, a couple other things:
    1) I left a comment here because I couldnt get an email ID, not because I wanted to leave a comment on the blog!
    2) I could provide you a copy of the slides, if it is for personal reference. Please feel free to email me if you’d like them.

  3. Hi Vishwa, thanks for the feedback and corrections, I’ve updated the post. Hope you won’t spend too much time combatting open source – there’s room for everyone to innovate, no matter what the license! It would be great if you could let the BCS themselves use your slides (even edited) – I know most of last year’s presentations are up: mine is at
    http://slidesha.re/a7h7WL
    for now.

  4. Pingback: Ibuildings Blog

Leave a Reply

Your email address will not be published. Required fields are marked *