91Èȱ¬ Search refresh: in-depth view
Hi. I'd like to follow up Matt's post about our new Search pages with a little more explanation of what's going on behind the scenes. This is my first post here - and I'll admit a tendency to witter on about stuff - so please bear with me!
The Search & Navigation team has been responsible for the systems that drive iPlayer's search results page since iPlayer launched, but for the new TV and Radio search feature we wanted to include data about all programmes, not just those that are currently on iPlayer.
Full details of all programmes have been available on the Programmes site for a while, but they've never been searchable before. To enable this new feature, we're supplementing the existing iPlayer data feeds with a new feed from the Programme Information database, PIPs.
PIPs is only available on the 91Èȱ¬'s internal network, so we have to replicate all the information we need for Search to our servers in the public-facing "content network".
We poll the "changes" list on PIPs 24 hours a day using a chunky piece of XSLT 2.0 code and push the data into our "Media" Autonomy server farm.
These data are in a hierarchical structure based on entities including masterbrands, services, brands, series, episodes, versions, broadcasts and on-demands. The episode data we're interested in are spread across all of them, so identifying the official "first broadcast" date and titles for an episode is harder than it first appears! (After ten years in the corporate IT world it's been rather refreshing to work with interesting data for the year I've been here!)
At query time, the search engine looks for both iPlayer and PIPs records in its database, giving precedence to the iPlayer record when it finds a pair corresponding to the same episode, and boosting results for more recent programmes.
This means that you should generally see the iPlayer items at the top of the list, followed by "coming up" programmes, followed by older episodes. We had to be careful with the boosting - we don't want to favour iPlayer and/or recent programmes too much, because they might not actually be relevant to your query, despite being more recent.
We're planning to extend this system in the next few weeks to add in other searchable data such as actor and presenter names and also record details of "top level editorial objects" - brands and standalone series - so they can be featured more prominently on the results page.
This release is the culmination of many other individual projects that have been ongoing for the last several months. Here's a brief overview:
Most obviously, we've given the user interface a spring clean, bringing it into line with the new "Visual Language" used across other 91Èȱ¬ websites.
We've upgraded all the search servers to the latest version of Autonomy IDOL on brand new servers, migrating a lot of separate indexes spread across various server pools into three pools: Core, Media and Collections. I've lost track of the number of servers used to host the various parts of the Search system - it's easily more than 40 - so this upgrade gives us an welcome opportunity to decommission some venerable old servers that have been giving our Operations team nightmares!
We've improved the text encoding support across the site and in the indexing systems - you should no longer see corrupted UTF-8 characters anywhere.
We've implemented date biasing for results in the main results page - that means that recent content should appear higher up in the results.
We've noticed that people often search for 91Èȱ¬ programmes using slightly different spellings to our preferred style, for example "Dr Who" vs the preferred form "Doctor Who". In these cases we search for the preferred style but give the option to search again using the original spelling. This system has previously only been in place on iPlayer search, but we think it's a useful addition across the other scopes.
We have a new Click Through Tracking system. You'll see evidence of that in the links on the first results page presented for each query. It's really useful for us to know which links are being used, so that we can understand how people use the site and improve it over time. Don't worry: we aren't storing any personally-identifiable data.
One last little thing I'd wanted to do for a long time was to change the search results URL from the dated and purely historical /cgi-bin/search/results.pl to simply /search - this proved to be one of those tasks that initially looks like an easy win, but because the old address is used internally for some specially-handled searches including C91Èȱ¬ Find, it turned out to be gnarlier than we expected!
That's all for now. Please do let us know what you think of the search system - we're here to make it better!
Andy Webb is a Senior Software Engineer in Search & Navigation, 91Èȱ¬ Future Media & Technology
Comment number 1.
At 14th Oct 2008, funnybrianS wrote:This comment was removed because the moderators found it broke the house rules. Explain.
Complain about this comment (Comment number 1)
Comment number 2.
At 14th Oct 2008, sjp4 wrote:It is still frustrating that when searching (from a news/sport page) I have to manually select *news and sport" at the top to get some proper results. It's been like this for years, and I was hoping that a search re-work might change it!
I think that every search I have ever done on the 91Èȱ¬ website I have been looking for news/sport content and had to make this change.
I see that you have placed "results from news&sport" half way down the results page - this is potentially a good idea, but the design of the page makes it impossible to pick these results out from the rest (the header for this section could be a LOT clearer / or have some sort of divider).
Complain about this comment (Comment number 2)
Comment number 3.
At 14th Oct 2008, Frankie Roberto wrote:@lightsjp4 Ha! Someone else that agrees with me.
See
Complain about this comment (Comment number 3)
Comment number 4.
At 15th Oct 2008, Briantist wrote:Please please PLEASE can you make a search from within the News section return News results, at least first!
Complain about this comment (Comment number 4)
Comment number 5.
At 16th Oct 2008, DamnyoureyesF wrote:Where have all the user generated content results gone?
Complain about this comment (Comment number 5)
Comment number 6.
At 23rd Oct 2008, Ros wrote:You know, one of the things that most irritates me about the 91Èȱ¬ website is the erratic nature of the search facility. It doesn't seem to me too much to ask that when I search by name for a particular programme, the first result should be a link to the page associated with that programme. Yet what always seems to come first will be a series of old news posts or press releases with perhaps one of the words I searched for somewhere in the text. Sometimes the programme page doesn't appear at all.
And as for searching for programmes on the iPlayer, I'm tempted to say don't get me started, but I already am. I loved the old version of the iPlayer when it was only radio. You could see, at a glance, within the iPlayer itself, all the available programmes on a particular channel, of a particular kind, or even of a particular kind and on a particular channel. You could see links to all available episodes for each programme. So that if, for example, you wanted to listen to all five episodes of Book at Bedtime in one go, you could do it easily.
Now what you get are a handful of 'More Like This' options and if that's not what you want you have to go back and trawl through the ridiculously complicated website. Huh.
Would it really be too much to return to the old system? Especially for radio. I wouldn't much care if the radio and TV players were wholly separate actually. You don't need a big blank screen for the radio, you know.
Complain about this comment (Comment number 6)
Comment number 7.
At 9th Sep 2009, felicioo wrote:Thank you.. sesli sohbet
Complain about this comment (Comment number 7)