91Èȱ¬

« Previous | Main | Next »

Rushes Sequences - Terry Winograd interview - USA (Video)

Post categories: ,Ìý,Ìý,Ìý,Ìý,Ìý

Dan Biddle Dan Biddle | 13:47 UK time, Thursday, 26 November 2009

is Professor of Computer Science at , USA. He specialises in .ÌýHe met with the programme three team to discuss the way in which search engines work, determine page rank and deliver results to our queries online.

These rushes sequences are part ofÌýour promise to release contentÌýfrom most of our interviews and some general footage, all underÌýa permissive licence for you to embed, or download a non-branded version and re-edit.

In order to see this content you need to have both Javascript enabled and Flash installed. Visit 91Èȱ¬ Webwise for full instructions. If you're reading via RSS, you'll need to visit the blog to access this content.



-------------------------------------------------

Transcript:
(Please note that this transcript is the 'raw data' text we receive from a transcription company. It is a tool commonly used in production to facilitate editing and review the content. We publish it for users in that same spirit, rather than it standing as a 'perfect' representation of the content.)

Alexi ÌýÌý ÌýÌýTerry what was the idea behind the research, this notion of page rank?

Terry ÌýÌý ÌýÌýThey started doing the research in an era when people had just begun to do search engines on the web. ÌýThe web started off, erm, really the idea, there was a bunch of interesting stuff and you browsed, you surfed. ÌýYou went from page to page saw what was there and that was fun. ÌýErm, and then people realised that there was enough interesting and serious stuff, they might want to actually go somewhere, where they could find something they wanted. ÌýSo a number of people at different places erm, created what were called search engines. ÌýErm and the basic idea was that you create an index that let you find where things are in the web. ÌýSo if you have here, and this is sort of a sketch of what it might be, web pages, each of these boxes is a page, a, b, c, and d. ÌýEach one has certain words in it, television, computer, circuit, whatever it is. ÌýAnd each one can have links, where the links point to another page. ÌýSo, this page on computers and net's may point to this one for televisions and computers and so on. ÌýNow, what they realised, this is before Google, with the people doing the original 'spiders' they were called on the web. ÌýWhat the spiders could do, is they could give them the address, give the computer the address of this page. ÌýThe computer could make a list of all the words that were on that page and also, find this page, cause there was a link. ÌýThen it would go to this page, make a list of all the words on that page and then it could follow the links there. ÌýAnd computers had gotten fast enough and powerful enough and the web was small enough, that you could actually build a complete index. ÌýSo you'd end up with something, think of the index in the back of a book, so the word computer appears in pages a, b, and d, the word television appears on this page and so on. ÌýSo I went to AltaVista let's say, which was one of these early search engines and I typed in computer, it would look in the index it had made and it would give me a search, a list of results that said, a, b, d, and so on. And this made it possible to go find something on the web, instead of just browsing around and seeing where you got to.

Alexi ÌýÌý ÌýÌýBut the problem of course is that, if somebody said computer a thousand times, because that was the key word that was being searched, it would push the result up and it wouldn't necessarily be the most

Terry ÌýÌý ÌýÌýExactly, so they have to decide, if there are three results, it's not problem, but if there's a hundred results or a thousand results, which ones do you show? ÌýAnd how do you know that a, is more interesting than d, or be is more interesting than d? ÌýSo the question of what was interesting, what was irrelevant, wasn't addressed by having just a regular index like this. So, the problem really, here's where Google, the founders of Google came in, Serge and Larry decided, that they could do a better job of, finding the interestingness, the relevance, what makes a page something you want to see, other than just that it happens to have the words that you search for.

Alexi ÌýÌý ÌýAnd how did they go about identifying interestingness, because that's a very subjective idea, isn't it?

Terry Ìý So interestingness is of course subjective, and there is no, what plays things like Yahoo did, is, had human beings go through and say, here's an interesting page, here's an interesting page. ÌýThat was the, the people, Yahoo was the most famous now, but there were a lot of people in that era, who would go through and check out pages. ÌýAnd again that worked when the web was very small.

Alexi ÌýÌý Ìý ÌýExactly that would not scale

Terry Ìý And as the web gets bigger you can't have higher people to go out and look at all the pages. ÌýSo the question is, how do you get people who you don't hire, to in some sense give you judgements on which pages are interesting. And they had a very interesting sort of metaphor for this, which is, imagine a crowd of people all surfing the internet. ÌýSo you take millions of people, start them out all over the internet, and they get to a page and they'll follow a link and from there maybe they'll follow another link. ÌýNow if you could actually get millions of people and all the paths they take, you would see that traffic would end up concentrating on certain places. ÌýA lot of people would end up here on this page and only a few people went on this page. ÌýThen when you've got around to giving your search results, you would give the ones that got a lot of this virtual traffic. ÌýNow this is not actual people going, cause you don't have millions of people, you don't have data on that. ÌýBut you can imagine, where would they go.

Alexi ÌýÌýÌýSo in, so if we kind of take this outside of the web, this would be like places in a City, that have a lot of people driving through it, for example, it's a particular junction, it's an important building or something like that. ÌýThat's what these websites, that's what the search algorithm identified?

Terry ÌýÌýÌýThat's what would decided what's the most relevant, what's the most interesting. ÌýSo, there's no, there is no simple way to actually get that data. ÌýBecause the people who know where other people went on the web are only the service providers and they don't give that information. ÌýBut what they realised is, if they used links, they could get an approximation of how interesting pages were. So they built a second index, which, not only kept track of what words were on each page, but were, it was linked from, so, you might here say that page b, has a link coming in from a, and a link coming in from x. Ìý
So they actually had information that gave them the full link structure of the web, where does every link go from and to. ÌýThen they could take this and they applied a mathematical algorithm, it's called the page rank algorithm. ÌýWhich was intended to basically simulate in some sense, the result of what would happen if you had an infinite number of monkeys. ÌýIf you put thousands, millions of millions of people on the web and let them just start browsing. ÌýAnd the result that they can get out of running this algorithm, which of course didn't require millions and billions of things going on, erm, was a good approximation that page b, lets say is the one that would get the most traffic of a, b, and d. ÌýSo then when you search for computer, it brings b to the top of your listing.

Alexi ÌýÌý ÌýÌýSo if a page had a lot of people going to it or referencing it, then that would increase its interestingness, it would increase its reputation?

Terry Ìý It's a little bit like in academics, were you have citations. ÌýSo I write an academic paper and I say, see so and so's paper from such and such year. ÌýThat indicates that, that's an interesting paper. ÌýAnd it's sort of the same thing here, if you have lots of links pointing to you, that indicates that a lot of people have decided you're interesting enough to put in a link pointing to you. ÌýSo that's really the basis of the algorithm.

Comments

  • No comments to display yet.
Ìý

91Èȱ¬ iD

91Èȱ¬ navigation

91Èȱ¬ © 2014 The 91Èȱ¬ is not responsible for the content of external sites. Read more.

This page is best viewed in an up-to-date web browser with style sheets (CSS) enabled. While you will be able to view the content of this page in your current browser, you will not be able to get the full visual experience. Please consider upgrading your browser software or enabling style sheets (CSS) if you are able to do so.