12 June 2007

More on neural networks (and how to win at 20 questions).

As I was thinking about interesting applications for artificial neural networks, I happened upon 20Q.net, a surprisingly addictive ANN-based 20 Questions Game. I tried out the Classic, Music, and Harry Potter games; I was able to stump the Music game about half the time, but only fooled the others very occasionally. This confirms an intuition about neural nets -- they learn best when their universe is small, when they get lots of human input, or both. The classic game is very general, but also almost 20 years old (works well due to plenty of human feedback); the Harry Potter game is relatively new, played frequently, and only deals with objects within the HP universe (works even better, due to the limited universe and the sheer number of HP enthusiasts out there). The music game is fairly new and very general, and doesn't work well at all. It asks silly questions that aren't appropriate based on earlier answers, and fails to ask really obvious ones. You might expect as much from a computer, of course, except that the other two games were a lot like playing 20 Questions with a real person who happens to be way smarter than you. These two skilled games ask questions that are unexpected, but make sense once you ponder them. And they're eerily good at guessing the answers.

So what? Well, this is interesting to me because I've been thinking lately about search engines. The only commercially available search engine I can find using neural net technology is MSN Search, introduced in 2005. It uses a centralized 'supervised learning' approach -- that is, somebody at MSN (probably lots of somebodies) is in charge of telling the engine which search results are the most on-target. Since the 'universe' here is big (the whole internet), it seems like a decentralized input approach would be in order: have users evaluate the search results, which is presumably what Google's doing with its new facial search module (see recent post and comments). As I've mentioned before, this is what Google's good at -- finding ways to make use of the work web users are already doing anyway.

Meanwhile, and probably next for websearch technology, there's semantic websearch: getting a computer to understand what words mean, so that it can find relevant results more creatively. The cheater's solution to this is the Semantic Web approach, tagging everything on the internet to explain what it is and what it's about in a way that makes sense to search algorithms. The real goal, though, is to teach a search algorithm what search terms mean, or at least what they might mean, so that it can find related content that doesn't use the same keywords/tags. Earlier this year Read/WriteWeb asked, 'Is Google a Semantic Search Engine?' and concluded that G. has at least introduced rudimentary semantic analysis in the related searches it suggests at the bottom of the page. This is based on statistical analysis of word context, though, not on true semantic analysis. Enter neural nets, which are the only way (so far) to give a computer the complexity necessary to 'get' language. More on this to come.


mtliberty said...

Would that entail tagging each word on a page with a weight? How else could a tagging system be used to know not only "the word," but the relative importance of each word.

For example, in the sentence: "I'm so hungry." which is the most important word?