January 11, 2008

Word Sense Disambiguation !!!!

I was reading about this cool topic which is very popular with dialogue writers of B-Grade Cinema but seems to be one of the major hurdles in the field of information retrieval.

Simply put Word Sense Disambiguation is process of identifying which sense of a word is used in a sentence.

Just to remind, you must have enjoyed WSD in the double meaning dialogues of Dada Konde, because for us it very easy to understand the context of a word, but to develop an algorithm to replicate this human ability is a nightmare for programmers. For eample, when we say “Sachin’s cricket bat”, it’s easier for us to understand that it is related to Sachin Tendular’s bat which he uses for playing cricket, but for a search engine, a bat can be a mammal, cricket is also an insect and who the hell is Sachin.

Few of the difficulties faced by IR can be, queries in some complex figure of speech rather than a literal language, different meaning a word in different languages (remember the “Monkey” controversy), addition of new words, new spellings/acronyms frequently used in sms, chat etc,.

Various techniques are used to overcome this hurdle like Ranking results based on the origin (country) of query, User clustering, Collaborative filtering (Collective intelligence) etc but none of them have proved to be completely infallible.

To me this seems to be one of the major reasons behind the popularity/growth of vertical search engines. Because more or less, vertical search engines knows the context of your query and the ambiguity is reduced to minimal.

I will try to give an example of the same by using my favorite keyword “Fish”. So let’s say you are search for the book called “Fish” by Harry Paul and John Christensen and type “fish” as a keyword for your search query, here are the results you’ll get from different search engines.

Google Web Search

Google Product Search

Google News Search

Google Book Search

Finally Google book search comes to your rescue. It is because Google book search knows that you are searching for a book which will be somehow related to “fish” and managed to throw the relevant result on page 1.

Many search engines are very wisely trying to integrate the intelligence of their vertical search engines to its generic search engine by combining the results. Google, Yahoo etc have also started including results from its vertical search engines (mainly news, images, video) but this strategy has been very effectively implemented by ASK. So if you search for “Sachin” on ask.com, you will get, web results, videos, encyclopaedia, blogs, related keywords, and even paid content in a much organised manner.

Okay, I guess enough for now, it seems now I am deviating from the main topic of WSD. But I am sure you must be feeling much superior to a stupid computer who can still interpret WSD as Washington School for the Deaf :-)