IIT Bombay develop new multilingual search engine

Sources: SpiderChat, goatelecom.com, APNIC Mailing list, INDIC Computing project, KMIndia.
Cached Versions: SpiderChat, APNIC, INDIC Computing project, KMIndia


Move over Google! A team of researchers from the Indian Insititute of Technology Bombay, says it has developed a search engine for the internet that is both multi-lingual as well as meaning specific, giving it a broader applicability and greater accuracy than existing models.

"Our search engine eliminates the language barrier and its results are much more accurate than any other techniques used," says Dr Pushpak Bhattacharya, Prof Computer Sciences and Engineering Department, IIT Bombay. Using Universal Networking Language (UNL), "the model has integrated the user's language requirement with the knowledge the user seeks," he points out.

In a paper to be presented at the ongoing International Conference on Universal Knowledge and Language here, Dr Bhattacharya and his team of students, Sarvjeet Singh, Tushar Chandra, Upmanyu Misra and Ushhan D Gundevia argue that their search engine retrieves only the knowledge that is relevant and attempts to bridge the language gap by using an underlying, structured language as a backhand translator. "As far as we know, we are the first to employ this technique," they say.

Google, widely believed to be the best search engine, is restricted only to English. According to an estimate by the World Wide Web, English language content makes for about 80 per cent of the trillion and trillion bytes of textual information on the internet. Though other language content is also catching up rapidly -- specially Chinese and South Asian languages -- the digital divide between nations and people is still huge.

It is in the backdrop of this that the United Nations began the UNL project in 1996. The universal networking language is simply put, an electronic language. It uses an EnConverter software to automatically convert natural language text into UNL. Thirteen languages so far, including Japanese, Chinese, Korean, Indonesian, English, Hindi, Marathi, Arabic, Italian, Russian, French, Spanish and Portuguese have deconverters in place that automatically translates them to other languages. With a lakh concepts in place, English boasts of the largest wordnet, so far.

IIT Bombay which is in the process of developing translation software for Hindi, Marathi and Konkani has developed 15,000 concepts so far for Hindi, says Bhattarcharya. He points to the immense extension of the reach of the internet, once computer translations of languages become availbale at the click of the button.