Microsoft is developing a new technique than can process code-mixed speech and text. The aim is to make the interaction between computer and human natural and multilingual.
According to a research, more than 50% of world’s population speaks two or more languages. But, technology has assumed single-language speech, which doesn’t reflect the way people naturally speak.
There are multilingual countries like India, where most people are multilingual. They primarily use Hindi or English, and keep switching between these two languages.
“I speak Bengali, English, and Hindi, as do a lot of my friends and colleagues. When we talk, we move fluidly between these languages without much thought,” wrote Monojit Choudhury, Researcher at Microsoft Research Lab India, in a blog post.
When people mix words and phrases of two more languages while speaking or writing, it is called code-mixing or code-switching. Hinglish (Hindi and English) and Spanglish (Spanish and English) are two examples of code-mixing.
It is important to build a technology that can process code-mixing to create useful translation, speech recognition tools, and engaging user interface.
The new technique developed by Microsoft (under Project Mélange) seeks to address this challenge. It is based on a linguistic model called equivalence constraint theory of code-mixing. This theory imposes several syntactic constraints on code-mixing.
“In building the Spanglish corpus, for example, we used Bing Microsoft Translator to first translate an English sentence into Spanish. Then we aligned the words, identifying which English word corresponded to the Spanish word, and in a process called parsing identified in the sentences the phrases and how they’re related. Then using the equivalence constraint theory, we systematically generated all possible valid Spanglish versions of the input English sentence,” explained Monojit.
The technique requires word embedding from two languages to process code-mixed language. For instance, Hinglish will need an embedding of words from Hindi and English in same space.
Also read: Microsoft releases roadmap for Azure DevOps
Microsoft said that its new technique can be used to build a multilingual chatbot that can code-mix, depending on the person, context of conversation, and topic under discussion. It will further switch in a natural and appropriate way.