I have been away from home for almost two years now. I talk and chat frequently with my mom and dad back home. My mom is more comfortable using my mother tounge Marathi to communicate with me and has to fight with the Roman Script keyboard that comes fitted to the computer back home. What is more annoying is despite of the fact that I am a computer science Masters Stuudent she still has to struggle, can’t I just write a small program that will encode the Roman key sequence and change it to the resective Devnagari Character? Well yes I can, but does she use the same key sequence every time to mean the same letter? Is she consistant with her spelling of certain phonemes? No she is not. Now should I make her mug up (Indan term for memorize) all the key sequences just coz she has to write to me in Devanagari script? Is it really worth the pain? (Well for those emotional types it might be, but my mom would say look kid, you want me to read this and mug it up? forget it! I will call you). To all those of you who are by now eager to point me to Google’s Indic Transliteration web interface, my point is my mom is not great with computer and although she would be keen on doing it, it would be a bigger pain for me to explain her how to really do it. So forget it.
What do I do? Leave her to call me and not pick the call coz I am either in school or playing Cricket or cooking or out with friends? No I chose to write the code which will adapt to her key sequences probabilistically and then suggest some output of the word to select. I know this is really a long term project and needs a lot of dedication. Below is my plan of action.
1. Identify the phonemes that are formed with the script consonants + vowels and other combinations.
2. Build a probabilistic model for phoneme preceedence using the two Marathi corpus available form CFILT-IIT-B.
3. draw association between Roman key strokes and Devenagari phonemes.
4. Write up a predction algorithm that that based on the previous key strokes and the probabilistic models predict the nextt phoneme in the word.
I am through with step one. It has a small python file as code. I would love to write a C++ code but I am trying to write like a small app that can be used in a web browser hence I am keeping it in something like Python. But again I don’t know if Python will be useful. So for now I am designing and developing the algo and a sample aap as my weekend project. Lets see where we go. Feedback welcome!