A Perso-Arabic to Indic Script Machine Transliteration Model"Prof. Gurpreet Singh Lehal Department of Computer Science, Punjabi University, Patiala, India."
Indian sub-continent is one of those unique parts of the world where single languages are written in different scripts. This is the case for example with Punjabi, spoken by tens of millions of people, but written in Indian East Punjab (20 million) in Gurmukhi script (a Left to Right script based on Devnagri) and in Pakistani West Punjab (80 million), it is written in Shahmukhi (a Right to Left script based on Perso-Arabic). Whilst in speech, Punjabi spoken in the Eastern and the Western parts is mutually comprehensible in the written form it is not. This is also the case with other languages like Urdu and Hindi (whilst having different names, they are the same language but written, as with Punjabi, in mutually incomprehensible forms). Hindi is written in the Devnagri script from left to right, Urdu is written in a script derived from a Persian modification of Arabic script written from right to left. A similar problem resides with the Sindhi language, which is written in a Persio-Arabic script in Pakistan and both in Persio-Arabic and Devanagri in India. Similar is the case with Kashmiri language too. Konkani is probably the only language in India which is written in five scripts Roman, Devnagri, Kannada, Persian-Arabic and Malayalam. The existence of multiple scripts has created communication barriers, as people can understand the spoken or verbal communication, however when it comes to scripts or written communication, the number diminishes, thus a need for transliteration tools which can convert text written in one language script to another script arises. A common feature of all these languages is that, one of the script is Perso-Arabic (Urdu, Sindhi, Shahmukhi etc.), while other script is Indic (Devnagri, Gurmukhi, Kannada, Malayalam). Perso-Arabic script is a right to left script, while Indic scripts are left to right scripts and both the scripts are mutually incomprehensible forms. Thus is a dire need for development of automatic machine transliteration tools for conversion between Perso-Arabic and Indic scripts.Machine Transliteration is an automatic method to generate characters or words in one alphabetical system for the corresponding characters in another alphabetical system. The transformation of text from one script to another is usually based on phonetic equivalencies. We present Sangam, a Perso-Arabic to Indic script machine transliteration system, which can convert with high accuracy text written in Perso-Arabic script to one of the Indic script sharing the same language. The system has been successfully tested on Punjabi(Shahmukhi-Gurmukhi) , Urdu (Urdu-Devnagri) and Sindhi(Sindhi Perso Arabic - Sindhi Devnagri) languages and can be easily extended for other languages like Kashmiri and Konkani text.