Quick note on further encounters with machine translations of non-European languages. I discovered yesterday that the Wiktionary Android app can search for Persian words typed in Latin letters. Obviously, it’s also possible to switch to a Persian keyboard, but it’s easier to quickly type in a word that I’ve heard in my regular alphabet.
There’s also the benefit of never mistyping letters that have the same pronunciation. In Persian, for example, there are several letters that are pronounced exactly the same (for example, ﺱ ,ﺙ, and ﺹ are all pronounced as /s/). Typing a word as I’ve heard it in Latin letters allows me to find its meaning without knowing its spelling first.
I tried this kind of Latin-alphabet search with other non-European languages and got mixed results. In Chinese, for example, it works, but it gives a list of search terms in pinyin, without the Chinese characters. This could be helpful for language learners, but it’s going to be annoying for educated native speakers, whose eyes will navigate actual Chinese characters more easily.
Via This Week in Google, I also learned that Google Translate is soliciting human input on its machine translations. Over at translate.google.com/community, they’re asking bilingual people to either translate phrases or verify what other translations.
It reminds me very much of the immersion lessons in Duolingo, but with two key differences:
- It’s not at all fun. It’s actually unclear why they call it community, since there is actually no communication between users working on translations, and they give no reason for anyone to participate.
- It’s apparently random. I couldn’t be sure, but the difficulty level I was presented didn’t seem to depend at all whether I had just skipped or verified a translation. Duolingo does give users some information about the expected difficulty of a text they are offering for translation.
I’m glad to see that they are asking for human input, but I’m surprised that their approach is so clumsy. Google used to have a pretty smart image labeling game, in which players would be matched up around the world and would get points for guessing the same words to describe images as their partner, without communicating. I liked it because it could help the image search algorithm find images based on what could be faulty, but common, human interpretation. For example, if everyone looks at a photo of an 80s hair band and thinks “They remind me of Van Halen,” then it makes sense to associate that photo with a search for “Van Halen,” even if the photo isn’t actually of Van Halen.
Luis von Ahn, the founder of Duolingo, developed the game that Google used for the image labeling game, which is one reason why I expected Duolingo to be much more adaptive. In the case of both Google Translate and Duolingo, I don’t see why we can’t have more smart, adaptive approaches to translation. With the sheer number of users of Google Translate (or probably Duolingo, at this point), there’s no reason to present users with a translation that hasn’t already been typed by a native speaker of the target language, in a different context.