Follow me on Twitter for my latest adventures!

I have extended my xgoogle library with a Python module for Google Translate.
The new module is called "xgoogle.translate" and it implements two classes - "Translator" and "LanguageDetector".
The "Translator" class can be used to translate text. It provides a function called "translate" that takes three arguments - "message", "lang_from" and "lang_to". It returns the translated text as a Unicode string. Don't forget to encode it to the right encoding before outputting, otherwise you'll get errors such as "UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-3: ordinal not in range(256)"
Here is an example usage of the "Translator" class:
>>> from xgoogle.translate import Translator
>>>
>>> translate = Translator().translate
>>>
>>> print translate("Mani sauc Pēteris", lang_to="en")
My name is Peter
>>>
>>> print translate("Mani sauc Pēteris", lang_to="ru").encode('utf-8')
Меня зовут Петр
>>>
>>> print translate("Меня зовут Петр")
My name is Peter
If "lang_from" is not given, Google's translation service auto-detects it. If "lang_to" is not given, it defaults to "en" (English).
In case of an error, the "translate" function throws "TranslationError" exception with a message why the translation failed. It's best to wrap calls to "translate" in a try/except block:
Program:
>>> from xgoogle.translate import Translator, TranslationError
>>>
>>> try:
>>> translate = Translator().translate
>>> print translate("")
>>> except TranslationError, e:
>>> print e
Output:
Failed translating: invalid text
The "LanguageDetector" class can be used to detect the language of the text. It contains a function called "detect".
The "detect" function takes only one argument - message - the piece of text you to detect language of.
It returns a "Language" object that has four properties:
- lang_code - two letter language code for the given language. For example "ru" for Russian.
- lang - the name of the language. For example, "Russian".
- confidence - the confidence level from 0.0 to 1.0 that describes how confident the detector was about the language of the given text.
- is_reliable - was the detection reliable.
Here is an example of "LanguageDetector":
>>> from xgoogle.translate import LanguageDetector, DetectionError
>>>
>>> detect = LanguageDetector().detect
>>> english = detect("This is a wonderful library.")
>>> english.lang_code
'en'
>>> english.lang
'English'
>>> english.confidence
0.28078437000000001
>>> english.is_reliable
True
In case of a failure "detect" raises a "DetectionError" exception.
These two classes interact with the Google Ajax Language API to do their job. Since this Ajax service returns JSON string, you'll need to install simplejson Python module. It should be as easy as typing "easy_install simplejson".
Download "xgoogle" library:
Download: xgoogle library (.zip)
Downloaded: 14286 times.
Download url: http://www.catonmat.net/download/xgoogle.zip
I haven't yet posted this library to pypi but I will soon do it.


Facebook
Plurk
more
GitHub
LinkedIn
FriendFeed
Google Plus
Amazon wish list
Comments
paw paw paw
Any plans on putting this up someplace (like GitHub or BitBucket) so we can fork off it and provide improvements?
Yuvi, yes, I have plans but do not know when I will do that. I'll let you know personally when I put it out.
Thanks :)
outstanding.
Cool.. Your module is more pythonic than my one. I'll throw my own and use your module.
Surely this module will make our code here at work more pythonic, we were pulling directly from google and parsing raw XML. I second the proposal from Yuvi to publish the code in bitbucket (mercurial please!)
"Don’t forget to encode it to the right encoding before outputting, otherwise you’ll get errors such as “UnicodeEncodeError: ‘latin-1′ codec can’t encode characters in position 0-3: ordinal not in range(256)”"
That's rather weird. Your terminal is (apparently) configured to accept UTF-8, why does Python attempt to convert the string to Latin-1?
Roman, I haven't figured that out yet. I looked at this:
Seems like the default encoding is 'ascii', but when I do:
It says it tried to encode it to latin-1 but encountered a char that could not be represented with this encoding.
But this works:
>>> print u.encode('utf-8') 啔Update:
ISO-8859-1 is Latin-1.
Another update:
That explains it.
nabucosound, okay! Will release it to several code repositories at once. :)
Great work, thanks!
Great Work. We've already created unreleased patches to Virtaal our desktop translation tool.
One issue is that the list of languages is hard coded. I realise that's because Google doesn't supply them. But it would be nice to just query Google and get that list at startup.
I'd love a way to query the translate.py classes to find out if a source or target language is supported. Currently I just get a TranslationError would be nicer if it raised a more fine grained Exception so that our code can then decide not to support that target language.
nice work! is there any way to use libs from google lib series in google app engine?
bug:
class DetectionError(Exception): passNameError: global name 'DetectError' is not defined
def detect(self, message): """ Given a 'message' detects its language. Returns Language object. """ message = quote_plus(message) real_url = LanguageDetector.detect_url % { 'message': message } try: detection = self.browser.get_page(real_url) data = json.loads(detection) if data['responseStatus'] != 200: raise DetectError, "Failed detecting language: %s" % data['responseDetails'] rd = data['responseData'] return Language(rd['language'], rd['confidence'], rd['isReliable']) except BrowserError, e: raise DetectError, "Failed detecting language (getting %s failed): %s" % (e.url, e.error) except ValueError, e: raise DetectErrro, "Failed detecting language (json failed): %s" % e.message except KeyError, e: raise DetectError, "Failed detecting language, response didn't contain the necessary data"yttrium, thanks! just fixed this bug and pushed to github: www.github.com/pkrumins/xgoogle.
This is exactly what I was looking for.
I have utilized some of this code in my project, PyTranslateList, available at https://sourceforge.net/projects/pytranslatelist/
Just thought you might be interested.
Thanks Peteris Krumins for sharing, they are very good script, could you rewrite google translation function, since google traslation webpage has been changed, they are a lots of exceptions.
Thanks for the great tool.
Is there a way to get an ascii phonetic translation back instead of the native characters?
SO??? Is there a way to get the phonetic translation from google translate???
Sounds great. So... going on PyPI soon, like you said?
Has anyone used this to translate a whole website into another language? is Google translate relaible enough?
Hi,
Can you advise how to use this and get around a proxy - I figure urllib2's ProxyHandler is the answer but can't figure out how to implement it.
Very good tool, how can we get the phonetic translation from google translate? This solution would be very useful for my work.
This seems to have a hard time detecting Portuguese. Am I experiencing this problem all by myself? Also, awesome work here!
Well it wasn't parsing correctly with utf-8 encoding. However now I get a problem with urllib.py when I use detect() on unicode text with "o gato em suas calças com queijo"
"UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal"
I will mess around with it some more.
Any chance of this translation process being available in VBS ?
would be way cool !.
when did you plan to distribute it to pypi ???
Hey, I'm doing an assignment using python on IVLE. I tried importing google.translate but it doesn't like it. Anyway I can use this?
Z
From Google Code"
"Google Translate API v1 was officially deprecated on May 26, 2011; it will be shut off completely on December 1, 2011. For text translations, you can use the Google Translate API v2, which is now available as a paid service."
:(
Leave a new comment