Google Python Search LibraryI have extended my xgoogle library with a Python module for Google Translate.

The new module is called "xgoogle.translate" and it implements two classes - "Translator" and "LanguageDetector".

The "Translator" class can be used to translate text. It provides a function called "translate" that takes three arguments - "message", "lang_from" and "lang_to". It returns the translated text as a Unicode string. Don't forget to encode it to the right encoding before outputting, otherwise you'll get errors such as "UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-3: ordinal not in range(256)"

Here is an example usage of the "Translator" class:

>>> from xgoogle.translate import Translator
>>>
>>> translate = Translator().translate
>>>
>>> print translate("Mani sauc Pēteris", lang_to="en")
My name is Peter
>>>
>>> print translate("Mani sauc Pēteris", lang_to="ru").encode('utf-8')
Меня зовут Петр
>>>
>>> print translate("Меня зовут Петр")
My name is Peter

If "lang_from" is not given, Google's translation service auto-detects it. If "lang_to" is not given, it defaults to "en" (English).

In case of an error, the "translate" function throws "TranslationError" exception with a message why the translation failed. It's best to wrap calls to "translate" in a try/except block:


Program:

>>> from xgoogle.translate import Translator, TranslationError
>>>
>>> try: 
>>>   translate = Translator().translate
>>>   print translate("")
>>> except TranslationError, e:
>>>   print e

Output:

Failed translating: invalid text 

The "LanguageDetector" class can be used to detect the language of the text. It contains a function called "detect".

The "detect" function takes only one argument - message - the piece of text you to detect language of.

It returns a "Language" object that has four properties:

  • lang_code - two letter language code for the given language. For example "ru" for Russian.
  • lang - the name of the language. For example, "Russian".
  • confidence - the confidence level from 0.0 to 1.0 that describes how confident the detector was about the language of the given text.
  • is_reliable - was the detection reliable.

Here is an example of "LanguageDetector":

>>> from xgoogle.translate import LanguageDetector, DetectionError
>>>
>>> detect = LanguageDetector().detect
>>> english = detect("This is a wonderful library.")
>>> english.lang_code
'en'
>>> english.lang
'English'
>>> english.confidence
0.28078437000000001
>>> english.is_reliable
True

In case of a failure "detect" raises a "DetectionError" exception.

These two classes interact with the Google Ajax Language API to do their job. Since this Ajax service returns JSON string, you'll need to install simplejson Python module. It should be as easy as typing "easy_install simplejson".

Download "xgoogle" library:

Download: xgoogle library (.zip)
Downloaded: 21961 times.
Download url: http://www.catonmat.net/download/xgoogle.zip

I haven't yet posted this library to pypi but I will soon do it.

Comments

f00li5h Permalink
September 28, 2009, 12:34

paw paw paw

September 28, 2009, 17:18

Any plans on putting this up someplace (like GitHub or BitBucket) so we can fork off it and provide improvements?

September 28, 2009, 18:19

Yuvi, yes, I have plans but do not know when I will do that. I'll let you know personally when I put it out.

September 28, 2009, 18:59

Thanks :)

September 28, 2009, 21:49

outstanding.

September 29, 2009, 07:26

Cool.. Your module is more pythonic than my one. I'll throw my own and use your module.

September 29, 2009, 08:46

Surely this module will make our code here at work more pythonic, we were pulling directly from google and parsing raw XML. I second the proposal from Yuvi to publish the code in bitbucket (mercurial please!)

Roman Permalink
September 29, 2009, 09:16

"Don’t forget to encode it to the right encoding before outputting, otherwise you’ll get errors such as “UnicodeEncodeError: ‘latin-1′ codec can’t encode characters in position 0-3: ordinal not in range(256)”"

That's rather weird. Your terminal is (apparently) configured to accept UTF-8, why does Python attempt to convert the string to Latin-1?

September 29, 2009, 10:42

Roman, I haven't figured that out yet. I looked at this:

>>> import sys
>>> sys.getdefaultencoding()
'ascii'

Seems like the default encoding is 'ascii', but when I do:

>>> u = u'\u5554'
>>> print u
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u5554' in position 0: ordinal not in range(256)

It says it tried to encode it to latin-1 but encountered a char that could not be represented with this encoding.

But this works:

>>> print u.encode('utf-8')
啔

Update:

>>> import locale
>>> locale.getpreferredencoding()
'ISO-8859-1'

ISO-8859-1 is Latin-1.

Another update:

$ echo $LANG
en_US
$ LANG=en_US.UTF-8
$ python
>>> import locale
>>> locale.getpreferredencoding()
'UTF-8'
>>> u = u'\u5555'
>>> print u
啕

That explains it.

September 29, 2009, 10:45

nabucosound, okay! Will release it to several code repositories at once. :)

Vinícius Permalink
September 30, 2009, 13:08

Great work, thanks!

October 12, 2009, 14:37

Great Work. We've already created unreleased patches to Virtaal our desktop translation tool.

One issue is that the list of languages is hard coded. I realise that's because Google doesn't supply them. But it would be nice to just query Google and get that list at startup.

I'd love a way to query the translate.py classes to find out if a source or target language is supported. Currently I just get a TranslationError would be nicer if it raised a more fine grained Exception so that our code can then decide not to support that target language.

October 19, 2009, 14:04

nice work! is there any way to use libs from google lib series in google app engine?

yttrium Permalink
January 08, 2010, 14:54

bug:

class DetectionError(Exception):
    pass

NameError: global name 'DetectError' is not defined

    def detect(self, message):
        """
        Given a 'message' detects its language.
        Returns Language object.
        """

        message = quote_plus(message)
        real_url = LanguageDetector.detect_url % { 'message': message }

        try:
            detection = self.browser.get_page(real_url)
            data = json.loads(detection)

            if data['responseStatus'] != 200:
                raise DetectError, "Failed detecting language: %s" % data['responseDetails']

            rd = data['responseData']
            return Language(rd['language'], rd['confidence'], rd['isReliable'])

        except BrowserError, e:
            raise DetectError, "Failed detecting language (getting %s failed): %s" % (e.url, e.error)
        except ValueError, e:
            raise DetectErrro, "Failed detecting language (json failed): %s" % e.message
        except KeyError, e:
            raise DetectError, "Failed detecting language, response didn't contain the necessary data"
January 08, 2010, 16:03

yttrium, thanks! just fixed this bug and pushed to github: www.github.com/pkrumins/xgoogle.

Erik Permalink
March 15, 2010, 17:56

This is exactly what I was looking for.
I have utilized some of this code in my project, PyTranslateList, available at https://sourceforge.net/projects/pytranslatelist/
Just thought you might be interested.

April 15, 2010, 23:25

Thanks Peteris Krumins for sharing, they are very good script, could you rewrite google translation function, since google traslation webpage has been changed, they are a lots of exceptions.

Pete Permalink
May 17, 2010, 15:26

Thanks for the great tool.

Is there a way to get an ascii phonetic translation back instead of the native characters?

Kris Permalink
June 09, 2010, 12:19

SO??? Is there a way to get the phonetic translation from google translate???

July 13, 2010, 05:57

Sounds great. So... going on PyPI soon, like you said?

September 05, 2010, 10:24

Has anyone used this to translate a whole website into another language? is Google translate relaible enough?

Andrew Permalink
September 15, 2010, 10:41

Hi,

Can you advise how to use this and get around a proxy - I figure urllib2's ProxyHandler is the answer but can't figure out how to implement it.

September 27, 2010, 11:25

Very good tool, how can we get the phonetic translation from google translate? This solution would be very useful for my work.

Josh Permalink
February 18, 2011, 09:30

This seems to have a hard time detecting Portuguese. Am I experiencing this problem all by myself? Also, awesome work here!

Josh again Permalink
February 19, 2011, 07:57

Well it wasn't parsing correctly with utf-8 encoding. However now I get a problem with urllib.py when I use detect() on unicode text with "o gato em suas calças com queijo"

"UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal"

I will mess around with it some more.

John Permalink
July 13, 2011, 20:04

Any chance of this translation process being available in VBS ?

would be way cool !.

George Permalink
September 08, 2011, 18:01

when did you plan to distribute it to pypi ???

October 08, 2011, 00:20

Hey, I'm doing an assignment using python on IVLE. I tried importing google.translate but it doesn't like it. Anyway I can use this?
Z

Benoit Permalink
November 28, 2011, 17:48

From Google Code"
"Google Translate API v1 was officially deprecated on May 26, 2011; it will be shut off completely on December 1, 2011. For text translations, you can use the Google Translate API v2, which is now available as a paid service."
:(

Josep Valls Permalink
June 19, 2012, 13:30

Anyone knows of any alternatives?

Jude Permalink
August 28, 2012, 20:09

Please I want to develop an application that can translate text from one language to another using python. please will be glad if you help me

Sara Permalink
April 21, 2013, 10:02

I used the translate code to convert word from Arabic to English, but gave me this error

Traceback (most recent call last):
File "C:\Python27\TR_TR.py", line 9, in

<module>

print translate(u"كتب", lang_to="en")
File "C:\Python27\xgoogle\translate.py", line 39, in translate
message = quote_plus(message)
File "C:\Python27\lib\urllib.py", line 1275, in quote_plus
return quote(s, safe)
File "C:\Python27\lib\urllib.py", line 1268, in quote
return ''.join(map(quoter, s))
KeyError: u'\u0643'
------------------------------------------------------------
and This is my code :
# -*- coding: cp1256 -*-
import sys
import HTMLParser
from xgoogle.translate import Translator

translate = Translator().translate

print translate(u"كتب", lang_to="en", lang_from='ar').encode('utf-8')

---------------------------------------------

Is the code does not use the Arabic language ??

Thanks..

January 14, 2014, 02:03

I'll put it on my bitbucket (http://bitbucket.org/hd1/xgoogle) shortly.

Leave a new comment

(why do I need your e-mail?)

(Your twitter name, if you have one. (I'm @pkrumins, btw.))

Type the first letter of your name: (just to make sure you're a human)

Please preview the comment before submitting to make sure it's OK.

Advertisements