You're viewing a comment by Joel Shapiro and its responses.

December 01, 2009, 22:57

Marhaban (More informal Hi, greetings in Arabic) Peter.

Firstly, Excellent work!

I do very specialized English to Arabic names and terms transcription work; verifying their integrity and veracity.

"Transcriptions" is the formal linguistic terminology for "spellings".

Without explaining further you can bring up my unique, very easy to use web page:

http://enartrans.com/transcription

Your xgoogle Google parser is what I've had in mind for a long time. Once I (the user) verifies which Arabic transcription variation(s) to use as search terms your Google parser is a powerful adjunct.

Here is my slight deviation on your original code with the native Arabic search term: native Arabic "Philadelphia".

For those of you versed in Arabic or other Semitic languages such as Hebrew Philadelphia in the incorrectly reads left to right where it should read right to left with contiguous characters.

Hopefully on the submit the Arabic characters will retain their at least human readable form even though they're in the wrong direction and not revert to some %hex encoding.

But no worries! From a functional respect it all correctly "comes out in the wash" (i.e. run the script)

My question is it seems on the whole your script works fine but on looking at a corresponding "native" Google search via Firefox I seem to be missing some URL's per page.

I was wondering if you have any plans to upgrade to different languages? Maybe there's some encodings not being recognized by your code ... perhaps some setting or designation I can do from my side?

That said the number of URLs I seem to miss nowhere near invalidates using your work as a wonderful adjunct to mine.

Job well done!

Regards,

Joel S.

# -*- coding: utf-8 -*-

import sys
sys.path.append(r'E:\MovableIdle-Python-2.5\xgoogle')

# http://www.catonmat.net/blog/python-library-for-google-search/

from xgoogle.search import GoogleSearch, SearchError
try:
gs = GoogleSearch("فيلادلفيا")
gs.results_per_page = 100
gs.page = 2
results = gs.get_results()

counter = 0

for res in results:
print res.title.encode('utf8')
print res.desc.encode('utf8')
print res.url.encode('utf8')
counter = counter + 1
print counter
print

except SearchError, e:
print "Search failed: %s" % e

Reply To This Comment

(why do I need your e-mail?)

(Your twitter name, if you have one. (I'm @pkrumins, btw.))

Type the word "halflife3": (just to make sure you're a human)

Please preview the comment before submitting to make sure it's OK.