Nothing delights me more than great books.
If you like my blog, I'd be thankful for a gift from my Amazon book wishlist. :)

Here is a quick hack that I wrote. It’s a Python library to search Google without using their API. It’s quick and dirty, just the way I love it.
Why didn’t I use Google’s provided REST API? Because it says “you can only get up to 8 results in a single call and you can’t go beyond the first 32 results”. Seriously, what am I gonna do with just 32 results?
I wrote it because I want to do various Google hacks automatically, monitor popularity of some keywords and sites, and to use it for various other reasons.
One of my next post is going to extend on this library and build a tool that perfects your English. I have been using Google for a while to find the correct use of various English idioms, phrases, and grammar. For example, “i am programmer” vs. “i am a programmer”. The first one is missing an indefinite article “a”, but the second is correct. Googling for these terms reveal that the first has 6,230 results, but the second has 136,000 results, so I pretty much trust that the 2nd is more correct than the first.
Subscribe to my posts via catonmat’s rss, if you are intrigued and would love to receive my posts automatically!
How to use the library?
First download the xgoogle library, and extract it somewhere.
Download: xgoogle library (.zip)
Downloaded: 3851 times.
Download url: http://www.catonmat.net/download/xgoogle.zip
At the moment it contains just the code for Google search, but in the future I will add other searches (google sets, google suggest, etc).
To use the search, from “xgoogle.search” import “GoogleSearch” and, optionally, “SearchError“.
GoogleSearch is the class you will use to do Google searches. SearchError is an exception class that GoogleSearch throws in case of various errors.
Pass the keyword you want to search as the first parameter to GoogleSearch’s constructor. The constructed object has several public methods and properties:
- method get_results() - gets a page of results, returning a list of SearchResult objects. It returns an empty list if there are no more results.
- property num_results - returns number of search results found.
- property results_per_page - sets/gets the number of results to get per page. Possible values are 10, 25, 50, 100.
- property page - sets/gets the search page.
As I said, get_results() method returns a SearchResult object. It has three attributes — “title”, “desc”, and “url”. They are Unicode strings, so do a proper encoding before outputting them.
Here is a screenshot that illustrates the “title”, “desc”, and “url” attributes:

Google search result for “catonmat”.
Here is an example program of doing a Google search. It takes the first argument, does a search on it, and prints the results:
from xgoogle.search import GoogleSearch, SearchError
try:
gs = GoogleSearch("quick and dirty")
gs.results_per_page = 50
results = gs.get_results()
for res in results:
print res.title.encode('utf8')
print res.desc.encode('utf8')
print res.url.encode('utf8')
print
except SearchError, e:
print "Search failed: %s" % e
This code fragment sets up a search for “quick and dirty” and specifies that a result page should have 50 results. Then it calls get_results() to get a page of results. Finally it prints the title, description and url of each search result.
Here is the output from running this program:
Quick-and-dirty - Wikipedia, the free encyclopedia Quick-and-dirty is a term used in reference to anything that is an easy way to implement a kludge. Its usage is popular among programmers, ... http://en.wikipedia.org/wiki/Quick-and-dirty Grammar Girl's Quick and Dirty Tips for Better Writing - Wikipedia ... "Grammar Girl's Quick and Dirty Tips for Better Writing" is an educational podcast that was launched in July 2006 and the title of a print book that was ...Writing - 39k - http://en.wikipedia.org/wiki/Grammar_Girl%27s_Quick_and_Dirty_Tips_for_Better_Writing Quick & Dirty Tips :: Grammar Girl Quick & Dirty Tips(tm) and related trademarks appearing on this website are the property of Mignon Fogarty, Inc. and Holtzbrinck Publishers Holdings, LLC. ... http://grammar.quickanddirtytips.com/ [...]

Compare these results to the output above.
You could also have specified which search page to start the search from. For example, the following code will get 25 results per page and start the search at 2nd page.
gs = GoogleSearch("quick and dirty")
gs.results_per_page = 25
gs.page = 2
results = gs.get_results()
You can also quickly write a scraper to get all the results for a given search term:
from xgoogle.search import GoogleSearch, SearchError
try:
gs = GoogleSearch("quantum mechanics")
gs.results_per_page = 100
results = []
while True:
tmp = gs.get_results()
if not tmp: # no more results were found
break
results.extend(tmp)
# ... do something with all the results ...
except SearchError, e:
print "Search failed: %s" % e
You can use this library to constantly monitor how your website is ranking for a given search term. Suppose your website has a domain “catonmat.net” and the search term you want to find your position for is “python videos”.
Here is a code that outputs your ranking: (it looks through first 100 results, if you need more, put a loop there)
import re
from urlparse import urlparse
from xgoogle.search import GoogleSearch, SearchError
target_domain = "catonmat.net"
target_keyword = "python videos"
def mk_nice_domain(domain):
"""
convert domain into a nicer one (eg. www3.google.com into google.com)
"""
domain = re.sub("^www(\d+)?\.", "", domain)
# add more here
return domain
gs = GoogleSearch(target_keyword)
gs.results_per_page = 100
results = gs.get_results()
for idx, res in enumerate(results):
parsed = urlparse(res.url)
domain = mk_nice_domain(parsed.netloc)
if domain == target_domain:
print "Ranking position %d for keyword '%s' on domain %s" % (idx+1, target_keyword, target_domain)
Output of this program:
Ranking position 6 for keyword python videos on domain catonmat.net Ranking position 7 for keyword python videos on domain catonmat.net
Here is a much wicked example. It uses the GeoIP Python module to find all 10 websites for keyword “wicked code” that are physically hosting in California or New York in USA. Make sure you download GeoCityLite database from “http://www.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz” and extract it to “/usr/local/geo_ip”.
import GeoIP
from urlparse import urlparse
from xgoogle.search import GoogleSearch, SearchError
class Geo(object):
GEO_PATH = "/usr/local/geo_ip/GeoLiteCity.dat"
def __init__(self):
self.geo = GeoIP.open(Geo.GEO_PATH, GeoIP.GEOIP_STANDARD)
def detect_by_host(self, host):
try:
gir = self.geo.record_by_name(host)
return {'country': gir['country_code'].lower(),
'region': gir['region'].lower()}
except Exception, e:
return {'country': 'none', 'region': 'none'}
dst_country = 'us'
dst_states = ['ca', 'ny']
dst_keyword = "wicked code"
num_results = 10
final_results = []
geo = Geo()
gs = GoogleSearch(dst_keyword)
gs.results_per_page = 100
seen_websites = []
while len(final_results) < num_results:
results = gs.get_results()
domains = [urlparse(r.url).netloc for r in results]
for d in domains:
geo_loc = geo.detect_by_host(d)
if (geo_loc['country'] == dst_country and
geo_loc['region'] in dst_states and
d not in seen_websites):
final_results.append((d, geo_loc['region']))
seen_websites.append(d)
if len(final_results) == num_results:
break
print "Found %d websites:" % len(final_results)
for w in final_results:
print "%s (state: %s)" % w
Here is the output of running it:
Found 10 websites: www.wickedcode.com (state: ca) www.retailmenot.com (state: ca) www.simplyhired.com (state: ca) archdipesh.blogspot.com (state: ca) wagnerblog.com (state: ca) answers.yahoo.com (state: ca) devsnippets.com (state: ca) friendfeed.com (state: ca) www.thedacs.com (state: ny) www.tipsdotnet.com (state: ca)
You may modify these examples the way you wish. I’d love to hear some comments about what you can come up with!
And just for fun, here are some other simple uses:
You can make your own Google Fight:
import sys
from xgoogle.search import GoogleSearch, SearchError
args = sys.argv[1:]
if len(args) < 2:
print 'Usage: google_fight.py "keyword 1" "keyword 2"'
sys.exit(1)
try:
n0 = GoogleSearch('"%s"' % args[0]).num_results
n1 = GoogleSearch('"%s"' % args[1]).num_results
except SearchError, e:
print "Google search failed: %s" % e
sys.exit(1)
if n0 > n1:
print "%s wins with %d results! (%s had %d)" % (args[0], n0, args[1], n1)
elif n1 > n0:
print "%s wins with %d results! (%s had %d)" % (args[1], n1, args[0], n0)
else:
print "It's a tie! Both keywords have %d results!" % n1
Download: google_fight.py
Downloaded: 1237 times.
Download url: http://www.catonmat.net/download/google_fight.py
Here is an example usage of google_fight.py:
$ ./google_fight.py google microsoft google wins with 2680000000 results! (microsoft had 664000000) $ ./google_fight.py "linux ubuntu" "linux gentoo" linux ubuntu wins with 4300000 results! (linux gentoo had 863000)
After I wrote this, I generalized this Google Fight to take N keywords, and made their passing to program easier by allowing them to be separated by a comma.
import sys
from operator import itemgetter
from xgoogle.search import GoogleSearch, SearchError
args = sys.argv[1:]
if not args:
print "Usage: google_fight.py keyword one, keyword two, ..."
sys.exit(1)
keywords = [k.strip() for k in ' '.join(args).split(',')]
try:
results = [(k, GoogleSearch('"%s"' % k).num_results) for k in keywords]
except SearchError, e:
print "Google search failed: %s" % e
sys.exit(1)
results.sort(key=itemgetter(1), reverse=True)
for res in results:
print "%s: %d" % res
Download: google_fight2.py
Downloaded: 1122 times.
Download url: http://www.catonmat.net/download/google_fight2.py
Here is an example usage of google_fight2.py:
$ ./google_fight2.py earth atmospehere, sun atmosphere, moon atmosphere, jupiter atmosphere earth atmospehere: 685000 jupiter atmosphere: 31400 sun atmosphere: 24900 moon atmosphere: 8130
I am going to expand on this library and add search for Google Sets, Google Sponsored Links, Google Suggest, and perhaps some other Google searches. Then I’m going to build various tools on them, like a sponsored links competitor finder, use Google Suggest together with Google Sets to find various phrases in English, and apply them to tens of other my ideas.
Download: xgoogle library (.zip)
Downloaded: 3851 times.
Download url: http://www.catonmat.net/download/xgoogle.zip
Download: google_fight.py
Downloaded: 1237 times.
Download url: http://www.catonmat.net/download/google_fight.py
Download: google_fight2.py
Downloaded: 1122 times.
Download url: http://www.catonmat.net/download/google_fight2.py
Did you like this post? Subscribe here:
If you really enjoyed the post, I'd appreciate a gift from my geeky Amazon book wishlist. Books would make me more educated and I could write even better posts. Thanks! :)

(20 votes, average: 4.2 out of 5)
|
|
|


March 12th, 2009 at 4:15 pm
Doesn’t google throttle you after a while, if you scrape their pages too often? Too many requests from one IP and Google stops responding…
March 12th, 2009 at 5:48 pm
I usually google words to get the correct spell of it. Actually I had an idea a while ago to make a editor based on phrases popularity :P
March 12th, 2009 at 6:30 pm
Looks like the scraper example you published pulls tons of dupes. Any way to fix?
March 12th, 2009 at 6:44 pm
Jorge, it does. got to be careful. put a sleep between calls if you are doing a lot of scraping.
Daniel, me too. I have had this idea for a while as well. :)
Steve, can you tell me the query you used? Google sometimes displays 2 results from the same site (2nd usually indented to the right), that’s an ok behavior. One way to escape that is to keep a list or dict of seen urls, then check if you have seen the url already.
March 12th, 2009 at 10:00 pm
Kid, use corpora for such checks, don’t burn energy on google’s servers. Jeez.
March 12th, 2009 at 10:42 pm
Just remember about http://www.google.com/support/webmasters/bin/answer.py?answer=66357&topic=15263 ;)
March 12th, 2009 at 10:52 pm
Kamil, are there publicly available corporas? I know google’s one but it’s on 6DVDs and costs $150.
Gints, shhhhhh.
March 12th, 2009 at 11:25 pm
as I said to you in private discussion, I still miss iteration so much. that would make xgoogle more pythonic and useful.
March 13th, 2009 at 3:12 am
Gints beat me to commenting about that… which is why I’ve been using other search engines so far.
I’ve been looking into this kind of thing. I had found different python bindings to their ajax search.
http://dcortesi.com/2008/05/28/google-ajax-search-api-example-python-code/
and
http://anyall.org/blog/2008/11/python-bindings-to-googles-ajax-search-api/
This looks interesting though. Bookmarked.
March 13th, 2009 at 1:02 pm
Hi there,
you could also use the “pyajaxgoogle” binding [1] to search.
[1] http://daui.lophus.org/python/modules/pyajaxgoogle/
March 13th, 2009 at 2:34 pm
ME, not really. That is their api that gives 32 results.
March 13th, 2009 at 3:07 pm
Peteris: Yes, but at least it’s legal.
March 13th, 2009 at 3:54 pm
Everything is legal… Having a mindset that something is illegal is just wrong (in a sense that you will always hold back from creating something cool, because you think it’s illegal).
March 13th, 2009 at 6:55 pm
my college uses a proxy server(NTLM) which is with authentication.How do I use xgoogle?
March 13th, 2009 at 8:01 pm
Varun, you can set environment variable:
and then run the application that uses xgoogle.
Other way is to edit xgoogle/browser.py file and add
urllib2.ProxyHandler({'http':'www.someproxy.com:3128'})to list of handlers.
March 15th, 2009 at 11:32 pm
[…] Python Library for Google Search […]
March 19th, 2009 at 8:21 am
Hi! This is a beautifull example how to use google and python. I have my own projects with python. First i try to show the people what simple and good is python. I started with Romania.
April 6th, 2009 at 2:58 am
Nice idea i will explore the code as a tutorial
Thanks a lot men :-)
April 15th, 2009 at 1:28 pm
[…] is another quick hack that I wrote a while ago. It complements the xgoogle library that I published in my previous post with an API for Google Sponsored Links […]
May 5th, 2009 at 12:59 am
Hi Peter, I was looking at your code. I have done a much simpler utility for searching google. I used to use beautifulsoup also, but now I use lxml and xpath. It produces much quicker and cleaner code… here is an example that returns an array of the urls and text:
from lxml import etree as et from urllib import quote_plus,urlopen def gsearch(q='',num=10,datelimit=''): returninfo=[] searchurl='http://google.com/search?hl=en&as_q=%s&num=%s&as_qdr=%s'%(quote_plus(q),str(num),datelimit) results=urlopen(searchurl).read() tree=et.fromstring(results,et.HTMLParser()) links=tree.xpath('/html/body[@id="gsr"]/div[@id="res"]/div[1]/ol/li/h3/a') for a in links: returninfo.append({'href':a.values()[0],'text':a.text}) return returninfoLet me know what you think!
May 5th, 2009 at 1:10 am
Chad, thanks for leaving a comment. There are a couple of points that I want to make:
1. you don’t check for errors - my code does very rigorous error checking so that my applications did not suddenly die because of unhandled exceptions.
2. i love the conciseness of your code - mine is 10x longer.
3. i did not know lxml supported xpath - very nice to learn that.
4. i know lxml is much faster than BeautifulSoup, but it’s also less prone to malformed HTML. but perhaps in the case when we parse Google it’s not that important.
That’s everything that I can think of at the moment.
May 5th, 2009 at 1:32 am
The code was just a snippet of some other code I have, but in regards to your points:
1. There are only 3 points where errors can creep in that I see:
1- if the urlopen fails or
2- during the htmlparser() if the html is super-malformed (same w/ BeautifulSoup).
3- if google changes their html format(but that will screw up almost any scraper)
The xpath and rest of the code will be work without problem since xpath will return ‘[]’ if the xpath fails.
4. If you change the line:
tree=et.fromstring(results,et.HTMLParser())
to:
tree=et.fromstring(results,et.HTMLParser(recover=True))
then lxml handles malformed html almost as well as BeautifulSoup.
Anyways, I enjoy your blog, and just thought that I’d throw that out there.
June 8th, 2009 at 10:36 pm
Scraping is dangerous because all it takes is one change to destroy all the work.
June 17th, 2009 at 4:16 am
How would you recommend folding this into a script that uses a set google query and set parameters and writes the output to a file? This way it could be used regularly without feeding it all the variables over and over… (Sorry if this is obvious to others out there!)
June 28th, 2009 at 4:29 pm
Hello,
Thank you for this nice library. It is very useful I think. I have got 503 errors, even if I use sleep between search actions. I think this can be related to agent setting. How can we set a custom browser agent in you code?
June 28th, 2009 at 4:30 pm
Volkan, it’s somewhere in the source. I did not make it explicitly changble.
July 27th, 2009 at 9:16 am
Just cant stop my self to comment on your blog. Good post.
July 27th, 2009 at 7:16 pm
Dude, awesome! This works for my IRC bot!
July 28th, 2009 at 11:18 am
Any chance you’d be willing to put it up on Bitbucket or GitHub? :)
July 28th, 2009 at 12:20 pm
Oh - and licensing it as open source?
July 31st, 2009 at 3:30 am
Doug, about bitbucket or github: Sure. I will. I just have to automate my tools more, to push out changes from my repo to bitbucket or github. I don’t want to do anything manually. I haven’t yet done this, but I soon will. At the moment the latest version is always at http://www.catonmat.net/download/xgoogle.zip.
Doug, about licensing: All my work is open source. You may use it any way you wish.
August 9th, 2009 at 10:51 pm
Hello,
thanks for a very nice lib.
I have added two more parameters domain and hl.
However changing the language parameters will give no results. It seems to be the html that is slightly different using i.e. hl=sv (swedish). I have been flagging your code for some hours now as this is my first time using python. Would you have the solution for this? Even though google.com?hl=en is probably the most used way, I am interested in the local versions as well.
Thanks!
August 11th, 2009 at 6:40 am
[…] библиотеки, поэтому сразу даю линк на пост автора – Python Library for Google Search, с примерами и описанием. Библиотека избавляет вас от […]
August 23rd, 2009 at 7:00 pm
Hi Peteris! I’m using your lib, but i’ve some problems surfing google pages. What is the best way to change it?
How can i know when they’re over?
Regards
stray
August 23rd, 2009 at 7:33 pm
Hi Stray! They are over when get_results() returns an empty list.
To get all results do this:
results = [] while True: tmp = gs.get_results() if not tmp: # no more results were found break results.extend(tmp)August 25th, 2009 at 2:34 am
Hi!
Thanks a lot for the code! I was rather saddened to see pygoogle no longer being maintained, nor Google releasing any SOAP API keys any longer. This is perfect.
One thing I run across- I wondered if there was an easy way to enclose a search in quotes (”") instead of the default?
I’m searching for rather long strings that pretty much require some quotes around it to pull the exact results.
Thanks for any input!
tom
August 25th, 2009 at 3:58 am
[…] presenting you to my actual code, i must really give credit to the great xgoogle library created by Peteris Krumins. I have included it in my own code and it really made my life easier. Also, if you are on the […]
August 28th, 2009 at 4:34 pm
I have an error:
Search failed: Failed getting http://www.google.com/search?hl=en&q=quick+and+dirty&num=50&btnG=Google+Search: HTTP Error 503: Service Unavailable
What i do wrong?
September 6th, 2009 at 11:22 pm
Hi Peter,
I’ve used this library for a variety of things so I thought I would just pop in and say thanks for providing something that works well is easy to use.
I just started writing a replacement google library for my company’s internal use. Like chad, I am also a huge fan of lxml for a variety of reasons. I also would like to make the library a bit more “pythonic” in general by adding smart generators so that you can iterate over results without worrying about what page they are on. I wrap your SearchResults objects already so I will probably provide a class/function hook in the constructor so can yield() instances of WhateverClass.
balcon: that 503 error is 99% likely to mean that you tried to scrape google too quickly. Bare minimum time between searches is about 10 seconds if you’re doing more than 5-6 requests.
Cheers,
Nathan
September 30th, 2009 at 1:18 pm
[…] have extended my xgoogle library with a Python module for Google […]
October 2nd, 2009 at 5:50 pm
[…] have extended my xgoogle library with a Python module for Google […]
October 7th, 2009 at 5:37 pm
Hi Pete! I’m again here to post :=)
You suggest me that they(google pages) are over when get_results() returns an empty list. Using your lib i’ve found it is not so true in fact look at this scenario:
-> Google results: 1680000
-> Dork: inurl:polito (it’s my university :P)
… I print all links
-> Results: 100
But as you can see they’re only 100 instead of 1680000. I don’t demand to have all these outcomes (I’ve used your examples)
Have fun!
Regards
stray
October 7th, 2009 at 6:51 pm
Stray, here is the code that I just tried:
>>> from xgoogle.search import GoogleSearch >>> import time >>> >>> gs = GoogleSearch("inurl:polito") >>> gs.results_per_page = 100 >>> res = [] >>> while True: ... tmp = gs.get_results() ... if not tmp: ... break ... res.extend(tmp) ... time.sleep(5) ... >>> print len(res) 618Seems to work for me.
The thing is that Google can show that it has 10 billion results but in reality it will return only 1000 for any search. And if it thinks there are some duplicates in those 1000, then it will return even less. In this case it returned 618 results.
October 16th, 2009 at 10:26 am
Did someone manage to use different parameters for other google domains and languages?
Kind regards,
lowel
October 27th, 2009 at 8:32 am
I can’t download xgoogle.zip file. Please fix link. Thanks very much.
October 27th, 2009 at 9:23 am
AloneRoad, works for me. Try to see what is going on on your side.
November 9th, 2009 at 7:42 pm
I am getting the following error scraping Google from python:
urllib2.HTTPError: HTTP Error 503: Service Unavailable
Strange thing is that I can view results through a browser on the same machine(same IP etc) no problem. I am already masquarding the “User Agent Id’ to mimic Firefox. I am also collecting in the cookies and feeding these back with the request.
Does anyone have an idea as to how Google will be telling the Python screen scraper versus the browser apart? Perhaps some other header in the request?
I would be very grateful to get anybodys thoughs and experiences on this.
Tom
November 23rd, 2009 at 12:32 am
Well done, thanks for providing this.
Given that your are a scientist, have a go at a ‘quick and dirty’, easy-to-use Google Scholar lib next. That’s duly needed, as Google doesn’t provide an API for that service (yet?).
The scientific community will be eternally thankful. Some use cases and suggestions can be found here:
http://code.google.com/p/google-ajax-apis/issues/detail?id=109
November 23rd, 2009 at 1:07 am
Alessandro, thanks for the comment. The never ending list of requests to add API for Google Scholar inspired me to write it right this very moment! I am doing it!
November 23rd, 2009 at 1:31 am
Tom, I didn’t notice your comment.
Perhaps Google figured that your user agent was spammy. Try creating GoogleSearch object with random_agent argument set to true:
November 23rd, 2009 at 2:30 am
Thanks for the code, can’t wait to put it to use!
November 23rd, 2009 at 3:59 am
[…] Python Library for Google Search – good coders code, great reuse (tags: python google search library programming api code) […]
November 24th, 2009 at 9:25 pm
[…] fait le test grâce à Peter Krumins pour la lib python XGoogle Leif Hedstrom pour la lib Python Yahoo search et au Blog Uswaretch, pour le wrapper de Bing en […]
December 1st, 2009 at 10:57 pm
Marhaban (More informal Hi, greetings in Arabic) Peter.
Firstly, Excellent work!
I do very specialized English to Arabic names and terms transcription work; verifying their integrity and veracity.
“Transcriptions” is the formal linguistic terminology for “spellings”.
Without explaining further you can bring up my unique, very easy to use web page:
http://enartrans.com/transcription
Your xgoogle Google parser is what I’ve had in mind for a long time. Once I (the user) verifies which Arabic transcription variation(s) to use as search terms your Google parser is a powerful adjunct.
Here is my slight deviation on your original code with the native Arabic search term: native Arabic “Philadelphia”.
For those of you versed in Arabic or other Semitic languages such as Hebrew Philadelphia in the incorrectly reads left to right where it should read right to left with contiguous characters.
Hopefully on the submit the Arabic characters will retain their at least human readable form even though they’re in the wrong direction and not revert to some %hex encoding.
But no worries! From a functional respect it all correctly “comes out in the wash” (i.e. run the script)
My question is it seems on the whole your script works fine but on looking at a corresponding “native” Google search via Firefox I seem to be missing some URL’s per page.
I was wondering if you have any plans to upgrade to different languages? Maybe there’s some encodings not being recognized by your code … perhaps some setting or designation I can do from my side?
That said the number of URLs I seem to miss nowhere near invalidates using your work as a wonderful adjunct to mine.
Job well done!
Regards,
Joel S.
# -*- coding: utf-8 -*-
import sys
sys.path.append(r’E:\MovableIdle-Python-2.5\xgoogle’)
# http://www.catonmat.net/blog/python-library-for-google-search/
from xgoogle.search import GoogleSearch, SearchError
try:
gs = GoogleSearch(”فيلادلفيا”)
gs.results_per_page = 100
gs.page = 2
results = gs.get_results()
counter = 0
for res in results:
print res.title.encode(’utf8′)
print res.desc.encode(’utf8′)
print res.url.encode(’utf8′)
counter = counter + 1
print counter
print
except SearchError, e:
print “Search failed: %s” % e
December 1st, 2009 at 11:13 pm
Marhaban Peter; Quick addendum - on pasting my code snippet the Arabic Philadelphia seems to have “righted itself” and appears in perfect,correct human readable form reading right to left.
Yes, for anyone who wants to try my Python snippet it reverts back reading to right in the Python editor.
It should still work fine for you provided you make the proper tweaks from your system.
مع سلامة (Maa Salama or Ciao y’all in Arabic)
Joel S.
December 4th, 2009 at 9:19 am
[…] замечательную библиотеку для анализа выдачи google – Python Library for Google Search. Единственное, чего мне в ней не хватало, так это […]
December 6th, 2009 at 4:29 pm
Peteris,
Thanks for sharing this code. I was wondering is there a particular reason why you are packaging the BeautifulSoup module in xgoogle?
Recently I’ve ran into troubles using a new version of BeautifulSoup with soup2text (http://svn.tools.ietf.org/svn/tools/ietfdb/sprint/73/rjs/ietf/utils/soup2text.py) and using an older version fixed the problem.
Did you encounter a similar situation and if so do you know what additions in BeautifuSoup broke your code?
December 6th, 2009 at 6:42 pm
Hi Nicolas,
Yes, I encountered a similar situation. I am packaging BeautifulSoup in my code because it’s the most stable version I have ever used. The new BS uses a different parsing engine and when doing tests it would throw unexpected errors such as EncodeError, IndexError and others. And the old one parses it just fine.
December 8th, 2009 at 12:29 pm
[…] This program searches for plurks that were indexed by Google. It outputs URLs to indexed pages. It’s written in Python and uses my xgoogle library. […]
December 10th, 2009 at 6:16 am
Looks awesome, very thorough. In my script I wrote a simple class with a static search method to grab result links from the page source… all I really needed at the time. Though this will definitely be useful to me in the future. Nicely done.
December 22nd, 2009 at 2:50 am
I am getting a weird error
your example code in the readme file works great in the interactive but fails when I put it in a file
this is the code –>
>>> from xgoogle.search import GoogleSearch
>>> gs = GoogleSearch(”catonmat”)
>>> gs.results_per_page = 25
>>> results = gs.get_results()
>>> for res in results:
… print res.title.encode(’utf8′)
$ ./xgoogle_mod.py
from: can’t read /var/mail/xgoogle.search
./xgoogle_mod.py: line 4: syntax error near unexpected token `(’
./xgoogle_mod.py: line 4: `gs = GoogleSearch(”quick and dirty”)’
Whyever is it looking for /var/mail/xgoogle, when I have it loaded in dist-pkgs where the interactive imports it just fine. What am I missing?
December 22nd, 2009 at 3:26 am
OK. Never mind, I got it working. back-tics crawled in somehow. gs=GoogleSearch(`name’)
January 11th, 2010 at 1:51 pm
Thanks a lot for the awesome code man. This was exactly what I was looking for in order to write a lyric downloader for amarok in python.
January 11th, 2010 at 6:21 pm
“Google’s Terms of Service do not allow the sending of automated queries of any sort to our system without express permission in advance from Google.”
If every website adopted this policy, Google themselves would be out of business tomorrow. Or maybe compete with DMOZ. There probably exists no company that sends out more automated queries than Google. It’s not exactly the height of hypocrisy, because they do respect robots.txt and if you don’t want to be there you don’t have to. However, we all know that if you are not to be found on google, you don’t exist. Descartes famously said, “Cogito ergo sum” I think therefore I am. Today he would say, “Above the fold on google ergo sum.”
January 21st, 2010 at 2:42 am
Thanks for the handy script. I modified browser.py to support all languages when it parses from/to/total numbers:
http://www.cs.bris.ac.uk/pgrad/ebrahimi/tmp/browser.py
January 26th, 2010 at 2:39 pm
I’d like to modify the lib to search in differents web sites: google.com, google.com.SOME_COUNTRY, blogsearch.google.com, news.google.com
I’ll try this afternoon. Do you think it’s an easy task? I’ll send you a patch then.
January 26th, 2010 at 2:46 pm
It shouldn’t be difficult, Juanjo.
Clone latest code from github:
xgoogle at github.
January 28th, 2010 at 10:29 am
[…] world’ function then a Google one. So i did a quick Google search and found the xgoogle python module i quickly wrote up the function and started testing it and like clock work i would send it an email […]
January 31st, 2010 at 5:00 pm
[…] you to enter a lis tof words, comma seperated Hope this could come in use EDIT: have a look at this too if you would like to use xgoogle for more features like search engine ranking […]
February 2nd, 2010 at 8:19 am
[…] se puede optar por lo que ya se ha codificado como la libreria xgoogle de Peteris Krumins, con una filosofía […]