Google Python Search LibraryAs promised in my previous post on xgoogle library, I have added a module to get results from Google Sets.

Google Sets allows to automatically create groups of related items from a few example items. For example, you feed it "red, green, blue," and it will predict other colors such as "yellow, black, white, brown, etc."

One of the most fascinating applications that this library can be used for is predicting domain names. Most sysadmins have a coherent naming policy for their systems. For example, a sysadmin at a university might call his machines "psychology.university.edu", "art.university.edu", "geography.university.edu", etc. Now, if we feed these names "psychology, art, geography" to Google Sets, it would come up with more names such as "history, mathematics, biology, and others". Now we can do DNS scans to find if there really are such machines. This is a pretty powerful method for reconnaissance.

There are many other interesting applications. Black hat SEO's may use it to stuff their pages with related keywords and thus rank for more words on search engines. Linguists can use it for various natural language processing problems. Various word guessing games can be created.

But my personal goal in writing this library was to use it for my English language perfection and correction tool that I will release in one of the next posts about this project. I wrote more about this idea in the introductory post of xgoogle library. Please see that post for more info.

The new module is called "googlesets", and to use it, import "GoogleSets" and create an object of this type. Pass the list of items to create the prediction from to the constructor. Then use "get_results()" member function to get the list of predicted items. It returns a list of Unicode strings, so make sure to use a proper encoding when outputting them.

Here is an example usage of the new module. It finds items related to programming languages "python" and "perl":

from xgoogle.googlesets import GoogleSets
gs = GoogleSets(['python', 'perl'])
items = gs.get_results()
for item in items:
  print item.encode('utf8')

Output:

python
perl
php
ruby
java
javascript
c++
c
cgi
tcl
c#

The output matches that of Google Sets itself:

Google Sets Predicted Items from Perl and Python

See the readme.txt file in the xgoogle archive for more examples.

Download "xgoogle" library:

Download: xgoogle library (.zip)
Downloaded: 21947 times.
Download url: http://www.catonmat.net/download/xgoogle.zip

Have fun and let me know if you find this library useful in any way in your own projects.

Comments

Nobody Permalink
August 14, 2009, 01:31

Thanks for the code and its liberal license. Users may wish to
note, however, that the Google Sets license allows only personal, not commercial use.

August 14, 2009, 06:50

Nobody, I'm changing the license today. Someone with good knowledge about licensing made a great suggestion today and I will follow his advice! Here is his suggestion.

srid Permalink
August 19, 2009, 18:55

I suggest that you register xgoogle in PyPI - http://pypi.python.org/

One can then simple use "easy_install xgoogle" to install the library.

When you register, be sure to give a direct link to the tarball .. or upload the tarball using the 'upload' command in setup.py.

Alec Henriksen Permalink
September 14, 2009, 01:25

Hey Peter, this is a really cool library. I've been following this blog since your RedditRiver posts, and I always love these periodic script releases.

April 16, 2010, 06:27

It is a cool python module, your idea is very great.

March 20, 2014, 02:41

This is just the information I am finding everywhere. blog Me and my friend were arguing about an issue similar to this! Now I know that I was right.Thanks for the information you post. I just subscribe your blog. This is a nice blog

Leave a new comment

(why do I need your e-mail?)

(Your twitter name, if you have one. (I'm @pkrumins, btw.))

Type the first letter of your name: (just to make sure you're a human)

Please preview the comment before submitting to make sure it's OK.

Advertisements