I am now on Twitter! Meet me on Twitter here (my nick is pkrumins.)
Or on Google Buzz and Facebook.

As promised in my previous post on xgoogle library, I have added a module to get results from Google Sets.
Google Sets allows to automatically create groups of related items from a few example items. For example, you feed it “red, green, blue,” and it will predict other colors such as “yellow, black, white, brown, etc.”
One of the most fascinating applications that this library can be used for is predicting domain names. Most sysadmins have a coherent naming policy for their systems. For example, a sysadmin at a university might call his machines “psychology.university.edu”, “art.university.edu”, “geography.university.edu”, etc. Now, if we feed these names “psychology, art, geography” to Google Sets, it would come up with more names such as “history, mathematics, biology, and others”. Now we can do DNS scans to find if there really are such machines. This is a pretty powerful method for reconnaissance.
There are many other interesting applications. Black hat SEO’s may use it to stuff their pages with related keywords and thus rank for more words on search engines. Linguists can use it for various natural language processing problems. Various word guessing games can be created.
But my personal goal in writing this library was to use it for my English language perfection and correction tool that I will release in one of the next posts about this project. I wrote more about this idea in the introductory post of xgoogle library. Please see that post for more info.
The new module is called “googlesets“, and to use it, import “GoogleSets” and create an object of this type. Pass the list of items to create the prediction from to the constructor. Then use “get_results()” member function to get the list of predicted items. It returns a list of Unicode strings, so make sure to use a proper encoding when outputting them.
Here is an example usage of the new module. It finds items related to programming languages “python” and “perl”:
from xgoogle.googlesets import GoogleSets
gs = GoogleSets(['python', 'perl'])
items = gs.get_results()
for item in items:
print item.encode('utf8')
Output:
python perl php ruby java javascript c++ c cgi tcl c#
The output matches that of Google Sets itself:
See the readme.txt file in the xgoogle archive for more examples.
Download: xgoogle library (.zip)
Downloaded: 4182 times.
Download url: http://www.catonmat.net/download/xgoogle.zip
Have fun and let me know if you find this library useful in any way in your own projects.
Did you like this post? Subscribe here:
If you really enjoyed the post, I'd appreciate a gift from my geeky Amazon book wishlist. Books would make me more educated and I could write even better posts. Thanks! :)

(8 votes, average: 4.13 out of 5)
|
|
|


August 14th, 2009 at 1:31 am
Thanks for the code and its liberal license. Users may wish to
note, however, that the Google Sets license allows only personal, not commercial use.
August 14th, 2009 at 6:50 am
Nobody, I’m changing the license today. Someone with good knowledge about licensing made a great suggestion today and I will follow his advice! Here is his suggestion.
August 19th, 2009 at 6:55 pm
I suggest that you register xgoogle in PyPI - http://pypi.python.org/
One can then simple use “easy_install xgoogle” to install the library.
When you register, be sure to give a direct link to the tarball .. or upload the tarball using the ‘upload’ command in setup.py.
September 14th, 2009 at 1:25 am
Hey Peter, this is a really cool library. I’ve been following this blog since your RedditRiver posts, and I always love these periodic script releases.
December 2nd, 2009 at 8:44 pm
[…] Python Library for Google Translate * Python Library for Google Sets * Python Library for Searching Adwords * Python Library for Google […]
December 8th, 2009 at 3:59 pm
[…] An article about Google sets module on catonmat. […]