Follow me on Twitter for my latest adventures!
Google Sets allows to automatically create groups of related items from a few example items. For example, you feed it "red, green, blue," and it will predict other colors such as "yellow, black, white, brown, etc."
One of the most fascinating applications that this library can be used for is predicting domain names. Most sysadmins have a coherent naming policy for their systems. For example, a sysadmin at a university might call his machines "psychology.university.edu", "art.university.edu", "geography.university.edu", etc. Now, if we feed these names "psychology, art, geography" to Google Sets, it would come up with more names such as "history, mathematics, biology, and others". Now we can do DNS scans to find if there really are such machines. This is a pretty powerful method for reconnaissance.
There are many other interesting applications. Black hat SEO's may use it to stuff their pages with related keywords and thus rank for more words on search engines. Linguists can use it for various natural language processing problems. Various word guessing games can be created.
But my personal goal in writing this library was to use it for my English language perfection and correction tool that I will release in one of the next posts about this project. I wrote more about this idea in the introductory post of xgoogle library. Please see that post for more info.
The new module is called "googlesets", and to use it, import "GoogleSets" and create an object of this type. Pass the list of items to create the prediction from to the constructor. Then use "get_results()" member function to get the list of predicted items. It returns a list of Unicode strings, so make sure to use a proper encoding when outputting them.
Here is an example usage of the new module. It finds items related to programming languages "python" and "perl":
from xgoogle.googlesets import GoogleSets gs = GoogleSets(['python', 'perl']) items = gs.get_results() for item in items: print item.encode('utf8')
The output matches that of Google Sets itself:
See the readme.txt file in the xgoogle archive for more examples.
Download "xgoogle" library:
Download: xgoogle library (.zip)
Downloaded: 32546 times.
Download url: http://www.catonmat.net/download/xgoogle.zip
Have fun and let me know if you find this library useful in any way in your own projects.