
On my way back home from university (30 minutes+) I just love to read news from my favorite social news site reddit.com. A few weeks ago I saw this 'Ask Reddit' post which asked if we could get a reddit version for mobile phones. Well, I thought, it's a cool project and I can do it quickly.
While scanning through the comments of 'Ask Reddit' post, I noticed davidlvann's comment where he said that Digg.com already had almost a plain text version of Digg, called DiggRiver.com.
It didn't take me long to do a
$ whois redditriver.com No match for "REDDITRIVER.COM".
to find that the domain RedditRiver.com was not registered! What a great name for a project! I quickly mailed my friend Alexis [kn0thing] Ohanian at Reddit (check his alien blog) to ask a permission to do a Reddit River project. Sure enough, he registered the domain for me and I was free to make it happen!
I'll describe how I made the site, and I will release full source code.
Update: The project is now live!
Update: Full source code is now available! It includes all the scripts mentioned here!
Download full redditriver.com source code (downloaded 5381 times)
My language of choice for this project is Python, the same language reddit.com is written in.
This is actually the first real project I am doing in Python (I'm a big Perl fan). I have a good overall understanding of Python but I have never done a project from the ground up! Before doing the project I watched a few Python video lectures and read a bunch of articles to get into a mindset of a Pythonista.
Designing Stages of RedditRiver.com
The main goal of the project was to create a very lightweight version of reddit, which would monitor for story changes (as they get up/down voted) on several pages across the most popular popular subreddits, and which would find mobile versions of stories posted (what I mean is rewrite URLs, say, a post to The Washington Post gets rewritten to the print version of the same article, or a link to youtube.com gets rewritten to the mobile version of yotube.com -- m.youtube.com, etc.).
The project was done in several separate steps.
- First, I set up the web server to handle Python applications,
- Then I created a few Python modules to extract contents of Reddit website,
- Next I created an SQLite database and wrote a few scripts to save the extracted data,
- Then I wrote a Python module to discover mobile versions of given web pages,
- Finally, I created the web.py application to handle requests to RedditRiver.com!
Setting up the Web Server
I am very lucky to have a full dedicated server sponsored by ZigZap - We Are Tech (I seriously recommend them if you are looking for a great hosting!). Being an experienced Linux user, I asked them for a pure Linux server with no software or control panels pre-installed and that's exactly what I got! Thanks, ZigZap! :)
I already run this blog and picurls.com on the server and I had chosen lighttpd web server and PHP programming language for these two projects. To get RedditRiver running, I had to add Python support to the web server.
I decided to run web.py web framework to serve the HTML contents because of its simplicity and because Reddit guys used it themselves after rewriting Reddit from Lisp to Python.
Following the install instructions, getting web.py running on the server was as simple as installing the web.py package!
It was also just as easy to get lighttpd web server to communicate with web.py and my application. This required flup package to be installed to allow lighttpd to interface with web.py.
Update: after setting it all up, and experimenting a bit with web.py (version 0.23) and Cheetah's templates, I found that for some mysterious reason web.py did not handle "#include" statements of the templates. The problem was with web.py's 'cheetah.py' file, line 23, where it compiled the regular expression for handling "#include" statements:
r_include = re_compile(r'(?!\\)#include \"(.*?)\"($|#)', re.M)
When I tested it out in interpreter,
>>> r_include = re.compile(r'(?!\\)#include \"(.*?)\"($|#)', re.M)
>>> r_include.search('#include "foo"').groups()
('foo', '')
>>> r_include.search('foo\n#include "bar.html"\nbaz').groups()
('bar.html', '')
it found #include's accross multiline text lines just fine, but it did not work with my template files. I tested it like 5 times and just couldn't get it why it was not working.
As RedditRiver is the only web.py application running on my server, I easily patched that regex on line 23 to something trivial and it all started working! I dropped all the negative lookahead magic and checking for end of the line:
r_include = re_compile(r'#include "(.*?)"', re.M)
As I said, I am not sure why the original regex did not work in the web.py application, but did work in the interpreter. If anyone knows what happened, I will be glad to hear from you! :)
Accessing Reddit Website via Python
I wrote several Python modules (which also work as executables) to access information on Reddit - stories across multiple pages of various subreddits (and front page) and user created subreddits.
As Reddit still does not provide an API to access the information on their site, I had to extract the relevant information from the HTML content of the pages.
The first module I wrote is called 'subreddits.py' which accesses http://reddit.com/reddits and returns (or prints out, if used as an executable) the list of the most popular subreddits (a subreddit is a reddit for a specific topic, for example, programming or politics)
Get this program here: subreddit extractor (redditriver.com project) (downloaded: 3310 times).
This module provides three useful functions:
- get_subreddits(pages=1, new=False), which gets 'pages' pages of subreddits and returns a list of dictionaries of them. If new is True, gets 'pages' pages of new subreddits (http://reddit.com/reddits/new),
- print_subreddits_paragraph(), which prints subreddits information in human readable format, and
- print_subreddits_json(), which prints it in JSON format. The output is in utf-8 encoding.
The way this module works can be seen from the Python interpreter right away:
>>> import subreddits
>>> srs = subreddits.get_subreddits(pages=2)
>>> len(srs)
50
>>> srs[:5]
[{'position': 1, 'description': '', 'name': 'reddit.com', 'subscribers': 11031, 'reddit_name': 'reddit.com'}, {'position': 2, 'description': '', 'name': 'politics', 'subscribers': 5667, 'reddit_name': 'politics'}, {'position': 3, 'description': '', 'name': 'programming', 'subscribers': 9386, 'reddit_name': 'programming'}, {'position': 4, 'description': 'Yeah reddit, you finally got it. Context appreciated.', 'name': 'Pictures and Images', 'subscribers': 4198, 'reddit_name': 'pics'}, {'position': 5, 'description': '', 'name': 'obama', 'subscribers': 651, 'reddit_name': 'obama'}]
>>>
>>> from pprint import pprint
>>> pprint(srs[3:5])
[{'description': 'Yeah reddit, you finally got it. Context appreciated.',
'name': 'Pictures and Images',
'reddit_name': 'pics',
'subscribers': 4198},
{'description': '',
'name': 'obama',
'reddit_name': 'obama',
'subscribers': 651}]
>>>
>>> subreddits.print_subreddits_paragraph(srs[3:5])
position: 4
name: Pictures and Images
reddit_name: pics
description: Yeah reddit, you finally got it. Context appreciated.
subscribers: 4198
position: 5
name: obama
reddit_name: obama
description:
subscribers: 651
>>>
>>> subreddits.print_subreddits_json(srs[3:5])
[
{
"position": 4,
"description": "Yeah reddit, you finally got it. Context appreciated.",
"name": "Pictures and Images",
"subscribers": 4198,
"reddit_name": "pics"
},
{
"position": 4,
"description": "",
"name": "obama",
"subscribers": 651,
"reddit_name": "obama"
}
]
Or it can be called from the command line:
$ ./subreddits.py --help usage: subreddits.py [options] options: -h, --help show this help message and exit -oOUTPUT Output format: paragraph or json. Default: paragraph. -pPAGES How many pages of subreddits to output. Default: 1. -n Retrieve new subreddits. Default: nope.
This module reused the awesome BeautifulSoup HTML parser module, and simplejson JSON encoding module.
The second program I wrote is called 'redditstories.py' which accesses the specified subreddit and gets the latest stories from it. It was written pretty much the same way I did it for redditmedia project in Perl.
Get this program here: reddit stories extractor (redditriver.com project) (downloaded: 2393 times).
This module also provides three similar functions:
- get_stories(subreddit='front_page', pages=1, new=False), which gets 'pages' pages of stories from subreddit and returns a list of dictionaries of them. If new is True, gets new stories only,
- print_stories_paragraph(), which prints subreddits information in human readable format, and
- print_stories_json(), which prints it in JSON format. The output is in utf-8 encoding.
It can also be used as a Python module or executable.
Here is an example of using it as a module:
>>> import redditstories
>>> s = redditstories.get_stories(subreddit='programming')
>>> len(s)
25
>>> s[2:4]
[{'title': "when customers don't pay attention and reply to a "donotreply.com" email address, it goes to Chet Faliszek, a programmer in Seattle", 'url': 'http://consumerist.com/371600/the-man-who-owns-donotreplycom-knows-all-the-secrets-of-the-world', 'unix_time': 1206408743, 'comments': 54, 'subreddit': 'programming', 'score': 210, 'user': 'srmjjg', 'position': 3, 'human_time': 'Tue Mar 25 03:32:23 2008', 'id': '6d8xl'}, {'title': 'mysql --i-am-a-dummy', 'url': 'http://dev.mysql.com/doc/refman/4.1/en/mysql-tips.html#safe-updates', 'unix_time': 1206419543, 'comments': 59, 'subreddit': 'programming', 'score': 135, 'user': 'enobrev', 'position': 4, 'human_time': 'Tue Mar 25 06:32:23 2008', 'id': '6d9d3'}]
>>> from pprint import pprint
>>> pprint(s[2:4])
[{'comments': 54,
'human_time': 'Tue Mar 25 03:32:23 2008',
'id': '6d8xl',
'position': 3,
'score': 210,
'subreddit': 'programming',
'title': "when customers don't pay attention and reply to a "donotreply.com" email address, it goes to Chet Faliszek, a programmer in Seattle",
'unix_time': 1206408743,
'url': 'http://consumerist.com/371600/the-man-who-owns-donotreplycom-knows-all-the-secrets-of-the-world',
'user': 'srmjjg'},
{'comments': 59,
'human_time': 'Tue Mar 25 06:32:23 2008',
'id': '6d9d3',
'position': 4,
'score': 135,
'subreddit': 'programming',
'title': 'mysql --i-am-a-dummy',
'unix_time': 1206419543,
'url': 'http://dev.mysql.com/doc/refman/4.1/en/mysql-tips.html#safe-updates',
'user': 'enobrev'}]
>>> redditstories.print_stories_paragraph(s[:1])
position: 1
subreddit: programming
id: 6daps
title: Sign Up Forms Must Die
url: http://www.alistapart.com/articles/signupforms
score: 70
comments: 43
user: markokocic
unix_time: 1206451943
human_time: Tue Mar 25 15:32:23 2008
>>> redditstories.print_stories_json(s[:1])
[
{
"title": "Sign Up Forms Must Die",
"url": "http:\/\/www.alistapart.com\/articles\/signupforms",
"unix_time": 1206451943,
"comments": 43,
"subreddit": "programming",
"score": 70,
"user": "markokocic",
"position": 1,
"human_time": "Tue Mar 25 15:32:23 2008",
"id": "6daps"
}
]
Using it from a command line:
$ ./redditstories.py --help
usage: redditstories.py [options]
options:
-h, --help show this help message and exit
-oOUTPUT Output format: paragraph or json. Default: paragraph.
-pPAGES How many pages of stories to output. Default: 1.
-sSUBREDDIT Subreddit to retrieve stories from. Default:
reddit.com.
-n Retrieve new stories. Default: nope.
These two programs just beg to be converted into a single Python module. They have the same logic with just a few changes in the parser. But for the moment I am generally happy, and they serve the job well. They can also be understood individually without having a need to inspect several source files.
I think that one of the future posts could be a reddit information accessing library in Python.
I can already think of one hundred ideas what someone can do with such a library. For example, one could print out top programming stories his or her shell:
$ echo "Top five programming stories:" && echo && ./redditstories.py -s programming | grep 'title' | head -5 && echo && echo "Visit http://reddit.com/r/programming to view them!" Top five programming stories: title: Sign Up Forms Must Die title: You can pry XP from my cold dead hands! title: mysql --i-am-a-dummy title: when customers don't pay attention and reply to a "donotreply.com" email address, it goes to Chet Faliszek, a programmer in Seattle title: Another canvas 3D Renderer written in Javascript Visit http://reddit.com/r/programming to view them!
Creating and Populating the SQLite Database
The database choice for this project is SQLite, as it is fast, light and this project is so simple, that I can't think of any reason to use a more complicated database system.
The database has a trivial structure with just two tables 'subreddits' and 'stories'.
CREATE TABLE subreddits ( id INTEGER PRIMARY KEY AUTOINCREMENT, reddit_name TEXT NOT NULL UNIQUE, name TEXT NOT NULL UNIQUE, description TEXT, subscribers INTEGER NOT NULL, position INTEGER NOT NULL, active BOOL NOT NULL DEFAULT 1 ); INSERT INTO subreddits (id, reddit_name, name, description, subscribers, position) VALUES (0, 'front_page', 'reddit.com front page', 'since subreddit named reddit.com has different content than the reddit.com frontpage, we need this', 0, 0); CREATE TABLE stories ( id INTEGER PRIMARY KEY AUTOINCREMENT, title TEXT NOT NULL, url TEXT NOT NULL, url_mobile TEXT, reddit_id TEXT NOT NULL, subreddit_id INTEGER NOT NULL, score INTEGER NOT NULL, comments INTEGER NOT NULL, user TEXT NOT NULL, position INTEGER NOT NULL, date_reddit UNIX_DATE NOT NULL, date_added UNIX_DATE NOT NULL ); CREATE UNIQUE INDEX idx_unique_stories ON stories (title, url, subreddit_id);
The 'subreddits' table contains information extracted by 'subreddits.py' module (described earlier). It keeps the information and positions of all the subreddits which appeared on the most popular subreddit page (http://reddit.com/reddits).
Reddit lists 'reddit.com' as a separate subreddit on the most popular subreddit page, but it turned out that it was not the same as the front page of reddit! That's why I insert a fake subreddit called 'front_page' in the table right after creating it, to keep track of both 'reddit.com' subreddit and reddit's front page.
The information in the table is updated by a new program - update_subreddits.py.
View: subreddit table updater (redditriver.com project) (downloaded: 1790 times)
The other table, 'stories' contains information extracted by 'redditstories.py' module (also described earlier).
The information in this table is updated by another new program - update_stories.py.
As it is impossible to keep track of all the scores and comments, and position changes across all the subreddits, the program monitors just a few pages on each of the most popular subreddits.
View: story table updater (redditriver.com project) (downloaded: 1775 times)
These two programs are run periodically by crontab (task scheduler in unix). The program update_subreddits.py gets run every 30 minutes and update_stories.py every 5 minutes.
Finding the Mobile Versions of Given Websites
This is probably the most interesting piece of software that I wrote for this project. The idea is to find versions of a website suitable for viewing on a mobile device.
For example, most of the stories on politics subreddit link to the largest online newspapers and news agencies, such as The Washington Post or MSNBC. These websites provide a 'print' version of the page which is ideally suitable for mobile devices.
Another example is websites who have designed a real mobile version of their page and let the user agent know about it by placing <link rel="alternate" media="handheld" href="..."> tag in the head section of an html document.
I wrote an 'autodiscovery' Python module called 'autodiscover.py'. This module is used by the update_stories.py program described in the previous section. After getting the list of new reddit stories, the update_stories.py tries to autodiscover a mobile version of the story and if it is successful, it places it in 'url_mobile' column of the 'stories' table.
Here is an example run from Python interpreter of the module:
>>> from autodiscovery import AutoDiscovery
>>> ad = AutoDiscovery()
>>> ad.autodiscover('http://www.washingtonpost.com/wp-dyn/content/article/2008/03/24/AR2008032402969.html')
'http://www.washingtonpost.com/wp-dyn/content/article/2008/03/24/AR2008032402969_pf.html'
>>> ad.autodiscover('http://www.msnbc.msn.com/id/11880954/')
'http://www.msnbc.msn.com/id/11880954/print/1/displaymode/1098/'
And it can also be used from command line:
$ ./autodiscovery.py http://www.washingtonpost.com/wp-dyn/content/article/2008/03/24/AR2008032402969.html http://www.washingtonpost.com/wp-dyn/content/article/2008/03/24/AR2008032402969_pf.html
Source: mobile webpage version autodisovery (redditriver.com project) (downloaded 2735 times)
This module actually uses a configuration file 'autodisc.conf' which defines patterns to look for in the web page's HTML code. At the moment the config file is pretty primitive and defines just three configuration options:
- REWRITE_URL defines a rule how to rewrite URL of a website which makes it difficult to autodiscover the mobile link easily. For example, a page could use JavaScript to pop-up the print version of the page. In such a case REWRITE_URL rule can be used to match the host which uses this technique and rewrite part of the url to another.
- PRINT_LINK defines how a print link might look like. For example, it could say 'print this page' or 'print this article'. This directive defines such phrases to look for.
- IGNORE_URL defines urls to ignore. For example, a link to a flash animation should definitely be ignored, as it does not define a mobile version at all. You can place the .swf extension in this ignore list to avoid it being downloaded by autodiscovery.py.
Configuration used by autodiscovery.py: autodiscovery configuration (redditriver.com project) (downloaded 2695)
Creating the web.py Application
The final part to the project was creating the web.py application.
It was pretty straight forward to create it as it only required writing the correct SQL expressions for selecting the right data out of the database.
Here is how the controller for the web.py application looks like:
urls = (
'/', 'RedditRiver',
'/page/(\d+)/?', 'RedditRiverPage',
'/r/([a-zA-Z0-9_.-]+)/?', 'SubRedditRiver',
'/r/([a-zA-Z0-9_.-]+)/page/(\d+)/?', 'SubRedditRiverPage',
'/reddits/?', 'SubReddits',
'/stats/?', 'Stats',
'/stats/([a-zA-Z0-9_.-]+)/?', 'SubStats',
'/about/?', 'AboutRiver'
)
The first version of reddit river implements browsable front stories (RedditRiver and RedditRiverPage classes), browsable subreddit stories (SubRedditRiver and SubRedditRiverPage classes), list of the most popular subreddits (SubReddits class), front page and subreddit statistics (most popular stories and most active users, Stats and SubStats classes) and an about page (AboutRiver class).
The source code: web.py application (redditriver.com project) (downloaded: 2735 times)
Release
I have put it online! Click redditriver.com to visit the site.
I have also released the source code. Here are all the files mentioned in the article, and a link to the whole website package.
Download Programs which Made Reddit River Possible
All the programs in a single .zip:
Download link: full redditriver.com source code
Downloaded: 5381 times
Individual scripts:
Download link: subreddit extractor (redditriver.com project)
Downloaded: 3310 times
Download link: reddit stories extractor (redditriver.com project)
Downloaded: 2393 times
Download link: subreddit table updater (redditriver.com project)
Downloaded: 1790 times
Download link: story table updater (redditriver.com project)
Downloaded: 1775 times
Download link: mobile webpage version autodisovery (redditriver.com project)
Downloaded: 2735 times
Download link: autodiscovery configuration (redditriver.com project)
Downloaded: 2695 times
Download link: web.py application (redditriver.com project)
Downloaded: 2257 times
All these programs are released under GNU GPL license, so you may derive your own stuff, but do not forget to share your derivative work with everyone!
Vote for this article:
Alexis recently sent me a reddit t-shirt for doing redditmedia project, I decided to take a few photos wearing it :)

Have fun and I hope to hear a lot of positive feedback on redditriver project :)
Learning Python Programming Language Through Video Lectures
One of the upcoming projects I am doing (I will reveal it in one of the next blog posts.) is going to be written entirely in Python. I have a good understanding of Python but, same as I had with JavaScript, I have little experience doing projects from the ground up in it.
Update: the project was redditriver.com, read designing redditriver.com (includes full source code).
Before diving into the project I decided to take a look at a few Python video lectures to learn language idioms and features which I might have not heard of.
Finding Python video lectures was pretty easy as I run a free video lecture blog.
First Python Lecture: Python for Programmers
Interesting moments in the lecture:
- [07:15] There are several Python implementations - CPython, PyPy, IronPython and Jython.
- Python has similarities with [12:04] Java, [15:30] C++ and [19:05] C programming languages.
- [15:37] Python is multi-paradigm language supporting object oriented, procedural, generic and functional programming paradigms.
- [19:49] Python follows C standard's rationale: 1. trust the programmer; 2. don't prevent the programmer from doing what needs to be done; 3. keep the language small and simple; 4. provide only one way to do an operation.
- [13:02] Python code is normally implicitly compiled to bytecode.
- [13:25] Everything inherits from object.
- [14:56] Garbage collection in classic Python happens as soon as possible.
- [24:50] Python has strong but dynamic typing.
- [28:42] Names don't have types, objects do.
- [36:25] Why are there two ways to raise a number to a power (with double star ** operator and pow())? - Because pow() is a three argument function pow(x, y, z) which does x^y mod z.
- [36:52] Python supports plain and Unicode strings.
- [38:40] Python provides several built-in container types: tuple's, list's, set's, frozenset's and dict's.
- [41:55] c[i:j:k] does slicing with step k.
- [42:45] c[i:j] always has first bound included and last bound excluded.
- [44:11] Comparisons can be "chained", for example 3 < x < 9.
- [45:05] False values in Python are 0, "", None, empty containers and False.
- [49:07] 'for' is implemented in terms of iterators.
- [52:18] Function parameters may end with *name to take a tuple of arbitrary arguments, or may end with **name to take a dict of arbitrary arguments.
- [55:39] Generators.
- [01:00:20] Closures.
- [01:02:00] Classes.
- [01:05:30] Subclassing.
- [01:07:00] Properties.
- [01:14:35] Importing modules.
- [01:16:20] Every Python source file is a module, and you can just import it.
- [01:17:20] Packages.
Okay, this talk was a very basic talk and it really was an introduction for someone who never worked in Python. I could not find many interesting points to point out from the lecture, so the last 8 points are just titles of topics covered in the lecture.
Second Python Lecture: Advanced Python or Understanding Python
Interesting moments in the lecture:
- [03:18] Python is designed by implementation.
- [04:20] Everything is runtime (even compiletime is runtime).
- [04:42] A namespace is a dict.
- [05:33] A function is created by having its code compiled to code object, then wrapped as a function object.
- [10:00] Everything is an object and a reference, except variables.
- [11:00] Python has 3-scopes rule - names are either local, global or builtin.
- [11:12] Global names mean they exist in a module, not everywhere!
- [14:02] 'import mod' statement is just a syntactic sugar for mod = __import__("mod").
- [14:15] sys.modules contains a list of cached modules.
- [14:30] You may set the value of a module name in sys.modules dict to None, to make it unimportable.
- [15:20] Mutable objects are not hashable, most immutable objects are hashable.
- [18:05] Assignments, type checks, identity comparison, 'and or not', method calls are not object hooks.
- [22:15] Any Python object has two special attributes __dict__ which holds per object data and __class__ which refers to the class.
- [27:18] Iterators are not rewindable, reversible or copyable.
- [29:04] Functions with yield return generators.
- [39:20] "New" style classes unified C types and Python classes.
- [47:00] __slots__ prevent arbitrary attribute assignments.
- [48:10] __new__ gets called when the object gets created (__init__ gets called when the object has already been constructed).
- [01:01:40] Inheritance is resolved using a C3 Method Resolution Order algorithm.
- [01:04:57] Unicode in Python.
- [01:06:45] UTF8 is not Unicode, it's a Unicode encoding!
- [01:11:50] codecs module automatically converts between encodings.
- [01:13:00] Recommended reading - Functional Programming HOWTO and Python source code ;)
This lecture gets pretty complicated towards the end as the lecturer goes deep into subjects which require adequate experience with Python.
Third Python Lecture: Python: Design and Implementation
Video Lecture (wmv, 93 MB)
Download Lecture Slides
Interesting moments in the lecture:
- [01:27] Python started in late 1989, around December 1989.
- [01:57] Python's named after Monty Python's Flying Circus.
- [06:20] Python was first released to USENET and then a public group comp.lang.python was started.
- [08:06] Guido van Rossum, the author of Python, moved to US in 1995.
- [09:58] Python will never become a commercial project thanks to Python Software Foundation, founded in 2001.
- [11:23] Python origins go back to ideas from ABC programming language (indentation for statement grouping, simple control structures, small number of data types).
- [13:01] Being on ABC's implementation team, Guido learned a lot about language design and implementation.
- [16:37] One of the main goals of Python was to make programmer's productivity more important than program's performance.
- [17:10] Original positioning of Python was in the middle between C and sh.
- [21:13] Other languages, such as, Modula-3, Icon and Algol 68 also had an impact on Python's implementation details.
- [24:32] If a feature can be implemented as a clear extension module, it is always preferable to changing the language itself.
- [25:23] The reason Python uses dictionaries for namespaces is that it required minimal changes to the stuff the language already had.
- [28:11] Language features are accepted only if they will be used by a wide variety of users. A recent example of a new language feature is the 'with' statement.
- [31:13] Question from the audience - "Can't the 'with' statement be implemented via closures?"
- [34:25] Readable code is the most important thing.
- [37:57] To add a new language feature, PEP, Python Enhancement Proposal has to be written.
- [40:47] Python's goal was to be cross-platform (hardware & OS) right from the beginning.
- [47:09] Python's lexer has a stack to parse indentation.
- [49:20] Two passes are run over abstract syntax tree, one to generate symbol table and the other to produce bytecode.
- [50:20] Bytecode opcodes are very high level, close to conceptual primitive operations in language, rather close to what hardware could do.
- [01:02:54] Jython generates pure Java bytecode.
- [01:03:01] Jython's strings are always Unicode.
- [01:06:45] IronPython is as fast or even faster than CPython.
Question and answer session:
- [01:08:57] Have there been attempts to compile Python to machine code (for example, x86)?
- [01:13:46] Why not use simple tail recursion?
- [01:16:09] How does the garbage collection work?
This video lecture gives an insight on history and development ideas of Python language. I believe it is important to know the history and details of the language design decisions to be really competent in it.
There are a few more lectures I have found:
- Advanced Topics in Programming Languages Series: Python Design Patterns (Part 1)
- Advanced Topics in Programming Languages Series: Python Design Patterns (part 2)
- Google Developers Day US - Python Design Patterns (a shorter/summary version of previous two)
- Python 3000 (Lecture on the next major Python version)
- ReUsable Web Components with Python and Future Python Web Development
- Introduction to Python for Plone developers (.mov, 184MB) and Download Slides.
There is also some great reading material available:
- a great, free book on Python - Dive into Python,
- a must-read article on Python coding style - Code Like a Pythonista: Idiomatic Python,
- Essential Python reading List,
- Ten quirky things about Python,
- How not to write Python code
Have fun learning Python!
PS. Do you know any other video lectures on Python that I haven't mentioned here? Feel free to post them in the comments! Thanks! :)
Let me teach you how to work efficiently with command line history in bash.
This tutorial comes with a downloadable cheat sheet that summarizes (and expands on) topics covered in this guide.
Download PDF cheat sheet: bash history cheat sheet (.pdf) (downloaded: 152838 times)
Download ASCII cheat sheet: bash history cheat sheet (.txt) (downloaded: 13653 times)
Download TEX cheat sheet: bash history cheat sheet (.tex) (downloaded: 5567 times)
In case you are a first time reader, this is the 3rd part of the article series on working efficiently in bourne again shell. Previously I have written on how to work efficiently in vi and emacs command editing modes by using predefined keyboard shortcuts (both articles come with cheat sheets of predefined shortcuts).
First, lets review some basic keyboard shortcuts for navigating around previously typed commands.
As you remember, bash offers two modes for command editing - emacs mode and vi mode. In each of these editing modes the shortcuts for retrieving history are different.
Suppose you had executed the following commands:
$ echo foo bar baz
$ iptables -L -n -v -t nat
$ ... lots and lots more commands
$ echo foo foo foo
$ perl -wle 'print q/hello world/'
$ awk -F: '{print$1}' /etc/passwd
$
and you wanted to execute the last command (awk -F ...).
You could certainly hit the up arrow and live happily along, but do you really want to move your hand that far away?
If you are in emacs mode just try CTRL-p which fetches the previous command from history list (CTRL-n for the next command).
In vi mode try CTRL-[ (or ESC) (to switch to command mode) and 'h' ('j' for the next command).
There is another, equally quick, way to do that by using bash's history expansion mechanism - event designators. Typing '!!' will execute the previous command (more about event designators later).
Now, suppose that you wanted to execute 'iptables -L -n -v -t nat' command again without retyping it.
A naive user would, again, just keep hitting up-arrow key until he/she finds the command. But that's not the way hackers work. Hackers love to work quickly and efficiently. Forget about arrow keys and page-up, page-down, home and end keys. They are completely useless and, as I said, they are too far off from the main part of the keyboard anyway.
In emacs mode try CTRL-r and type a few first letters of 'iptables', like 'ipt'. That will display the last iptables command you executed. In case you had more than one iptables commands executed in between, hitting CTRL-r again will display older entries. In case you miss the right command and move too deep into history list, you can reverse the search direction by hitting CTRL-s (don't forget that by default CTRL-s stops the output to the terminal and you'll get an effect of "frozen" terminal (hit CTRL-q to "unfreeze"), see stty command to change this behavior).
In vi mode the same CTRL-r and CTRL-s still work but there is another way more specific to vi mode.
Switch to command mode by hitting CTRL-[ or ESC and hit '/', then type a first few characters of 'iptables' command, like 'ipt' and hit return. Bash will display the most recent match found in history. To navigate around use 'n' or just plain '/' to repeat the search in the same direction, and 'N' or '?' to repeat the search in opposite direction!
With event designators you may execute only the most recently executed command matching (or starting with) 'string'.
Try '!iptables' history expansion command which refers to the most recent command starting with 'iptables'.
Another way is to use bash's built in 'history' command then grep for a string of interest and finally use an event designator in form '!N', where N is an integer which refers to N-th command in command history list.
For example,
$ history | grep 'ipt' 2 iptables -L -n -v -t nat $ !2 # will execute the iptables command
I remembered another way to execute N-th command in history list in vi editing mode. Type 'N' (command number) and then 'G', in this example '2G'
Listing and Erasing Command History
Bash provides a built-in command 'history' for viewing and erasing command history.
Suppose that we are still working with the same example:
$ echo foo bar baz
$ iptables -L -n -v -t nat
$ ... lots and lots more commands
$ echo foo foo foo
$ perl -wle 'print q/hello world/'
$ awk -F: '{print$1}' /etc/passwd
$
Typing 'history' will display all the commands in bash history alongside with line numbers:
1 echo foo bar baz
2 iptables -L -n -v -t nat
... lots and lots more commands
568 echo foo foo foo
569 perl -wle 'print q/hello world/'
570 awk -F: '{print$1}' /etc/passwd
Typing 'history N', where N is an integer, will display the last N commands in the history.
For example, 'history 3' will display:
568 echo foo foo foo
569 perl -wle 'print q/hello world/'
570 awk -F: '{print$1}' /etc/passwd
history -c will clear the history list and history -d N will delete a history entry N.
By default, the history list is kept in user's home directory in a file '.bash_history'.
History Expansion
History expansion is done via so-called event designators and word designators. Event designators can be used to recall previously executed commands (events) and word designators can be used to extract command line arguments from the events. Optionally, various modifiers can be applied to the extracted arguments.
Event designators are special commands that begin with a '!' (there is also one that begins with a '^'), they may follow a word designator and one or more modifiers. Event designators, word designators and modifiers are separated by a colon ':'.
Event Designators
Lets look at a couple of examples to see how the event designators work.
Event designator '!!' can be used to refer to the previous command, for example,
$ echo foo bar baz foo bar baz $ !! foo bar baz
Here the '!!' executed the previous 'echo foo bar baz' command.
Event designator '!N' can be used to refer to the N-th command.
Suppose you listed the history and got the following output:
1 echo foo foo foo
2 iptables -L -n -v -t nat
... lots and lots more commands
568 echo bar bar bar
569 perl -wle 'print q/hello world/'
570 awk -F: '{print$1}' /etc/passwd
Then the event designator '!569' will execute 'perl ...' command, and '!1' will execute 'echo foo foo foo' command!
Event designator '!-N' refers to current command line minus N. For example,
$ echo foo bar baz foo bar baz $ echo a b c d e a b c d e $ !-2 foo bar baz
Here the event designator '!-2' executed a one before the previous command, or current command line minus 2.
Event designator '!string' refers to the most recent command starting with 'string'. For example,
$ awk --help $ perl --help
Then the event designator '!p' or '!perl' or '!per' will execute the 'perl --help' command. Similarly, '!a' will execute the awk command.
An event designator '!?string?' refers to a command line containing (not necessarily starting with) 'string'.
Perhaps the most interesting event designator is the one in form '^string1^string2^' which takes the last command, replaces string1 with string2 and executes it. For example,
$ ehco foo bar baz bash: ehco: command not found $ ^ehco^echo^ foo bar baz
Here the '^ehco^echo^' designator replaced the incorrectly typed 'ehco' command with the correct 'echo' command and executed it.
Word Designators and Modifiers
Word designators follow event designators separated by a colon. They are used to refer to some or all of the parameters on the command referenced by event designator.
For example,
$ echo a b c d e a b c d e $ echo !!:2 b
This is the simplest form of a word designator. ':2' refers to the 2nd argument of the command (3rd word). In general ':N' refers to Nth argument of the command ((N+1)-th word).
Word designators also accept ranges, for example,
$ echo a b c d e a b c d e $ echo !!:3-4 c d
There are various shortcuts, such as, ':$' to refer to the last argument, ':^' to refer to the first argument, ':*' to refer to all the arguments (synonym to ':1-$'), and others. See the cheat sheet for a complete list.
Modifiers can be used to modify the behavior of a word designators. For example:
$ tar -xvzf software-1.0.tgz software-1.0/file ... $ cd !!:$:r software-1.0$
Here the 'r' modifier was applied to a word designator which picked the last argument from the previous command line. The 'r' modifier removed the trailing suffix '.tgz'.
The 'h' modifier removes the trailing pathname component, leaving the head:
$ echo /usr/local/apache /usr/local/apache $ echo !!:$:h /usr/local
The 'e' modifier removes all but the trailing suffix:
$ ls -la /usr/src/software-4.2.messy-Extension ... $ echo /usr/src/*!!:$:e /usr/src/*.messy-Extension # ls could have been used instead of echo
Another interesting modifier is the substitute ':s/old/new/' modifier which substitutes new for old. It can be used in conjunction with 'g' modifier to do global substitution. For example,
$ ls /urs/local/software-4.2 /urs/local/software-4.3 /usr/bin/ls: /urs/local/software-4.2: No such file or directory /usr/bin/ls: /urs/local/software-4.3: No such file or directory $ !!:gs/urs/usr/ ...
This example replaces all occurances of 'urs' to 'usr' and makes the command correct.
There are a few other modifiers, such as 'p' modifier which prints the resulting command after history expansion but does not execute it. See the cheat sheet for all of the modifiers.
Modifying History Behavior
Bash allows you to modify which commands get stored in the history list, the file where they get stored, the number of commands that get stored, and a few other options.
These options are controlled by setting HISTFILE, HISTFILESIZE, HISTIGNORE and HISTSIZE environment variables.
HISTFILE, as the name suggests, controls where the history file gets saved.
For example,
$ export HISTFILE=/home/pkrumins/todays_history
will save the commands to a file /home/pkrumins/todays_history
Set it to /dev/null or unset it to avoid getting your history list saved.
HISTFILESIZE controls how many history commands to keep in HISTFILE.
For example,
$ export HISTFILESIZE=1000
will keep the last 1000 history commands.
HISTSIZE controls how many history commands to keep in the history list of current session.
For example,
$ export HISTSIZE=42
will keep 42 last commands in the history of current session.
If this number is less than HISTFILESIZE, only that many commands will get written to HISTFILE.
HISTIGNORE controls the items which get ignored and do not get saved. This variable takes a list of colon separated patterns. Pattern '&' (ampersand) is special in a sense that it matches the previous history command.
There is a trick to make history ignore the commands which begin with a space. The pattern for that is "[ ]*"
For example,
$ export HISTIGNORE="&:[ ]*:exit"
will make bash ignore duplicate commands, commands that begin with a space, and the 'exit' command.
There are several other options of interest controlled by the built-in 'shopt' command.
The options may be set by specifying '-s' parameter to the 'shopt' command, and may be unset by specifying '-u' parameter.
Option 'histappend' controls how the history list gets written to HISTFILE, setting the option will append history list of current session to HISTFILE, unsetting it (default) will make HISTFILE get overwritten each time.
For example, to set this option, type:
$ shopt -s histappend
And to unset it, type:
$ shopt -u histappend
Option 'histreedit' allows users to re-edit a failed history substitution.
For example, suppose you had typed:
$ echo foo bar baz
and wanted to substitute 'baz' for 'test' with the ^baz^test^ event designator , but you made a mistake and typed ^boo^test^. This would lead to a substitution failure because the previous command does not contain string 'boo'.
If you had this option turned on, bash would put the erroneous ^baz^test^ event designator back on the command line as if you had typed it again.
Finally, option 'histverify' allows users to verify a substituted history expansion.
Based on the previous example, suppose you wanted to execute that 'echo' command again by using the '!!' event designator. If you had this option on, bash would not execute the 'echo' command immediately but would first put it on command line so that you could see if it had made the correct substitution.
Tuning the Command Prompt
Here is how my command prompt looks:
Wed Jan 30@07:07:03 pkrumins@catonmat:1002:2:~$
The first line displays the date and time the command prompt was displayed so I could keep track of commands back in time.
The second line displays username, hostname, global history number and current command number.
The global history number allows me to quickly use event designators.
My PS1, primary prompt display variable looks like this:
PS1='\d@\t\n\u@\h:\!:\#:\w$ '
Bash History Cheat Sheet
Here is a summary cheat sheet for working effectively with bash history.
This cheat sheet includes:
- History editing keyboard shortcuts (emacs and vi mode),
- History expansion summary - event designators, word designators and modifiers,
- Shell variables and `shopt' options to modify history behavior,
- Examples
Download Bash History Summary Sheet
PDF format (.pdf):
Download link: bash history cheat sheet (.pdf)
Downloaded: 152838 times
ASCII .txt format:
Download link: bash history cheat sheet (.txt)
Downloaded: 13653 times
LaTeX format (.tex):
Download link: bash history cheat sheet (.tex)
Downloaded: 5567 times
This cheat sheet is released under GNU Free Document License.
Are there any tips you want to add?
Working Productively in Bash's Vi Command Line Editing Mode (with Cheat Sheet)
Bash provides two modes for command line editing - emacs and vi. Emacs editing mode is the default and I already wrote an article and created a cheat sheet for this mode.
This time I am going to introduce you to bash's vi editing mode and give out a detailed cheat sheet with the default keyboard mappings for this mode.
The difference between the two modes is what command each key combination (or key) gets bound to. You may inspect your current keyboard mappings with bash's built in bind command:
$ bind -P abort can be found on "\C-g", "\C-x\C-g", "\M-\C-g". accept-line can be found on "\C-j", "\C-m". alias-expand-line is not bound to any keys ...
To get into the vi editing mode type
$ set -o vi
in your bash shell (to switch back to emacs editing mode, type set -o emacs).
If you are used to a vi text editor you will feel yourself at home.
The editing happens in two modes - command mode and insert mode. In insert mode everything you type gets output to the terminal, but in the command mode the keys are used for various commands.
Here are a few examples with screenshots to illustrate the vi editing mode.
Let '[i]' be the position of cursor in insert mode in all the examples and '[c]' be the position of cursor in command mode.
Examples:
Once you have changed the readline editing mode to vi (by typing set -o vi), you will be working in insert mode.
The example will be performed on this command:
$ echo arg1 arg2 arg3 arg4[i]
Example 1:
Suppose you have typed a command with a few arguments and want to insert another argument before an argument which is three words backward.
$ echo arg1 (want to insert arg5 here) arg2 arg3 arg4[i]
Hit 'ESC' to switch to command mode and press '3' followed by 'B':
$ echo arg1 [c]arg2 arg3 arg4
Alternatively you could have hit 'B' three times: 'BBB'.
Now, enter insert mode by hitting 'i' and type 'arg5 '
$ echo arg1 arg5 [i]arg2 arg3 arg4
Example 2:
Suppose you wanted to change arg2 to arg5:
$ echo arg1 [c]arg2 arg3 arg4
To do this, you can type 'cw' which means 'change word' and just type out 'arg5':
$ echo arg1 arg5[c] arg3 arg4
Or even quicker, you can type 'f2r5', where 'f2' moves the cursor right to next occurrence of character '2' and 'r5' replaces the character under the cursor with character '5'.
Example 3:
Suppose you typed a longer command and you noticed that you had made several mistakes, and wanted to do the correction in the vi editor itself. You can type 'v' to edit the command in the editor and not on the command line!
Example 4:
Suppose you typed a long command and remembered that you had to execute another one before it. No need to erase the current command! You can switch to command mode by hitting ESC and then type '#' which will send the current command as a comment in the command history. After you type the command you had forgotten, you may go two commands back in history by typing 'kk' (or '2k'), erase the '#' character which was appended as a comment and execute the command, this makes the whole command look like 'ESC 2k0x ENTER'.
These are really basic examples, and it doesn't get much more complex than this. You should check out the cheat sheet for other tips and examples, and try them out!
To create the cheat sheet, I downloaded bash-2.05b source code and scanned through lib/readline/vi_keymap.c source code file and lib/readline/vi_mode.c to find all the default key bindings.
It turned out that the commands documented in vi_keymap.c were all documented in man 3 readline and I didn't find anything new.
After that I checked bashline.c source file function initialize_readline to find how the default keyboard shortcuts were changed. I found that 'CTRL-e' (which switched from vi mode to emacs) got undefined, 'v' got defined which opens the existing command in the editor, and '@' which replaces a macro key (char) with the corresponding string.
The cheat sheet includes:
- Commands for entering input mode,
- Basic movement commands,
- Character finding commands,
- Character finding commands,
- Deletion commands,
- Undo, redo and copy/paste commands,
- Commands for history manipulation,
- Completion commands,
- A few misc. commands, and
- Tips and examples
Download Vi Editing Mode Cheat Sheet
PDF format (.pdf):
Download link: bash vi editing mode cheat sheet (.pdf)
Downloaded: 92993 times
ASCII .txt format:
Download link: bash vi editing mode cheat sheet (.txt)
Downloaded: 17528 times
LaTeX format (.tex):
Download link: bash vi editing mode cheat sheet (latex .tex)
Downloaded: 4751 times
This cheat sheet is released under GNU Free Document License.
Do you want to have a broader discussion on this topic?
Discuss it on catonmat forums!

I have started working on my bachelor's thesis in physics. It's about using genetic algorithms for finding optimal solutions to physics problems. One of the problems I will solving is simulating equilibrium configurations of two dimensional systems of particles which can attract and repel (dipole systems). This problem is NP hard, which means there is no effective algorithm to find the exact solution in an adequate time. Several other algorithms can be used to approximate the solution and find near-equilibrium configurations, one of them being genetic algorithm technique.
The easiest situation is when the particles are bounded in a circle which border they can't trespass. The goal is to find how these particles will position themselves inside this circle.
This case has already been tackled using the simulated annealing method and the near-optimum solutions for hundreds of particles have been found. This method uses different principles than genetic algorithm method which I will describe shortly. The main idea of simulated annealing method is that the system is given an initial temperature which basically controls how much the particles will fluctuate inside the system. The greater the temperature, the bigger the fluctuations. The temperature gradually gets decreased and the fluctuations get smaller and smaller. Calculating the energy of the system and requiring it to be minimal at each step the temperature gets decreased eventually leads to a solution.
Here is how the solutions for 10, 11, 12, 13, 14 and 15 particles looks like, found using simulated annealing algorithms:

Notice how interestingly and non-intuitively the particles position themselves in a case when another particle gets added to a system of 12 particles. Instead of positioning the new particle somewhere in the middle as in transition from 11 particles to 12, the system decides to put it on the border.
In my thesis I will be using genetic algorithms to arrive at the solution for this problem.
Here is a general (not related to my topic of thesis) description of what genetic algorithms are.
Genetic Algorithms 101
Genetic algorithms mimic the evolution by natural selection. The basic idea of genetic algorithms is very simple. Genetic algorithms feature populations of individuals which evolve with the use of the principles of selection, variation and inheritance.
One of the ways to implement this idea in computer programs is to represent individuals as strings of binary digits. Each bit in the string represents one gene. Each individual is assigned a numerical evaluation of its merit by a fitness function. The fitness function determines how each gene of an individual will be interpreted and, thus, what specific problem the population will evolve to solve.
Once all individuals in the population have been evaluated, their fitness values are used for selection. Individuals with low fitness get eliminated and the strongest get selected. Inheritance is implemented by making multiple copies of high-fitness individuals. The high-fitness individuals get mutated and they crossover to produce a new population of individuals. Mutation is implemented as flipping individual bits in the binary string representation of an individual and crossover happens as an exchange of binary substrings of two individuals to obtain a new offspring.
By transforming the previous set of individuals to a new one, the algorithm generates a new set of individuals that have better fitness than the previous set of individuals. When the transformations are applied over and over again, the individuals in the population tend to represent improved solutions to whatever problem was posed in the fitness function.
Here is an illustration how a genetic algorithm might work:

I look forward to finding what other interesting projects that I can make using the very useful information I have learnt.
PS. I am writing blog posts less often now because I am really, really busy with studies. Sorry about that.



Twitter
Facebook
Plurk
more
GitHub
LinkedIn
FriendFeed
Google Plus
Amazon wish list