introduction to sqlite database rdbmsIf you have been following my blog, you might have noticed that almost all of my projects use the SQLite database engine.

My projects are relatively tiny, low traffic and data is mostly queried, not written. Such characteristics make SQLite the perfect database for my projects.

If you did not know, the SQLite database is self contained within a single file! There are no configuration woes, no network security to worry about, no hundreds pages of documentation. It's just a single file!

See Distinctive Features of SQLite and Appropriate Uses for SQLite pages to find other points when SQLite is a good fit and when not.

Here is the lecture by Richard Hipp, the author of SQLite:

Here are some interesting facts from the lecture:

  • [02:50] SQLite is designed to be embedded, it's less than 250 KB in size.
  • [08:00] Uncommon SQLite uses (this got me most interested): stand-in for client-server DBMS during testing/debugging. Local database caching. Implementing complex data structures. Sorting large amounts of data. Configuration files. IPC via database. Application file formats.
  • [14:06] SQLite is very convenient to use as a tool to teach basics of SQL, as it just works.
  • [19:32] Unusual features of SQLite: SQLite ignores data types for columns (you can store string in an integer column, for example). SQLite does type affinity on data inserted in columns. Table 'sqlite_master' stores information about tables. Attaching to multiple databases simultaneously via ATTACH command. You can join or copy across multiple open databases (for example, hot backup the database).
  • [24:40] Anatomy of an SQL database engine.
  • [27:00] SQLite compiles queries to byte code (can be viewed via EXPLAIN statement) to be executed in a virtual machine.
  • [28:20] Observations of SQLite: trouble with licensing. A register based virtual machine is much easier to generate code for which is optimal than a stack based VM. Dynamic typing in databases is a really good thing. Regression tests allow rewriting large parts of SQLite without minor version releases.
  • [36:30] Q and A!
  • [36:35] Is there ORM tool available for SQLite?
  • [39:30] How is dynamic typing better than static typing in databases?
  • [41:32] What did you mean by 'complex data types'?
  • [43:15] Why is a register based virtual machine better than a stack based?
  • [44:22] Why does SQLite only parse foreign keys but not enforce them?
  • [46:08] Is SQLite an in-memory database?
  • [46:50] What's the future of SQLite?
  • [48:10] My SQLite DB got corrupt, what do I do?
  • [49:30] When does the DB roll back in case of power failure?
  • [50:30] What happens if there is a second power failure while rolling back the queries from previous power failure?

A few notes from me.

The usage of 'manifest typing' really confused me in this lecture, because I, and most of the people I have talked to, uses this term for 'static typing'. The author of SQLite uses it to mean 'dynamic typing'. Don't know why...

An SQLite database can be managed via the sqlite (or sqlite3) command line tool or GUI tool such as SQLite Browser (primitive), SQLiteSpy (advanced) and SQLite Manager (as a FireFox Add-on).

Finally, here are a few articles you should read if you are interested in more advanced SQLite details:

I hope you enjoyed it and have fun using SQLite for your next project!

musical geek friday - jonathan coulton - code monkeyThis week on Musical Geek Friday we have the Code Monkey song!

This song was written by Jonathan Coulton as a part of his "Thing a Week" musical project, where he would write a new song every week and put it on his website. With this song Jonathan instantly became an Internet Hero. He made it to Slashdot and even New York Times (includes video interview with him)!

This song is about a type of programmers called "Code Monkeys". These programmers do coding without passion, just for money. A person can also be called a code monkey if he's a newbie or he's only able to produce low quality code.

Here it is, the Code Monkey song:


Download this song: code monkey.mp3 (musical geek friday #3)
Downloaded: 30842 times

Download lyrics: code monkey lyrics (musical geek friday #3)
Downloaded: 5976

Lyrics of the "Code Monkey" song:

Code Monkey get up get coffee
Code Monkey go to job
Code Monkey have boring meeting
With boring manager Rob

Rob say Code Monkey very dilligent
But his output stink
His code not functional or elegant
What do Code Monkey think?

Code Monkey think maybe manager want to write god damned login page himself
Code Monkey not say it out loud
Code Monkey not crazy, just proud
Code Monkey like Fritos
Code Monkey like Tab and Mountain Dew
Code Monkey very simple man
With big warm fuzzy secret heart:
Code Monkey like you
Code Monkey like you, youuuuuuuuuuu

Code Monkey hang around at front desk
Tell you sweater look nice
Code Monkey offer buy you soda
Bring you cup, bring you ice

You say no thank you for the soda cause
Soda make you fat
Anyway you busy with the telephone
No time for chat

Code Monkey have long walk back to cubicle he sit down pretend to work
Code Monkey not thinking so straight
Code Monkey not feeling so great
Code Monkey like Fritos
Code Monkey like Tab and Mountain Dew
Code Monkey very simple man
With big warm fuzzy secret heart:
Code Monkey like you
Code Monkey like you a lot

Code Monkey have every reason
To get out this place
Code Monkey just keep on working
See your soft pretty face
Much rather wake up, eat a coffee cake
Take bath, take nap
This job fulfilling in creative way
Such a load of crap

Code Monkey think someday he have everything even pretty girl like you
Code Monkey just waiting for now
Code Monkey say someday, somehow
Code Monkey like Fritos
Code Monkey like Tab and Mountain Dew
Code Monkey very simple man
With big warm fuzzy secret heart:
Code Monkey like you
Code Monkey like youuuuuuuuuuuuuuuuuu

If you liked it, people on YouTube have posted their videos dancing to this music and even made computer animations for this song. See Code Monkey Videos on YouTube.

Here is my top favorite YouTube video of this song:

Download "Code Monkey" Song

Download this song: code monkey.mp3 (musical geek friday #3)
Downloaded: 30842 times

Download lyrics: code monkey lyrics (musical geek friday #3)
Downloaded: 5976

Click to listen:

Have fun and until next geeky Friday!

coding horror keyword analysisI have subscribed to quite a few programming blogs, one of them being Coding Horror.

Coding Horror is written by a guy named Jeff Atwood (you probably knew that already), and his blog has received massive attention, bringing 93 thousand (woah!) feed subscribers (as of April, 2008).

One thing that caught my attention on CodingHorror blog is that its traffic stats are publicly available!

coding horror traffic statistics

The statistics are hosted by, which keeps only the last 500 entries of any traffic activity.

recent keyword activity on codinghorror

I wanted to see a clearer picture of the most popular keywords people searched for and ended up in Coding Horror blog.

Thirty minutes later I had written a Perl program, which accessed the statistics, parsed the "Recent Keyword Activity" page, extracted the keywords, and inserted them in an SQLite database.

I always love to describe how my programs work. I'll make it short this time, as we are concentrating on the statistics and not on programming.

The Perl Program

The Perl program uses (or reuses) a few CPAN modules:

The program takes two optional arguments

  • -nodb not to insert the keywords in database (just print them out)
  • number - number of pages to extract keywords from

Here is the source code of the program:

# Peteris Krumins (, 2008
#  --  good coders code, great reuse
# Access traffic statistics and extract a few pages of latest search queries
# Released under GNU GPL
# 2008.04.08: Version 1.0

# run it as 'perl [-nodb] [number of pages to extract]'
# -nodb specifies not to insert keywords in database, just print them to stdout

use strict;
use warnings;

use DBI;
use WWW::Mechanize;
use HTML::TreeBuilder;
use Date::Parse;

# URL to publicly available codinghorror's statcounter stats
my $login_url = '';

# Query used to INSERT a new keyword in the database
my $insert_query = 'INSERT OR IGNORE INTO queries (query, unix_date, human_date) VALUES (?, ?, ?)';

# Path to SQLite database
my $db_path = 'codinghorror.db';

# Insert queries in database or not? Default, yes.
my $do_db = 1;

# Number of pages of keywords to extract. Default 1.
my $pages = 1;

for (@ARGV) {
    $pages = $_ if /^\d+$/;
    $do_db = 0 if /-nodb/;

my $dbh;
$dbh = DBI->connect("dbi:SQLite:$db_path", '', '', { RaiseError => 1 }) if $do_db;

my $mech = WWW::Mechanize->new();
my $login_req = $mech->get($login_url);

unless ($mech->success) {
    print STDERR "Failed getting $login_url:\n";
    print $login_req->message, "\n";
    exit 1;

unless ($mech->content =~ /Coding Horror/i) {
    # Could not access Coding Horror's stats
    print STDERR "Failed accessing Coding Horror stats\n";
    exit 1;

my $kw_req = $mech->follow_link(text => 'Recent Keyword Activity');
unless ($mech->success) {
    print STDERR "Couldn't find 'Recent Keyword Activity' link";
    print $kw_req->message, "\n";
    exit 1;

for my $page (1..$pages) {
    my $tree = HTML::TreeBuilder->new_from_content($mech->content);
    my $td_main_panel = $tree->look_down('_tag' => 'td', 'class' => 'mainPanel');
    unless ($td_main_panel) {
        print STDERR "Unable to find '<td class=mainPanel>'";
        exit 1;
    my $table = $td_main_panel->look_down('_tag' => 'table', 'class' => 'standard');
    unless ($table) {
        print STDERR "Unable to find 'table' tag";
        exit 1;
    my @trs = $table->look_down('_tag' => 'tr');
    my $idx = 0;
    for my $tr (@trs) {
        next unless $idx++;
        my @tds = $tr->look_down('_tag' => 'td');
        unless (@tds == 6) {
            print STDERR "<td> count was not 6!\n";
        my ($date, $time, $query) = map { $_->as_text } (@tds[1..2], $tds[4]);
        next unless $query;
        my $year = (localtime)[5] + 1900;
        my $ydt = "$date $year $time";
        my $unix_date = str2time($ydt);
        print "$date $year $time: $query\n";
        $dbh->do($insert_query, undef, $query, $unix_date, $ydt) if $do_db;
    if ($page != $pages) {
        my $page_req = $mech->follow_link(text => $page + 1);
        unless ($page_req) {
            print STDERR "Couldn't find page ", $page + 1, " of keywords", "\n";
            exit 1;

Download: coding horror keyword scraper

Here is an example run of the program:

$ ./ -nodb 2
8 Apr 2008 03:50:54: media player
8 Apr 2008 03:50:53: physical working environment programmers
8 Apr 2008 03:50:26: nano itx case
8 Apr 2008 03:50:23: how to clean some internet spyware or adware infection
8 Apr 2008 03:50:23: mercurial install tutorial windows
8 Apr 2008 03:50:22: iis 5.1 multiple websites
8 Apr 2008 03:50:17: javascript integer manipulation comparision
8 Apr 2008 03:50:16: build machines pc
8 Apr 2008 03:50:14: manage remote desktop connections
8 Apr 2008 03:50:07: check that all variables are initialized
8 Apr 2008 03:50:00: powergrep older version
8 Apr 2008 03:49:43: software counterfeiting
8 Apr 2008 03:48:59: floppy emulator windows xp
8 Apr 2008 03:48:35: safari rendering cleartype
8 Apr 2008 03:48:18: captchas goole broken
8 Apr 2008 03:48:11: vs2005 ide color
8 Apr 2008 03:47:55: optimising dual core for cubase sx3
8 Apr 2008 03:47:44: micosoft project scheduling
8 Apr 2008 03:47:36: dont buy from craig at australian computer resellers
8 Apr 2008 03:47:32: large scale stored procedures
8 Apr 2008 03:47:31: free diff tool
8 Apr 2008 03:46:58: games that support 3 monitors
8 Apr 2008 03:46:56: firefox multiple times same stylesheet
8 Apr 2008 03:46:48:
8 Apr 2008 03:46:37: apple software serial code blocker
8 Apr 2008 03:46:31: beautiful code jon bentley
8 Apr 2008 03:46:28: system.web.httpparseexception
8 Apr 2008 03:46:23: round in
8 Apr 2008 03:46:15: project postmortem software
8 Apr 2008 03:45:43: programming fun
8 Apr 2008 03:45:33: sending messages over ip using command prompt
8 Apr 2008 03:45:26: where did horror develop?

The SQLite Database

The database has just one table called 'queries' which contains a 'query', 'unix_date' and 'human_date' columns. The 'unix_date' column is used for sorting the entries chronologically, and 'human_date' is there just so I could easily see the date.

Here is the schema of the database:

CREATE TABLE queries (id INTEGER PRIMARY KEY, query TEXT, unix_date INTEGER, human_date TEXT);
CREATE UNIQUE INDEX unique_query_date ON queries (query, unix_date);

As the Perl program is run periodically, it might extract the same keywords several times. I created a UNIQUE index on 'query' and 'unix_date' fields, and left the job to drop the duplicate records to SQLite.

The Perl program uses the following SQL query to insert the data in database:

INSERT OR IGNORE INTO queries (query, unix_date, human_date) VALUES (?, ?, ?)

The 'OR IGNORE' makes sure the duplicate records get silently discarded.

Simple Statistics

I have been collecting keywords since March 31, and the database has now grown to a size of 73'336 records and 7MB (3MB compressed).

Download: coding horror keyword database (.zip)

I ran a few simple SQL queries against the data using the GUI SQLite Database Browser to find the most popular keywords. I recommend downloading it, if you want to play around with the database.

The first query selected the 15 most popular keywords, along with their count, and percentage of all keywords.

The following SQL query did it:

 count(query) c,
 (round(count(query)/(1.0*(select count(*) from queries)),3)*100) || '%',
FROM queries
GROUP BY query

most popular coding horror’s keywords (sql query in sqlite database browser)

I also made a bar chart using the public Google Charts API:

This chart would look much better if it had vertical bars. I couldn't figure out how to add keywords nicely below each bar, though.

Here is how the messy query to Google Charts API looks like:'s%20Top%2015%20Keywords&cht=bhs&chd=t:100,77,12.07,10.18,9.09,8.74,8.64,8.49,7.05,6.51,5.91,5.71,5.66,5.61,5.22&chs=400x450&chxt=x,y&chxl=0:|0|2013|1:|command%20prompt%20commands|registration%20keys|cmd%20tricks|vista%20media%20center|sql%20joins|command%20prompt|you%20may%20be%20a%20victim...|codinghorror|dual%20core%20vs%20quad%20core|quad%20core%20vs%20dual%20core|cmd%20commands|command%20prompt%20tricks|system%20idle%20processes|coding%20horror|system%20idea%20process

Just to illustrate various ways to work with SQLite database, I did the same query from command line, and queried top 50 popular keywords, here they are:

$ sqlite3 ./codinghorror.db
sqlite> .header ON
sqlite> .explain ON
sqlite> SELECT count(query) c, query FROM queries GROUP BY query ORDER BY c DESC LIMIT 50;
c     query
----  -------------
2013  system idle process
1550  coding horror
243   system idle processes
205   command prompt tricks
183   cmd commands
176   quad core vs dual core
174   dual core vs quad core
171   codinghorror
142   you may be a victim of software counterfeiting
131   command prompt
119   sql joins
115   vista media center
114   cmd tricks
113   registration keys
105   command prompt commands
105   jeff atwood
99    quad core
96    dell xps m1330 review
89    rainbow tables
84    what is system idle process
82    software counterfeiting
80    fizzbuzz
78    laptop power consumption
77    quad core vs duo core
75    sql join
74    dell xps m1330
74    hard drive temperature
74    vista memory usage
73    source control
70    linked in
69    pontiac aztec
66    pontiac aztek
64    m1330 review
63    cracking
61    consolas
60    captcha
56    hyperterminal
56    ikea jerker
55    code horror
55    polling rate
55    source safe
54    coding horrors
54    dual core or quad core
54    programming quotes
54    visual source safe
53    logparser
51    sourcesafe
51    superfetch
51    three monitors
50    windows experience index

Knowing the most popular keywords can give you some hints what topics to write about on your blog. For example, an article named 'Windows Command Prompt Tricks' would start bringing good traffic from search engines instantly!

I did another bunch of queries to find the most popular programming languages on Coding Horror. I put the languages I could think of in langs.txt file, and ran the following Perl one-liner:

$ perl -MDBI -wlne 'BEGIN { $, = q/ /; $dbh = DBI->connect(q/dbi:SQLite:codinghorror.db/); } print +($dbh->selectrow_array(qq/SELECT count(query) FROM queries WHERE query LIKE "$_" OR query LIKE "$_ %" OR query LIKE "% $_" OR query LIKE "% $_ %"/))[0], $_' langs.txt | sort -n -r

It produced the following output:

1127 visual studio
1087 c#
407 c
287 javascript
239 java
139 asp
104 visual basic
59 php
44 ruby
42 python
26 perl
22 lisp
19 erlang
3 pascal
1 tcl
1 prolog
0 ml
0 haskell

I added 'visual studio' to the list of programming languages, as every beginner thinks it actually is a programming language. There were no keywords matching 'C++' because most search engines think of '+' as an operator rather than a valid search string.

I must say that Python is the answer to life, the universe and everything, as it was searched for 42 times!

Here is the same data put on a chart:

Here are some of the most popular search queries among programming languages:

I suggest that you download the keyword database and analyze the data that interests you the most yourself!


Download Perl program: coding horror keyword scraper
Downloaded: 7365 times

Download SQLite database(3 MB): coding horror keyword database (.zip)
Downloaded: 892

If you liked the post, why not vote for it?

leech access - coming at you (leech axss - comin at choo)Continuing my Friday geek music series, I am presenting to you a very geeky hip-hop song about downloading pirated stuff, such as music, software and movies (so called "warez") off the net.

The song is originally written by guys calling themselves Leech Axss and it's called "Leech Axss - Coming@Choo".

This song is NSFW - not suitable for work, as it contains explicit language! Though, you can listen to it on your headphones. :)

As I mentioned in my first geek music post, I'll not just post the song, but also provide a little insight into the song.

This song is about a lamer trying to gain leech access to some guy's warez ftp server. Usually, an access to a site with hundreds of gigabytes of warez, with no intentions to upload any new content, is called "leech access". It's every beginner's dream to have leech access to any server. Unfortunately, if you are not already well respected, you can't just have leech access. To have an access, you must provide some value to the site. For example, you must upload some 0day stuff. Digital content is called 0day if it gets distributed on warez servers before it actually gets released by the company.

The lamer in this song is suggested to use his real email address as a password for the ftp (note that anonymous ftp access usually asks for an email as a password). Being totally lame, he provides his real email, gets sent trojans and viruses, gets mail bombed and his machine finally gets owned.


Download this song: leech axss - coming at you.mp3 (musical geek friday #2)
Downloaded: 15644 times

Download lyrics (not censored): leech axss - coming at you lyrics (musical geek friday #2)
Downloaded: 8529

Here is the lyrics (I censored the explicit language, see the 'download lyrics' link above for uncensored version):

where is my snare?
i have no snare in my headphones
oh, there's my snare
in my audio warez folder, ho ho ho ho ho

leech axss, leech axss, leech, leech axss

freebsd is da s**t to me
linux, stick it up in your a**, you get me
you came to f**k with me in the irc
that i didn't give you access to my ftp
little dood, with a f**kin' +v in your nick
you might as well be sucking my motherf**kin' d**k
message of the day says that you are lame
so prevent the pain and get a dc j
leech axss, ain't no dude to f**k with
leech axss, ain't no dude to chat with
'cause i'm downloading chicks-with-d**s.avi
and i'm loadin' edonkey my windows swap file
yo yo yo, where's your 0day
you ain't got no 0day, because you're gay
because you are afraid and so easy to break
make it easy to take over you pc and f**k it up straight

leech axss is comin' at you, your box is mine in minute or two
your firewalls are tumbling down, leeching all the 0day that is found
dvs in you mp3s, you gotta fear my leet-o skillz
comin' inside the megabytes, leech axss you just can't fight

leech axss, leech axss, leech, leech axss ho ho ho ho ho

just put your e-mail in the password-box
now i've got your info, b***h, thanks a lot
i'ma send you a motherf**king e-mail bomb
dos your isp, dada dam da dam
trojan horses and viruses are coming at you
gold-sex is the site where you gonna re-route
meanwhile i hax and gonna gain the root
"f**k you" is message before you reboot
whoops, did i open your cd-drive?
whoops, did i f**king read your mind?
two thousand messages in your icq
and your soundcard just lost the irq
these are the wicked ways of leech axss
i am leet - you're nothing but your daddy's ball sweat
check me in my channel, as me operate
and get more net sex than my n****r, bill gates

leech axss is comin' at you, your box is mine in minute or two
your firewalls are tumbling down, leeching all the 0day that is found
dvs in you mp3s, you gotta fear my leet-o skillz
comin' inside the megabytes, leech axss you just cant fight

leech axss is comin' at you, your box is mine in minute or two
your firewalls are tumbling down, leeching all the 0day that is found
dvs in you mp3s, you gotta fear my leet-o skillz
comin' inside the megabytes, leech axss you just can't fight

ctrl + alt + del

Download Leech Access is Coming at You Song

Download this song: leech axss - coming at you.mp3 (musical geek friday #2)
Downloaded: 15644 times

Download lyrics (not censored): leech axss - coming at you lyrics (musical geek friday #2)
Downloaded: 8529

Click to listen:

Have fun and until next geeky Friday!

guy l. steele jr. growing a language java acm talkI found a really exciting video lecture by Guy L. Steele that I'd like to share with you. The title of the lecture is "Growing a Language".

The main thing Guy Steele asks during the lecture is "If I want to help other persons to write all sorts of programs, should I design a small programming language or a large one?" He answers that he should build neither a small, nor a big language. He needs to design a language that can grow. The main goal in designing a language should be to plan for growth. The language must start small, and the language must grow as the set of users grows.

As an example, he compares APL and Lisp. APL did not allow its users to grow the language in a "smooth" way. Adding new primitives to the language did not look the same as built-in primitives, this made users the language hard to grow. In Lisp, on the other hand, new words defined by the user look like language primitives, language primitives look like user defined words. It made language users easily extend the language, share their code, and grow the language.

Mr. Steele also prepared a PDF of his talk. Download it here (mirror, just in case: here).

He currently works at Sun Microsystems and he is responsible for research in language design and implementation strategies. His bio page on Sun Microsystems page says: "He has been praised for an especially clear and thorough writing style in explaining the details of programming languages." This lecture really shows it.

I understood what he was up to from the very beginning of the lecture. Only after the first ten minutes Guy revealed that "his firm rule for this talk is that if he needs to use a word of two or more syllables, he must define it."

Another thing Guy Steele shows with this talk is how a small language restricts the expressiveness of your thoughts. First you must define a lot of new words to be able to express yourself clearly and quickly.

Should a programming language be small or large? A small programming language might take but a short time to learn. A large programming language may take a long, long time to learn, but then it is less hard to use, for we then have a lot of words at hand — or, I should say, at the tips of our tongues — to use at the drop of a hat. If we start with a small language, then in most cases we can not say much at the start. We must first define more words; then we can speak of the main thing that is on our mind. [...] If you want to get far at all with a small language, you must first add to the small language to make a language that is more large.

He gives many more interesting points how languages should be grown. Just watch the lecture!

He defined the following words during the lecture: woman, person, machine, other, other than, number, many, computer, vocabulary, language, define, program, definition, example, syllable, primitive, because, design, twenty, thirty, forty, hundred, million, eleven, thirteen, fourteen, sixteen, seven, fifty, ago, library, linux, operating system, cathedral, bazaar, pattern, datum, data, object, method, generic type, operator, overloaded, polymorphic, complex number, rational number, interval, vector, matrix, meta.