mysql performance tuningIn this post I'll cover a lecture on MySQL performance tuning.

This lecture is given by Jay Pipes. Jay works at MySQL and has written a book on MySQL. It's called Pro MySQL and it covers intermediate and advanced features of the database. He also has an interesting blog, which I am long subscribed to - Jay Pipes blog.

In this lecture Mr. Pipes talks about core concepts of profiling and benchmarking, about the most common sources of performance problems, about indexing, schema, coding guidelines, and a little about server parameter tuning.

Here is his talk at Google:

The most interesting performance tuning tips from the video:

  • [02:20] Don't benchmark without a goal. Have a goal like "improve performance by 20%". Otherwise you'll waste a lot of time tuning milliseconds out of your application.
  • [02:50] Change just one thing at a time and re-run the benchmarks.
  • [03:40] Disable the query cache by setting the cache size to 0 when running MySQL benchmarks.
  • [05:22] The best tool for profiling MySQL queries is the EXPLAIN command. Understand it!
  • [06:40] Log slow queries and use mysqldumpshow to parse the log. It also has an option (--log-queries-not-using-indexes) of logging any query that does not use an index on a table.
  • [07:40] Jeremy Zawodny wrote the mytop utility for monitoring the threads and overall performance of MySQL servers.
  • [08:55 && 11:30] Repeated queries on an un-indexed field will kill your application faster than anything else.
  • [09:30] Don't de-normalize just because you think it will be faster. Start with normalized database schemes.
  • [10:15] Server parameter tweaking is not a catch-all. Tuning server parameters can help but it's very specific to certain situations.
  • [12:05] If you use MyISAM storage engine, exploit covering indexes.
  • [12:50] Ensure good selectivity on index fields.
  • [14:45] On multi-column indexes, pay attention to order of fields within the index definition.
  • [15:40] Be aware that as your database grows, the data in the indexed fields can gradate, deteriorating the usefulness of that index. As you data grows, always examine if the indexes you originally thought are still relevant to the data.
  • [17:02] Example of a common index problem, where an index is created on multiple fields.
  • [20:30] Use the smallest data types possible. Don't use bigint, when int will do. Or, don't use char(200), when a varchar or smaller char() would do. Using the right type will fit more records in memory or index key block, meaning fewer reads, resulting in faster performance.
  • [21:30] Consider horizontally spitting many-columned tables if they contain a lot of NULLs or rarely used columns.
  • [23:35] Get rid of surrogate keys (with example).
  • [24:05 && 33:20] Be an SQL programmer who thinks in sets, not procedural programming paradigms.
  • [24:35] InnoDB can't optimize SELECT COUNT(*) queries. Use counter tables! That's how to scale InnoDB.
  • [27:20] Always try to isolate index fields on one side of condition in a query (with example).
  • [28:20] Avoid using CURRENT_DATE() as it invalidates the cache.
  • [29:34] Example of using calculated fields when searching on top level domain. Idea - put a reversed TLD in the table.
  • [33:20] Avoid correlated subqueries. Think in sets not loops! Here is a great article on visualizing SQL joins.
  • [34:50] Example of using derived tables to avoid correlated subqueries.
  • [36:25] Be aware of global and per-thread server variables.
  • [37:50] Enable query cache if your application is doing a lot more reads than writes!
  • [28:50] MySQL uses MyISAM for internal data storage.
  • [40:00] MySQL loves ram!
  • [40:35] Q and A.

Jay recently published slides from his Join-Fu talk. Go get them!

I enjoyed this talk a lot. I am an intermediate MySQL user and I had not read his book. It was really informative!

If you want to learn more about MySQL, and don't yet have his book, why not get his book:

reddit hacker topOver the last fortnight I have released two top-like applications to follow Reddit and Hacker News from the console. I called these applications Reddit Top and Hacker Top. I received a few emails and comments asking me to explain how the applications were made. I'll explain it in this article.

A few months ago, while I was creating the Reddit River website, I noticed that Python's standard library included a curses module. Having worked with curses and Curses Development Kit in C, I decided to refresh my curses skills, this time in Python.

Coding the application started with creating two separate Python modules for retrieving stories from Hacker News and Reddit.

If you look at the source code of Hacker Top and Reddit Top programs, you'll notice two Python modules called "" and "".

Both of these modules follow the same interface and provide function get_stories(). The core functionality of this function can be easily understood from this code fragment:

stories = []
for i in range(pages):
    content = _get_page(url)
    entries = _extract_stories(content)
    url = _get_next_page(content)
    if not url:

The function iterates over the given number of Reddit or Hacker News pages and creates a list of objects (of type Story) containing information about stories on each page.

This function takes two optional parameters 'pages' and 'new' ( also takes an optional 'subreddit' parameter). These parameters control how many pages of (new) stories to scrape.

Here is an example of using module to get the titles and scores of the first five most popular stories on Hacker News:

>>> from pyhackerstories import get_stories
>>> stories = get_stories()
>>> for story in stories[:5]:
...     print "%-3d - %s" % (story.score, story.title)
30  - Rutgers Graduate Student Finds New Prime-Generating Formula
59  - Xobni VP Engineering leaves for own startup
69  - The Pooled-Risk Company Management Company
14  - Jeff Bonforte, CEO of Xobni, explains why Gabor left
52  - Google's Wikipedia clone Knol launches.

Each Story object contains the following properties:

  • position - story position
  • id - identifier used by Hacker News or Reddit to identify the story
  • title - title of the story
  • url - web address of the story
  • user - username of the user who submitted the story
  • score - number of upvotes the story has received
  • human_time - time the story was submitted
  • unix_time - unix time the story was submitted
  • comments - number of comments the story has received

Here is another example of retrieving most active Reddit users on Programming Subreddit (based on 5 pages of stories):

>>> from pyredditstories import get_stories
>>> stories = get_stories(subreddit='programming', pages=5)
>>> userdict = {}
>>> for story in stories:
...     userdict[story.user] = userdict.setdefault(story.user, 0) + 1
>>> users = [(userdict[u], u) for u in userdict]
>>> users.sort()
>>> users.reverse()
>>> for user in users[:5]:
...  print "%s: %d" % (user[1], user[0])
<a href="">gst</a>: 19
<a href="">dons</a>: 6
<a href="">llimllib</a>: 4
<a href="">synthespian</a>: 3
<a href="">gthank</a>: 3

The and Python modules can also be used as standalone applications.

Executing with '--help' command line argument tells that:

$ ./  --help
usage: [options]

  -h, --help   show this help message and exit
  -sSUBREDDIT  Subreddit to retrieve stories from. Default:
  -pPAGES      How many pages of stories to output. Default: 1.
  -n           Retrieve new stories. Default: nope.

Here is an example of executing to get five most popular programming stories:

$ ./ -s programming | grep '^title' | head -5

title: "Turns out my nephew is really good with computers, so we're going to give him the job!"
title: Sun Microsystems funding Haskell on multicore OpenSPARC!
title: I love parser combinators [Haskell]
title: 2D collision detection for SVG - demo of intersection routines with SVG/Javascript
title: Priorities: Solaris vs Linux

That's about enough information about these modules to create all kinds of wonderful things! Just play around a little! :)


Now I'll briefly explain how the curses user interface was made. I'll cover only the ideas and will not go deeply into code.

hacker top, detailed mode

I started with reading Curses Programming with Python howto. This howto explained how to do all the basic curses operations in Python - how to get into curses mode, how to create windows, print colorful text and how to handle user input.

The user interface had two requirements - it had to be responsive while the new stories were being retrieved from the sites, and it had to be independent from the website, meaning that it should be easy to display data from a different website with minor (or no) modifications to the interface code.

The first requirement was satisfied by creating a separate thread (see Retriever class) which periodically called get_stories() and passed the stories to the interface via a queue.

The second requirement was already satisfied when I created the and modules. As I mentioned, these modules provided the same interface for retrieving stories and could be used almost interchangeably.

That's basically it! If you have any specific questions, feel free to ask them in the comments!

Ps. I am thinking of releasing Dzone Top and Digg Top, and then merge all these Top programs into a single social news console program. Are there any volunteers who would like to help me?

Download Hacker Top and Reddit Top Programs

Hacker Top

Download link: hacker top program (5889 downloads)

Reddit Top

Download link: reddit top program (4318 downloads)

These programs are released under GNU General Public License.

redditYes, I admit it, I am fanatic about programming! Being fanatic means that I am always staying current with programming news, I am doing a lot of experimentation, and I am being very playful with various programming languages.

Reddit has long been my favorite site for news but very frequently I found myself reading nonsense which had nothing to do with my real passion. I decided to take a radical step and turn Reddit into the definitive source for programming news, or the way I like to say, I decided to read Reddit the fanatic programmer way.

Here is how I did it.

I said no to all the default Reddit communities.

default reddit communities

And subscribed to various programming subreddits!

programming reddit communities
Note: I am also subscribed to math and physics subreddits as those subjects are my hobbies.

This picture includes only a fraction of the programming subreddits I subscribed to! Here is the full list:

I am also subscribed to some related communities:

Doing so turned Reddit's front page into a definitive source for programming news.

reddit’s front page (small)

I invite you to share your methods for following the latest programming buzz in the comments!

A Year of BloggingMy dear readers, it has been a year since I have been blogging here! With this post I want to share my blog statistics with you.

During this year (July 14, 2007 - July 20, 2008) I managed to write 58 posts, which received 808 comments. Based on statistics from Statcounter and Google Analytics, these posts received 574,874 views by 424,292 unique visitors.

Here is a Google Analytics graph showing monthly page views for this period (click for a larger version): page views graph (small)

Here are the top 5 countries my blog readers came from:

  • United States (195,076 visitors)
  • United Kingdom (25,988 visitors)
  • Canda (25,335 visitors)
  • India (13,753 visitors)
  • Germany (13,670 visitors)

Not surprisingly, the top 5 referring sites were all social media and bookmarking sites:

From all the visitors 48,386 went to my site directly and 82,502 were sent here by my darling Google.

During the first year, approximately 1000 people subscribed to my blog. Here is the Feedburner subscriber graph for the year:

feedburner statistics for one year of blogging

If you are interested in my blog, you may subscribe here: catonmat rss feed.

According to Technorati, my blog has received 476 blog reactions and ranks 29,521-st out of 112.8 million blogs!

Many of my posts have been submitted to Reddit, Digg and have been Stumbled. Here are the top 5 most visited posts:

Some of my how-to posts came with downloadable cheat sheets. Here are the top 5 cheat sheets:

During the last year I also did several web projects. As I was busy with physics studies, I created only four web projects (all of these projects are open source):

In March 2008 I started posting geek music on Fridays. Here are the top 5 geek songs:

I am satisfied with where this blog is heading. I'd like to thank all my fans and all the visitors who regularly return to the site!

To make things more challenging, I am setting myself a goal of reaching 5000 subscribers by the end of the next year of blogging (July 2009)! I know that this is very ambitious goal but I am ready to take the challenge!

birthday portal cake

This article is part of the article series "Musical Geek Friday."
<- previous article next article ->

the bittorrent songThis week on Musical Geek Friday - the famous BitTorrent Song!

The song is written and performed by Brent Simon. Brent describes himself as a super nerd who plays the synth and makes original music that's honest and from the heart. He keeps composing and his latest music can be found on his MySpace page - brentsimon.

The song, when I first downloaded it, was actually called "Mininova", in honor of one of the largest BitTorrent sites on the net -

I was curious about the history of this song and emailed Brent. It turns out the original title of the song is "World Wired West" and Mininova torrent renamed to song to "Mininova" when they put it for download on their site!

Here is what Brent had to say about the history of the song:

I decided to write "World Wired West", which has become better known as "the Bittorent song", based off of my experience with the net and torrents. I wanted to write something as geeky as possible and see if anyone knew what I was talking about. It looks like you guys do! The second verse is actually a short poem I wrote for the literary journal back in high school. I was transitioning from DOS to Windows at that point; that's why I have the old DOS references of "Bad command or file name" and D-I-R.

I also asked him about the mysterious final verse, he said:

I had just read Naked Lunch by William S. Burroughs, which was the most confusing and disturbing book I've ever read, and my cousin Jim wanted me to write a song about it. That verse is the combined total of what I could make out from the book and what I remember from the movie (which I haven't seen in many years). There was no way I could write a whole song about it, but I was able to fit a verse together and there you have it.

Here it is! The BitTorrent Song!


Download this song: the bittorrent song.mp3 (musical geek friday #11)
Downloaded: 15569 times

Download lyrics: the bittorrent song lyrics (musical geek friday #11)
Downloaded: 3054 times

Here is the lyrics of The BitTorrent Song:

-Verse 1-
Gather 'round and hear my story
'Bout the new old west
I sought torrents, rips, and ISO's
Logged in under guest

Movies, music, games and porno,
Warez and killer apps,
Compressors, CODECs and keygens
Compilers, hacks and cracks

Data's streamin' my hard drive's screamin'
Fragmenting the night
Bit's are flowin' corruption's growin'
Into every last byte

-Verse 2-
Bad command or file name
OK, now what's next
D-I-R where is my brain
It's coded in base hex

Alright now you'll feel my wrath
Don't ever mess with me
Cram this up your Gigabyte
So long Format C:

Data's streamin' my hard drive's screamin'
Fragmenting the night
Bit's are flowin' corruption's growin'
Into every last byte

-Bonus Verse-
If you watch my lips real closely
You'll see they don't match what I'm saying
My typewriter emits fluids
All of which are intoxicating

Staring at my shoes all after-
Noon on heroin isn't boring
Just to poop I need the surgical
Equivalent of apple coring

Naked Lunch
Naked Lunch
Naked Lunch
Naked Lunch

Here is Brent Simon performing The Bittorrent Song:

Download "The Bittorrent Song"

Download this song: the bittorrent song.mp3 (musical geek friday #11)
Downloaded: 15569 times

Download lyrics: the bittorrent song lyrics (musical geek friday #11)
Downloaded: 3054 times

Click to listen:

Have fun and until next geeky Friday! :)