bjarne stroustrup video lecture c++0x iso standard c++09 iconWhile browsing my favorite programming news site programming.reddit.com links I stumbled accorss this link to a video lecture on C++ upcoming standard C++0x by no one else than Bjarne Stroustrup himself!

You can start watching it right away or you can download it in DivX, MPEG and other formats.

I have a great interest in the C and C++ family of programming languages and their history, and I have read two of Bjarne's books - C++ Programming Language and The Design and Evolution of C++. I enjoyed every page of these books and they made me not only a decent C++ programmer but also made me understand how the language was formed, what it's goals were, where it was headed and how the language got various constructs it has now. If you ever consider becoming a great C++ programmer, these books are a definite read.

The most fundamental things these books taught me was to think think of various levels of abstraction and approaching a given programming problem from various programming paradigms.

When I found the link I put aside all the things I was working on and started watching the video lecture! I love C++ that much!

A note aside for people wanting to learn C++. I see people argue on programming.reddit.com and other sites that C++ is not worth learning, that it's is a dead language and X is better than C++, etc. Don't listen to this crap! If you ever watched Guy Kawasaki's The Art of Start video presentation, the 11th point of success is "Don't let the bozos grind you down." That's what they are trying to do if you listen to them! Just start learning C++ and you will succeed with it!

Now, back to the lecture. Here, I cite what the lecturer has to say about his lecture:

A good programming language is far more than a simple collection of features. My ideal is to provide a set of facilities that smoothly work together to support design and programming styles of a generality beyond my imagination. Here, I briefly outline rules of thumb (guidelines, principles) that are being applied in the design of C++0x. Then, I present the state of the standards process (we are aiming for C++09) and give examples of a few of the proposals such as concepts, generalized initialization, being considered in the ISO C++ standards committee. Since there are far more proposals than could be presented in an hour, I'll take questions.

Just like I did while learning JavaScript from video lectures, I am going to timestamp blog about most interesting things that caught my attention!

Here I list the things that caught my attention in Bjarne's C++ video presentation. Time in the brackets is when it appeared on the video. '+' before the brackets indicate that I knew it already, '-' that I didn't (just for personal notes). I will write down some obvious facts about the language even though I know them, so you got an idea what the lecture was about.

  • +(03:25) C++ is used on both Mars rovers, Spirit and Opportunity. Design life of the rowers was 6 months.
  • +(05:30) C++ is a better C in a way that it can roughly do the same as C but also has many new features
  • +/-(06:55) Highest level goals of C++ are to make it a better language for systems programming and library building and make it easier to teach and learn.
  • (08:46) Joke: The next Intels will execute infinite loop in five minutes and that's why you don't need performance. :)
  • +/-(10:00) The main problem for a new revision of the standard is the popularity of C++. Existing and new users want countless improvements. Adding a new feature needs to keep the existing code absolutely stable. Each new feature makes the language harder to learn.
  • +(15:46) The current C++ ISO standard is from 1998, with a revision in 2003.
  • (15:50) Joke: If you can tell the difference between C++ 1998 and C++ 03, then you have just been reading too many manuals.
  • (16:36) Joke: Some of C++ language developers working on C++0x are very, very keen to get that x to be a decimal number.
  • +(17:20) Voting on C++ standard features is done nation-wide. Each nation casts one vote.
  • +/-(19:14) Standardization matters because it directly affects millions of peoples, new techniques need to get into mainstream use, it's a defense against vendor lock-in
  • +/-(22:10) Rules of thumb for the standard:
    • Maintain stability and compatibility,
    • Prefer libraries to language extensions,
    • Prefer generality to specialization,
    • Support both experts and novices,
    • Increase type safety,
    • Improve performance and ability to work directly with hardware,
    • Fit into the real world,
    • Make only changes that changes the way people think.
  • -(30:28) There are around 100 new proposals for language features.
  • -(33:22) There are much less library proposals. Just 11 new proposals for Library TR1!
  • -(35:47) Areas of language change are machine model and concurrency, modules and DLLs, support for generic programming.
  • +(38:01) Vector initialization problem example.
  • (40:50) Even Bjarne made an error when using the verbose syntax for initializing a vector with a list of values from an array:

    bjarne vector initialization mistake int double

    "If you have tedious, verbose and indirect code, you make mistakes!"
    /Bjarne Stroustrup/

  • -(41:09) Indirect vector initialization from arrays violates Stroustrup's language design principle (from The Design and Evolution of C++) - "Support user-defined and built-in types equally well."
  • -(41:56) C++0x solution to initialization problem are initialized lists, std::initializer_list<type>.
  • -(43:43) There are too many ways to initialize things in C++ and they work in various contexts, C++0x introduces uniform initialization syntax which can be used in any initialization.
  • -(46:50) Fundamental cause of lots of problems in C++ with generic programming is that the compiler doesn't know what template argument types are supposed to do.
  • +/-(49:30) C++ 98 got templates right in a way that parametrization didn't require hierarchies, parametrization could be done with non-types, the code generated had uncompromising efficiency and that it turned out that template instantiation was Turing complete!
  • -(54:42) Concept aims of C++0x are direct expression of intent (lecture got cut here (they ran out of type or something) :( and the next moment was somewhere in the future), no performance degradation compared to current code, relatively easy implementation within current compilers and that current template code must remain valid.
  • -(55:33) Lecture continues here from where it was cut. It's something about type system how it makes sure correct data types using just declarations, and about compile time type contracts through templates.
  • -(1:06:14) Quick summary: template aliases, initializer lists, overloading based on concepts, type deduction from initializers, a new for loop for ranges.

After the lecture the following questions were asked:

  • (01:09:40) What's your opinion about the Microsoft implementation of C++?
    • A: Microsoft's implementation is the the best out there, they conform to the standards pretty well and the code generated is also good. GNU gcc is also good. Though, they want you to use their "Managed C++" called C++/CLI which is totally unportable. Apple does the same with their version of C++ which is Objective C/C++ and and so does GNU. They all play this game of trying to get users just to use their product and not switch to their competitor products.

  • (01:11:56) Do you think you'll ever design a new language from scratch?
    • A: Certainly not from scratch. You have to answer the question, why are you designing a language? You design a language to solve a certain problem. If I ever designs a new language it will be because I feel that some problem needs a solution.

  • (01:13:39) You mentioned threads, are there other things like transactions and cache mangement?
    • A: Concurrency is becoming very important. The question is how do you do it? My solution is to provide language primitives out of which you build libraries that use these primitives and provide various models of concurrency. Doing it directly with language primitives is too hard.

  • (01:16:25) How long after the standard is out do you expect to see a production compiler?
    • A: After the C++0x standard is approved and released the vendors will start releasing compilers right away. Some of them have already built in some of the upcoming features.

  • (01:17:55) Is auto like a type inference?
    • A: auto is kinda type inference but it's very simple. You simply look at the type of initializer and you use it.

  • (01:18:47) Would it be useful to have a switch in every compiler for deprecating features?
    • A: Yes, that would be useful because the compilers have to support old features which we would like to get rid of ever. I have not been able to convince compiler makes to do it.

  • (01:19:16) Is it possible to do garbage collection (GC) cleanly and efficiently in C++?
    • A: Yes, it is possible to do GC in C++. An implementation already exists and I will have a discussion tomorrow on whether to put it in standard. There are two problems, though. One is that people would start writing poor code never caring to free the used memory which would lead to poor performance. The other is that GC can be a performance virus.

  • (01:24:39) A lot of academic institutions have dropped teaching C++. As a result there are a lot of poor coding practices and poor coding solutions coming in from people. Are there any plans to have some documentation on how it would be more teachable?
    • A: I have become an academic for the last couple of years and someone talked me into teaching undergrads. I am more used to serious Ph.D's from good universities with 10 years of experience and it's not quite the same! :) I tried out ideas and I wrote a text book which will get out some time next year.

  • (01:26:24) How soon after you created C++ did you see it start to take over the industry?
    • A: The first commercial release was in 1985. I had access to data how many C++ users there were and kept track of it during 80s. From 1979 till 1991 the doubling rate was 7.5 months. And now we are at 3 million users.

  • (01:28:25) A lot of template classes at the moment use template hoisting to make them more efficient in terms of code size at compilation time. Is there anything being done to address the issues that make it necessary?
    • A: There is a trick of avoiding a lot of separate template instantiations based on void pointer. I don't see any changes to that. That's a portable way of doing it.

  • (01:29:50) What's your opinion on generic programming at runtime level?
    • A: It would be a good idea, but what's mostly called generic programming at runtime level has either so many indirections that it runs at 1/10ths of the speed of non-generic code or it's not too generic and you can't do any of the interesting stuff.

  • (01:31:33) You talked about having user defined types act the same way as built in types. Pointers are used for various optimizations like function overloading and smart pointers. Do you see a problem here? Is it being solved?
    • A: First of all, I think smart pointers are overused. Secondly, we can emulate inheritance with smart pointers. You can basically build a perfect smart pointer now. I worry about smart pointers because if you use a smart pointer and I give you one and we have no agreement on how mine works, we got a race condition. We have no lock on this code and we got two pieces of code which poke in the same area. You have to be very careful of the semantics of this smart pointer.

  • (01:33:38) There are interesting parallels between templates and duck typing used in dynamic languages. Will templates overtake classes for writing code and filing contracts?
    • A: Yes, templates has roughly the same as duck typing in scripting languages done dynamically. I think interfaces will be much better specified with concepts and there is still a large components of duck typing. Templates are becoming more important. Please remember that templates by themselves are nothing! They help you to abstract you over something.

  • (01:37:14) Have you ever gotten any death threads because of the changes in the language?
    • A: I have never gotten any death threads for any reason. And lets keep it that way!

  • (01:37:28) Is there any particular naming convention you subscribe to?
    • A: Yes, I like underscores. I do not like the camel stuff, it's less readable.

  • (01:37:57) The new language features you come up with. There are so many languages upcoming right now? Do you try to reuse any of the things they have done?
    • A: I try to learn from new languages, particularly, the users of new language. But grafting from one language to another is much harder than most people think. When you see something work in one language, then you see what problem are they solving, can we solve as elegantly in C++? If the answer is no, then we see how it can be solved and see the way it was done in other language. But simple grafting is a very hard exercise.

  • (01:38:52) When you initially designed the language, did you start from rigorous specifications or how did it start?
    • A: I am trying to be rigorous, but it's still informal in a sense that it is written in English. I started out with C specification written by Dennis Ritchie. Some things have improved, some have become more obscure because of the more words people use. We have tried several times to see if we can make it also formal. It would be nice to have formal sematics either for all of it or parts of it. It has not been that successful over the years. But I am very happy to report that a group from IBM this year managed to prove that C++ inheritance system was formally sound. It's proven. So 20 years later they proved that I didn't screw up.

  • (01:40:18) With Sun releasing some hardware which runs Java bytecode, are you afraid that it could take away C++'s embedded position?
    • A: Java would kill C++ totally in 2 years, Sun said in 1996. They sort of been repeating this story over and over again. There is a lot of Java, and there is a lot of C++ and it's a big world.

  • (01:41:05) How do you balance things at compile time and runtime, for example exceptions?
    • A: If you need to have balance, something at runtime, then you have to have it at runtime. For example, most of the good uses of virtual functions can't be done at runtime because you don't have the information. Talking about exceptions, there are compilers which add no overhead if no exceptions are thrown. There are trade offs and some things you just need to do at runtime.

You can watch the lecture right here as an embedded flash video, or you can download the this lecture:

ascii plain text unix sed ed awk cheat sheets txt formatEver since I published my personal sed, ed and awk cheat sheets in .pdf and .doc formats, I have been receiving suggestions that I should also create plain text versions of them. People said that it was ridiculous to have UNIX tool cheat sheets in .pdf or Microsoft Word (.doc) formats and not to have them in plain text.

I agreed and converted the UNIX tool cheat sheets to plain text format and did some ASCII art formating so they looked neat.

Enjoy!

(If you also want to download printable .pdf or .doc of these cheat sheets, follow the three links at the beginning of this post!)

UNIX Power Tool Cheat Sheets

AWK Cheat Sheet (.txt):
Download link: awk cheat sheet (.txt)
Downloaded: 108548 times

Sed Cheat Sheet (.txt):
Download link: sed stream editor cheat sheet (.txt)
Downloaded: 37868 times

Ed Cheat Sheet (.txt):
Download link: ed text editor cheat sheet (.txt)
Downloaded: 19876 times

PS. if you notice any bugs, spelling mistakes or just want to thank me, leave a comment :)

reddit media website post iconDuring my usage of reddit, I have observed that many titles have "(Pic)" or "[Picture]", "(Video)", etc. after them. It means that the contents the link points to has a picture or video in it. Sometimes I want to have fun with my friends and go through all the pics or vids. Unfortunately reddit's search is broken and there is really no good way to see the best pics and videos voted on reddit in the past.

I decided to create reddit media site which will monitor reddit's front page, collect picture & video links, and build an archive of them over time.

The site has been launched:
visit reddit media now

In (read more about it on about this blog page) page I wrote about one of the methods I like to use when developing software (and this project requires writing a few tools quickly, more about them below). It is call "the hacker's approach". The hacker's approach method is basically writing software as fast as possible using everything available and not thinking much about the best development practices, and not worrying what others will think about your code. If you are a good programmer the code quality produced is just a bit worse than writing it carefully but the time saved is enormous.

I will release full source code of website with all the programs generating the website. Also I will blog how the tools work and what ideas I used.

Update: Done! The site is up at reddit media: intelligent fun online.

Reddit Media Website's Technical Design Sketch

I use DreamHost shared hosting to run this website. Overall it is great hosting company and I have been with them for more than a year now! Unfortunately since it is a shared hosting, sometimes the server gets overloaded and serving of dynamic pages can become slow (a few seconds to load).

I want the new website to be as fast as possible even when the server is a bit loaded. I do not want any dynamic parsing to be involved when accessing the website. Because of this I will go with generating static HTML pages.

A Perl script will run every 30 mins from crontab, get reddit.com website, extract titles and URLs. Another script will add the titles to the lightweight sqlite on-disk database in case I ever want to make the website dynamic. And the third script will use the entries in the database and generate HTML pages.

Technical Design

A knowledgeable user might ask if this design does not have a race-condition at the moment the new static page is generated and user requesting the same page. The answer is no. The way new pages will be generated is that they will be written to temporary files, then moved in place of the existing ones. The website runs on Linux operating system and by looking up `man 2 rename' we find that

If newpath already exists it will be atomically replaced (subject to a
few conditions - see ERRORS below), so that there is no point at which
another process attempting to access(2,5) newpath will find it missing.

rename system call is atomic which means we have no trouble with race conditions!

Reddit provides RSS feed to the front page news. It has 25 latest news and maybe 5 are media links. That is not enough links to launch the website. People visiting the site will get bored with just 5 links and a few new added daily. I need more content right at the moment I launch the site. Or I could to launch the site later when articles have piled up. Unfortunately, I do not want to wait and I want to launch it ASAP! The hacker's approach!

First, I will create a script which will go through all the pages on reddit looking for picture and video links, and insert the found items in the database. It will match patterns in link titles and will match domains which exclusively contain media.
Here is the list of patterns I could come up with which describe pictures and videos:

  • picture
  • pic
  • image
  • photo
  • comic
  • chart
  • video
  • vid
  • clip
  • film
  • movie

And here are the domains found on youtube which exclusively contain media:

  • youtube.com
  • video.google.com
  • liveleak.com
  • break.com
  • metacafe.com
  • brightcove.com
  • dailymotion.com
  • flicklife.com
  • flurl.com
  • gofish.com
  • ifilm.com
  • livevideo.com
  • video.yahoo.com
  • photobucket.com
  • flickr.com
  • xkcd.com

To write this script I will use LWP::UserAgent to get HTML contents and HTML::TreeBuilder to extract titles and links.

This script will output the found items in human readable format, ready for input to another script which will absorb this information and put it in the SQLite database.

This script is called 'reddit_extractor.pl'. It takes one optional argument which is number of reddit pages to extract links from. If no argument is specified, it goes through all reddit pages until it hits the last one. For example, specifying 1 as the first argument makes it parse just the front page. I can now run this script periodically to find links on the front page. No need for parsing RSS.

There is one constant in this script which can be changed. This constant, VOTE_THRESHOLD, sets the threshold of how many votes a post on reddit should have received to be collected by our program. I had to add it because when digging in older reddit's posts, media with 1 or 2 votes can be found which means it really wasn't that good.

The script outputs each media post matching a pattern or domain in the following format:

title (type, user, reddit id, url)
  • title is the title of the article
  • type is the media type. It can be one of 'video', 'videos', 'picture', 'pictures'. It's plural if the title contains "pics" or "videos" (plural) form of media.
  • user is the reddit user who posted the link
  • reddit id is the unique identifier reddit uses to identify its links
  • url is the url to the media

Script 'reddit_extractor.pl' can be viewed here:
reddit extractor (perl script, reddit media generator)

Then I will create a script which takes this input and puts it into SQLite database. It is so trivial that there is nothing much to write about it.

This script will also be written in Perl programming langauge and will use just DBI and DBD::SQLite modules for accessing the SQLite database.

The script will create an empty database on the first invocation, read the data from stdin and insert the data in the database.

The database design is dead simple. It contains just two tables:

  • reddit which stores the links found on reddit, and
  • reddit_status which contains some info about how the page generator script used the reddit table

Going into more details, reddit table contains the following colums:

  • id - the primary key of the table
  • title - title of the media link found on reddit
  • url - url to the media
  • reddit_id - id reddit uses to identify it's posts (used by my scripts to link to comments)
  • user - username of the person who posted the link on reddit
  • type - type of the media, can be: 'video', 'videos', 'picture', 'pictures'. It's plural if the title contains "pics" or "videos" (plural) form of media.
  • date_added - the date the entry was added to the database

The other table, reddit_status contains just two colums:

  • last_id - the last id in the reddit table which the generator script used for generating the site
  • last_run - date the of last successful run of the generator script

This script is called 'db_inserter.pl'. It does not take any arguments but has one constant which has to be changed before using. This constant, DATABASE_PATH, defined the path to SQLite database. As I mentioned, it is allowed for the database not to exist, this script will create one on the first invocation.

These two scripts used together can now be periodically run from crontab to monitor the reddit's front page and insert the links in the database. It can be done with as simple command as:

reddit_extractor.pl 1 | db_inserter.pl

Script 'db_inserter.pl' ca be viewed here:
db inserter (perl script, reddit media generator)

Now that we have our data, we just need to display it in a nice manner. That's the job of generator script.

The generator script will be run after the previous two scripts have been run together and it will use information in the database to build static HTML pages.

Since generating static pages is computationally expensive, the generator has to be smart enough to minimize regeneration of already generated pages. I commented the algorithm (pretty simple algorithm) that minimizes regeneration script carefully, you can take a look at 'generate_pages' function in the source.

The script generates three kinds of pages at the moment - pages containing all pictures and videos, pages containing just pictures and pages containing just videos.

There is a lot of media featured on reddit and as the script keeps things cached, the directory sizes can grow pretty quickly. If a file system which performs badly with thousands of files in a single directory is used, the runtime of the script can degrade. To avoid this, the generator stores cached reddit posts in subdirectories based on the first char of their file name. For example, if a filename of a cached file is 'foo.bar', then it stores the file in /f/foo.bar directory.

The other thing this script does is locate thumbnail images for media. For example, for YouTube videos, it would construct URL to their static thumbnails. For Google Video I could not find a public service for easily getting the thumbnail. The only way I found to get a thumbnail of Google Video is to get the contents of the actual video page and extract it from there. The same applies to many other video sites which do not tell developers how to get the thumbnail of the video. Because of this I had to write a Perl module 'ThumbExtractor.pm', which given a link to a video or picture, extracts the thumbnail.

'ThumbExtractor.pm' module can be viewed here:
thumbnail extractor (perl module, reddit media generator)

Some of the links on reddit contain the link to actual image. I wouldn't want the reddit media site to take long to load, that's why I set out to seek a solution for caching small thumbnails on the server the website is generated.

I had to write another module 'ThumbMaker.pm' which goes and downloads the image, makes a thumbnail image of it and saves to a known path accessible from web server.

'ThumbMaker.pm' module can be viewed here:
thumbnail maker (perl module, reddit media generator)

To manipulate the images (create thumbnails), the ThumbMaker package uses Netpbm open source software.

Netpbm is a toolkit for manipulation of graphic images, including conversion of images between a variety of different formats. There are over 300 separate tools in the package including converters for about 100 graphics formats. Examples of the sort of image manipulation we're talking about are: Shrinking an image by 10%; Cutting the top half off of an image; Making a mirror image; Creating a sequence of images that fade from one image to another.

You will need this software (either compile yourself, or get the precompiled packages) if you want to run the the reddit media website generator scripts!

To use the most common image operations easily, I wrote a package 'Netpbm.pl', which provides operations like resize, cut, add border and others.
'Netpbm.pm' package can be viewed here:
netpbm image manipulation (perl module, reddit media generator)

I hit an interesting problem while developing the ThumbExtractor.pm and ThumbMaker.pm packages - what should they do if the link is to a regular website with just images? There is no simple way to download the right image which the website wanted to show to users.
I thought for a moment and came up with an interesting but simple algorithm which finds "the best" image on the site.
It retrieve ALL the images from the site and find the one with biggest dimensions and make a thumbnail out of it. It is pretty obvious, pictures posted on reddit are big and nice, so the biggest picture on the site must be the one that was meant to be shown.
A more advanced algorithm would analyze it's location on the page and add weigh to the score of how good the image is, depending on where it is located. The more in the center of the screen, the higher score.

For this reason I developed yet another Perl module called 'ImageFinder.pm'. See the 'find_best_image' subroutine to see how it works!

'ImageFinder.pm' module can be viewed here:
best image finder (perl module, reddit media generator)

The generator script also uses CPAN's Template::Toolkit package for generating HTML pages from templates.

The name of the generator script is 'page_gen.pl'. It takes one optional argument 'regenerate' which if specified clears the cache and regenerates all the pages anew. It is useful when templates are updated or changes are made to thumbnail generator.

Program 'page_gen.pl' can be viewed here:
reddit media page generator (perl script)

While developing any piece of software I like solving various problems on paper. For example, with this site I had to solve problem how to regenerate existing pages minimally and how to resize thumbnails so they looked nice.
Here is how the sheet on which I took small notes looked like after the site got published:

reddit media website quick design notes
(sorry for the quality again, i took the picture with camera phone with two shots and stitched it together with image editor)

The final website is at redditmedia.com address (now moved to http://reddit.picurls.com). Click http://reddit.picurls.com to visit it!

Here are all the scripts packed together with basic documentation:

Download Reddit's Media Site Generator Scripts

All the scripts in a single .zip:
Download link: reddit media website generator suite (.zip)
Downloaded: 1738 times

Individual scripts:

reddit_extractor.pl
Download link: reddit extractor (perl script, reddit media generator)
Downloaded: 4647 times

db_inserter.pl
Download link: db inserter (perl script, reddit media generator)
Downloaded: 3346 times

page_gen.pl
Download link: reddit media page generator (perl script)
Downloaded: 3165 times

ThumbExtractor.pm
Download link: thumbnail extractor (perl module, reddit media generator)
Downloaded: 4018 times

ThumbMaker.pm
Download link: thumbnail maker (perl module, reddit media generator)
Downloaded: 3290 times

ImageFinder.pm
Download link: best image finder (perl module, reddit media generator)
Downloaded: 3412 times

NetPbm.pm
Download link: netpbm image manipulation (perl module, reddit media generator)
Downloaded: 3433 times

For newcomers - What is reddit?

For newcomers, reddit is a social news website where users decide its contents.

From their faq:

What is reddit?

A source for what's new and popular on the web -- personalized for you. We want to democratize the traditional model by giving editorial control to the people who use the site, not those who run it. Your votes train a filter, so let reddit know what you liked and disliked, because you'll begin to be recommended links filtered to your tastes. All of the content on reddit is from users who are rewarded for good submissions (and punished for bad ones) by their peers; you decide what appears on your front page and which submissions rise to fame or fall into obscurity.

Have fun with the website and please tell me what do you think about it in the comments! Thanks :)

perl pack unpack printf sprintf cheat sheetI decided one day that I want to master Perl's pack() and unpack() functions to be able to manipulate data in Perl efficiently.

Perl's pack and unpack are two functions for transforming data according to a user-defined template, between the guarded way Perl stores values and some well-defined representation as might be required in the environment of a Perl program. Unfortunately, they're also two of the most misunderstood and most often overlooked functions that Perl provides.

As I wrote before, my way of learning these complex functions were to make a cheat sheet first with all the template parameters and then just spend a day reading more about them and experimenting.

As I usually print cheat sheets two pages per side and the pack/unpack cheat sheet consumed just one page, I added Perl's printf/sprintf format and attribute summary.

Here is how I printed this cheat sheet:

perl pack unpack printf sprintf cheat sheet thumbnail
(Sorry for the bad quality, I shot it with my camera phone)

Download Perl's pack/unpack and printf Cheat Sheet

PDF:
Download link: perl's pack/unpack and printf cheat sheet (.pdf)
Downloaded: 126106 times

Microsoft Word 2000 format (.doc):
Download link: perl's pack/unpack and printf cheat sheet (.doc)
Downloaded: 3476 times

javascript rhino and yahoo theatreI decided I wanted to learn JavaScript Programming language better. I had been programming in it now and then but I had never really developed any good skills in it.

If you have read about this blog page then you know that I also run Free Science Online blog which is all about free video lectures online. In my May's post I had found some really good programming video lectures, 13 of them being on JavaScript. Since I run this video lecture blog, I obviously have great interest in video lectures, so why not try to learn better JavaScript from these video lectures?

These lectures are given by Douglas Crockford who is a senior JavaScript Architect at Yahoo!. He is well known for his work in introducing JavaScript Object Notation (JSON).


First four lectures are on the basics of language:

Sometimes Yahoo! Video gives this error: Sorry! This video is no longer available on Yahoo! Video. In this case refresh your browser a couple of times!

The next three lectures are on Advanced JavaScript:

Then there are 6 more lectures on JavaScript which should probably be viewed only after you have viewed the ones just mentioned.

Viewing the YUI Theater I just found another JavaScript lecture which was published just recently:

My approach to watching video lectures

I have been watching various video lectures for almost 4 years now. Mostly mathematics, physics and theory of computer science. My approach to getting most of the lectures is the following. When I watch the video lectures I take notes just as if I were in class.
Even better, when I do not understand any part of the lecture I can always rewind it back and see that fragment again. I can also pause the lecture, think for a while and then continue. But that's physics and maths.
Here is a photo of notes I have taken while watching MIT's 803: Vibrations and Waves (yes! it's available completely for free at MIT's Open Course Ware)

learning 803 vibrations and waves thumbnail pic

I am serious about physics and maths video lectures, as you can see in the image, all the main results are boxed in red, the results are fully derived (even if the professor does not do it on blackboard). Btw, one lecture perfectly fits on both sides of an A4 sheet.

Here is a close-up of Lecture 13: Electromagnetic Waves - Plane Wave Solutions to Maxwell's Equations - Polarization - Malus' Law.

mit'ss 803 - lecture 13 - em waves, plane waves

(Sorry about the bad quality of the photos, I shot them with my Nokia N73 cell phone camera)

This approach probably does not work with programming languages and computer tools. Because mathematics and physics is mostly done on paper, watching these video lectures and taking notes is actually doing them. The process of taking notes develops the skills because you work with the new concepts/operators/theorems/whatnot. Not so in programming languages. Unless you find an online degree for Java programmers, you can take a book on a new programming language, read it, and the next moment you can't even write the hello world program because you have only got familiar with the subject and have not developed the skills. I have experienced this myself.

Here is my approach how I am going to learn JavaScript from these lectures. I might adjust this approach at any moment if I find it not working, I will update this post appropriately then.

I will definitely watch all 11 video lectures. I will start with the first four basic lectures, watch them one by one, take notes as with physics video lectures and experiment as I go.

I will be taking notes lightly to have my mind really think the information I am getting over. But no red boxes around constructs as with physics. That's the experimentation part to learning. I am going to try the new constructs as soon as I see them so they stuck in my mind better.
Update: I dropped the idea of taking any notes on paper, because I am blogging the key points from lectures here.

Also to make this article interesting, I will annotate each lecture if something really interesting catches my eye. As I mentioned, I have programmed JS before so I am not sure how much I will learn from the first four basic lectures.

In my previous blog post I used Windows Script Host to create a program in VBScript. The other language the same scripting host runs is JScript which conforms to the same standard as JavaScript. So I should be safe doing JavaScript experimentation in JScript.

Points that caught my attention in JavaScript Video Lecture Part I

  • (00:45) World's most misunderstood programming language - has "Java" in its name and "Script". It has nothing to do with Java programming language and it's a real programming language not some tiny scripting language
  • (02:38) There are generally no books available to learn JS from - all are bad and full of nasty examples
  • (02:56) The only book recommended is JavaScript: The Definitive Guide, 5th Edition by David Flanagan - the least bad book
  • (03:37) JavaScript is a functional language
  • (08:12) Microsoft reverse engineered original implementation of JavaScript and called it JScript to avoid trademark issues
  • (09:49) During standardization, the original bugs were left as is without fixing to prevent already written programs from breaking (Douglas slips and says "Sun" but he actually means "Microsoft")
  • (12:16) One of the key ideas of the language is prototypal inheritance where objects inherit from objects and there are no classes
  • (12:45) One other key idea is that functions are first-class objects
  • (13:36) There are no integers in the language, everything is represented as 64-bit floating point numbers
  • (14:30) NaN (Not a Number) is not equal to antyhing, including NaN. Which means NaN == NaN is false
  • (15:22) Type of NaN is Number
  • (15:52) + prefix operator does the same thing as Number function
  • (16:08) Generally always specify radix argument in parseInt function because if the first character is '0' and there is no radix argument provided, it will assume that '0' to be an octal constant
  • (17:15) Each character takes 16-bits of memory
  • (17:56) There is no separate character type in JS, characters are represented as strings with a length of 1
  • (19:55) undefined is the default value for uninitialized variables and parameters
  • (20:38) Falsy values are: false, null, undefined, "", 0, NaN. All other values (including all objects are truthy). Everything else in the language are objects
  • (22:15) Object members can be accessed with dot notation Object.member or subscript notation Object["member"]
  • (22:59) The language is loosely typed but not "untyped"
  • (25:19) Reserved words are overused, they can't be used in dot notation as method names
  • (27:25) Operators == and != can do type coercion, so it's better to use === and !=== which do no coercion
  • (28:24) Operator && is also called the 'guard operator', Operator || is also called the 'default operator'
  • (30:16) The bitwise operators convert the operand to a 32-bit signed integer, perform the operation and then turn the result back into 64-bit floating point. Don't use bitwise operators in cases like multiplying by 4 using << 2. It will not.

After having watched the first lecture I decided that there was no point in taking notes on paper because I am blogging the key points here and trying various examples I can come up with with JScript, taking notes just wastes time.

Points that caught my attention in JavaScript Video Lecture Part II

  • (00:20) Break statements can have labels
  • (00:41) Iterating over all the members of an object with for (var name in object) { } syntax, will also iterate over inherited members
  • (06:10) There is only global and function scope in JavaScript, there is no block scope
  • (07:40) If there is no expression in a return statement, the value returned is undefined. Except for constructors, whose default return value is this
  • (08:29) An object is an unordered collection of name/value pairs
  • (21:10) All objects are linked directly or indirectly to Object.prototype
  • (23:42) Array indexes are converted to strings and used as names for retrieving values

Points that caught my attention in JavaScript Video Lecture Part III

  • (00:29) Functions inherit from Object and can store name/value pairs
  • (02:00) The function statement is just a short-hand for a var statement with a function value
  • (02:35) Functions can be defined inside of other functions
  • (03:08) JavaScript has closures
  • (07:29) There are four ways to call a function: function form, method form, constructor form and apply form
  • (10:00) this is an extra parameter. Its value depends on the calling form
  • (10:30) When a function is invoked, in addition to its parameters, it also gets a special parameter called arguments
  • (11:53) Built-in types can be augmented through (Object|Array|Function|Number|String|Boolean).prototype
  • (14:09) The typeof prefix operator returns 'object' for Array and null types
  • (15:23) eval() is the most misused feature of the language
  • (21:57) In web browsers, the global objects is the window object
  • (22:51) Use of the global namespace must be minimized
  • (23:11) Any variable which is not properly declared is assumed to be global by default
  • (23:45) JSLint is a tool which helps identify weaknesses
  • (24:21) Every object is a separate namespace, use an object to organize your variables and functions
  • (27:15) Function scope can create an encapsulation

Points that caught my attention in JavaScript Video Lecture Part IV

  • (03:30) The language definition is neutral on threads
  • (11:20) When the compiler sees an error, it attempts to replace a nearby linefeed with a semicolon and try again
  • (12:51) Do not use extra commas in array literals. Netscape will tell you that length of [1,2,3,] is 3 while IE will tell it's 4
  • (18:48) Key ideas in JavaScript - Load and go delivery, loose typing, objects as general containers, prototypal inheritance, lambda, linkage through global variables

These four lectures gave me much better theoretical understanding of JavaScript but just a little better practical skills. I should do a project entirely in JavaScript to become more skillful.

I can't wait to see the Advanced JavaScript lectures. At the end of the 4th lecture Douglas said that they will continue with theory of DOM which I will follow and only then continue with Advanced JS.

Points that caught my attention in The Theory of the DOM Part I

  • (03:31) A scripted web browser, Netscape Navigator 2, was first introduced in 1995
  • Best thing happened to have standards work was Mozilla abandoning the Netscape layer model in favor of the W3C model
  • (10:03) List of browsers Yahoo! wants their JavaScript library to run on are FireFox 1.5, FireFox 2.0, Safari 2, Internet Explorer 6, IE 7, Opera 9
  • (12:11) The <script> tag first appeared in Netscape Navigator 2
  • (13:05) <!-- --> comment around script was a Netscape 2 hack for Mosaic and Navigator 1.0
  • (14:51) W3C deprecated language=javascript attribute in script tags, don't put it there anymore
  • (18:48) If you call document.write before onload it inserts data into the document, if you call it after, it replaces the document with the new stuff
  • (20:25) name= is used to identify values in form data and to identify a window or frame
  • (20:45) id= is used to uniquely identify an element so that you could get access to it
  • (20:59) Microsoft introduced document.all as a super-collection of all elements with name or id
  • (21:39) W3C instead said use document.getElementById(id) and document.getElementsByName(name)
  • (23:41) Document tree structure is different for IE than for other browsers because Microsoft decided to depart from W3C standard and not to include whitespaces as text nodes in the tree
  • (25:02) document.body gets you to body node of the tree, document.documentElement gets you to the html root tag of the tree

After watching this lecture I decided to add time where each point that caught my attention happened so that if anyone is interested in any of the points he/she could just fast forward to that place in the video. Eventually I will go through the videos up to this one once more and add timestamps.

Points that caught my attention in The Theory of the DOM Part II

  • (04:32) The guy designing CSS chose a not so appealing names to a programmer for CSS style names. JavaScript guys converted those to camel case in JavaScript which is probably the least compatible with CSS style names
  • (08:31) Replacing a child is done with "java oriented, model based, nothing in common with reality sort of api" through old.parentNode.replaceChild(new, old) where you specify old twice
  • (09:03) It is important to remove any event handlers from the object before you delete it
  • (10:10) Microsoft and their Internet Explorer were the first to realize that it is convenient to provide access to HTML parser and provided innerHTML property which can be assigned a string containing HTML directly
  • (10:50) There is no standard describing innerHTML property
  • (12:12) The browser has an event-driven, single-threaded, asynchronous programming model
  • (12:55) There are three ways to adding event handlers - classic mode (node["on" + type] = func), Microsoft mode (node.attachEvent("on" + type, func)) and W3C mode (node.addEventListener(type, func, bool))
  • (14:50) Microsoft does not send an event parameter, they use the global event object instead.
  • (15:58) There are two ways how events are handled - trickling and bubbling
  • (17:23) The extra bool parameter in W3C mode of adding event handlers node.addEventListener(type, func, bool) tells whether the events are processed bottom up (bubbling) or top down (trickling)

Points that caught my attention in The Theory of the DOM Part III

  • (01:26) Hugest memory leaks happen in IE 6
  • (01:33) Because of that you must explicitly remove event handlers from nodes before deleting or replacing them
  • (06:49) self, parent and top are aliases of window object
  • (08:10) A script can access another window if and only if document.domain === otherwindow.document.domain
  • (10:10) There are three ways to get cross browser compatibility - browser detection, feature detection, platform libraries
  • (11:20) Internet Explorer 1.0 identified itself as "Internet Explorer" but many sites refused to serve the contents complaining that it was not "Mozilla" so in version 1.5 IE identifies itself as "Mozilla"
  • (12:16) Browser detection cross compatibility is the least recommended way
  • (15:37) Platform library cross compatibility is the most recommended way
  • (15:35) Platform library cross compatibility is the most recommended way
  • (18:48) No browser completely implements the standards and much of the DOM is not in any standards. If there was a 100% standards compliant browser, it would not work!
  • (19:19) When programming DOM: 1) do what works; 2) do what's common; 3) do what's standard

Okay, I watched the whole Theory of DOM course and have gained good theoretical knowledge but basically no practical skills. To get better with the JavaScript and DOM will require me to do some interesting practical projects with both of these guys.

Now I am off to watch Advanced JavaScript lectures and then the remaining.

Points that caught my attention in Advanced JavaScript Part I

  • (01:20) In prototypal inheritance objects inherit directly from objects, there are no classes
  • (01:30) An objects contains a "secret link" to another object. Mozilla calls it __proto__
  • (03:36) If looking for a member fails, the last object searched is Object.prototype
  • (07:50) When functions are designed to be used with new, they are called constructors
  • (08:13) new Constructor() returns a new object with a link to Constructor.prototype
  • (09:40) Always have your constructors named with a capital letter so you at least develop reflex for putting a new in front of it
  • (09:48) Forgetting the new still makes code work but it does not construct a new object. This is considered one of the language design errors
  • (10:00) When a new function object is created, it is always given a prototype member
  • (10:49) "Differential inheritance" is a special form of inheritance where you specify just the changes from one generation of objects to the next
  • (17:24) JavaScript doesn't have an operator which makes a new object using an existing object as its prototype, so we have to write our own function
  • (18:50) A "public method" is a function that uses this to access its object
  • (21:55) Functions can be used to create module containers
  • (25:01) "Privileged methods" are functions that have access to "secret" information and they are constructed through closures
  • (28:05) "Parasitic inheritance" is a way of creating an object of an augmented version an existing object.

Points that caught my attention in Advanced JavaScript Part II

  • (06:30) Pseudoclassical patterns are less effective than prototypal patterns or parasitic patterns
  • (09:06) Inner functions do not have access to this
  • (09:32) In JavaScript 1.0 there were no arrays
  • (10:31) When arrays got added, arguments object was forgot to be converted to Array object and it continues to be an array like object
  • (15:47) There are debuggers for IE - Microsoft Script Debugger, which is bad and two debuggers built in Visual Studio and Office 2003. Mozilla has Venkman and Firebug. Safari has Drosera
  • (17:30) funny instruction on how to get debugger working with Office 2003 :D
  • (24:29) All implementations of JavaScript have non-standard debugger statement which cause a breakpoint if there is a dubgger present

Points that caught my attention in Advanced JavaScript Part III

  • (04:20) Array join() method is much faster for concatenating large set of strings than using operator +
  • (07:03) Just have the server gzip the JavaScript source file to minimize the load times, avoid tools which can introduce bugs such as minificators and obfuscators
  • (07:19) JSON

The advanced JavaScript lectures provided lots of idioms and patterns used in the language. I did not do much experimentation and have really grasped just the concepts and overall structure of these advanced concepts.

Now I am going to watch "Advancing JavaScript with Libraries" by John Resig, creator of the JQuery JavaScript library and author of Pro JavaScript Techniques, is a Mozilla technologist focused on the relationship between Mozilla and the world of JavaScript libraries.

Interesting points from Advancing JavaScript with Libraries Part I

  • (08:20) In IE7 basically the only change to JavaScript was to XmlHTTPRequest object
  • (25:14) There are two standards for querying the DOM document - XPath and CSS 3 selectors
  • (26:13) IE doesn't have very good CSS selector support, because of that users have been using the very minimum of CSS selectors which almost equates CSS 1
  • (27:30) jQuery allows selecting elements from the DOM by using CSS 3 selectors which is done in one line of jQuery code instead of 20 - 25 lines of just JavaScript DOM code

Interesting points from Advancing JavaScript with Libraries Part II

  • (05:15) Users are expecting the DOM selectors to behave more like CSS, that is like when a new CSS selector is added, it propagates to all the elements affected. The users expect the same to happen when a chunk of HTML is added to the DOM, that the handlers get added to them without re-running any code
  • (12:30) Object Relational Mappings
  • (14:50) Libraries create new patterns on top of existing APIs

There were not that many points that got me interested because it was pretty obvious stuff. It was just interesting to see that the DOM is not perfect and there are many bugs which one or the other browser fails, why they fail and how to solve these DOM related problems. Also typical programming meta-problems were discussed such as JavaScript trying to get elements before browser has loaded the DOM, how the white spaces are treated and what methods to use for navigating the DOM. Later the lecture went on how to query the DOM tree using jQuery and mix of XPath and CSS 3. Then it is discussed how injecting HTML in an existing document is done, what's tricky about it and what problems can arise. Finally the lecture continues with FUEL and object relational mappings.

The other lecture I watched was "Maintainable JavaScript" by Nicholas Zakas. He is an engineer on the team that brings you My Yahoo!, one of the most popular personalized portals on the web. He is also the author of two books on frontend engineering, including "Professional JavaScript for Web Developers," one of the best tomes of its kind.

Interesting points from Maintainable JavaScript

  • (01:15) It is estimated that as much as 80% of time is spent maintaining existing code
  • (04:30) Maintainable code is understandable, intuitive, adaptive, extendable and debuggable
  • (16:20) There are three layers on the client side - JavaScript for behavior, CSS for presentation and HTML for structure
  • (23:42) Programming practices
  • (26:30) Namespace your objects
  • (32:10) Avoid null comparison, use instanceof or typeof operators
  • (37:15) Write code in separate JavaScript files and use a build process to combine them
  • (40:47) Summary of writing maintainable code - code conventions, loose coupling, programming practices and build process

There are not that many interesting points anymore because most of the stuff has been covered in the previous lectures. Apart from that these lecture did not teach me much new because this stuff was pretty obvious. The only point to watch this lecture is to refresh all these obvious suggestions - agree on indentation and naming conventions in your team, comment difficult algorithms and large sections of code, don't write obvious comments, comment hacks, loose coupling, careful use of complex code and design patterns, etc.

What do you think about these lectures?