What do you think John McCarthy would say if he saw your code? I don't think he'd like it...

John McCarthy - Programming, You Are Doing It Completely Wrong.

John McCarthy - Programming, You Are Doing It Completely Wrong.

Remember my article on Set Operations in the Unix Shell? I implemented 14 various set operations by using common Unix utilities such as diff, comm, head, tail, grep, wc and others. I decided to create a simpler version of that post that just lists the operations. I also created a **.txt cheat-sheet** version of it and to make things more interesting I added an Awk implementation of each set op. If you want a detailed explanations of each operation, go to the original article.

Download .txt right away: set operations in unix shell (.txt)

Download path: **http://www.catonmat.net/download/setops.txt**

## Set Membership

$ grep -xc 'element' set # outputs 1 if element is in set # outputs >1 if set is a multi-set # outputs 0 if element is not in set $ grep -xq 'element' set # returns 0 (true) if element is in set # returns 1 (false) if element is not in set $ awk '$0 == "element" { s=1; exit } END { exit !s }' set # returns 0 if element is in set, 1 otherwise. $ awk -v e='element' '$0 == e { s=1; exit } END { exit !s }'

## Set Equality

$ diff -q <(sort set1) <(sort set2) # returns 0 if set1 is equal to set2 # returns 1 if set1 != set2 $ diff -q <(sort set1 | uniq) <(sort set2 | uniq) # collapses multi-sets into sets and does the same as previous $ awk '{ if (!($0 in a)) c++; a[$0] } END{ exit !(c==NR/2) }' set1 set2 # returns 0 if set1 == set2 # returns 1 if set1 != set2 $ awk '{ a[$0] } END{ exit !(length(a)==NR/2) }' set1 set2 # same as previous, requires >= gnu awk 3.1.5

## Set Cardinality

$ wc -l set | cut -d' ' -f1 # outputs number of elements in set $ wc -l < set $ awk 'END { print NR }' set

## Subset Test

$ comm -23 <(sort subset | uniq) <(sort set | uniq) | head -1 # outputs something if subset is not a subset of set # does not putput anything if subset is a subset of set $ awk 'NR==FNR { a[$0]; next } { if !($0 in a) exit 1 }' set subset # returns 0 if subset is a subset of set # returns 1 if subset is not a subset of set

## Set Union

$ cat set1 set2 # outputs union of set1 and set2 # assumes they are disjoint $ awk 1 set1 set2 # ditto $ cat set1 set2 ... setn # union over n sets $ cat set1 set2 | sort -u # same, but assumes they are not disjoint $ sort set1 set2 | uniq # sort -u set1 set2 $ awk '!a[$0]++' # ditto

## Set Intersection

$ comm -12 <(sort set1) <(sort set2) # outputs insersect of set1 and set2 $ grep -xF -f set1 set2 $ sort set1 set2 | uniq -d $ join <(sort -n A) <(sort -n B) $ awk 'NR==FNR { a[$0]; next } $0 in a' set1 set2

## Set Complement

$ comm -23 <(sort set1) <(sort set2) # outputs elements in set1 that are not in set2 $ grep -vxF -f set2 set1 # ditto $ sort set2 set2 set1 | uniq -u # ditto $ awk 'NR==FNR { a[$0]; next } !($0 in a)' set2 set1

## Set Symmetric Difference

$ comm -3 <(sort set1) <(sort set2) | sed 's/\t//g' # outputs elements that are in set1 or in set2 but not both $ comm -3 <(sort set1) <(sort set2) | tr -d '\t' $ sort set1 set2 | uniq -u $ cat <(grep -vxF -f set1 set2) <(grep -vxF -f set2 set1) $ grep -vxF -f set1 set2; grep -vxF -f set2 set1 $ awk 'NR==FNR { a[$0]; next } $0 in a { delete a[$0]; next } 1; END { for (b in a) print b }' set1 set2

## Power Set

$ p() { [ $# -eq 0 ] && echo || (shift; p "$@") | while read r ; do echo -e "$1 $r\n$r"; done } $ p `cat set` # no nice awk solution, you are welcome to email me one: # peter@catonmat.net

## Set Cartesian Product

$ while read a; do while read b; do echo "$a, $b"; done < set1; done < set2 $ awk 'NR==FNR { a[$0]; next } { for (i in a) print i, $0 }' set1 set2

## Disjoint Set Test

$ comm -12 <(sort set1) <(sort set2) # does not output anything if disjoint $ awk '++seen[$0] == 2 { exit 1 }' set1 set2 # returns 0 if disjoint # returns 1 if not

## Empty Set Test

$ wc -l < set # outputs 0 if the set is empty # outputs >0 if the set is not empty $ awk '{ exit 1 }' set # returns 0 if set is empty, 1 otherwise

## Minimum

$ head -1 <(sort set) # outputs the minimum element in the set $ awk 'NR == 1 { min = $0 } $0 < min { min = $0 } END { print min }'

## Maximum

$ tail -1 <(sort set) # outputs the maximum element in the set $ awk '$0 > max { max = $0 } END { print max }'

## Have Fun!

Have fun with these ops! If you can think of other solutions, or have any tips or tricks to add, please comment on the article! Thank you!

Thanks to waldner and pgas from #awk in FreeNode. And greetings to Andreas for coming up with the cool power set function for bash!

Download "**Set Operations in Unix Shell**" Document

Download .txt document: set operations in unix shell (.txt)

Downloaded: 12834 times

Download URL: http://www.catonmat.net/download/setops.txt

Download .pdf document: set operations in unix shell (.pdf)

Downloaded: 13846 times

Download URL: http://www.catonmat.net/download/setops.pdf

This week on Musical Geek Friday an anti-piracy video-musical from 1990s! It's called **Don't Copy That Floppy**.

"Don't Copy That Floppy" was an anti-copyright infringement campaign run by the Software Publishers Association (SPA) beginning in 1992.

In this video two teenagers, Jenny and Corey, are playing a game on a classroom computer. Corey is exuberantly pushing keys and is heavily immersed in the game action; Jenny is beating him. Frustrated, he asks for a rematch, but she has an upcoming class and must leave. He decides he will copy the game so that he can play it at home. Upon inserting his blank floppy disk into the Apple Macintosh computer a video pops up on the computer. This video is of a rapper named MC Double Def DP the "Disk Protector". The DP's role is instructional and he must teach the teenagers that copying games is bad. His method of lecture is a hip-hop style song and dance.

I ripped and edited the audio from the video. You can listen and download it here:

[audio:http://www.catonmat.net/download/dont_copy_that_floppy.mp3]

Download this song: don't copy that floppy (musical geek friday #16)

Downloaded: 23353 times

Download lyrics: don't copy that floppy lyrics (musical geek friday #16)

Downloaded: 8173 times

Here is the whole video, some parts are really boring and you may want to skip those:

Direct URL: http://www.youtube.com/watch?v=qj8FACzHeko

Here is just the song that I ripped:

Direct URL: http://www.youtube.com/watch?v=XWf_jbrpn4o

Don't Copy That Floppy lyrics:

(Corey: Jenny, hold up. Look, I brought a disk and we could copy this, ok?

We could play it on my brother computer.

Jenny: Ok, no problem... All we gotta do is... Woah!

Corey: Are you sure you know whatcha doing?)Did I hear you right, did I hear you sayin'

That you're gonna make a copy of a game without payin'?

Come on, guys, I thought you knew better don't copy that floppy!Don't don't don't don't...

(Corey: Wait a minute. Who the heck are you, anyway?

Jenny: Yeah. And what are you doing on my computer?)I'm your MC Double Def DP

That's the Disk Protector for you and the posse

That's your artists, writers, designers and programmers

They pump up the images for games and gramma's that lets you learn, but also play

The games you came here for todayNow I know you love the game and that's alright to do

Because the posse who make them, they love them too

But if you start stealing, there's no more they can do(Corey: But I just wanted to make one copy!)

You say 'I'll just make a copy, for me and a friend'

Then he'll make one and she'll make one and where will it end?

One leads to another then ten, then more,

And no one buys anything from the store

So no one gets paid and they can't make more

The posse breaks up and they close the door

Don't copy! Don't copy that floppy!So let me break this down for you:

D-D-Do-Do-Don'tNo Carmen Sandiego, no more Oregon Trail

Tetris and the others, they're all gonna fail

Not because we want it but because you're just takin' it

Dis-res-pec-tin' all the folks who are ma-kin' it

The more you take, the less there will be

The disks become fewer, the games fall away

The screen starts to tweak, and then it will fade

Programs fall through a black hole in space

The computer world becomes bleak and stark

Loses its life and the screen goes dark(Computer: Welcome to the end of the computer age... mwahahahaha...)

But I'm much too strong and you're much too smart

To let that happen to your chances to explore

Parts of the new age just behind the door of your minds

You're the posse of the future and you hold in your brains what's never thought before

And in time, you'll see just so much more

That's why I'm here and that's what I'm fighting for

Don't copy! Don't copy that floppy!Now let me introduce you, to some of the teams

That will explain a little more about what I mean!D-D-Do-Do-Don't... Don't copy that floppy!

You see, on these disks we have frozen in time

The creativity of someone's mind

Do you think, that because, with a flick of a key

You can copy that game, that the work is free

This creativity, we protect it by law

We value so highly, what the mind's eye saw

Don't copy! Don't copy that floppy!D-D-Do-Do-Don't... Don't copy... Don't copy that floppy!

To do the right thing, it's really simple for you

The copyright law, it will tell you what to do

Buy one, for every computer you use

Anything else is like going to the store

Taking the disk, and walking out the door

It's called thiefin', stealin', taking what's not yours

Is that really where you want your life to go?

Think about it, I don't think so.

Don't copy! Don't copy that floppy!Now you see a game you like and you really want to try it

Don't copy that floppy, just go to the store and buy it

Think of it this way, okay?

When you're buy a disk, you're sayin' to the team

You respect what you do and what you're workin' for

We'll keep up our support so you can make up some more

We'll do the right thing and the future will be clear

There will be new programs here at the end

Don't copy! Don't copy that floppy!Now you know how the games and the programs are made

And what you do to make sure that they're not gonna fade

The bottom line is it's all up to you

There's nothing more that I can do

The goals in your court, dribble, shoot, or pass

I'm sure you'll make your decision with classDon't copy that floppy

(MC Double Def DP: See ya, I'm outta here!)

Download "Don't Copy That Floppy" Song

Download this song: don't copy that floppy (musical geek friday #16)

Downloaded: 23353 times

Download lyrics: don't copy that floppy lyrics (musical geek friday #16)

Downloaded: 8173 times

Click to listen:

[audio:http://www.catonmat.net/download/dont_copy_that_floppy.mp3]

Have fun and until next geeky Friday! :)

**33**Comments November 27, 2008

# MIT's Introduction to Algorithms, Lecture 15: Dynamic Programming

This is the tenth post in an article series about MIT's lecture course "**Introduction to Algorithms**." In this post I will review lecture fifteen, which introduces the concept of **Dynamic Programming** and applies it to the **Longest Common Subsequence** problem.

Dynamic programming is a design technique similar to divide and conquer. Divide-and-conquer algorithms partition the problem into independent subproblems, solve the subproblems recursively, and then combine their solutions to solve the original problem. Dynamic programming is applicable when the subproblems are not independent, that is, when subproblems share subsubproblems. A dynamic-programming algorithm solves every subsubproblem just once and then saves its answer in a table, thereby avoiding the work of recomputing the answer every time the subsubproblem is encountered.

Dynamic programming was systematized by Richard E. Bellman. He began the systematic study of dynamic programming in 1955. The word "programming," both here and in linear programming, refers to the use of a tabular solution method and not to writing computer code.

Dynamic programming is typically applied to optimization problems. In such problems there can be many possible solutions. Each solution has a value, and we wish to find a solution with the optimal (minimum or maximum) value. We call such a solution *an optimal solution*, as opposed to *the optimal solution*, since there may be several solutions that achieve the optimal value.

Dynamic programming can be effectively applied to solve the longest common subsequence (LCS) problem. The problem is stated as following: given two sequences (or strings) x and y find a maximum-length common subsequence (substring) of x and y.

For example, given two sequences x = "ABCBDAB" and y = "BDCABA", the LCS(x, y) = { "BCBA", "BDAB", "BCAB" }. As you can see there are several optimal solutions.

Lecture fifteen introduces dynamic programming via this longest common subsequence problem. It first gives a brute-force, exponential time algorithm for solving it. The idea of algorithm is to check every subequence of x[1..m] (m is the length of sequence x) to see if it is also a subsequence of y[1..n] (n is the length of sequence y). Checking takes O(n) time, and there are 2^{m} subsequences of x. The running time thus is exponential O(n·2^{m}). It is no good for large sequences and the lecture continues with a simplification - let's look at the length of a longest-common subseq and then extend this algorithm to find the LCS itself. The simplified algorithm is recursive in nature and computes the same subproblems. At this moment two dynamic programming hallmarks are stated:

- 1.
**Optimal substructure**: an optimal solution to a problem contains optimal solutions to subproblems. - 2.
**Overlapping subproblems**: a recursive solution contains a "small" number of distinct subproblems repeated many times.

As the subproblems are overlapping, the lecture introduces concept of **memoization** algorithm (note that it's not memo**r**ization). A better known word for memoization is caching. The subproblems are cached (memoized) so that they are not recomputed over and over again.

The lecture ends with constructing a dynamic programming table for LCS problem and explains how to find a LCS from this table.

You're welcome to watch lecture fifteen:

Topics covered in lecture fifteen:

- [00:20] Dynamic programming.
- [01:47] Longest common subsequence (LCS) problem.
- [03:55] Example of LCS on sequences "ABCBDAB" and "BDCABA".
- [06:55] Brute force algorithm for LCS.
- [07:50] Analysis of brute force algorithm.
- [11:40] Simplification of LCS problem.
- [16:20] Theorem about LCS length.
- [18:25] Proof of the theorem.
- [30:40] Dynamic programming hallmark #1: Optimal substructure.
- [32:25] Example of hallmark #1 on LCS.
- [34:15] Recursive algorithm for longest common subseq.
- [36:40] Worst case analysis of the algorithm.
- [38:10] Recursion tree of algorithm.
- [42:40] Dynamic programming hallmark #2: Overlapping subproblems.
- [44:40] Example of hallmark #2 on LCS.
- [45:50] Memoization algorithm for LCS.
- [48:45] Time and space analysis of memoized algorithm.
- [54:30] Dynamic programming algorithm for LCS.
- [01:01:15] Analysis of dynamic programming algorithm.

Lecture fifteen notes:

Have fun with programming dynamically! The next post will be about graphs, greedy algorithms and minimum spanning trees.

PS. This course is taught from the CLRS book (also called "Introduction to Algorithms"). Chapter 15 is called "Dynamic Programming" and covers the topics in this lecture. It also explains the assembly-line scheduling problem, matrix-chain multiplication problem, elements of dynamic programming and optimal binary search trees.

Two weeks ago I had an on-site interview at Google in Mountain View, California! The job interview with Google was an interesting experience and I want to tell you about it.

The position I was interviewing for was a Google SRE. SRE stands for Site Reliability Engineering. Site reliability engineers (SREs) are both software engineers and systems administrators, responsible for Google's production services from end-to-end.

There were eight separate interviews. The first three were over the phone (phone interviews) and the remaining five were on-site. The first interview was with the recruiter and was not very technical but the other seven were very technical.

All interviews went pretty well but I just learned that I won't be getting hired. I personally think that I did really well. I answered all the questions but it seems they were not satisfied. Google and the recruiter didn't give me precise reasons. He said that "the morning interviews were not that great" and "I should get more experience to work in their mission critical team."

**Update:** This article has been translated to Japanese.

**Update:** This article has been translated to German.

Here is how it all happened.

Shortly after I published the "Code Reuse in Google Chrome" post I was contacted by a recruiter at Google. The email said:

I recruit top notch Software Engineering talent at Google. I recently came across your name as a possible world class Engineer and am intrigued to know more about you. I promise to exchange some detailed info about us as well.

Interested to hear more? Want to be an impact player at Google? Then please respond with a current (English) copy of your resume and I'll be happy to call you and discuss.

At first I thought I would be applying for a software developer position, but after we went through my skillset, the recruiter concluded that I would better fit as an SRE. I agreed with him. This seemed like a perfect position for me. I love systems administration as much as I love programming.

## First Interview (phone)

The first interview was on the 10th of September with the recruiter. He explained the Google recruitment process to me and we went through my skill set. I had to rank myself from 0 - 10 in a bunch of areas such as C programming, C++ programming, Python programming, networking, algorithms and data structures, distributed systems, Linux systems administration, and others.

As I said, based on my answers we concluded that SRE was the best position for me. An SRE basically has to know everything: algorithms, data structures, programming, networking, distributed systems, scalable architecture, troubleshooting. It's a great hacker position!

After these questions he asked me where I would like to work - Google office in Ireland, Zurich, Mountain View or Australia. I said Mountain View as it's the Googleplex!

The second half of the interview had some basic technical questions, just to make sure I knew something. The questions were about Linux systems administration, algorithms, computer architecture and C programming. I can't go into any details because I signed a non-disclosure agreement. (Update: NDA expired, so I posted all the interview questions at the bottom of this post.)

I made some factual mistakes but he was satisfied and we scheduled the next phone interview. He warned me that it will be very technical and I should do really good preps. I asked him to give me a plenty of time for the preparation and we scheduled the next interview on 22nd of September.

He also told me that each phone interview is going to be 45 minutes to 1 hour long.

I started preparing like crazy. I found three presentations on what SRE is all about:

- Engineering Reliability into Web Sites: Google SRE
- Google SRE: That Couldn't Happen to US... Could It?
- Google SRE: Chasing Uptime

Then I found all the other blog posts about interviews and interview questions at Google:

- Corey Trager's Google Interview
- Rod Hilton's Google Interview
- Ben Watson's Google Interview
- Shaun Boyd's Google Interview
- How I Blew My Google Interview by Henry Blodget
- Get That Job at Google by Steve Yegge
- Tales from the Google's interview room
- Google Interview Questions
- Google Interview Questions -- Fun Brain Teasers!
- And some others...

I printed and read four Google research papers:

- The Google File System
- Bigtable: A Distributed Storage System for Structured Data
- MapReduce: Simplified Data Processing on Large Clusters
- and just for fun Failure Trends in a Large Disk Drive Population

I also went through several books:

- the best book on basics of networking "TCP/IP Illustrated"
- the best book on algorithms "MIT's Introduction to Algorithms" + my notes on algorithms
- a book on scalability "Building Scalable Web Sites"

As I did not know if I might get specific programming language questions, I went through a few tens of recipes in C++ Cookbook, Python Cookbook, and Perl Cookbook.

## Second Interview (phone)

The second phone interview was with an engineer from Google. He worked on the Ads team which is responsible for running AdSense, AdWords and other advertisement stuff.

The interview was very technical and started with an algorithmic problem which was too large to fit in computer memory. I had to tell him precisely how I would get around this problem and what data structures and algorithms I would use. He also asked me to think out loudly. The interview continued with questions about data structures, DNS, TCP protocol, a security vulnerability associated with TCP, networking in general, and Google itself.

The questions basically where:

- You've 100GB file but only 1GB of memory. How would you sort it?
- Tell me about your favorite data structure.
- How does DNS work?
- Can DNS work over TCP?
- How do DNS root servers work?
- How does BGP work?
- How does TCP work and what is 3-way handshake?
- How does TCP session spoofing works and how is it prevented?
- What would you change at Google?

After the interview the engineer had to write feedback on me. It was positive and I could move on with the interviews.

## Third Interview (phone)

I gave myself more time to prepare and the third interview was on the 1st of October. It was with an engineer from the Google traffic team.

In this interview I had a very simple programming question and I had to do coding over phone. I was free to choose the language and I chose Perl as it is my most favorite programming language. It was impossible to dictate Perl syntax over phone "for my dollar sign element open paren at data close paren open curly brace ... close curly brace" so I submitted my Perl program over the email.

The question was: Write a program to find set difference. Given two sets A and B, find elements in A-B, or in other words, find elements in set A that are not in B.

Then the same problem was taken to the next level, what if the data we are working on is gigabytes in size, terabytes in size. How would my program/solution change?

Finally I had a question about DNS again, then HTTP protocol, routing, and TCP data transfer.

The questions were:

- How does DNS work?
- How does HTTP work?
- If a HTTP request fails, does operating system retry it, or the browser?

The feedback was positive and I could prepare for the on-site interviews. In my conversation with my recruiter I got to know that there will be five on-site interviews, each exactly 45 minutes long. One on my previous work experience, one on algorithms and data structures, one on troubleshooting and networking, and two on software development with focus on C and C++.

My recruiter suggested that I read a few more documents:

- Google C++ Style Guide
- Web Search for a Planet: The Google Cluster Architecture
- Algorithm Tutorials on TopCoder

I flew out to interview on 24th of October and arrived in California at 8pm. Google paid for my trip, hotel, cab and food.

## Fourth Interview (on-site)

The fourth interview was finally at Googleplex! At 10am I met my recruiter and we had a 15 minute discussion about the interviews. He told me I would have two interviews now, then one of Google engineers would take me to lunch to one of Google's restaurants and then I would have three other interviews.

At 10:15am the first on-site interview began. It was about my previous job experience. I have had a lot of job experience in the past and I decided to tell about a physical security notification system that I coded in C on Linux a few years ago. The system would receive messages through the serial port and send out emails and text messages.

In the last minutes of the interview he asked me some basic Unix filesystem questions. What is an inode?

In all the on-site interviews I was writing and drawing on two big whiteboards.

## Fifth Interview (on-site)

The fifth interview began at 11am. It was a coding session and began with a trick question and not a real coding problem. The trick question was: What's the angle between clock hands when it's 3:15. Then I was asked to implement the solution in C for arbitrary hour:minute. The solution was a mathematical expression that was a one-line return statement. No big coding there. Then I was asked to write an implementation of a binary tree. While coding I made a mistake and forgot to initialize part of a data structure that I had malloc()'ed. The program would have segfault'ed in real life and I would have noticed the error. I said there were no errors here but the Google engineer pointed out I hadn't initialized data.

After this interview I was taken to lunch by the engineer who interviewed me on the second (phone) interview. She told me she was working at Google for two years and was very happy about it. We went to Asian food restaurant (located in Googleplex). Then she showed me around Googleplex.

## Sixth Interview (on-site)

The sixth interview began at 12:45pm. It was a troubleshooting and networking interview. The interviewer drew a network diagram on the whiteboard and had imagined a problem in there. I had to ask a bunch of specific networking questions to locate the problem. He was satisfied and in the last few minutes of the interview he asked me some specific networking device questions, like what's the difference between a router and a switch and what's OSI model.

## Seventh Interview (on-site)

The seventh interview began at 1:30pm. It was a coding session. I was asked to implement a simple string manipulation subroutine that finds common characters in two C strings. I could use either C or C++. I chose C. Unfortunately I made an off-by-one mistake there - the most common programming mistake in the history of mankind. The whole interview focused on this one problem.

## Eighth Interview (on-site)

The last, eight, interview began at 2:15pm. It was algorithms and data structures interview. The problem presented here was similar to the problem in the 2nd interview. Not only was it a problem too large to fit in computer memory but it also was distributed. How to sort data that doesn't fit in memory and you've 100 computers to sort it. I had to do all kinds of trickery to solve it. The interview was very free-style and we talked back and forth about the problem. I arrived at the correct solution near the end of the interview and he said that not many candidates get that far in the solution. I was also asked if I knew mapreduce and of course I knew mapreduce as I had read the Google paper. This was basically a mapreduce problem.

After the interview the engineer escorted me out to the lobby and I took a cab back to my hotel.

## The End

Overall the Google interviews were super fun. I love trivia questions like they ask in interviews. The interview questions were technical but not very challenging or difficult.

Update: Now that my NDA has expired, here are some the interview questions that I remember.

- Tell me about one of the projects on your resume.
- What technologies did you use to get this project going?
- What if your project had 5000 or 50000 or 5000000 users?
- What's an inode?
- What's the angle between clock faces when it's 3:15?
- Write a C function that returns angle between clock faces for any (hour, minute).
- Write a binary tree.
- How would you troubleshoot this problem - network diagram prestented.
- What's the difference between a router and switch?
- Implement a routine in C that counts number of characters in a string.
- Given 100GB file and a computer with 1GB of memory, how would you sort it.
- Can you make it parallel and solve it on 100 computers?
- What's a priority queue?
- How does BGP work?
- Can DNS use TCP? In which cases DNS uses TCP?
- Implement set difference in any language you like.
- How does HTTP work?
- How does 3 way handshake work in TCP?
- What's
`void *`

? - What's the system call for creating files?