This article is part of the article series "Sed One-Liners Explained."
<- previous article next article ->

sed -- the superman of unix stream editingThis is the second part of a three-part article on the famous sed one-liners. This part will explain sed one-liners for selective printing of certain lines. See part one for introduction of the series.

Just like the famous Awk one-liners, sed one-liners are beautiful, tiny little sed programs that span no more than 1 terminal line. They were written by Eric Pement and are floating around on the Internet as 'sed1line.txt' file.

If you are intrigued by this article series, I suggest that you subscribe to my posts!

Eric's sed one-liners are divided into several sections:

Update: Spanish translation of part two is available!

I have also made a sed cheat sheet that summarizes the whole sed utility. I suggest that you print it before you proceed and keep it in front of you. It will help you memorize the commands faster.

Awesome news: I have written an e-book based on this article series. Check it out:

Grab the sed1line.txt file and let's start.

4. Selective Printing of Certain Lines.

44. Print the first 10 lines of a file (emulates "head -10").

sed 10q

This one-liner restricts the "q" (quit) command to line "10". It means that this command gets executed only when sed reads the 10th line. For all the other lines there is no command specified. When there is no command specified, the default action is to print the line as-is. This one-liner prints lines 1-9 unmodified and at 10th line quits. Notice something strange? It was supposed to print first 10 lines of a file, but it seems that it just printed only the first 9... Worry not! The quit command is sneaky in its nature. Upon quitting with "q" command, sed actually prints the contents of pattern space and only then quits. As a result lines 1-10 get printed!

Please see the first part of the article for explanation of "pattern space".

45. Print the first line of a file (emulates "head -1").

sed q

The explanation of this one-liner is almost the same as of the previous. Sed quits and prints the first line.

A more detailed explanation - after the first line has been placed in the pattern space, sed executes the "q" command. This command forces sed to quit; but due to strange nature of the "q" command, sed also prints the contents of pattern space. As a result, only the first line gets printed.

46. Print the last 10 lines of a file (emulates "tail -10").

sed -e :a -e '$q;N;11,$D;ba'

This one-liner is tricky to explain. It always keeps the last 10 lines in pattern space and at the very last line of input it quits and prints them.

I'll try to explain it. The first "-e :a" creates a label called "a". The second "-e" does the following: "$q" - if it is the last line, quit and print the pattern space. If it is not the last line, execute three commands "N", "11,$D" and "ba". The "N" command reads the next line of input and appends it to the pattern space. The line gets separated from the rest of the pattern space by a new line character. The "11,$D" command executes the "D" command if the current line number is greater than or equal to 11 ("11,$" means from 11th line to end of file). The "D" command deletes the portion of pattern space up to the first new line character. The last command "ba" branches to a label named "a" (beginning of script). This guarantees that the pattern space never contains more than 10 lines, because as line 11 gets appended to pattern space, line 1 gets deleted, as line 12 gets appended line 2 gets deleted, etc.

47. Print the last 2 lines of a file (emulates "tail -2").

sed '$!N;$!D'

This one-liner is also tricky. First of all, the "$!" address restricts commands "N" and "D" to all the lines except the last line.

Notice how the addresses can be negated. If "$<command>" restricts a command to the last line, then "$!<command>" restricts the command to all but the last line. This can be applied to all restriction operations.

In this one-liner the "N" command reads the next line from input and appends it to pattern space. The "D" command deletes everything in pattern space up to the first "\n" symbol. These two commands always keep only the most recently read line in pattern space. When processing the second-to-last line, "N" gets executed and appends the last line to the pattern space. The "D" does not get executed as "N" consumed the last line. At this moment sed quits and prints out the last two lines of the file.

48. Print the last line of a file (emulates "tail -1").

sed '$!d'

This one-liner discards all the lines except the last one. The "d" command deletes the current pattern space, reads in the next line, and restarts the execution of commands from the first. In this case it just loops over itself like "dddd...ddd" until it hits the last line. At the last line no command is executed ("$!d" restricted execution of "d" to all the lines but last) and the pattern space gets printed.

Another way to do the same:

sed -n '$p'

The "-n" parameter suppresses automatic printing of pattern space. It means that without an explicit "p" command (or other commands that act directly on the output stream), sed is dead silent. The "p" command stands for "print" and it prints the pattern space. This one-liner calls the "p" command at the very last line of input. All the other lines are silently discarded.

49. Print next-to-the-last line of a file.

Eric gives three different one-liners to do this. The first one prints a blank line if the file contains just 1 line:

sed -e '$!{h;d;}' -e x

This one-liner executes the "h;d" commands for all the lines except the last one ("$!" restricts "h;d" commands to all lines except last). The "h" command puts the current line in hold buffer and "d" deletes the current line, and starts execution at the first sed command ("h;d" gets executed again, and again, ...). At every single line, that line gets copied to hold buffer. At the very last line "h;d" does not get executed. At this moment "x" gets a chance to execute. The "x" command exchanges the contents of hold buffer with pattern space. Remember that the previous line is still in the hold buffer. The "x" command puts it back in pattern space, and sed prints it! There you go, the next-to-last line was printed!

In case there is just 1 line in the file, only the "x" command gets executed. As the hold buffer initially is empty, "x" puts emptiness in pattern space (I use word "put" here but it actually exchanges the pattern space with hold space). Now sed prints the contents of pattern space, but it's empty, so sed prints out just a blank line.

The second prints the first line if the file contains just 1 line:

sed -e '1{$q;}' -e '$!{h;d;}' -e x

This sed-one liner is divided in two parts. The first part "1{$q;}" handles the case when the file contains just a single line. The second part "$!{h;d;} x" is exactly the same as in the previous one-liner! Thus, I need to explain just the first part.

The first part says - if it is the first line "1", then execute "$q". The "$q" command means - if it is the last line, then quit. What it effectively does is it quits if the first line is the last line (i.e. file contains just one line). Remember from one-liner #44 that before quitting sed prints the contents of pattern space. As a result, if the file contains just one line, sed prints it.

The third prints nothing for 1 line files:

sed -e '1{$d;}' -e '$!{h;d;}' -e x

This one-liner is again divided in two parts. The first part is "1{$d;}" and the second is exactly the same as in the previous two one-liners. I will explain just the first part.

The first part says - if it is the first line "1", then execute "$d". The "$d" command means - if it is the last line, then delete the pattern space and start all over again. In case the first line is the last (only one line in file), there is nothing more to be done and sed quits, printing nothing.

50. Print only the lines that match a regular expression (emulates "grep").

sed -n '/regexp/p'

This one-liner suppresses automatic printing of pattern space with the "-n" switch and makes use of "p" command to print only the lines that match "/regexp/". The lines that do not match this regex get silently discarded. The ones that match get printed. That's it.

Another one-liner that does the same:

sed '/regexp/!d'

This one-liner deletes all the lines that do not match "/regexp/". The other lines get printed by default. The "!" before "d" command inverts the line matching.

51. Print only the lines that do not match a regular expression (emulates "grep -v").

sed -n '/regexp/!p'

This one-liner is the inverse of the previous.

The "-n" prevents automatic printing of pattern space. The "/regexp/" restricts the "!p" command only to lines that match "/regexp/", but the "!" switch prevents "p" from acting on these lines. What happens is "p" acts on all lines that do not match "/regexp/", and they get "p"rinted.

sed '/regexp/d'

This one-liner is the inverse of the previous (#50).

This one-liner executed the "d" (delete) command on all lines that match "/regexp/", thus leaving only the lines that do not match. They get printed automatically.

52. Print the line immediately before regexp, but not the line containing the regexp.

sed -n '/regexp/{g;1!p;};h'

This one-liner saves each line in hold buffer with "h" command. If a line matches the regexp, the hold buffer (containing the previous line) gets copied to pattern space with "g" command and the pattern space gets printed out with "p" command. The "1!" restricts "p" not to print on the first line (as there are no lines before the first).

53. Print the line immediately after regexp, but not the line containing the regexp.

sed -n '/regexp/{n;p;}'

First of all, this one-liner disables automatic printing of pattern space with "-n" command line argument. Then, for all the lines that match "/regexp/", this one-liner executes "n" and "p" commands. The "n" command is the only command that depends on "-n" flag explicitly. If "-n" is specified it will empty the current pattern space and read in the next line of input. If "-n" is not specified, it will print out the current pattern space before emptying it. As in this one-liner "-n" is specified, the "n" command empties the pattern space, reads in the next line and then the "p" command prints that line out.

54. Print one line before and after regexp. Also print the line matching regexp and its line number. (emulates "grep -A1 -B1").

sed -n -e '/regexp/{=;x;1!p;g;$!N;p;D;}' -e h

First let's look at "h" command at the end of script. It gets executed on every line and stores the current line in pattern space in hold buffer. The idea of storing the current line in hold buffer is that if the next line matches "/regexp/" then the previous line is available in hold buffer.

Now let's look at the complicated "/regexp/{=;x;1!p;g;$!N;p;D;}" command. It gets executed only if the line matches "/regexp/". The first thing it does is it prints the current line number with "=" command. Then, it exchanges the hold buffer with pattern space by using the "x" command. As I explained, the "h" command at the end of the script makes sure that the hold buffer always contains the previous line. Now we have put it in the pattern space with "x" command. Next, if it's not the first line, "1!p" prints the pattern space, effectively printing the previous line. Now the "g" command gets executed. It copies the original line that was just exchanged with hold buffer back to pattern space. Now the "$!N" executes. If it is not the last line, "N" appends the next line to the current pattern space (and separates them with "\n" char). Pattern space now contains the line that matched "/regexp/" and the next line. The "p" command prints that. "D" deletes the current line (line that matched "/regexp/") from pattern space and finally "h" gets executed again, that puts the contents of pattern space into hold buffer. As "D" deleted the current line, the next line was put in hold buffer.

55. Grep for "AAA" and "BBB" and "CCC" in any order.

sed '/AAA/!d; /BBB/!d; /CCC/!d'

This one-liner inverts the "d" command to be executed on lines that do not contain either "AAA", "BBB" or "CCC". If a line does not contain one of them, it gets deleted and sed proceeds to the next line. Only if all three of the patterns are present, does the sed print the line.

56. Grep for "AAA" and "BBB" and "CCC" in that order.

sed '/AAA.*BBB.*CCC/!d'

This one-liner deletes lines that do not match regexp "/AAA.*BBB.*CCC/". For example, a line "AAAfooBBBbarCCC" will get printed but "AAAfooCCCbarBBB" baz will not.

It can also be written as:

sed -n '/AAA.*BBB.*CCC/p'

This one-liner prints lines that contain AAA...BBB...CCC in that order.

57. Grep for "AAA" or "BBB", or "CCC".

sed -e '/AAA/b' -e '/BBB/b' -e '/CCC/b' -e d

This one-liner uses the "b" command to branch to the end of the script if the line matches "AAA" or "BBB" or "CCC". At the end of the script the line gets implicitly printed. If the line does not match "AAA" or "BBB" or "CCC", the script reaches the "d" command that deletes the line.

gsed '/AAA\|BBB\|CCC/!d'

This one-liner works with GNU sed. GNU sed allows alternation operator | to be used to match separate things. It's a more compact way of saying match "AAA" or "BBB", or "CCC".

If you are using GNU sed, then there is actually no need to escape the pipes |. You may specify the "-r" command line option to use extended regular expressions. This way this one liner becomes:

gsed -r '/AAA|BBB|CCC/!d'

or

gsed -rn '/AAA|BBB|CCC/p'

58. Print a paragraph that contains "AAA". (Paragraphs are separated by blank lines).

sed -e '/./{H;$!d;}' -e 'x;/AAA/!d;'

First notice that this one-liner is divided in two parts for clearness. The first part is "/./{H;$!d;}" and the second part is "x;/AAA/!d".

The first part has an interesting pattern match "/./". What do you think it does? Well, a line separating paragraphs would be a blank line, meaning it would not have any characters in it. This pattern matches only the lines that are not separating paragraphs. These lines get appended to hold buffer with "H" command. They also get prevented from printing with "d" command (except for the last line, when "d" does not get executed ("$!" restricts "d" to all but the last line)). Once sed sees a blank line, the "/./" pattern no longer matches and the second part of one-liner gets executed.

The second part exchanges the hold buffer with pattern space by using the "x" command. The pattern space now contains the whole paragraph of text. Next sed tests if the paragraph contains "AAA". If it does, sed does nothing which results in printing the paragraph. If the paragraph does not contain "AAA", sed executes the "d" command that deletes it without printing and restarts execution at first command.

59. Print a paragraph if it contains "AAA" and "BBB" and "CCC" in any order.

sed -e '/./{H;$!d;}' -e 'x;/AAA/!d;/BBB/!d;/CCC/!d'

This one-liner is also split in two parts for clarity. The first part is exactly the same as the first part of previous one-liner. The second part is very similar to one-liner #55 and also the previous.

The "x" command in the 2nd part does exactly the same as in previous one-liner, it exchanges the hold buffer, that contains the paragraph with pattern space. Next sed does three tests - it tests if the paragraph contains "AAA", "BBB" and "CCC". If the paragraph does not contain even one of them, the "d" command gets executed that purges the paragraph. If it contains all three patterns, sed happily prints the paragraph.

60. Print a paragraph if it contains "AAA" or "BBB" or "CCC".

sed -e '/./{H;$!d;}' -e 'x;/AAA/b' -e '/BBB/b' -e '/CCC/b' -e d

The first part is exactly the same as in previous two one-liners and does not require explanation. The second part that happens to be "-e 'x;/AAA/b' -e '/BBB/b' -e '/CCC/b' -e d" is almost exactly the same as in one-liner #57.

The "x" command exchanges the paragraph stored in hold buffer with the pattern space. Then it tests if the pattern space (paragraph) contains "AAA", if it does, sed branches to end of script with "b" command, that happily makes sed print the paragraph. If "AAA" did not match, sed does exactly the same testing for pattern "BBB". If it again did not match, it tests for "CCC". If none of these patterns were found, sed executes the "d" command that deletes everything and restarts this one-liner.

Here is another way to do the same with GNU sed:

gsed '/./{H;$!d;};x;/AAA\|BBB\|CCC/b;d'

This one-liner is exactly the same as previous one. It just compresses the three tests for "AAA", "BBB" or "CCC" into one "/AAA\|BBB\|CCC/" as explained in one-liner #57.

61. Print only the lines that are 65 characters in length or more.

sed -n '/^.\{65\}/p'

This one-liner prints lines that are 65 characters in length or more. It does it by using a regular expression "^.{65}" that matches any 65 characters at the beginning of line. If there are less than 65 characters, the regex does not match and the line does not get printed (as automatic printing was disabled with "-n" command line option).

62. Print only the lines that are less than 65 chars.

sed -n '/^.\{65\}/!p'

This one-liner inverts the previous one. If the line matches 65 characters, then it is not printed "!p". If it does not match, it gets printed.

Another way to do the same:

sed '/^.\{65\}/d'

This one-liner deletes all lines that match 65 characters. All others implicitly get printed.

63. Print section of a file from a regex to end of file.

sed -n '/regexp/,$p'

This one-liner uses a tricky range match "/regex/,$". It matches lines starting from the first line that matches "/regex/" to the end of file "$". The "p" command prints these lines. All other lines get silently discarded.

64. Print lines 8-12 (inclusive) of a file.

sed -n '8,12p'

This is another type of range match. This range matches a section of lines between two lines numbers (inclusive). In this case it's lines [8 to 12].

sed '8,12!d'

This is the same one-liner, just written differently. It deletes lines that are outside of range [8, 12] and prints those in this range.

65. Print line number 52.

sed -n '52p'

This one-liner restricts the "p" command to line "52". Only this line gets "p"rinted.

sed '52!d'

This one-liner deletes all lines except line 52. Line 52 gets printed.

sed '52q;d'

This one is the smartest. It quits at line 52 with "q" command. The previous two one-liners would loop over all the remaining lines and do nothing. Remember from one-liner #44 that quit command prints the pattern space with it. The "d" command makes sure that no other line gets printed while sed gets to line 52.

66. Beginning at line 3, print every 7th line.

gsed -n '3~7p'

This one-liner uses a line range match extension of GNU sed. A line range in format "first~step" matches every step'th line starting from first. In this one-liner it's "3~7", meaning match every 7th line starting from 3rd. The "-n" flag prevents printing any other lines, and "p" in "3~7p" prints the matched line.

For everyone else, this one-liner works:

sed -n '3,${p;n;n;n;n;n;n;}'

This one-liner executes commands "p;n;n;n;n;n;n" for lines starting the 3rd line. The "3,$" is a line range match that restricts commands by line numbers. The "$" means end of file and "3" means 3rd line.

The "p;n;n;n;n;n;n" command prints the line, then skips 6, prints the 7th, skips 6, prints the 14th, etc. As it starts executing at line 3, the effect is - print line 3, skip 6, print line 10, skip 6, print line 17, .... That is, print every 7th line beginning at 3rd.

67. Print section of lines between two regular expressions (inclusive).

sed -n '/Iowa/,/Montana/p'

This one-liner prints all the lines between the first line that matches a regular expression "Iowa" and the first line that matches a regular expression "Montana".

It uses a range match "/start/,/finish/" that matches all lines starting from a line that matches "start" and ending with the first line that matches "finish".

An Important Comment About Ranges!

I have an important comment about ranges. Ranges in form "/start/,/finish/" always match 2 lines or more. If "/finish/" is on the same line as "/start/" it will not work. Please see the Sed FAQ 3.3 for more details.

Sed One-Liners Explained E-Book

I have written an e-book called "Sed One-Liners Explained". I improved the explanations of the one-liners in this article series, added new one-liners and added three new chapters - an introduction to sed, a summary of sed addresses and ranges, and debugging sed scripts with sed-sed. Please take a look:

Have Fun!

Have fun with sed, the superman of Unix text stream editing!

If you liked this article, you may also like a very similar article on Famous Awk One-Liners Explained.

Ps. If you notice anything that you can't understand, please let me know in the comments. Thanks!

This article is part of the article series "Vim Plugins You Should Know About."
<- previous article next article ->

Vim Plugins, surround.vimHere comes the second post in the article series "Vim Plugins You Should Know About". This time I am going to introduce you to a plugin called "repeat.vim".

Repeat.vim fixes an important functionality problem in the surround.vim plugin that I wrote about last week. The problem with surround.vim lies in the repeat command "." (dot). If you had applied a surrounding and wanted to repeat it with the "." command, it wouldn't work. This plugin fixes this problem.

So basically, whenever you install surround.vim, you also want to install repeat.vim with it.

There is one catch, though. It does not repeat visual or "ys" commands. Luckily, the "ys" commands can be often be substituted with "cs" commands. For example, if you wanted to do several "ysw"" (wrap a word in quotes), you may type "csw"" and then use "." commands to repeat. The only way to repeat visual commands is to record a macro.

Here is an example usage of the repeat.vim script. Suppose you had typed a sentence and you wanted to wrap all the words in quotes:

|foo bar baz quux muux woox

(| is cursor)

Type csw":

|"foo" bar baz quux muux woox

Now press W.

"foo" |"bar" baz quux muux woox

W moved to the next word and . repeated the wrapping command.

Now do the same 4 more times, and you have the whole line wrapped:

"foo" "bar" "baz" "quux" "muux" "woox"

How to install repeat.vim?

  • 1. Download repeat.vim to ~/.vim/plugin (on Unix/Linux), or ~\vimfiles\plugin (Windows).
  • 2. Restart Vim or source repeat.vim with ":so ~/.vim/plugin/repeat.vim" on Unix or ":so ~/vimfiles/plugin/repeat.vim" on Windows).

Have Fun!

Have fun with surround.vim + repeat.vim. I'll write about a much more life-changing plugin the next time. :)

This article is part of the article series "Awk One-Liners Explained."
<- previous article next article ->

awk programming one-liners explainedThis is the second part of a three-part article on the famous Awk one-liners. This part will explain Awk one-liners for text conversion and substitution. See part one for introduction of the series.

"What are these famous Awk one-liners?", you might wonder? Well, they are concise and beautiful Awk programs that span no more than 70 characters (less than one terminal line). They were written by Eric Pement and are floating around on the Internet as 'awk1line.txt' file.

If you are intrigued by this article series, I suggest that you subscribe to my posts!

Eric Pement's Awk one-liner collection consists of five sections:

I recommend that you print out my Awk Cheat Sheet before you proceed. This way you will have the language reference in front of you, and you will memorize it better.

Awesome news: I have written an e-book based on this article series. Check it out:

Grab the local copy of Awk one-liners file here awk1line.txt and let's roll.

3. Text Conversion and Substitution

21. Convert Windows/DOS newlines (CRLF) to Unix newlines (LF) from Unix.

awk '{ sub(/\r$/,""); print }'

This one-liner uses the sub(regex, repl, [string]) function. This function substitutes the first instance of regular expression "regex" in string "string" with the string "repl". If "string" is omitted, variable $0 is used. Variable $0, as I explained in the first part of the article, contains the entire line.

The one-liner replaces '\r' (CR) character at the end of the line with nothing, i.e., erases CR at the end. Print statement prints out the line and appends ORS variable, which is '\n' by default. Thus, a line ending with CRLF has been converted to a line ending with LF.

22. Convert Unix newlines (LF) to Windows/DOS newlines (CRLF) from Unix.

awk '{ sub(/$/,"\r"); print }'

This one-liner also uses the sub() function. This time it replaces the zero-width anchor '$' at the end of the line with a '\r' (CR char). This substitution actually adds a CR character to the end of the line. After doing that Awk prints out the line and appends the ORS, making the line terminate with CRLF.

23. Convert Unix newlines (LF) to Windows/DOS newlines (CRLF) from Windows/DOS.

awk 1

This one-liner may work, or it may not. It depends on the implementation. If the implementation catches the Unix newlines in the file, then it will read the file line by line correctly and output the lines terminated with CRLF. If it does not understand Unix LF's in the file, then it will print the whole file and terminate it with CRLF (single windows newline at the end of the whole file).

Ps. Statement '1' (or anything that evaluates to true) in Awk is syntactic sugar for '{ print }'.

24. Convert Windows/DOS newlines (CRLF) to Unix newlines (LF) from Windows/DOS

gawk -v BINMODE="w" '1'

Theoretically this one-liner should convert CRLFs to LFs on DOS. There is a note in GNU Awk documentation that says: "Under DOS, gawk (and many other text programs) silently translates end-of-line "\r\n" to "\n" on input and "\n" to "\r\n" on output. A special "BINMODE" variable allows control over these translations and is interpreted as follows: ... If "BINMODE" is "w", then binary mode is set on write (i.e., no translations on writes)."

My tests revealed that no translation was done, so you can't rely on this BINMODE hack.

Eric suggests to better use the "tr" utility to convert CRLFs to LFs on Windows:

tr -d \r

The 'tr' program is used for translating one set of characters to another. Specifying -d option makes it delete all characters and not do any translation. In this case it's the '\r' (CR) character that gets erased from the input. Thus, CRLFs become just LFs.

25. Delete leading whitespace (spaces and tabs) from the beginning of each line (ltrim).

awk '{ sub(/^[ \t]+/, ""); print }'

This one-liner also uses sub() function. What it does is replace regular expression "^[ \t]+" with nothing "". The regular expression "^[ \t]+" means - match one or more space " " or a tab "\t" at the beginning "^" of the string.

26. Delete trailing whitespace (spaces and tabs) from the end of each line (rtrim).

awk '{ sub(/[ \t]+$/, ""); print }'

This one-liner is very similar to the previous one. It replaces regular expression "[ \t]+$" with nothing. The regular expression "[ \t]+$" means - match one or more space " " or a tab "\t" at the end "$" of the string. The "+" means "one or more".

27. Delete both leading and trailing whitespaces from each line (trim).

awk '{ gsub(/^[ \t]+|[ \t]+$/, ""); print }'

This one-liner uses a new function called "gsub". Gsub() does the same as sub(), except it performs as many substitutions as possible (that is, it's a global sub()). For example, given a variable f = "foo", sub("o", "x", f) would replace just one "o" in variable f with "x", making f be "fxo"; but gsub("o", "x", f) would replace both "o"s in "foo" resulting "fxx".

The one-liner combines both previous one-liners - it replaces leading whitespace "^[ \t]+" and trailing whitespace "[ \t]+$" with nothing, thus trimming the string.

To remove whitespace between fields you may use this one-liner:

awk '{ $1=$1; print }'

This is a pretty tricky one-liner. It seems to do nothing, right? Assign $1 to $1. But no, when you change a field, Awk rebuilds the $0 variable. It takes all the fields and concats them, separated by OFS (single space by default). All the whitespace between fields is gone.

28. Insert 5 blank spaces at beginning of each line.

awk '{ sub(/^/, "     "); print }'

This one-liner substitutes the zero-length beginning of line anchor "^" with five empty spaces. As the anchor is zero-length and matches the beginning of line, the five whitespace characters get appended to beginning of the line.

29. Align all text flush right on a 79-column width.

awk '{ printf "%79s\n", $0 }' 

This one-liner asks printf() to print the string in $0 variable and left pad it with spaces until the total length is 79 chars.

Please see the documentation of printf function for more information and examples.

30. Center all text on a 79-character width.

awk '{ l=length(); s=int((79-l)/2); printf "%"(s+l)"s\n", $0 }'

First this one-liner calculates the length() of the line and puts the result in variable "l". Length(var) function returns the string length of var. If the variable is not specified, it returns the length of the entire line (variable $0). Next it calculates how many white space characters to pad the line with and stores the result in variable "s". Finally it printf()s the line with appropriate number of whitespace chars.

For example, when printing a string "foo", it first calculates the length of "foo" which is 3. Next it calculates the column "foo" should appear which (79-3)/2 = 38. Finally it printf("%41", "foo"). Printf() function outputs 38 spaces and then "foo", making that string centered (38*2 + 3 = 79)

31. Substitute (find and replace) "foo" with "bar" on each line.

awk '{ sub(/foo/,"bar"); print }'

This one-liner is very similar to the others we have seen before. It uses the sub() function to replace "foo" with "bar". Please note that it replaces just the first match. To replace all "foo"s with "bar"s use the gsub() function:

awk '{ gsub(/foo/,"bar"); print }'

Another way is to use the gensub() function:

gawk '{ $0 = gensub(/foo/,"bar",4); print }'

This one-liner replaces only the 4th match of "foo" with "bar". It uses a never before seen gensub() function. The prototype of this function is gensub(regex, s, h[, t]). It searches the string "t" for "regex" and replaces "h"-th match with "s". If "t" is not given, $0 is assumed. Unlike sub() and gsub() it returns the modified string "t" (sub and gsub modified the string in-place).

Gensub() is a non-standard function and requires GNU Awk or Awk included in NetBSD.

In this one-liner regex = "/foo/", s = "bar", h = 4, and t = $0. It replaces the 4th instance of "foo" with "bar" and assigns the new string back to the whole line $0.

32. Substitute "foo" with "bar" only on lines that contain "baz".

awk '/baz/ { gsub(/foo/, "bar") }; { print }'

As I explained in the first one-liner in the first part of the article, every Awk program consists of a sequence of pattern-action statements "pattern { action statements }". Action statements are applied only to lines that match pattern.

In this one-liner the pattern is a regular expression /baz/. If line contains "baz", the action statement gsub(/foo/, "bar") is executed. And as we have learned, it substitutes all instances of "foo" with "bar". If you want to substitute just one, use the sub() function!

33. Substitute "foo" with "bar" only on lines that do not contain "baz".

awk '!/baz/ { gsub(/foo/, "bar") }; { print }'

This one-liner negates the pattern /baz/. It works exactly the same way as the previous one, except it operates on lines that do not contain match this pattern.

34. Change "scarlet" or "ruby" or "puce" to "red".

awk '{ gsub(/scarlet|ruby|puce/, "red"); print}'

This one-liner makes use of extended regular expression alternation operator | (pipe). The regular expression /scarlet|ruby|puce/ says: match "scarlet" or "ruby" or "puce". If the line matches, gsub() replaces all the matches with "red".

35. Reverse order of lines (emulate "tac").

awk '{ a[i++] = $0 } END { for (j=i-1; j>=0;) print a[j--] }'

This is the trickiest one-liner today. It starts by recording all the lines in the array "a". For example, if the input to this program was three lines "foo", "bar", and "baz", then the array "a" would contain the following values: a[0] = "foo", a[1] = "bar", and a[2] = "baz".

When the program has finished processing all lines, Awk executes the END { } block. The END block loops over the elements in the array "a" and prints the recorded lines. In our example with "foo", "bar", "baz" the END block does the following:

for (j = 2; j >= 0; ) print a[j--]

First it prints out j[2], then j[1] and then j[0]. The output is three separate lines "baz", "bar" and "foo". As you can see the input was reversed.

36. Join a line ending with a backslash with the next line.

awk '/\\$/ { sub(/\\$/,""); getline t; print $0 t; next }; 1'

This one-liner uses regular expression "/\\$/" to look for lines ending with a backslash. If the line ends with a backslash, the backslash gets removed by sub(/\\$/,"") function. Then the "getline t" function is executed. "Getline t" reads the next line from input and stores it in variable t. "Print $0 t" statement prints the original line (but with trailing backslash removed) and the newly read line (which was stored in variable t). Awk then continues with the next line. If the line does not end with a backslash, Awk just prints it out with "1".

Unfortunately this one liner fails to join more than 2 lines (this is left as an exercise to the reader to come up with a one-liner that joins arbitrary number of lines that end with backslash :)).

37. Print and sort the login names of all users.

awk -F ":" '{ print $1 | "sort" }' /etc/passwd

This is the first time we see the -F argument passed to Awk. This argument specifies a character, a string or a regular expression that will be used to split the line into fields ($1, $2, ...). For example, if the line is "foo-bar-baz" and -F is "-", then the line will be split into three fields: $1 = "foo", $2 = "bar" and $3 = "baz". If -F is not set to anything, the line will contain just one field $1 = "foo-bar-baz".

Specifying -F is the same as setting the FS (Field Separator) variable in the BEGIN block of Awk program:

awk -F ":"
# is the same as
awk 'BEGIN { FS=":" }'

/etc/passwd is a text file, that contains a list of the system's accounts, along with some useful information like login name, user ID, group ID, home directory, shell, etc. The entries in the file are separated by a colon ":".

Here is an example of a line from /etc/passwd file:

pkrumins:x:1000:100:Peteris Krumins:/home/pkrumins:/bin/bash

If we split this line on ":", the first field is the username (pkrumins in this example). The one-liner does just that - it splits the line on ":", then forks the "sort" program and feeds it all the usernames, one by one. After Awk has finished processing the input, sort program sorts the usernames and outputs them.

38. Print the first two fields in reverse order on each line.

awk '{ print $2, $1 }' file

This one liner is obvious. It reverses the order of fields $1 and $2. For example, if the input line is "foo bar", then after running this program the output will be "bar foo".

39. Swap first field with second on every line.

awk '{ temp = $1; $1 = $2; $2 = temp; print }'

This one-liner uses a temporary variable called "temp". It assigns the first field $1 to "temp", then it assigns the second field to the first field and finally it assigns "temp" to $2. This procedure swaps the first two fields on every line. For example, if the input is "foo bar baz", then the output will be "bar foo baz".

Ps. This one-liner was incorrect in Eric's awk1line.txt file. "Print" was missing.

40. Delete the second field on each line.

awk '{ $2 = ""; print }'

This one liner just assigns empty string to the second field. It's gone.

41. Print the fields in reverse order on every line.

awk '{ for (i=NF; i>0; i--) printf("%s ", $i); printf ("\n") }'

We saw the "NF" variable that stands for Number of Fields in the part one of this article. After processing each line, Awk sets the NF variable to number of fields found on that line.

This one-liner loops in reverse order starting from NF to 1 and outputs the fields one by one. It starts with field $NF, then $(NF-1), ..., $1. After that it prints a newline character.

42. Remove duplicate, consecutive lines (emulate "uniq")

awk 'a !~ $0; { a = $0 }'

Variables in Awk don't need to be initialized or declared before they are being used. They come into existence the first time they are used. This one-liner uses variable "a" to keep the last line seen "{ a = $0 }". Upon reading the next line, it compares if the previous line (in variable "a") is not the same as the current one "a !~ $0". If it is not the same, the expression evaluates to 1 (true), and as I explained earlier, any true expression is the same as "{ print }", so the line gets printed out. Then the program saves the current line in variable "a" again and the same process continues over and over again.

This one-liner is actually incorrect. It uses a regular expression matching operator "!~". If the previous line was something like "fooz" and the new one is "foo", then it won't get output, even though they are not duplicate lines.

Here is the correct, fixed, one-liner:

awk 'a != $0; { a = $0 }'

It compares lines line-wise and not as a regular expression.

43. Remove duplicate, nonconsecutive lines.

awk '!a[$0]++'

This one-liner is very idiomatic. It registers the lines seen in the associative-array "a" (arrays are always associative in Awk) and at the same time tests if it had seen the line before. If it had seen the line before, then a[line] > 0 and !a[line] == 0. Any expression that evaluates to false is a no-op, and any expression that evals to true is equal to "{ print }".

For example, suppose the input is:

foo
bar
foo
baz

When Awk sees the first "foo", it evaluates the expression "!a["foo"]++". "a["foo"]" is false, but "!a["foo"]" is true - Awk prints out "foo". Then it increments "a["foo"]" by one with "++" post-increment operator. Array "a" now contains one value "a["foo"] == 1".

Next Awk sees "bar", it does exactly the same what it did to "foo" and prints out "bar". Array "a" now contains two values "a["foo"] == 1" and "a["bar"] == 1".

Now Awk sees the second "foo". This time "a["foo"]" is true, "!a["foo"]" is false and Awk does not print anything! Array "a" still contains two values "a["foo"] == 2" and "a["bar"] == 1".

Finally Awk sees "baz" and prints it out because "!a["baz"]" is true. Array "a" now contains three values "a["foo"] == 2" and "a["bar"] == 1" and "a["baz"] == 1".

The output:

foo
bar
baz

Here is another one-liner to do the same. Eric in his one-liners says it's the most efficient way to do it.

awk '!($0 in a) { a[$0]; print }'

It's basically the same as previous one, except that it uses the 'in' operator. Given an array "a", an expression "foo in a" tests if variable "foo" is in "a".

Note that an empty statement "a[$0]" creates an element in the array.

44. Concatenate every 5 lines of input with a comma.

awk 'ORS=NR%5?",":"\n"'

We saw the ORS variable in part one of the article. This variable gets appended after every line that gets output. In this one-liner it gets changed on every 5th line from a comma to a newline. For lines 1, 2, 3, 4 it's a comma, for line 5 it's a newline, for lines 6, 7, 8, 9 it's a comma, for line 10 a newline, etc.

Awk one-liners explained e-book

I have written my first e-book called "Awk One-Liners Explained". I improved the explanations of the one-liners in this article series, added new one-liners and added three new chapters - introduction to awk one-liners, summary of awk special variables and idiomatic awk. Please take a look:

Have Fun!

Have fun with these one-liners. I hope you learned something new.

If you liked this article, you may also like a very similar article on Famous Sed One-Liners Explained.

Ps. If you notice anything that you can't understand, please let me know in the comments. Thanks!

This article is part of the article series "Vim Plugins You Should Know About."
<- previous article next article ->

Vim Plugins, surround.vimHi all. I am starting a new article series called "Vim Plugins You Should Know About". This series of articles is going to be about Vim plugins that you should know about and perhaps even be using. The first article in this series will be about one of my favorite plugins called "surround.vim".

If you are intrigued by this article series, I suggest that you subscribe to my posts!

Don't worry, I'll finish all the previous article series that I have started (creating and marketing a video downloading tool, famous awk one-liners explained, sed one-liners explained and mit algorithms). I just wanted to utilize my excitement and get this article published ASAP.

Okay, so what is surround.vim plugin? Here is what Tim Pope, the author of this plugin, says:

Surround.vim is all about "surroundings": parentheses, brackets, quotes, XML tags, and more. The plugin provides mappings to easily delete, change and add such surroundings in pairs.

That's what the plugin does. It allows you to easily delete, change and add various "surroundings" in pairs. For example, you may quickly wrap a line in an html tag, or you may delete a pair of { }, or you may add quotes around words, etc.

Let's take a look at a few examples for each surrounding command: delete surrounding, change surrounding, and add surrounding. In the examples | indicates the position of the cursor.

Examples of deleting surroundings:

Surroundings can be deleted with the "ds" command. After you type "ds", the command expects the surrounding target you want to delete. It may be any of three quote marks, ', ", and `, the eight punctuation marks, (, ), {, }, [, ], <, > , and a special 't' target for deleting the innermost HTML tag.

Text              Command    New Text
---------------   -------    -----------
'Hello W|orld'    ds'        Hello World
(12|3+4*56)/2     ds(        123+4*56/2
<div>fo|o</div>   dst        foo

(| is the position of cursor in these examples)

See how easy it was to delete surroundings? Just 3 keystrokes. Compare it with doing it old fashioned way:

Text              Command    New Text
---------------   -------    -----------
'Hello W|orld'    F'x,x      Hello World
(12|3+4*56)/2     Bxf)x      123+4*56/2
<div>fo|o</div>   Bdf>wdf>   foo

(| is the position of cursor in these examples)

That was really messy, wasn't it? I hope you start to see now how handy surround.vim is!

Examples of changing surroundings:

Surroundings can be changed with the "cs" command. The "cs" command, just like "ds" command takes a surrounding target, and it also takes the surrounding replacement. There are a few more surrounding targets for this command. They are w for word, W for word + skip punctuation, s for sentence, and p for paragraph.

Text              Command    New Text
---------------   -------    -----------
"Hello |world!"   cs"'       'Hello world!'
"Hello |world!"   cs"<q>     <q>Hello world!</q>
(123+4|56)/2      cs)]       [123+456]/2
(123+4|56)/2      cs)[       [ 123+456 ]/2
<div>foo|</div>   cst<p>     <p>foo</p>
fo|o!             csw'       'foo'!
fo|o!             csW'       'foo!'

(| is the position of cursor in these examples)

Examples of adding surroundings:

Surroundings can be added with the same "cs" command, which takes a surrounding target, or with the "ys" command that takes a valid vim motion. There is a special "yss" command that applies a surrounding to the whole line, and "ySS" that applies the surrounding to the whole line, places the text on a new line and indents it.

Text              Command      New Text
---------------   -------      -----------
Hello w|orld!     ysiw)        Hello (world)!
Hello w|orld!     csw)         Hello (world)!
fo|o              ysiwt<html>  <html>foo</html>
foo quu|x baz     yss"         "foo quux baz"
foo quu|x baz     ySS"         "
                                   foo quux baz
                               "

(| is the position of cursor in these examples)

You can also add surroundings while in insert mode. That is supported via <CTRL-s> mapping.

Please be very careful with CTRL-s. On many terminals it stops the output and your session will appear to be frozen! If that happens, use CTRL-q to unfreeze it. You may want to remove CTRL+s mapping from your terminal altogether. Add "stty stop ''" to your shell startup file (.bashrc for bash, .kshrc for ksh, etc).

For example (while in insert mode):

Command                  New Text
-------                  ------------
<CTRL-s>"                ""
<CTRL-s><CTRL-s><html>   <html>
                             |
                         </html>

(| is the position of cursor in these examples)

I urge you to try these mappings out. You won't become a hockey player by just looking at the game, you have to play it yourself!

For a complete reference, here are all the commands/mappings surround.vim provides:

Normal mode
-----------
ds  - delete a surrounding
cs  - change a surrounding
ys  - add a surrounding
yS  - add a surrounding and place the surrounded text on a new line + indent it
yss - add a surrounding to the whole line
ySs - add a surrounding to the whole line, place it on a new line + indent it
ySS - same as ySs

Visual mode
-----------
s   - in visual mode, add a surrounding
S   - in visual mode, add a surrounding but place text on new line + indent it

Insert mode
-----------
<CTRL-s> - in insert mode, add a surrounding
<CTRL-s><CTRL-s> - in insert mode, add a new line + surrounding + indent
<CTRL-g>s - same as <CTRL-s>
<CTRL-g>S - same as <CTRL-s><CTRL-s>

How to install surround.vim?

  • 1. Download surround.zip. It will actually be a zip file containing the surround.vim script and documentation.
  • 2. Extract surround.zip to ~/.vim (on Unix/Linux), or ~\vimfiles (Windows).
  • 3. Restart Vim (or just source surround.vim script with ":so ~/.vim/plugin/surround.vim" on Unix or ":so ~/vimfiles/plugin/surround.vim" on Windows).
  • 4. Regenerate helptags with ":helptags ~/.vim/doc" on Unix or ":helptags ~/vimfiles/doc" on Windoze.

Type ":help surround" to read help.

Comments appreciated!

What plugins do you use? Your comments will help me and my blog readers to find more plugins which we might have not heard about. You may also share any tips and aha-moments that you have had while using vim. I might add my own and turn them into a valuable article. Thanks!

This article is part of the article series "MIT Introduction to Algorithms."
<- previous article next article ->

MIT AlgorithmsThis is the eleventh post in an article series about MIT's lecture course "Introduction to Algorithms." In this post I will review lecture sixteen, which introduces the concept of Greedy Algorithms, reviews Graphs and applies the greedy Prim's Algorithm to the Minimum Spanning Tree (MST) Problem.

The previous lecture introduced dynamic programming. Dynamic programming was used for finding solutions to optimization problems. In such problems there can be many possible solutions. Each solution has a value, and we wish to find a solution with the optimal (minimum or maximum) value. Greedy algorithms are another set of methods for finding optimal solution. A greedy algorithm always makes the choice that looks best at the moment. That is, it makes a locally optimal choice in the hope that this choice will lead to a globally optimal solution. Greedy algorithms do not always yield optimal solutions, but for many problems they do. In this lecture it is shown that a greedy algorithm gives an optimal solution to the MST problem.

Lecture sixteen starts with a review of graphs. It reminds that a graph G is a pair (V, E), where V is a set of vertices (nodes) and E ⊆ VxV is a set of edges. If E contains ordered pairs, the graph is directed (also called a digraph), otherwise it's undirected. It also gives facts, such as, for all graphs, |E| = O(|V|2), and if a graph is connected then |E| >= |V| - 1. It also mentions handshaking lemma - sum over all vertices of degree of the vertex = 2|E| (Σv∈Vdegree(v) = 2|E|).

The lecture continues with graph representations. Graphs are commonly stored as adjacency matrices or as adjacency lists. Adjacency matrix of a graph G = (V, E) is an nxn matrix A with elements aij = 1, if edge (i, j) ∈ E and aij = 0, if edge (i, j) ∉ E.

Here is an example of adjacency matrix for a digraph G = ({1, 2, 3, 4}, {{1, 2}, {1, 3}, {2, 3}, {4, 3}}):

Adjacency matrix example

You can see that the element aij of matrix A is 1 if there is an edge from i to j, and the element is 0 if there is no edge from i to j (i is row, j is column). The storage space required by this representation is O(|V|2). For dense graphs this is ok, but for sparse it's not ok.

Adjacency list of a given vertex v ∈ V is the list Adj[v] of vertices adjacent to v. Here is an example of adjacency list for the same graph:

Adjacency list example

Storage required by this representation is O(V+E).

The lecture progresses with the problem of minimum spanning trees (MSTs). It's stated as following: given a connected, undirected graph G = (V, E) with edge weight function w: E -> R, find a spanning tree that connects all the vertices of minimum weight.

Here is an example of a MST. The MST of the graph is colored in green. The MST connects all the vertices of the graph so that the weight of edges is minimum.

Minimum Spanning Tree

Analyzing the MST problem in more detail, it is appears that MST contains optimal substructure property - if we take a subgraph and look at its MST, that MST will be a part of the MST of the whole graph. This is dynamic programming hallmark #1 (see previous lecture). It also appears that the problem of finding an MST contains overlapping subproblems, which is dynamic programming hallmark #2. Hallmarks #1 and #2 suggest that we could use a dynamic programming algorithm to find a MST of a graph. Surprisingly, there is something more powerful than dynamic programming algorithm for this problem - a greedy algorithm.

There is a hallmark for greedy algorithms:

  • Greedy choice property - a locally optimal choice is globally optimal.

Lecture continues with a theorem and a proof that a greedy local choice for a MST is globally optimal choice. This is the key idea behind Prim's algorithm for finding a minimum spanning tree of a graph. The idea in the algorithm is to maintain a priority queue Q (keyed on minimum weight of edge connected to a vertex) of vertices that are not yet connected in the MST. At each step the algorithm takes a vertex from Q and goes over all edges, connects the edge with minimum weight to MST and updates keys of neighbor vertices (please see the lecture or lecture notes for the exact algorithm and an example).

The lecture ends with running time analysis of Prim's algorithm. As it uses a priority queue, the running time depends on implementation of the priority queue used. For example, if we use a dumb array as a priority queue, the running time is O(V2), if we use a binary heap, the running time os O(E·lg(V)), if we use a Fibonacci heap, the running time is O(E + V·lg(V)).

Another well known algorithm for MSTs is Kruskal's algorithm (runs in O(E·lg(V)), which uses a disjoint-set data structure, that is not covered in this course (but is covered in Chapter 21 of the book this course is taught from).

The best known algorithm to date for finding MSTs is by David R. Karger (MIT), Philip N. Klein (Brown University) and Robert E. Tarjan (Princeton University). This algorithm was found in 1993, it's randomized in nature and runs in O(E + V) expected time. Here is the publication that presents this algorithm: "A Randomized Linear-Time Algorithm for Finding Minimum Spanning Trees".

You're welcome to watch lecture sixteen:

Topics covered in lecture sixteen:

  • [00:10] Review of graphs.
  • [00:50] Digraph - directed graph.
  • [01:50] Undirected graph.
  • [02:40] Bound of edges in a graph |E| = O(V2).
  • [04:10] Connected graph edges |E| >= |V| - 1.
  • [05:50] Graph representations.
  • [06:15] Adjacency matrix graph representation.
  • [08:20] Example of adjacency matrix of a graph.
  • [09:50] Storage required for an adjacency matrix O(V2). Dense representation.
  • [12:40] Adjecency list graph representation.
  • [13:40] Example of adjecancy list of a graph.
  • [15:50] Handshaking lemma (theorem) for undirected graphs.
  • [18:20] Storage required for adj. list representation O(E + V).
  • [23:30] Minimum spanning trees.
  • [24:20] Definition of MST problem.
  • [28:45] Example of a graph and its minimum spanning tree.
  • [34:50] Optimal substructure property of MST (dynamic programming hallmark #1).
  • [38:10] Theorem that a subgraph's MST is part of whole graph's MST.
  • [42:40] Cut and paste argument proof.
  • [45:40] Overlapping subproblems of MST (dynamic programming hallmark #2).
  • [48:20] Hallmark for greedy algorithms: The greedy choice property - a locally optimal choice is globally optimal.
  • [50:25] Theorem and proof that a local choice of least weight edge is globally optimal for MSTs.
  • [01:01:10] Prim's algorithm for minimum spanning trees.
  • [01:01:50] Key idea of Prim's algorithm.
  • [01:02:55] Pseudocode of Prim's algorithm.
  • [01:07:40] Example of running Prim's algorithm.
  • [01:15:03] Analysis of Prim's algorithm running time.
  • [01:18:20] Running time if we choose array, binary heap and Fibonaci heap.
  • [01:22:40] Kruskal's algorithm.
  • [01:23:00] The best algorithm for MST to date.

Lecture sixteen notes:

MIT Algorithms Lecture 16 Notes Thumbnail. Page 1 of 2.
Lecture 16, page 1 of 2: graphs, graph representations, adjacency lists, minimum spanning trees, cut and paste argument.

MIT Algorithms Lecture 16 Notes Thumbnail. Page 2 of 2.
Lecture 16, page 2 of 2: greedy choice property, prim's algorithm, idea of kruskal's algorithm.

Have fun with greedy programming! The next post will be a trilogy of graph and shortest paths algorithms - Dijkstra's algorithm, Breadth-first search, Bellman-Ford algorithm, Floyd-Warshall algorithm and Johnson's algorithm!

PS. This course is taught from the CLRS book (also called "Introduction to Algorithms"). Chapter 16 is called "Greedy Algorithms". It explains an activity-selection problem, elements of greedy algorithms, Huffman codes, Matroid theory, and task scheduling problem. Graphs are reviewed in Appendix B of the book. If these topics excite you, you may want to buy this awesome book: