This article is part of the article series "Awk One-Liners Explained."
<- previous article next article ->

awk programming one-liners explainedThis is the third and final part of a three-part article on the famous Awk one-liners. This part will explain Awk one-liners for selective printing and deletion of certain lines. See part one for introduction of the series.

If you just came to my website, then you might wonder, "What are these Awk one-liners and why are they famous?" The answer is very simple - they are small and beautiful Awk programs that do one and only text manipulation task very well. They have been circulating around the Internet as awk1line.txt text file and they have been written by Eric Pement.

If you are intrigued by this article series, I suggest that you subscribe to my posts, as I will have a lot more interesting and educational articles this year.

Eric Pement's Awk one-liner collection consists of five sections:

Awesome news: I have written an e-book based on this article series. Check it out:

Grab my Awk cheat sheet and the local copy of Awk one-liners file awk1line.txt and let's roll.

4. Selective Printing of Certain Lines

45. Print the first 10 lines of a file (emulates "head -10").

awk 'NR < 11'

Awk has a special variable called "NR" that stands for "Number of Lines seen so far in the current file". After reading each line, Awk increments this variable by one. So for the first line it's 1, for the second line 2, ..., etc. As I explained in the very first one-liner, every Awk program consists of a sequence of pattern-action statements "pattern { action statements }". The "action statements" part get executed only on those lines that match "pattern" (pattern evaluates to true). In this one-liner the pattern is "NR < 11" and there are no "action statements". The default action in case of missing "action statements" is to print the line as-is (it's equivalent to "{ print $0 }"). The pattern in this one-liner is an expression that tests if the current line number is less than 11. If the line number is less than 11, Awk prints the line. As soon as the line number is 11 or more, the pattern evaluates to false and Awk skips the line.

A much better way to do the same is to quit after seeing the first 10 lines (otherwise we are looping over lines > 10 and doing nothing):

awk '1; NR == 10 { exit }'

The "NR == 10 { exit }" part guarantees that as soon as the line number 10 is reached, Awk quits. For lines smaller than 10, Awk evaluates "1" that is always a true-statement. And as we just learned, true statements without the "action statements" part are equal to "{ print $0 }" that just prints the first ten lines!

46. Print the first line of a file (emulates "head -1").

awk 'NR > 1 { exit }; 1'

This one-liner is very similar to previous one. The "NR > 1" is true only for lines greater than one, so it does not get executed on the first line. On the first line only the "1", the true statement, gets executed. It makes Awk print the line and read the next line. Now the "NR" variable is 2, and "NR > 1" is true. At this moment "{ exit }" gets executed and Awk quits. That's it. Awk printed just the first line of the file.

47. Print the last 2 lines of a file (emulates "tail -2").

awk '{ y=x "\n" $0; x=$0 }; END { print y }'

Okay, so what does this one do? First of all, notice that "{y=x "\n" $0; x=$0}" action statement group is missing the pattern. When the pattern is missing, Awk executes the statement group for all lines. For the first line, it sets variable "y" to "\nline1" (because x is not yet defined). For the second line it sets variable "y" to "line1\nline2". For the third line it sets variable "y" to "line2\nline3". As you can see, for line N it sets the variable "y" to "lineN-1\nlineN". Finally, when it reaches EOF, variable "y" contains the last two lines and they get printed via "print y" statement.

Thinking about this one-liner for a second one concludes that it is very ineffective - it reads the whole file line by line just to print out the last two lines! Unfortunately there is no seek() statement in Awk, so you can't seek to the end-2 lines in the file (that's what tail does). It's recommended to use "tail -2" to print the last 2 lines of a file.

48. Print the last line of a file (emulates "tail -1").

awk 'END { print }'

This one-liner may or may not work. It relies on an assumption that the "$0" variable that contains the entire line does not get reset after the input has been exhausted. The special "END" pattern gets executed after the input has been exhausted (or "exit" called). In this one-liner the "print" statement is supposed to print "$0" at EOF, which may or may not have been reset.

It depends on your awk program's version and implementation, if it will work. Works with GNU Awk for example, but doesn't seem to work with nawk or xpg4/bin/awk.

The most compatible way to print the last line is:

awk '{ rec=$0 } END{ print rec }'

Just like the previous one-liner, it's computationally expensive to print the last line of the file this way, and "tail -1" should be the preferred way.

49. Print only the lines that match a regular expression "/regex/" (emulates "grep").

awk '/regex/'

This one-liner uses a regular expression "/regex/" as a pattern. If the current line matches the regex, it evaluates to true, and Awk prints the line (remember that missing action statement is equal to "{ print }" that prints the whole line).

50. Print only the lines that do not match a regular expression "/regex/" (emulates "grep -v").

awk '!/regex/'

Pattern matching expressions can be negated by appending "!" in front of them. If they were to evaluate to true, appending "!" in front makes them evaluate to false, and the other way around. This one-liner inverts the regex match of the previous (#49) one-liner and prints all the lines that do not match the regular expression "/regex/".

51. Print the line immediately before a line that matches "/regex/" (but not the line that matches itself).

awk '/regex/ { print x }; { x=$0 }'

This one-liner always saves the current line in the variable "x". When it reads in the next line, the previous line is still available in the "x" variable. If that line matches "/regex/", it prints out the variable x, and as a result, the previous line gets printed.

It does not work, if the first line of the file matches "/regex/", in that case, we might want to print "match on line 1", for example:

awk '/regex/ { print (x=="" ? "match on line 1" : x) }; { x=$0 }'

This one-liner tests if variable "x" contains something. The only time that x is empty is at very first line. In that case "match on line 1" gets printed. Otherwise variable "x" gets printed (that as we found out contains the previous line). Notice that this one-liner uses a ternary operator "foo?bar:baz" that is short for "if foo, then bar, else baz".

52. Print the line immediately after a line that matches "/regex/" (but not the line that matches itself).

awk '/regex/ { getline; print }'

This one-liner calls the "getline" function on all the lines that match "/regex/". This function sets $0 to the next line (and also updates NF, NR, FNR variables). The "print" statement then prints this next line. As a result, only the line after a line matching "/regex/" gets printed.

If it is the last line that matches "/regex/", then "getline" actually returns error and does not set $0. In this case the last line gets printed itself.

53. Print lines that match any of "AAA" or "BBB", or "CCC".

awk '/AAA|BBB|CCC/'

This one-liner uses a feature of extended regular expressions that support the | or alternation meta-character. This meta-character separates "AAA" from "BBB", and from "CCC", and tries to match them separately on each line. Only the lines that contain one (or more) of them get matched and printed.

54. Print lines that contain "AAA" and "BBB", and "CCC" in this order.

awk '/AAA.*BBB.*CCC/'

This one-liner uses a regular expression "AAA.*BBB.*CCC" to print lines. This regular expression says, "match lines containing AAA followed by any text, followed by BBB, followed by any text, followed by CCC in this order!" If a line matches, it gets printed.

55. Print only the lines that are 65 characters in length or longer.

awk 'length > 64'

This one-liner uses the "length" function. This function is defined as "length([str])" - it returns the length of the string "str". If none is given, it returns the length of the string in variable $0. For historical reasons, parenthesis () at the end of "length" can be omitted. This one-liner tests if the current line is longer than 64 chars, if it is, the "length > 64" evaluates to true and line gets printed.

56. Print only the lines that are less than 64 characters in length.

awk 'length < 64'

This one-liner is almost byte-by-byte equivalent to the previous one. Here it tests if the length if line less than 64 characters. If it is, Awk prints it out. Otherwise nothing gets printed.

57. Print a section of file from regular expression to end of file.

awk '/regex/,0'

This one-liner uses a pattern match in form 'pattern1, pattern2' that is called "range pattern". The 3rd Awk Tip from article "10 Awk Tips, Tricks and Pitfalls" explains this match very carefully. It matches all the lines starting with a line that matches "pattern1" and continuing until a line matches "pattern2" (inclusive). In this one-liner "pattern1" is a regular expression "/regex/" and "pattern2" is just 0 (false). So this one-liner prints all lines starting from a line that matches "/regex/" continuing to end-of-file (because 0 is always false, and "pattern2" never matches).

58. Print lines 8 to 12 (inclusive).

awk 'NR==8,NR==12'

This one-liner also uses a range pattern in format "pattern1, pattern2". The "pattern1" here is "NR==8" and "pattern2" is "NR==12". The first pattern means "the current line is 8th" and the second pattern means "the current line is 12th". This one-liner prints lines between these two patterns.

59. Print line number 52.

awk 'NR==52'

This one-liner tests to see if current line is number 52. If it is, "NR==52" evaluates to true and the line gets implicitly printed out (patterns without statements print the line unmodified).

The correct way, though, is to quit after line 52:

awk 'NR==52 { print; exit }'

This one-liner forces Awk to quit after line number 52 is printed. It is the correct way to print line 52 because there is nothing else to be done, so why loop over the whole doing nothing.

60. Print section of a file between two regular expressions (inclusive).

awk '/Iowa/,/Montana/'

I explained what a range pattern such as "pattern1,pattern2" does in general in one-liner #57. In this one-liner "pattern1" is "/Iowa/" and "pattern2" is "/Montana/". Both of these patterns are regular expressions. This one-liner prints all the lines starting with a line that matches "Iowa" and ending with a line that matches "Montana" (inclusive).

5. Selective Deletion of Certain Lines

There is just one one-liner in this section.

61. Delete all blank lines from a file.

awk NF

This one-liner uses the special NF variable that contains number of fields on the line. For empty lines, NF is 0, that evaluates to false, and false statements do not get the line printed.

Another way to do the same is:

awk '/./'

This one-liner uses a regular-expression match "." that matches any character. Empty lines do not have any characters, so it does not match.

Awk one-liners explained e-book

I have written my first e-book called "Awk One-Liners Explained". I improved the explanations of the one-liners in this article series, added new one-liners and added three new chapters - introduction to awk one-liners, summary of awk special variables and idiomatic awk. Please take a look:

Have Fun!

This concludes the article series about Awk one-liners. I hope that you enjoyed this three-part article and it made you a better Awk programmer!

My future plans are to create a awk1line-explained.txt that will be a supplementary file to the famous awk1line.txt. I am also thinking about publishing a nicely formatted pdf e-book about all the one-liners.

If you liked this article, you may also like a very similar article on Famous Sed One-Liners Explained.

And finally, if you notice anything that you can't understand, please let me know in the comments. Thank you!

Merry Christmas everyone!

If you're celebrating Christmas at your Unix console, wouldn't it be fun to have a Christmas tree in your shell? It sure would! Follow these steps to have your own Christmas tree in the shell.

Step 1: Install Acme::POE::Tree Perl Module.

Type the following at your shell, it will install Acme::POE::Tree Perl module:

perl -MCPAN -e 'install Acme::POE::Tree'

If you get notified that you are missing dependencies, answer 'yes' to have them installed.

Step 2: Celebrate Christmas at Console.

Type this to have your Christmas tree up and running:

perl -MAcme::POE::Tree -e 'Acme::POE::Tree->new()->run()'

The result:

Animated Acme::POE::Tree Perl Christmas Tree

Merry Christmas!

Ps. I created a geeky wish list at Amazon.com. I'd appreciate a Christmas or New Year present. The list is here: Peter's Amazon.com Wish List. Thank you! :)

This article is part of the article series "Sed One-Liners Explained."
<- previous article next article ->

sed -- the superman of unix stream editingThis is the second part of a three-part article on the famous sed one-liners. This part will explain sed one-liners for selective printing of certain lines. See part one for introduction of the series.

Just like the famous Awk one-liners, sed one-liners are beautiful, tiny little sed programs that span no more than 1 terminal line. They were written by Eric Pement and are floating around on the Internet as 'sed1line.txt' file.

If you are intrigued by this article series, I suggest that you subscribe to my posts!

Eric's sed one-liners are divided into several sections:

Update: Spanish translation of part two is available!

I have also made a sed cheat sheet that summarizes the whole sed utility. I suggest that you print it before you proceed and keep it in front of you. It will help you memorize the commands faster.

Awesome news: I have written an e-book based on this article series. Check it out:

Grab the sed1line.txt file and let's start.

4. Selective Printing of Certain Lines.

44. Print the first 10 lines of a file (emulates "head -10").

sed 10q

This one-liner restricts the "q" (quit) command to line "10". It means that this command gets executed only when sed reads the 10th line. For all the other lines there is no command specified. When there is no command specified, the default action is to print the line as-is. This one-liner prints lines 1-9 unmodified and at 10th line quits. Notice something strange? It was supposed to print first 10 lines of a file, but it seems that it just printed only the first 9... Worry not! The quit command is sneaky in its nature. Upon quitting with "q" command, sed actually prints the contents of pattern space and only then quits. As a result lines 1-10 get printed!

Please see the first part of the article for explanation of "pattern space".

45. Print the first line of a file (emulates "head -1").

sed q

The explanation of this one-liner is almost the same as of the previous. Sed quits and prints the first line.

A more detailed explanation - after the first line has been placed in the pattern space, sed executes the "q" command. This command forces sed to quit; but due to strange nature of the "q" command, sed also prints the contents of pattern space. As a result, only the first line gets printed.

46. Print the last 10 lines of a file (emulates "tail -10").

sed -e :a -e '$q;N;11,$D;ba'

This one-liner is tricky to explain. It always keeps the last 10 lines in pattern space and at the very last line of input it quits and prints them.

I'll try to explain it. The first "-e :a" creates a label called "a". The second "-e" does the following: "$q" - if it is the last line, quit and print the pattern space. If it is not the last line, execute three commands "N", "11,$D" and "ba". The "N" command reads the next line of input and appends it to the pattern space. The line gets separated from the rest of the pattern space by a new line character. The "11,$D" command executes the "D" command if the current line number is greater than or equal to 11 ("11,$" means from 11th line to end of file). The "D" command deletes the portion of pattern space up to the first new line character. The last command "ba" branches to a label named "a" (beginning of script). This guarantees that the pattern space never contains more than 10 lines, because as line 11 gets appended to pattern space, line 1 gets deleted, as line 12 gets appended line 2 gets deleted, etc.

47. Print the last 2 lines of a file (emulates "tail -2").

sed '$!N;$!D'

This one-liner is also tricky. First of all, the "$!" address restricts commands "N" and "D" to all the lines except the last line.

Notice how the addresses can be negated. If "$<command>" restricts a command to the last line, then "$!<command>" restricts the command to all but the last line. This can be applied to all restriction operations.

In this one-liner the "N" command reads the next line from input and appends it to pattern space. The "D" command deletes everything in pattern space up to the first "\n" symbol. These two commands always keep only the most recently read line in pattern space. When processing the second-to-last line, "N" gets executed and appends the last line to the pattern space. The "D" does not get executed as "N" consumed the last line. At this moment sed quits and prints out the last two lines of the file.

48. Print the last line of a file (emulates "tail -1").

sed '$!d'

This one-liner discards all the lines except the last one. The "d" command deletes the current pattern space, reads in the next line, and restarts the execution of commands from the first. In this case it just loops over itself like "dddd...ddd" until it hits the last line. At the last line no command is executed ("$!d" restricted execution of "d" to all the lines but last) and the pattern space gets printed.

Another way to do the same:

sed -n '$p'

The "-n" parameter suppresses automatic printing of pattern space. It means that without an explicit "p" command (or other commands that act directly on the output stream), sed is dead silent. The "p" command stands for "print" and it prints the pattern space. This one-liner calls the "p" command at the very last line of input. All the other lines are silently discarded.

49. Print next-to-the-last line of a file.

Eric gives three different one-liners to do this. The first one prints a blank line if the file contains just 1 line:

sed -e '$!{h;d;}' -e x

This one-liner executes the "h;d" commands for all the lines except the last one ("$!" restricts "h;d" commands to all lines except last). The "h" command puts the current line in hold buffer and "d" deletes the current line, and starts execution at the first sed command ("h;d" gets executed again, and again, ...). At every single line, that line gets copied to hold buffer. At the very last line "h;d" does not get executed. At this moment "x" gets a chance to execute. The "x" command exchanges the contents of hold buffer with pattern space. Remember that the previous line is still in the hold buffer. The "x" command puts it back in pattern space, and sed prints it! There you go, the next-to-last line was printed!

In case there is just 1 line in the file, only the "x" command gets executed. As the hold buffer initially is empty, "x" puts emptiness in pattern space (I use word "put" here but it actually exchanges the pattern space with hold space). Now sed prints the contents of pattern space, but it's empty, so sed prints out just a blank line.

The second prints the first line if the file contains just 1 line:

sed -e '1{$q;}' -e '$!{h;d;}' -e x

This sed-one liner is divided in two parts. The first part "1{$q;}" handles the case when the file contains just a single line. The second part "$!{h;d;} x" is exactly the same as in the previous one-liner! Thus, I need to explain just the first part.

The first part says - if it is the first line "1", then execute "$q". The "$q" command means - if it is the last line, then quit. What it effectively does is it quits if the first line is the last line (i.e. file contains just one line). Remember from one-liner #44 that before quitting sed prints the contents of pattern space. As a result, if the file contains just one line, sed prints it.

The third prints nothing for 1 line files:

sed -e '1{$d;}' -e '$!{h;d;}' -e x

This one-liner is again divided in two parts. The first part is "1{$d;}" and the second is exactly the same as in the previous two one-liners. I will explain just the first part.

The first part says - if it is the first line "1", then execute "$d". The "$d" command means - if it is the last line, then delete the pattern space and start all over again. In case the first line is the last (only one line in file), there is nothing more to be done and sed quits, printing nothing.

50. Print only the lines that match a regular expression (emulates "grep").

sed -n '/regexp/p'

This one-liner suppresses automatic printing of pattern space with the "-n" switch and makes use of "p" command to print only the lines that match "/regexp/". The lines that do not match this regex get silently discarded. The ones that match get printed. That's it.

Another one-liner that does the same:

sed '/regexp/!d'

This one-liner deletes all the lines that do not match "/regexp/". The other lines get printed by default. The "!" before "d" command inverts the line matching.

51. Print only the lines that do not match a regular expression (emulates "grep -v").

sed -n '/regexp/!p'

This one-liner is the inverse of the previous.

The "-n" prevents automatic printing of pattern space. The "/regexp/" restricts the "!p" command only to lines that match "/regexp/", but the "!" switch prevents "p" from acting on these lines. What happens is "p" acts on all lines that do not match "/regexp/", and they get "p"rinted.

sed '/regexp/d'

This one-liner is the inverse of the previous (#50).

This one-liner executed the "d" (delete) command on all lines that match "/regexp/", thus leaving only the lines that do not match. They get printed automatically.

52. Print the line immediately before regexp, but not the line containing the regexp.

sed -n '/regexp/{g;1!p;};h'

This one-liner saves each line in hold buffer with "h" command. If a line matches the regexp, the hold buffer (containing the previous line) gets copied to pattern space with "g" command and the pattern space gets printed out with "p" command. The "1!" restricts "p" not to print on the first line (as there are no lines before the first).

53. Print the line immediately after regexp, but not the line containing the regexp.

sed -n '/regexp/{n;p;}'

First of all, this one-liner disables automatic printing of pattern space with "-n" command line argument. Then, for all the lines that match "/regexp/", this one-liner executes "n" and "p" commands. The "n" command is the only command that depends on "-n" flag explicitly. If "-n" is specified it will empty the current pattern space and read in the next line of input. If "-n" is not specified, it will print out the current pattern space before emptying it. As in this one-liner "-n" is specified, the "n" command empties the pattern space, reads in the next line and then the "p" command prints that line out.

54. Print one line before and after regexp. Also print the line matching regexp and its line number. (emulates "grep -A1 -B1").

sed -n -e '/regexp/{=;x;1!p;g;$!N;p;D;}' -e h

First let's look at "h" command at the end of script. It gets executed on every line and stores the current line in pattern space in hold buffer. The idea of storing the current line in hold buffer is that if the next line matches "/regexp/" then the previous line is available in hold buffer.

Now let's look at the complicated "/regexp/{=;x;1!p;g;$!N;p;D;}" command. It gets executed only if the line matches "/regexp/". The first thing it does is it prints the current line number with "=" command. Then, it exchanges the hold buffer with pattern space by using the "x" command. As I explained, the "h" command at the end of the script makes sure that the hold buffer always contains the previous line. Now we have put it in the pattern space with "x" command. Next, if it's not the first line, "1!p" prints the pattern space, effectively printing the previous line. Now the "g" command gets executed. It copies the original line that was just exchanged with hold buffer back to pattern space. Now the "$!N" executes. If it is not the last line, "N" appends the next line to the current pattern space (and separates them with "\n" char). Pattern space now contains the line that matched "/regexp/" and the next line. The "p" command prints that. "D" deletes the current line (line that matched "/regexp/") from pattern space and finally "h" gets executed again, that puts the contents of pattern space into hold buffer. As "D" deleted the current line, the next line was put in hold buffer.

55. Grep for "AAA" and "BBB" and "CCC" in any order.

sed '/AAA/!d; /BBB/!d; /CCC/!d'

This one-liner inverts the "d" command to be executed on lines that do not contain either "AAA", "BBB" or "CCC". If a line does not contain one of them, it gets deleted and sed proceeds to the next line. Only if all three of the patterns are present, does the sed print the line.

56. Grep for "AAA" and "BBB" and "CCC" in that order.

sed '/AAA.*BBB.*CCC/!d'

This one-liner deletes lines that do not match regexp "/AAA.*BBB.*CCC/". For example, a line "AAAfooBBBbarCCC" will get printed but "AAAfooCCCbarBBB" baz will not.

It can also be written as:

sed -n '/AAA.*BBB.*CCC/p'

This one-liner prints lines that contain AAA...BBB...CCC in that order.

57. Grep for "AAA" or "BBB", or "CCC".

sed -e '/AAA/b' -e '/BBB/b' -e '/CCC/b' -e d

This one-liner uses the "b" command to branch to the end of the script if the line matches "AAA" or "BBB" or "CCC". At the end of the script the line gets implicitly printed. If the line does not match "AAA" or "BBB" or "CCC", the script reaches the "d" command that deletes the line.

gsed '/AAA\|BBB\|CCC/!d'

This one-liner works with GNU sed. GNU sed allows alternation operator | to be used to match separate things. It's a more compact way of saying match "AAA" or "BBB", or "CCC".

If you are using GNU sed, then there is actually no need to escape the pipes |. You may specify the "-r" command line option to use extended regular expressions. This way this one liner becomes:

gsed -r '/AAA|BBB|CCC/!d'

or

gsed -rn '/AAA|BBB|CCC/p'

58. Print a paragraph that contains "AAA". (Paragraphs are separated by blank lines).

sed -e '/./{H;$!d;}' -e 'x;/AAA/!d;'

First notice that this one-liner is divided in two parts for clearness. The first part is "/./{H;$!d;}" and the second part is "x;/AAA/!d".

The first part has an interesting pattern match "/./". What do you think it does? Well, a line separating paragraphs would be a blank line, meaning it would not have any characters in it. This pattern matches only the lines that are not separating paragraphs. These lines get appended to hold buffer with "H" command. They also get prevented from printing with "d" command (except for the last line, when "d" does not get executed ("$!" restricts "d" to all but the last line)). Once sed sees a blank line, the "/./" pattern no longer matches and the second part of one-liner gets executed.

The second part exchanges the hold buffer with pattern space by using the "x" command. The pattern space now contains the whole paragraph of text. Next sed tests if the paragraph contains "AAA". If it does, sed does nothing which results in printing the paragraph. If the paragraph does not contain "AAA", sed executes the "d" command that deletes it without printing and restarts execution at first command.

59. Print a paragraph if it contains "AAA" and "BBB" and "CCC" in any order.

sed -e '/./{H;$!d;}' -e 'x;/AAA/!d;/BBB/!d;/CCC/!d'

This one-liner is also split in two parts for clarity. The first part is exactly the same as the first part of previous one-liner. The second part is very similar to one-liner #55 and also the previous.

The "x" command in the 2nd part does exactly the same as in previous one-liner, it exchanges the hold buffer, that contains the paragraph with pattern space. Next sed does three tests - it tests if the paragraph contains "AAA", "BBB" and "CCC". If the paragraph does not contain even one of them, the "d" command gets executed that purges the paragraph. If it contains all three patterns, sed happily prints the paragraph.

60. Print a paragraph if it contains "AAA" or "BBB" or "CCC".

sed -e '/./{H;$!d;}' -e 'x;/AAA/b' -e '/BBB/b' -e '/CCC/b' -e d

The first part is exactly the same as in previous two one-liners and does not require explanation. The second part that happens to be "-e 'x;/AAA/b' -e '/BBB/b' -e '/CCC/b' -e d" is almost exactly the same as in one-liner #57.

The "x" command exchanges the paragraph stored in hold buffer with the pattern space. Then it tests if the pattern space (paragraph) contains "AAA", if it does, sed branches to end of script with "b" command, that happily makes sed print the paragraph. If "AAA" did not match, sed does exactly the same testing for pattern "BBB". If it again did not match, it tests for "CCC". If none of these patterns were found, sed executes the "d" command that deletes everything and restarts this one-liner.

Here is another way to do the same with GNU sed:

gsed '/./{H;$!d;};x;/AAA\|BBB\|CCC/b;d'

This one-liner is exactly the same as previous one. It just compresses the three tests for "AAA", "BBB" or "CCC" into one "/AAA\|BBB\|CCC/" as explained in one-liner #57.

61. Print only the lines that are 65 characters in length or more.

sed -n '/^.\{65\}/p'

This one-liner prints lines that are 65 characters in length or more. It does it by using a regular expression "^.{65}" that matches any 65 characters at the beginning of line. If there are less than 65 characters, the regex does not match and the line does not get printed (as automatic printing was disabled with "-n" command line option).

62. Print only the lines that are less than 65 chars.

sed -n '/^.\{65\}/!p'

This one-liner inverts the previous one. If the line matches 65 characters, then it is not printed "!p". If it does not match, it gets printed.

Another way to do the same:

sed '/^.\{65\}/d'

This one-liner deletes all lines that match 65 characters. All others implicitly get printed.

63. Print section of a file from a regex to end of file.

sed -n '/regexp/,$p'

This one-liner uses a tricky range match "/regex/,$". It matches lines starting from the first line that matches "/regex/" to the end of file "$". The "p" command prints these lines. All other lines get silently discarded.

64. Print lines 8-12 (inclusive) of a file.

sed -n '8,12p'

This is another type of range match. This range matches a section of lines between two lines numbers (inclusive). In this case it's lines [8 to 12].

sed '8,12!d'

This is the same one-liner, just written differently. It deletes lines that are outside of range [8, 12] and prints those in this range.

65. Print line number 52.

sed -n '52p'

This one-liner restricts the "p" command to line "52". Only this line gets "p"rinted.

sed '52!d'

This one-liner deletes all lines except line 52. Line 52 gets printed.

sed '52q;d'

This one is the smartest. It quits at line 52 with "q" command. The previous two one-liners would loop over all the remaining lines and do nothing. Remember from one-liner #44 that quit command prints the pattern space with it. The "d" command makes sure that no other line gets printed while sed gets to line 52.

66. Beginning at line 3, print every 7th line.

gsed -n '3~7p'

This one-liner uses a line range match extension of GNU sed. A line range in format "first~step" matches every step'th line starting from first. In this one-liner it's "3~7", meaning match every 7th line starting from 3rd. The "-n" flag prevents printing any other lines, and "p" in "3~7p" prints the matched line.

For everyone else, this one-liner works:

sed -n '3,${p;n;n;n;n;n;n;}'

This one-liner executes commands "p;n;n;n;n;n;n" for lines starting the 3rd line. The "3,$" is a line range match that restricts commands by line numbers. The "$" means end of file and "3" means 3rd line.

The "p;n;n;n;n;n;n" command prints the line, then skips 6, prints the 7th, skips 6, prints the 14th, etc. As it starts executing at line 3, the effect is - print line 3, skip 6, print line 10, skip 6, print line 17, .... That is, print every 7th line beginning at 3rd.

67. Print section of lines between two regular expressions (inclusive).

sed -n '/Iowa/,/Montana/p'

This one-liner prints all the lines between the first line that matches a regular expression "Iowa" and the first line that matches a regular expression "Montana".

It uses a range match "/start/,/finish/" that matches all lines starting from a line that matches "start" and ending with the first line that matches "finish".

An Important Comment About Ranges!

I have an important comment about ranges. Ranges in form "/start/,/finish/" always match 2 lines or more. If "/finish/" is on the same line as "/start/" it will not work. Please see the Sed FAQ 3.3 for more details.

Sed One-Liners Explained E-Book

I have written an e-book called "Sed One-Liners Explained". I improved the explanations of the one-liners in this article series, added new one-liners and added three new chapters - an introduction to sed, a summary of sed addresses and ranges, and debugging sed scripts with sed-sed. Please take a look:

Have Fun!

Have fun with sed, the superman of Unix text stream editing!

If you liked this article, you may also like a very similar article on Famous Awk One-Liners Explained.

Ps. If you notice anything that you can't understand, please let me know in the comments. Thanks!

This article is part of the article series "Vim Plugins You Should Know About."
<- previous article next article ->

Vim Plugins, surround.vimHere comes the second post in the article series "Vim Plugins You Should Know About". This time I am going to introduce you to a plugin called "repeat.vim".

Repeat.vim fixes an important functionality problem in the surround.vim plugin that I wrote about last week. The problem with surround.vim lies in the repeat command "." (dot). If you had applied a surrounding and wanted to repeat it with the "." command, it wouldn't work. This plugin fixes this problem.

So basically, whenever you install surround.vim, you also want to install repeat.vim with it.

There is one catch, though. It does not repeat visual or "ys" commands. Luckily, the "ys" commands can be often be substituted with "cs" commands. For example, if you wanted to do several "ysw"" (wrap a word in quotes), you may type "csw"" and then use "." commands to repeat. The only way to repeat visual commands is to record a macro.

Here is an example usage of the repeat.vim script. Suppose you had typed a sentence and you wanted to wrap all the words in quotes:

|foo bar baz quux muux woox

(| is cursor)

Type csw":

|"foo" bar baz quux muux woox

Now press W.

"foo" |"bar" baz quux muux woox

W moved to the next word and . repeated the wrapping command.

Now do the same 4 more times, and you have the whole line wrapped:

"foo" "bar" "baz" "quux" "muux" "woox"

How to install repeat.vim?

  • 1. Download repeat.vim to ~/.vim/plugin (on Unix/Linux), or ~\vimfiles\plugin (Windows).
  • 2. Restart Vim or source repeat.vim with ":so ~/.vim/plugin/repeat.vim" on Unix or ":so ~/vimfiles/plugin/repeat.vim" on Windows).

Have Fun!

Have fun with surround.vim + repeat.vim. I'll write about a much more life-changing plugin the next time. :)

This article is part of the article series "Awk One-Liners Explained."
<- previous article next article ->

awk programming one-liners explainedThis is the second part of a three-part article on the famous Awk one-liners. This part will explain Awk one-liners for text conversion and substitution. See part one for introduction of the series.

"What are these famous Awk one-liners?", you might wonder? Well, they are concise and beautiful Awk programs that span no more than 70 characters (less than one terminal line). They were written by Eric Pement and are floating around on the Internet as 'awk1line.txt' file.

If you are intrigued by this article series, I suggest that you subscribe to my posts!

Eric Pement's Awk one-liner collection consists of five sections:

I recommend that you print out my Awk Cheat Sheet before you proceed. This way you will have the language reference in front of you, and you will memorize it better.

Awesome news: I have written an e-book based on this article series. Check it out:

Grab the local copy of Awk one-liners file here awk1line.txt and let's roll.

3. Text Conversion and Substitution

21. Convert Windows/DOS newlines (CRLF) to Unix newlines (LF) from Unix.

awk '{ sub(/\r$/,""); print }'

This one-liner uses the sub(regex, repl, [string]) function. This function substitutes the first instance of regular expression "regex" in string "string" with the string "repl". If "string" is omitted, variable $0 is used. Variable $0, as I explained in the first part of the article, contains the entire line.

The one-liner replaces '\r' (CR) character at the end of the line with nothing, i.e., erases CR at the end. Print statement prints out the line and appends ORS variable, which is '\n' by default. Thus, a line ending with CRLF has been converted to a line ending with LF.

22. Convert Unix newlines (LF) to Windows/DOS newlines (CRLF) from Unix.

awk '{ sub(/$/,"\r"); print }'

This one-liner also uses the sub() function. This time it replaces the zero-width anchor '$' at the end of the line with a '\r' (CR char). This substitution actually adds a CR character to the end of the line. After doing that Awk prints out the line and appends the ORS, making the line terminate with CRLF.

23. Convert Unix newlines (LF) to Windows/DOS newlines (CRLF) from Windows/DOS.

awk 1

This one-liner may work, or it may not. It depends on the implementation. If the implementation catches the Unix newlines in the file, then it will read the file line by line correctly and output the lines terminated with CRLF. If it does not understand Unix LF's in the file, then it will print the whole file and terminate it with CRLF (single windows newline at the end of the whole file).

Ps. Statement '1' (or anything that evaluates to true) in Awk is syntactic sugar for '{ print }'.

24. Convert Windows/DOS newlines (CRLF) to Unix newlines (LF) from Windows/DOS

gawk -v BINMODE="w" '1'

Theoretically this one-liner should convert CRLFs to LFs on DOS. There is a note in GNU Awk documentation that says: "Under DOS, gawk (and many other text programs) silently translates end-of-line "\r\n" to "\n" on input and "\n" to "\r\n" on output. A special "BINMODE" variable allows control over these translations and is interpreted as follows: ... If "BINMODE" is "w", then binary mode is set on write (i.e., no translations on writes)."

My tests revealed that no translation was done, so you can't rely on this BINMODE hack.

Eric suggests to better use the "tr" utility to convert CRLFs to LFs on Windows:

tr -d \r

The 'tr' program is used for translating one set of characters to another. Specifying -d option makes it delete all characters and not do any translation. In this case it's the '\r' (CR) character that gets erased from the input. Thus, CRLFs become just LFs.

25. Delete leading whitespace (spaces and tabs) from the beginning of each line (ltrim).

awk '{ sub(/^[ \t]+/, ""); print }'

This one-liner also uses sub() function. What it does is replace regular expression "^[ \t]+" with nothing "". The regular expression "^[ \t]+" means - match one or more space " " or a tab "\t" at the beginning "^" of the string.

26. Delete trailing whitespace (spaces and tabs) from the end of each line (rtrim).

awk '{ sub(/[ \t]+$/, ""); print }'

This one-liner is very similar to the previous one. It replaces regular expression "[ \t]+$" with nothing. The regular expression "[ \t]+$" means - match one or more space " " or a tab "\t" at the end "$" of the string. The "+" means "one or more".

27. Delete both leading and trailing whitespaces from each line (trim).

awk '{ gsub(/^[ \t]+|[ \t]+$/, ""); print }'

This one-liner uses a new function called "gsub". Gsub() does the same as sub(), except it performs as many substitutions as possible (that is, it's a global sub()). For example, given a variable f = "foo", sub("o", "x", f) would replace just one "o" in variable f with "x", making f be "fxo"; but gsub("o", "x", f) would replace both "o"s in "foo" resulting "fxx".

The one-liner combines both previous one-liners - it replaces leading whitespace "^[ \t]+" and trailing whitespace "[ \t]+$" with nothing, thus trimming the string.

To remove whitespace between fields you may use this one-liner:

awk '{ $1=$1; print }'

This is a pretty tricky one-liner. It seems to do nothing, right? Assign $1 to $1. But no, when you change a field, Awk rebuilds the $0 variable. It takes all the fields and concats them, separated by OFS (single space by default). All the whitespace between fields is gone.

28. Insert 5 blank spaces at beginning of each line.

awk '{ sub(/^/, "     "); print }'

This one-liner substitutes the zero-length beginning of line anchor "^" with five empty spaces. As the anchor is zero-length and matches the beginning of line, the five whitespace characters get appended to beginning of the line.

29. Align all text flush right on a 79-column width.

awk '{ printf "%79s\n", $0 }' 

This one-liner asks printf() to print the string in $0 variable and left pad it with spaces until the total length is 79 chars.

Please see the documentation of printf function for more information and examples.

30. Center all text on a 79-character width.

awk '{ l=length(); s=int((79-l)/2); printf "%"(s+l)"s\n", $0 }'

First this one-liner calculates the length() of the line and puts the result in variable "l". Length(var) function returns the string length of var. If the variable is not specified, it returns the length of the entire line (variable $0). Next it calculates how many white space characters to pad the line with and stores the result in variable "s". Finally it printf()s the line with appropriate number of whitespace chars.

For example, when printing a string "foo", it first calculates the length of "foo" which is 3. Next it calculates the column "foo" should appear which (79-3)/2 = 38. Finally it printf("%41", "foo"). Printf() function outputs 38 spaces and then "foo", making that string centered (38*2 + 3 = 79)

31. Substitute (find and replace) "foo" with "bar" on each line.

awk '{ sub(/foo/,"bar"); print }'

This one-liner is very similar to the others we have seen before. It uses the sub() function to replace "foo" with "bar". Please note that it replaces just the first match. To replace all "foo"s with "bar"s use the gsub() function:

awk '{ gsub(/foo/,"bar"); print }'

Another way is to use the gensub() function:

gawk '{ $0 = gensub(/foo/,"bar",4); print }'

This one-liner replaces only the 4th match of "foo" with "bar". It uses a never before seen gensub() function. The prototype of this function is gensub(regex, s, h[, t]). It searches the string "t" for "regex" and replaces "h"-th match with "s". If "t" is not given, $0 is assumed. Unlike sub() and gsub() it returns the modified string "t" (sub and gsub modified the string in-place).

Gensub() is a non-standard function and requires GNU Awk or Awk included in NetBSD.

In this one-liner regex = "/foo/", s = "bar", h = 4, and t = $0. It replaces the 4th instance of "foo" with "bar" and assigns the new string back to the whole line $0.

32. Substitute "foo" with "bar" only on lines that contain "baz".

awk '/baz/ { gsub(/foo/, "bar") }; { print }'

As I explained in the first one-liner in the first part of the article, every Awk program consists of a sequence of pattern-action statements "pattern { action statements }". Action statements are applied only to lines that match pattern.

In this one-liner the pattern is a regular expression /baz/. If line contains "baz", the action statement gsub(/foo/, "bar") is executed. And as we have learned, it substitutes all instances of "foo" with "bar". If you want to substitute just one, use the sub() function!

33. Substitute "foo" with "bar" only on lines that do not contain "baz".

awk '!/baz/ { gsub(/foo/, "bar") }; { print }'

This one-liner negates the pattern /baz/. It works exactly the same way as the previous one, except it operates on lines that do not contain match this pattern.

34. Change "scarlet" or "ruby" or "puce" to "red".

awk '{ gsub(/scarlet|ruby|puce/, "red"); print}'

This one-liner makes use of extended regular expression alternation operator | (pipe). The regular expression /scarlet|ruby|puce/ says: match "scarlet" or "ruby" or "puce". If the line matches, gsub() replaces all the matches with "red".

35. Reverse order of lines (emulate "tac").

awk '{ a[i++] = $0 } END { for (j=i-1; j>=0;) print a[j--] }'

This is the trickiest one-liner today. It starts by recording all the lines in the array "a". For example, if the input to this program was three lines "foo", "bar", and "baz", then the array "a" would contain the following values: a[0] = "foo", a[1] = "bar", and a[2] = "baz".

When the program has finished processing all lines, Awk executes the END { } block. The END block loops over the elements in the array "a" and prints the recorded lines. In our example with "foo", "bar", "baz" the END block does the following:

for (j = 2; j >= 0; ) print a[j--]

First it prints out j[2], then j[1] and then j[0]. The output is three separate lines "baz", "bar" and "foo". As you can see the input was reversed.

36. Join a line ending with a backslash with the next line.

awk '/\\$/ { sub(/\\$/,""); getline t; print $0 t; next }; 1'

This one-liner uses regular expression "/\\$/" to look for lines ending with a backslash. If the line ends with a backslash, the backslash gets removed by sub(/\\$/,"") function. Then the "getline t" function is executed. "Getline t" reads the next line from input and stores it in variable t. "Print $0 t" statement prints the original line (but with trailing backslash removed) and the newly read line (which was stored in variable t). Awk then continues with the next line. If the line does not end with a backslash, Awk just prints it out with "1".

Unfortunately this one liner fails to join more than 2 lines (this is left as an exercise to the reader to come up with a one-liner that joins arbitrary number of lines that end with backslash :)).

37. Print and sort the login names of all users.

awk -F ":" '{ print $1 | "sort" }' /etc/passwd

This is the first time we see the -F argument passed to Awk. This argument specifies a character, a string or a regular expression that will be used to split the line into fields ($1, $2, ...). For example, if the line is "foo-bar-baz" and -F is "-", then the line will be split into three fields: $1 = "foo", $2 = "bar" and $3 = "baz". If -F is not set to anything, the line will contain just one field $1 = "foo-bar-baz".

Specifying -F is the same as setting the FS (Field Separator) variable in the BEGIN block of Awk program:

awk -F ":"
# is the same as
awk 'BEGIN { FS=":" }'

/etc/passwd is a text file, that contains a list of the system's accounts, along with some useful information like login name, user ID, group ID, home directory, shell, etc. The entries in the file are separated by a colon ":".

Here is an example of a line from /etc/passwd file:

pkrumins:x:1000:100:Peteris Krumins:/home/pkrumins:/bin/bash

If we split this line on ":", the first field is the username (pkrumins in this example). The one-liner does just that - it splits the line on ":", then forks the "sort" program and feeds it all the usernames, one by one. After Awk has finished processing the input, sort program sorts the usernames and outputs them.

38. Print the first two fields in reverse order on each line.

awk '{ print $2, $1 }' file

This one liner is obvious. It reverses the order of fields $1 and $2. For example, if the input line is "foo bar", then after running this program the output will be "bar foo".

39. Swap first field with second on every line.

awk '{ temp = $1; $1 = $2; $2 = temp; print }'

This one-liner uses a temporary variable called "temp". It assigns the first field $1 to "temp", then it assigns the second field to the first field and finally it assigns "temp" to $2. This procedure swaps the first two fields on every line. For example, if the input is "foo bar baz", then the output will be "bar foo baz".

Ps. This one-liner was incorrect in Eric's awk1line.txt file. "Print" was missing.

40. Delete the second field on each line.

awk '{ $2 = ""; print }'

This one liner just assigns empty string to the second field. It's gone.

41. Print the fields in reverse order on every line.

awk '{ for (i=NF; i>0; i--) printf("%s ", $i); printf ("\n") }'

We saw the "NF" variable that stands for Number of Fields in the part one of this article. After processing each line, Awk sets the NF variable to number of fields found on that line.

This one-liner loops in reverse order starting from NF to 1 and outputs the fields one by one. It starts with field $NF, then $(NF-1), ..., $1. After that it prints a newline character.

42. Remove duplicate, consecutive lines (emulate "uniq")

awk 'a !~ $0; { a = $0 }'

Variables in Awk don't need to be initialized or declared before they are being used. They come into existence the first time they are used. This one-liner uses variable "a" to keep the last line seen "{ a = $0 }". Upon reading the next line, it compares if the previous line (in variable "a") is not the same as the current one "a !~ $0". If it is not the same, the expression evaluates to 1 (true), and as I explained earlier, any true expression is the same as "{ print }", so the line gets printed out. Then the program saves the current line in variable "a" again and the same process continues over and over again.

This one-liner is actually incorrect. It uses a regular expression matching operator "!~". If the previous line was something like "fooz" and the new one is "foo", then it won't get output, even though they are not duplicate lines.

Here is the correct, fixed, one-liner:

awk 'a != $0; { a = $0 }'

It compares lines line-wise and not as a regular expression.

43. Remove duplicate, nonconsecutive lines.

awk '!a[$0]++'

This one-liner is very idiomatic. It registers the lines seen in the associative-array "a" (arrays are always associative in Awk) and at the same time tests if it had seen the line before. If it had seen the line before, then a[line] > 0 and !a[line] == 0. Any expression that evaluates to false is a no-op, and any expression that evals to true is equal to "{ print }".

For example, suppose the input is:

foo
bar
foo
baz

When Awk sees the first "foo", it evaluates the expression "!a["foo"]++". "a["foo"]" is false, but "!a["foo"]" is true - Awk prints out "foo". Then it increments "a["foo"]" by one with "++" post-increment operator. Array "a" now contains one value "a["foo"] == 1".

Next Awk sees "bar", it does exactly the same what it did to "foo" and prints out "bar". Array "a" now contains two values "a["foo"] == 1" and "a["bar"] == 1".

Now Awk sees the second "foo". This time "a["foo"]" is true, "!a["foo"]" is false and Awk does not print anything! Array "a" still contains two values "a["foo"] == 2" and "a["bar"] == 1".

Finally Awk sees "baz" and prints it out because "!a["baz"]" is true. Array "a" now contains three values "a["foo"] == 2" and "a["bar"] == 1" and "a["baz"] == 1".

The output:

foo
bar
baz

Here is another one-liner to do the same. Eric in his one-liners says it's the most efficient way to do it.

awk '!($0 in a) { a[$0]; print }'

It's basically the same as previous one, except that it uses the 'in' operator. Given an array "a", an expression "foo in a" tests if variable "foo" is in "a".

Note that an empty statement "a[$0]" creates an element in the array.

44. Concatenate every 5 lines of input with a comma.

awk 'ORS=NR%5?",":"\n"'

We saw the ORS variable in part one of the article. This variable gets appended after every line that gets output. In this one-liner it gets changed on every 5th line from a comma to a newline. For lines 1, 2, 3, 4 it's a comma, for line 5 it's a newline, for lines 6, 7, 8, 9 it's a comma, for line 10 a newline, etc.

Awk one-liners explained e-book

I have written my first e-book called "Awk One-Liners Explained". I improved the explanations of the one-liners in this article series, added new one-liners and added three new chapters - introduction to awk one-liners, summary of awk special variables and idiomatic awk. Please take a look:

Have Fun!

Have fun with these one-liners. I hope you learned something new.

If you liked this article, you may also like a very similar article on Famous Sed One-Liners Explained.

Ps. If you notice anything that you can't understand, please let me know in the comments. Thanks!