This article is part of the article series "Awk One-Liners Explained."
<- previous article next article ->
awk programming one-liners explained

This is the third and final part of a three-part article on the famous Awk one-liners. This part will explain Awk one-liners for selective printing and deletion of certain lines. See part one for introduction of the series.

If you just came to my website, then you might wonder, "What are these Awk one-liners and why are they famous?" The answer is very simple - they are small and beautiful Awk programs that do one and only text manipulation task very well. They have been circulating around the Internet as awk1line.txt text file and they have been written by Eric Pement.

If you are intrigued by this article series, I suggest that you subscribe to my posts, as I will have a lot more interesting and educational articles this year.

Eric Pement's Awk one-liner collection consists of five sections:

Awesome news: I have written an e-book based on this article series. Check it out:

Grab my Awk cheat sheet and the local copy of Awk one-liners file awk1line.txt and let's roll.

4. Selective Printing of Certain Lines

45. Print the first 10 lines of a file (emulates "head -10").

awk 'NR < 11'

Awk has a special variable called "NR" that stands for "Number of Lines seen so far in the current file". After reading each line, Awk increments this variable by one. So for the first line it's 1, for the second line 2, ..., etc. As I explained in the very first one-liner, every Awk program consists of a sequence of pattern-action statements "pattern { action statements }". The "action statements" part get executed only on those lines that match "pattern" (pattern evaluates to true). In this one-liner the pattern is "NR < 11" and there are no "action statements". The default action in case of missing "action statements" is to print the line as-is (it's equivalent to "{ print $0 }"). The pattern in this one-liner is an expression that tests if the current line number is less than 11. If the line number is less than 11, Awk prints the line. As soon as the line number is 11 or more, the pattern evaluates to false and Awk skips the line.

A much better way to do the same is to quit after seeing the first 10 lines (otherwise we are looping over lines > 10 and doing nothing):

awk '1; NR == 10 { exit }'

The "NR == 10 { exit }" part guarantees that as soon as the line number 10 is reached, Awk quits. For lines smaller than 10, Awk evaluates "1" that is always a true-statement. And as we just learned, true statements without the "action statements" part are equal to "{ print $0 }" that just prints the first ten lines!

46. Print the first line of a file (emulates "head -1").

awk 'NR > 1 { exit }; 1'

This one-liner is very similar to previous one. The "NR > 1" is true only for lines greater than one, so it does not get executed on the first line. On the first line only the "1", the true statement, gets executed. It makes Awk print the line and read the next line. Now the "NR" variable is 2, and "NR > 1" is true. At this moment "{ exit }" gets executed and Awk quits. That's it. Awk printed just the first line of the file.

47. Print the last 2 lines of a file (emulates "tail -2").

awk '{ y=x "\n" $0; x=$0 }; END { print y }'

Okay, so what does this one do? First of all, notice that "{y=x "\n" $0; x=$0}" action statement group is missing the pattern. When the pattern is missing, Awk executes the statement group for all lines. For the first line, it sets variable "y" to "\nline1" (because x is not yet defined). For the second line it sets variable "y" to "line1\nline2". For the third line it sets variable "y" to "line2\nline3". As you can see, for line N it sets the variable "y" to "lineN-1\nlineN". Finally, when it reaches EOF, variable "y" contains the last two lines and they get printed via "print y" statement.

Thinking about this one-liner for a second one concludes that it is very ineffective - it reads the whole file line by line just to print out the last two lines! Unfortunately there is no seek() statement in Awk, so you can't seek to the end-2 lines in the file (that's what tail does). It's recommended to use "tail -2" to print the last 2 lines of a file.

48. Print the last line of a file (emulates "tail -1").

awk 'END { print }'

This one-liner may or may not work. It relies on an assumption that the "$0" variable that contains the entire line does not get reset after the input has been exhausted. The special "END" pattern gets executed after the input has been exhausted (or "exit" called). In this one-liner the "print" statement is supposed to print "$0" at EOF, which may or may not have been reset.

It depends on your awk program's version and implementation, if it will work. Works with GNU Awk for example, but doesn't seem to work with nawk or xpg4/bin/awk.

The most compatible way to print the last line is:

awk '{ rec=$0 } END{ print rec }'

Just like the previous one-liner, it's computationally expensive to print the last line of the file this way, and "tail -1" should be the preferred way.

49. Print only the lines that match a regular expression "/regex/" (emulates "grep").

awk '/regex/'

This one-liner uses a regular expression "/regex/" as a pattern. If the current line matches the regex, it evaluates to true, and Awk prints the line (remember that missing action statement is equal to "{ print }" that prints the whole line).

50. Print only the lines that do not match a regular expression "/regex/" (emulates "grep -v").

awk '!/regex/'

Pattern matching expressions can be negated by appending "!" in front of them. If they were to evaluate to true, appending "!" in front makes them evaluate to false, and the other way around. This one-liner inverts the regex match of the previous (#49) one-liner and prints all the lines that do not match the regular expression "/regex/".

51. Print the line immediately before a line that matches "/regex/" (but not the line that matches itself).

awk '/regex/ { print x }; { x=$0 }'

This one-liner always saves the current line in the variable "x". When it reads in the next line, the previous line is still available in the "x" variable. If that line matches "/regex/", it prints out the variable x, and as a result, the previous line gets printed.

It does not work, if the first line of the file matches "/regex/", in that case, we might want to print "match on line 1", for example:

awk '/regex/ { print (x=="" ? "match on line 1" : x) }; { x=$0 }'

This one-liner tests if variable "x" contains something. The only time that x is empty is at very first line. In that case "match on line 1" gets printed. Otherwise variable "x" gets printed (that as we found out contains the previous line). Notice that this one-liner uses a ternary operator "foo?bar:baz" that is short for "if foo, then bar, else baz".

52. Print the line immediately after a line that matches "/regex/" (but not the line that matches itself).

awk '/regex/ { getline; print }'

This one-liner calls the "getline" function on all the lines that match "/regex/". This function sets $0 to the next line (and also updates NF, NR, FNR variables). The "print" statement then prints this next line. As a result, only the line after a line matching "/regex/" gets printed.

If it is the last line that matches "/regex/", then "getline" actually returns error and does not set $0. In this case the last line gets printed itself.

53. Print lines that match any of "AAA" or "BBB", or "CCC".

awk '/AAA|BBB|CCC/'

This one-liner uses a feature of extended regular expressions that support the | or alternation meta-character. This meta-character separates "AAA" from "BBB", and from "CCC", and tries to match them separately on each line. Only the lines that contain one (or more) of them get matched and printed.

54. Print lines that contain "AAA" and "BBB", and "CCC" in this order.

awk '/AAA.*BBB.*CCC/'

This one-liner uses a regular expression "AAA.*BBB.*CCC" to print lines. This regular expression says, "match lines containing AAA followed by any text, followed by BBB, followed by any text, followed by CCC in this order!" If a line matches, it gets printed.

55. Print only the lines that are 65 characters in length or longer.

awk 'length > 64'

This one-liner uses the "length" function. This function is defined as "length([str])" - it returns the length of the string "str". If none is given, it returns the length of the string in variable $0. For historical reasons, parenthesis () at the end of "length" can be omitted. This one-liner tests if the current line is longer than 64 chars, if it is, the "length > 64" evaluates to true and line gets printed.

56. Print only the lines that are less than 64 characters in length.

awk 'length < 64'

This one-liner is almost byte-by-byte equivalent to the previous one. Here it tests if the length if line less than 64 characters. If it is, Awk prints it out. Otherwise nothing gets printed.

57. Print a section of file from regular expression to end of file.

awk '/regex/,0'

This one-liner uses a pattern match in form 'pattern1, pattern2' that is called "range pattern". The 3rd Awk Tip from article "10 Awk Tips, Tricks and Pitfalls" explains this match very carefully. It matches all the lines starting with a line that matches "pattern1" and continuing until a line matches "pattern2" (inclusive). In this one-liner "pattern1" is a regular expression "/regex/" and "pattern2" is just 0 (false). So this one-liner prints all lines starting from a line that matches "/regex/" continuing to end-of-file (because 0 is always false, and "pattern2" never matches).

58. Print lines 8 to 12 (inclusive).

awk 'NR==8,NR==12'

This one-liner also uses a range pattern in format "pattern1, pattern2". The "pattern1" here is "NR==8" and "pattern2" is "NR==12". The first pattern means "the current line is 8th" and the second pattern means "the current line is 12th". This one-liner prints lines between these two patterns.

59. Print line number 52.

awk 'NR==52'

This one-liner tests to see if current line is number 52. If it is, "NR==52" evaluates to true and the line gets implicitly printed out (patterns without statements print the line unmodified).

The correct way, though, is to quit after line 52:

awk 'NR==52 { print; exit }'

This one-liner forces Awk to quit after line number 52 is printed. It is the correct way to print line 52 because there is nothing else to be done, so why loop over the whole doing nothing.

60. Print section of a file between two regular expressions (inclusive).

awk '/Iowa/,/Montana/'

I explained what a range pattern such as "pattern1,pattern2" does in general in one-liner #57. In this one-liner "pattern1" is "/Iowa/" and "pattern2" is "/Montana/". Both of these patterns are regular expressions. This one-liner prints all the lines starting with a line that matches "Iowa" and ending with a line that matches "Montana" (inclusive).

5. Selective Deletion of Certain Lines

There is just one one-liner in this section.

61. Delete all blank lines from a file.

awk NF

This one-liner uses the special NF variable that contains number of fields on the line. For empty lines, NF is 0, that evaluates to false, and false statements do not get the line printed.

Another way to do the same is:

awk '/./'

This one-liner uses a regular-expression match "." that matches any character. Empty lines do not have any characters, so it does not match.

Awk one-liners explained e-book

I have written my first e-book called "Awk One-Liners Explained". I improved the explanations of the one-liners in this article series, added new one-liners and added three new chapters - introduction to awk one-liners, summary of awk special variables and idiomatic awk. Please take a look:

Have Fun!

This concludes the article series about Awk one-liners. I hope that you enjoyed this three-part article and it made you a better Awk programmer!

My future plans are to create a awk1line-explained.txt that will be a supplementary file to the famous awk1line.txt. I am also thinking about publishing a nicely formatted pdf e-book about all the one-liners.

If you liked this article, you may also like a very similar article on Famous Sed One-Liners Explained.

And finally, if you notice anything that you can't understand, please let me know in the comments. Thank you!

This article is part of the article series "Awk One-Liners Explained."
<- previous article next article ->

Comments

January 05, 2009, 17:13

Very, very nice Peteris!

Muhammad Permalink
January 05, 2009, 17:20

Cool As Usual :D...GO on man i am Periodically checking your Blog.. :)

January 05, 2009, 17:30

Thanks Eric and Muhammad! :)

January 06, 2009, 09:48

Nice Peter, very useful. Thanks.

One more:
Printing last field of last line:

awk '{ f=$NF }; END{ print f }'  file

Which is very useful to print the latest updated file in my dir.

ls -lrt | awk '{ f=$NF }; END{ print f }'

// Jadu

Roman Permalink
January 06, 2009, 19:22

"There are just one one-liners in this section."

;=]

January 07, 2009, 01:27

Thanks, Jadu - that is a nice tip!

Roman, fixing it now :) Thanks!

thedailyperlonlinr Permalink
January 07, 2009, 23:00

ah you just gave me enough material for 2 months in my new little blog ;-)

February 09, 2009, 13:09

#47 can be written in easy way like this
awk ' { rec[NR]=$0} END{printf("%s\n%s\n", rec[NR-1], rec[NR]}' file

can't it?

February 09, 2009, 13:38

in #52 if this is the condition then it gives wrong results

foo
foo
foo
bar

According to statement it should "foo foo bar" (in different lines of course) but it prints "foo bar"

February 11, 2009, 05:42

Hi Yogesh Agrawal. In response to your questions,

#47: No, if you do it that way, you keep all the lines in 'rec' variable. If the file is 2 gigabytes in size, you keep 2 gigabytes in memory.

#52: You're right, it should produce output like that. I did not notice it. It's 6am here. Can you help me fix it?

February 11, 2009, 08:07

HI Peteris,
Regarding #52 correction. I got one thing.
If our awk command is like this it will work perfectly.
awk 'var-->0;/foo/{var=1}' file

I gave input file as:

baz
foo
foo
foo
bar
foo
baz

and got output as:
foo
foo
bar
baz
which is correct as per definition.

You may find loophole in this. But till then it is working perfect.
Thanks
yogesh

February 12, 2009, 01:02

Yogesh, I have three answers for you:

awk 'f;{f=/foo/}'
awk 'c&&!--c;/foo/{c=1}'
awk '/foo/{_[NR+1]}NR in _' 

And my friends from #awk channel suggested a bunch of variations:

a) Print all records from some pattern:

     awk '/pattern/{f=1}f' file

b) Print all records after some pattern:

     awk 'f;/pattern/{f=1}' file

c) Print the Nth record after some pattern:

     awk 'c&&!--c;/pattern/{c=N}' file

d) Print every record except the Nth record after some pattern:

     awk 'c&&!--c{next}/pattern/{c=N}' file

e) Print the N records after some pattern:

     awk 'c&&c--;/pattern/{c=N}' file

f) Print every record except the N records after some pattern:

     awk 'c&&c--{next}/pattern/{c=N}' file
Peter Passchier Permalink
February 22, 2009, 17:23

Re. oneliner 61, /./ and NF respond differently to lines that just contain spaces, /./ does pick those up.

hunter85 Permalink
June 04, 2009, 12:05

Amazing Blog.

Paul Permalink
July 29, 2009, 16:46

Serious error in #45, #46 and many others.
FNR is the line number in the *current* file.
NR is the cumulative line number across all inputs.

awk 'code' six_lines.txt ten_lines.txt will have
NR == 1 and FNR == 1
NR == 7 and FNR == 1 again
and END will see NR == 16 and FNR == 10.

Programs that check for change in FILENAME would be more efficient to check for FNR == 1.

Awk to print first 10 lines of several files:

awk 'FNR == 1 { print "\n.... " FILENAME; } FNR < 11

If your awk has "nextfile" it saves reading lines after first 10.

Raj Permalink
October 05, 2009, 09:42

Great awk coding, nice work: Peteris, Thanks.

I found another code for 58:

58. Print lines 8 to 12 (inclusive).

awk 'NR>7 && NR<13'
Nik Permalink
December 31, 2009, 00:57

Nice site.

I'm stumped with this. I have a file containing
2 AAA
3 AAA
4 AAA
9 AAA
8 XXX
9 XXX
10 XXX
19 XXX
5 BBB
6 BBB
7 BBB
12 BBB
3 FFF
4 FFF
5 FFF
11 FFF

Can I use awk to return an output like
2|9|AAA
8|10|XXX
5|7|BBB
3|4|5|FFF

That is, to return the FIRST and SECOND-TO-THE-LAST numbers for each unique pattern.

Balaji Bodicherla Permalink
September 17, 2010, 02:12

I am a vivid fan of awk and sed and always wanted to visit a site who can give one-lines w/good explanation. This site is by far the best one I have seen and it really helped me to thorougly refresh my awk knowledge.

Thanks a lot, keep up the good work. As you mentioned it really helps if you can provide examples for each one-liners, looking forward to get those examples.

Thanks again

February 13, 2011, 21:18

"thinking about publishing a nicely formatted pdf e-book"

A couple sed commands against the text would turn your one-liners explained into asciidoc format, which in turn can product html, pdf, docbook, ebook output, not to mention txt. PS: You can even use it for your math notes :-) See: http://www.methods.co.nz/asciidoc/

P_sun Permalink
March 23, 2011, 18:32

Hi,
If I want to get the lines 506, 590 and 600 (from text1.log) written to out.log.
The following command allows me to do that:

awk 'NR==506||NR==590 || NR==600' "text1.log" > out.txt

Let us set a=506, b=590 and c=600.
Now what I need is lines 628 (a+(122)*n), 712 (b+(122)*n),
722 (c+(122)*n) ,where n=1-20 written out from text1.log.
Can awk be suitable for this?
Can someone outline how this can be done. I appreciate your help.

If the above needs to be repeated for a series of outputs, text2.log, text3.log, how can this be done? Any help is greatly appreciated.

March 27, 2011, 04:08

Here is the code that should work for you:

for (( n=1; n<21; n++ )); do awk -v n=$n
'NR==a+(122*n)||NR==b+(122*n)||NR==c+(122*n)'
a=506 b=590 c=600 text1.log >>out.txt; done

This should be all in one line (a true "oneliner').

paddy1 Permalink
June 23, 2011, 21:40

I have this file (using Sun OS) and need to delete rows if fields 1 and 3 are repeated

$ cat tt
yy|red|12|500|55
rr|red|12|500|55
yy|yellow|12|600|55
rr|yellow|13|600|55

Expecting output to be

rr|red|12|500|55
rr|yellow|13|600|55

And using this command - nawk '!x[$1,$3]++' FS="|" tt
But it's just cleaning up the 3rd line !!! I just picked up the command from somewhere so not sure what needs to be changed. Help is appreciated :)

yogesh Permalink
August 31, 2011, 07:54

if i have a file master.txt containing 4000 lines,
i want to cut (not print, copy) first 10 lines and paste in a file called data1.txt,
then run a command (submit a job) as
"bsub -K -qio ./runjob.txt" .
the file runjob.txt contains a single line as "./doo data1.txt ."
So, I need to delete (remove content of it) the file data1.txt, so that next 10 lines of master.txt will be in data1.txt to process.
there should be 400 iterations.
is there a way to do it in a loop using awk/sed?

Shreya Permalink
September 01, 2011, 21:17

Hey what if I need to print a bunch of lines every time after /regex/ ?

September 22, 2011, 09:31

Thanks Peteris. learned a lot.

Leave a new comment

(why do I need your e-mail?)

(Your twitter name, if you have one. (I'm @pkrumins, btw.))

Type first 3 letters of your name: (just to make sure you're a human)

Please preview the comment before submitting to make sure it's OK.

Advertisements