Awk Programming September 27, 2008

# Awk One-Liners Explained, Part I: File Spacing, Numbering and Calculations

<- previous article next article ->

I noticed that Eric Wendelin wrote an article "awk is a beautiful tool." In this article he said that it was best to introduce Awk with practical examples. I totally agree with Eric.

When I was learning Awk, I first went through Awk - A Tutorial and Introduction by Bruce Barnett, which was full of examples to try out; then I created an Awk cheat sheet to have the language reference in front of me; and finally I went through the famous Awk one-liners (link to .txt file), which were compiled by Eric Pement.

This is going to be a three-part article in which I will explain every single one-liner in Mr. Pement's compilation. Each part will explain around 20 one-liners. If you follow closely then the explained examples will turn you into a great Awk programmer.

Eric Pement's Awk one-liner collection consists of five sections:

The first part of the article will explain the first two sections: "File spacing" and "Numbering and calculations." The second part will explain "Text conversion and substitution", and the last part "Selective printing/deleting of certain lines."

I recommend that you print out my Awk cheat sheet before you proceed. This way you will have the language reference in front of you, and you will memorize things better.

These one-liners work with all versions of awk, such as nawk (AT&T's new awk), gawk (GNU's awk), mawk (Michael Brennan's awk) and oawk (old awk).

Awesome news: I have written an e-book based on this article series. Check it out:

Let's start!

## 1. Line Spacing

1. Double-space a file.

`awk '1; { print "" }'`

So how does it work? A one-liner is an Awk program and every Awk program consists of a sequence of pattern-action statements "pattern { action statements }". In this case there are two statements "1" and "{ print "" }". In a pattern-action statement either the pattern or the action may be missing. If the pattern is missing, the action is applied to every single line of input. A missing action is equivalent to '{ print }'. Thus, this one-liner translates to:

`awk '1 { print } { print "" }'`

An action is applied only if the pattern matches, i.e., pattern is true. Since '1' is always true, this one-liner translates further into two print statements:

`awk '{ print } { print "" }'`

Every print statement in Awk is silently followed by an ORS - Output Record Separator variable, which is a newline by default. The first print statement with no arguments is equivalent to "print \$0", where \$0 is a variable holding the entire line. The second print statement prints nothing, but knowing that each print statement is followed by ORS, it actually prints a newline. So there we have it, each line gets double-spaced.

2. Another way to double-space a file.

`awk 'BEGIN { ORS="\n\n" }; 1'`

BEGIN is a special kind of pattern which is not tested against the input. It is executed before any input is read. This one-liner double-spaces the file by setting the ORS variable to two newlines. As I mentioned previously, statement "1" gets translated to "{ print }", and every print statement gets terminated with the value of ORS variable.

3. Double-space a file so that no more than one blank line appears between lines of text.

`awk 'NF { print \$0 "\n" }'`

The one-liner uses another special variable called NF - Number of Fields. It contains the number of fields the current line was split into. For example, a line "this is a test" splits in four pieces and NF gets set to 4. The empty line "" does not split into any pieces and NF gets set to 0. Using NF as a pattern can effectively filter out empty lines. This one liner says: "If there are any number of fields, print the whole line followed by newline."

4. Triple-space a file.

`awk '1; { print "\n" }'`

This one-liner is very similar to previous ones. '1' gets translated into '{ print }' and the resulting Awk program is:

`awk '{ print; print "\n" }'`

It prints the line, then prints a newline followed by terminating ORS, which is newline by default.

## 2. Numbering and Calculations

5. Number lines in each file separately.

`awk '{ print FNR "\t" \$0 }'`

This Awk program appends the FNR - File Line Number predefined variable and a tab (\t) before each line. FNR variable contains the current line for each file separately. For example, if this one-liner was called on two files, one containing 10 lines, and the other 12, it would number lines in the first file from 1 to 10, and then resume numbering from one for the second file and number lines in this file from 1 to 12. FNR gets reset from file to file.

6. Number lines for all files together.

`awk '{ print NR "\t" \$0 }'`

This one works the same as #5 except that it uses NR - Line Number variable, which does not get reset from file to file. It counts the input lines seen so far. For example, if it was called on the same two files with 10 and 12 lines, it would number the lines from 1 to 22 (10 + 12).

7. Number lines in a fancy manner.

`awk '{ printf("%5d : %s\n", NR, \$0) }'`

This one-liner uses printf() function to number lines in a custom format. It takes format parameter just like a regular printf() function. Note that ORS does not get appended at the end of printf(), so we have to print the newline (\n) character explicitly. This one right-aligns line numbers, followed by a space and a colon, and the line.

8. Number only non-blank lines in files.

`awk 'NF { \$0=++a " :" \$0 }; { print }'`

Awk variables are dynamic; they come into existence when they are first used. This one-liner pre-increments variable 'a' each time the line is non-empty, then it appends the value of this variable to the beginning of line and prints it out.

9. Count lines in files (emulates wc -l).

`awk 'END { print NR }'`

END is another special kind of pattern which is not tested against the input. It is executed when all the input has been exhausted. This one-liner outputs the value of NR special variable after all the input has been consumed. NR contains total number of lines seen (= number of lines in the file).

10. Print the sum of fields in every line.

`awk '{ s = 0; for (i = 1; i <= NF; i++) s = s+\$i; print s }'`

Awk has some features of C language, like the for (;;) { ... } loop. This one-liner loops over all fields in a line (there are NF fields in a line), and adds the result in variable 's'. Then it prints the result out and proceeds to the next line.

11. Print the sum of fields in all lines.

`awk '{ for (i = 1; i <= NF; i++) s = s+\$i }; END { print s+0 }'`

This one-liner is basically the same as #10, except that it prints the sum of all fields. Notice how it did not initialize variable 's' to 0. It was not necessary as variables come into existence dynamically. Also notice how it calls "print s+0" and not just "print s". It is necessary if there are no fields. If there are no fields, "s" never comes into existence and is undefined. Printing an undefined value does not print anything (i.e. prints just the ORS). Adding a 0 does a mathematical operation and undef+0 = 0, so it prints "0".

12. Replace every field by its absolute value.

`awk '{ for (i = 1; i <= NF; i++) if (\$i < 0) \$i = -\$i; print }'`

This one-liner uses two other features of C language, namely the if (...) { ... } statement and omission of curly braces. It loops over all fields in a line and checks if any of the fields is less than 0. If any of the fields is less than 0, then it just negates the field to make it positive. Fields can be addresses indirectly by a variable. For example, i = 5; \$i = 'hello', sets field number 5 to string 'hello'.

Here is the same one-liner rewritten with curly braces for clarity. The 'print' statement gets executed after all the fields in the line have been replaced by their absolute values.

```awk '{
for (i = 1; i <= NF; i++) {
if (\$i < 0) {
\$i = -\$i;
}
}
print
}'
```

13. Count the total number of fields (words) in a file.

`awk '{ total = total + NF }; END { print total+0 }'`

This one-liner matches all the lines and keeps adding the number of fields in each line. The number of fields seen so far is kept in a variable named 'total'. Once the input has been processed, special pattern 'END { ... }' is executed, which prints the total number of fields. See 11th one-liner for explanation of why we "print total+0" in the END block.

14. Print the total number of lines containing word "Beth".

`awk '/Beth/ { n++ }; END { print n+0 }'`

This one-liner has two pattern-action statements. The first one is '/Beth/ { n++ }'. A pattern between two slashes is a regular expression. It matches all lines containing pattern "Beth" (not necessarily the word "Beth", it could as well be "Bethe" or "theBeth333"). When a line matches, variable 'n' gets incremented by one. The second pattern-action statement is 'END { print n+0 }'. It is executed when the file has been processed. Note the '+0' in 'print n+0' statement. It forces '0' to be printed in case there were no matches ('n' was undefined). Had we not put '+0' there, an empty line would have been printed.

15. Find the line containing the largest (numeric) first field.

`awk '\$1 > max { max=\$1; maxline=\$0 }; END { print max, maxline }'`

This one-liner keeps track of the largest number in the first field (in variable 'max') and the corresponding line (in variable 'maxline'). Once it has looped over all lines, it prints them out. Warning: this one-liner does not work if all the values are negative.

Here is the fix:

`awk 'NR == 1 { max = \$1; maxline = \$0; next; } \$1 > max { max=\$1; maxline=\$0 }; END { print max, maxline }'`

16. Print the number of fields in each line, followed by the line.

`awk '{ print NF ":" \$0 } '`

This one-liner just prints out the predefined variable NF - Number of Fields, which contains the number of fields in the line, followed by a colon and the line itself.

17. Print the last field of each line.

`awk '{ print \$NF }'`

Fields in Awk need not be referenced by constants. For example, code like 'f = 3; print \$f' would print out the 3rd field. This one-liner prints the field with the value of NF. \$NF is last field in the line.

18. Print the last field of the last line.

`awk '{ field = \$NF }; END { print field }'`

This one-liner keeps track of the last field in variable 'field'. Once it has looped all the lines, variable 'field' contains the last field of the last line, and it just prints it out.

Here is a better version of the same one-liner. It's more common, idiomatic and efficient:

`awk 'END { print \$NF }'`

19. Print every line with more than 4 fields.

`awk 'NF > 4'`

This one-liner omits the action statement. As I noted in one-liner #1, a missing action statement is equivalent to '{ print }'.

20. Print every line where the value of the last field is greater than 4.

`awk '\$NF > 4'`

This one-liner is similar to #17. It references the last field by NF variable. If it's greater than 4, it prints it out.

## Awk one-liners explained e-book

I have written my first e-book called "Awk One-Liners Explained". I improved the explanations of the one-liners in this article series, added new one-liners and added three new chapters - introduction to awk one-liners, summary of awk special variables and idiomatic awk. Please take a look:

## Have fun!

That's it for Part I one the article. The second part will be on "Text conversion and substitution."

Have fun learning Awk! It's a fun language to know. :)

<- previous article next article ->

Nice post. In the comment for item 17, shouldn't it say 'f = 3; print \$f' instead of 'f = 3; print \$3' ?

Amazing! Scientists jsut never cease to amaze me!

Hehe,

Awesome, if the list was a bit longer it would be a must have for me. Can you extend it a bit? :)

Cheers,

Interesting list.

Here's a useful awk one-liner to kill a hanging Firefox process, including all its parents that are part of Firefox (but not the shell or any "higher" ancestors). This is a real-life example which I wrote and used recently:

UNIX one-liner to kill a hanging Firefox process

- Vasudev

Thanks ! many are exactly what I been lokking for:

Question re #6
So say if I want a new file from the concatenate of
foo.log0,
foo.log1,
foo.log2 (increaing date)

and number all lines of all log files together,

This is the right command ?

awk '{ print NR "\t" \$0 }' foo.log* > foo.txt

or should I do:

awk '{ print NR "\t" \$0 }' foo.log0 foo.log1 foo.log2 > foo.txt

thanks

#10 correction.

replace

`s=s+\$i`

with

`s=s+i`

ditto #11

Awesome list Peter! These examples are great!

Miguel, thanks, i fixed that mistake.

Eduard, this is a 3 part article. This is just part one. Parts two and three coming soon.

Goofy_barny, no, it's "s=s+\$i". It sums the values in each field.

Go_Obama, I also support Obama :) but for your question, do the awk '{ print NR "\t" \$0 }' foo.log0 foo.log1 foo.log2 > foo.txt. The other example will put foo.log11 before foo.log2.

Nice explanation of all the awk one liners, thanks peter.

Hello, I have been playing with AWK a little and have this line.

`df -h | grep / | grep % | awk '\$(NF-1) >= "20%" {print \$NF,": ", \$(NF-1)}'`

Everything works great except I have a volume with 3% usage. AWK in my usage of it only appears to evalute the first number of the “20%”.

I have looked around the net some for a resolution to this and read some documentation. I know that it is something simple I am overlooking, but you folks look like you know what you are doing and are active so I ask; What usage of AWK should I implement to make 1 !> 10.

–Sydney

Sydney, thanks for your question.

I see a couple of mistakes here. First of all you are using grep twice! Awk can do what grep does itself with the /.../ regex pattern matching.

Here is what I came up with (works in GNU Awk only!):

```df -h | awk '/dev/ { if (strtonum(\$5) >= 20) { print \$NF ": " \$(NF-1) } }'
```

And this one works in all Awk's:

```df -h | awk '/dev/ {
if (match(\$5, /^[0-9]+/)) {
usage = substr(\$5, RSTART, RLENGTH)
if (usage >= 20) {
print \$NF ": " \$(NF-1)
}
}
}'
```

Hope it helps.

If you add 0 to a field it's interpreted as a number. So

df -h | awk '/dev/ { if (strtonum(\$5) >= 20) { print \$NF ": " \$(NF-1) } }'

can be made to work in all awk's by

df -h | awk '/dev/ { if ((\$5 + 0) >= 20) { print \$NF ": " \$(NF-1) } }'

Regards,
Lee

Thanks so much Peter,

I am only ever using GNU AWK on Redhat or Ubuntu maybe Debian rarely. So I mutated the first into:

`df -h | awk '/\// { if (strtonum(\$(NF-1)) >= 20) { print \$NF ": " \$(NF-1) } }'`

I am guessing that strtonum function that you suggested takes it out of the string world. I was trying to compare a string to a number.

The RegEx /\// found all mounts.

df output is not constant in Red Hat with long mount names

`/shrug`

Who knew? So I changed the code to work backwards from the last field.

Thanks a bunch this was an interesting experiment for me that I may turn around and tweak for production.

Hi,

Please let me know the command to list files having specific number of rows. I want to list all the files having only 3 rows only.

Regards,

You're welcome, Sydney.

Nice clearly written examples. I've bookmarked it for future reference :)

Thanks!

Great info, I am familiar with Awk but have only used it for very simple field delimiting. I can't wait to read the rest of this series and also your series on Sed.

Thank you!
but where is the second part? I can't find it.

Hi. I've been trying to understand example 8

awk 'NF { \$0=++a " :" \$0 }; { print }'

but I can't, can anybody explain a little bit more?

Peter, amzing site, great articles. Congrats

Hey Augusto. Now that I look at one-liner #8, it looks pretty ridiculous. But I didn't write it.

If I wrote it, here is how it would look:

```awk '{ print ++a, \$0 }'
```

At every line increment variable a, and output it together with the line itself.

The explanation of the original one liner is this:

Every line gets read in variable \$0. The one-liner modifies this \$0. It appends the contents of variable 'a' to the beginning of \$0. But before appending 'a' it gets incremented by one by ++ unary operator.

Hi
if code goes like as awk '{ print ++a, \$0 }', line numbering included blank lined. But code goes awk 'NF {\$0=++a ": " \$0}; {print}' that will be add line number to non-blank line.

NF {\$0=++a ": " \$0} = There must have one field at lease to increase number to var "a" else no action or not take as record for blank line, same time \$0 (whole NR ) will become as [var a : ORG_LINE] then print again with all (whole line which newly create include var a value) by print statement after ; .

Correction : ---
if code goes like as awk '{ print ++a, \$0 }', line numbering included blank lined. But code goes awk 'NF {\$0=++a ": " \$0}; {print}' that will be add line number only to non-blank line. Blank will not get line number.

NF {\$0=++a ": " \$0} = There must have one field at lease to increase number to var "a" else no action or not take as record for blank line, same time \$0 (whole NR ) will become as [var a : ORG_LINE] then print again with all (whole line which newly create include var a value) by print statement after ; .

For Sydney 10/8/2008, who writes:
AWK in my usage of it only appears to evaluate the first number of the “20%”. so I ask; What usage of AWK should I implement to make 1 compare lower than 10.

Unix awk does not have strtonum but you don't need it, nor substr. The problem is that variables are treated both as string and numeric by the comparison operators. Trick is to typecast to numeric using (0 + expression), or to string using ("" expression), to force the right comparison. For a variable \$3 like 17%, use x = 0 + \$3 to assign the 17 part to x.

Your explanation to #17 is needed in #10-#12. I couldn't figure out how the \$i value was assigned.

Hi Wondering If I could have some help - sorry for double post

I have a column of numbers sorted in ascending order. I am trying to remove the last 5% of the records then count avg sum etc

The total records are : 99183
I only want to sum the first : 94222 (discarding the outliers 5%)

Where the value of 94222 is in the command line is where I want to use the variable NFIVE -

but if I put variable in awk counts all the records and sums no records.

Desired output:

```
cat <file> |tr -s '=' ' '|sort -k5n | awk '{NFIVE=NR*.95}; {if (NR<94222) TOTAL+=\$5} END{printf("COUNT:%d, TOTAL:%d,MEAN:%d\n",NFIVE,TOTAL,TOTAL/NFIVE)}'
```

OUTPUT: (and correct values)
COUNT:94222, TOTAL:19079403, MEAN:202

Incorrect values I get if using NFIVE
EG:

```cat <FILE> |tr -s '=' ' '|sort -k5n | awk '{NFIVE=NR*.95}; {if (NR<NFIVE) TOTAL+=\$5} END{printf("COUNT:%d, TOTAL:%d, MEAN:%d\n",NFIVE,TOTAL,TOTAL/NFIVE)}'
```

COUNT:94222, TOTAL:0, MEAN:0

Thanks for any assitance

How do you print in awk w/out new line char?
e.g. : chkconfig --list | awk '{print \$1}'
service1
service2
service3

service1, service2, service3

chkconfig --list | awk 'BEGIN { ORS=" ,"}; {print \$1}'

Awesome series!
I recently started playing with sed and awk and this series helped me a lot. Thanks a bunch Peter :)

simple awk trick fr Triple spacing a file

awk '{ORS="\n\n\n"}1'

Here is a code to unlit bird-style programs:

`awk 'sub(/^>/," ")||(\$0=" ")'`

Great articles!
However I tried this in my workspace:

`find . -name *.java|xargs awk 'END {print NR}'`

hoping to get the total line number of all java files...

The output I got was:

```334810
290871
272952
243138
247081
```

I don't know why it got multiple numbers

and when I tried wc -l, the total line number was:246911...
any idea?

Hi danial use this it will give u correct count

find . -name *.java| awk 'END {print NR}'

when you are using xargs it is taking the count of all content present in side a particular file, that's why it is giving the count of content present in files

Hah! Nice stuff, this awk. Thx to them examples, here's what i came up with to monitor what frequency my quad-core CPU currently runs at: cpu-freq-average.sh (lol)
> #!/bin/bash
> awk '{sum+=\$1}; END {printf "∅%.2fGHz\n", sum/NR/10**6}' \
> <(cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq)
integrates nicely into a tmux status bar - adn works for any number of CPUs :D

oha, code block0rz^^ sorry

```#!/bin/bash
awk '{sum+=\$1}; END {printf "∅%.2fGHz\n", sum/NR/10**6}' \
<(cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq)```

Hi,

Thought you might be interested in seeing my expanded version of your 'greater than' example #20 which I'm using for practical work purposes:

Print every record from an invoice where the value in field 7 (date pre-formatted by sed as YYYYmmdd) falls on or after April 1, 2011 and is before October 1, 2011.

awk '\$7 >= 20110401 && \$7 < 20111001'

Cool, I did not realize you could do this with awk! Your example has saved me looping through thousands of sales records and comparing date stamps against the range I needed!

John

Hi guys, someone can help me?
I had to write an awk script that count the number of single a in a file, and counts the number of lines where there are that a...

I wrote only this that give me the total number of a, but I don't know how to find total number of line containing this character:
awk '{for(i=0;i<NF;i++) if (\$i=="a") n++} {print "Tot # of a is "n}' filename

I'm looking for a way to output FIND in such a way to print a file with fields: "date time size

<tab>

file_path". I'm new to linux and my 6TB RAID is still in Windows NTFS. Filenames have spaces and classical music files can have very long names.

This is what I have so far, but awk prints the output of find on 2 lines, instead of 1.

find /mnt/Drive-D/ -type f -exec ls -ld --time-style=long-iso {} \; | awk '{print \$6,\$7,\$5,"\t",\$1=\$2=\$3=\$4=\$5=\$6=\$7=""; print \$0}'

thanks

Writing with style and getting good compliments on the article is quite hard, to be honest.

Hi,

awk '!arr[\$2]++' file.txt
I know that the above command will Get unique entries in file.txt based on column 2 (takes only the first instance). but want to understand the logic behind it. Please help

In addition, it will be possible for you to expand your gold mines and Elixir collectors to level 12 from level 8 Town Hall. Walls, however, it will only add value at level 9 with elixir. The drill for Dark Elixir, however, will be able to build their earlier: The first drill is you from Town Hall stage 7, and for the second stage follows in. 8
http://www.pre-hackedgames.net/clash-of-clans-hack/

How to get the 'right' header printed out when using df | awk as below?

df -k /db/test01 | awk '{printf "%-35s %-10s %-10s %-10s %10s %-s\n",\$1,\$2/1024/1024,\$3/1024/1024,\$4/1024/1024,\$5,\$6}'

At the moment, using the command below, the heading for field 2,3,4 are printed as zeroes. I cannot run df -k as I am on a very old Solaris server :(-

Any advice much appreciated. Thanks in advance.

Regarding Point 11,

awk '{ for (i = 1; i <= NF; i++) s = s+\$i }; END { print s+0 }'

It works the same for me as

awk '{ for (i = 1; i <= NF; i++) s = s+\$i }; END { print s }'

But if I understand your explanation...I thought I should get only ORS.

-------------------
Also notice how it calls "print s+0" and not just "print s". It is necessary if there are no fields. If there are no fields, "s" never comes into existence and is undefined. Printing an undefined value does not print anything (i.e. prints just the ORS). Adding a 0 does a mathematical operation and undef+0 = 0, so it prints "0".
---------------------

Dear
i have 2 files with "|" delimiter try to join 2 file with command "join -t"|" -1 1 -2 1 1.txt 2.txt" but out file just give me rows from (1-9) and (90-99) and (990-999) i want to know why

good blog

This is the best blog on AWK that I ever read. Good job!!

I would like to add another good resource for AWK programming
AWK command
It includes many useful examples.

### Leave a new comment

(Your twitter handle, if you have one.)

Type the word "0day_92": (just to make sure you're a human)

Please preview the comment before submitting to make sure it's OK.