This article is part of the article series "Awk One-Liners Explained."
<- previous article next article ->
awk programming one-liners explained

This is the second part of a three-part article on the famous Awk one-liners. This part will explain Awk one-liners for text conversion and substitution. See part one for introduction of the series.

"What are these famous Awk one-liners?", you might wonder? Well, they are concise and beautiful Awk programs that span no more than 70 characters (less than one terminal line). They were written by Eric Pement and are floating around on the Internet as 'awk1line.txt' file.

If you are intrigued by this article series, I suggest that you subscribe to my posts!

Eric Pement's Awk one-liner collection consists of five sections:

I recommend that you print out my Awk Cheat Sheet before you proceed. This way you will have the language reference in front of you, and you will memorize it better.

Awesome news: I have written an e-book based on this article series. Check it out:

Grab the local copy of Awk one-liners file here awk1line.txt and let's roll.

3. Text Conversion and Substitution

21. Convert Windows/DOS newlines (CRLF) to Unix newlines (LF) from Unix.

awk '{ sub(/\r$/,""); print }'

This one-liner uses the sub(regex, repl, [string]) function. This function substitutes the first instance of regular expression "regex" in string "string" with the string "repl". If "string" is omitted, variable $0 is used. Variable $0, as I explained in the first part of the article, contains the entire line.

The one-liner replaces '\r' (CR) character at the end of the line with nothing, i.e., erases CR at the end. Print statement prints out the line and appends ORS variable, which is '\n' by default. Thus, a line ending with CRLF has been converted to a line ending with LF.

22. Convert Unix newlines (LF) to Windows/DOS newlines (CRLF) from Unix.

awk '{ sub(/$/,"\r"); print }'

This one-liner also uses the sub() function. This time it replaces the zero-width anchor '$' at the end of the line with a '\r' (CR char). This substitution actually adds a CR character to the end of the line. After doing that Awk prints out the line and appends the ORS, making the line terminate with CRLF.

23. Convert Unix newlines (LF) to Windows/DOS newlines (CRLF) from Windows/DOS.

awk 1

This one-liner may work, or it may not. It depends on the implementation. If the implementation catches the Unix newlines in the file, then it will read the file line by line correctly and output the lines terminated with CRLF. If it does not understand Unix LF's in the file, then it will print the whole file and terminate it with CRLF (single windows newline at the end of the whole file).

Ps. Statement '1' (or anything that evaluates to true) in Awk is syntactic sugar for '{ print }'.

24. Convert Windows/DOS newlines (CRLF) to Unix newlines (LF) from Windows/DOS

gawk -v BINMODE="w" '1'

Theoretically this one-liner should convert CRLFs to LFs on DOS. There is a note in GNU Awk documentation that says: "Under DOS, gawk (and many other text programs) silently translates end-of-line "\r\n" to "\n" on input and "\n" to "\r\n" on output. A special "BINMODE" variable allows control over these translations and is interpreted as follows: ... If "BINMODE" is "w", then binary mode is set on write (i.e., no translations on writes)."

My tests revealed that no translation was done, so you can't rely on this BINMODE hack.

Eric suggests to better use the "tr" utility to convert CRLFs to LFs on Windows:

tr -d \r

The 'tr' program is used for translating one set of characters to another. Specifying -d option makes it delete all characters and not do any translation. In this case it's the '\r' (CR) character that gets erased from the input. Thus, CRLFs become just LFs.

25. Delete leading whitespace (spaces and tabs) from the beginning of each line (ltrim).

awk '{ sub(/^[ \t]+/, ""); print }'

This one-liner also uses sub() function. What it does is replace regular expression "^[ \t]+" with nothing "". The regular expression "^[ \t]+" means - match one or more space " " or a tab "\t" at the beginning "^" of the string.

26. Delete trailing whitespace (spaces and tabs) from the end of each line (rtrim).

awk '{ sub(/[ \t]+$/, ""); print }'

This one-liner is very similar to the previous one. It replaces regular expression "[ \t]+$" with nothing. The regular expression "[ \t]+$" means - match one or more space " " or a tab "\t" at the end "$" of the string. The "+" means "one or more".

27. Delete both leading and trailing whitespaces from each line (trim).

awk '{ gsub(/^[ \t]+|[ \t]+$/, ""); print }'

This one-liner uses a new function called "gsub". Gsub() does the same as sub(), except it performs as many substitutions as possible (that is, it's a global sub()). For example, given a variable f = "foo", sub("o", "x", f) would replace just one "o" in variable f with "x", making f be "fxo"; but gsub("o", "x", f) would replace both "o"s in "foo" resulting "fxx".

The one-liner combines both previous one-liners - it replaces leading whitespace "^[ \t]+" and trailing whitespace "[ \t]+$" with nothing, thus trimming the string.

To remove whitespace between fields you may use this one-liner:

awk '{ $1=$1; print }'

This is a pretty tricky one-liner. It seems to do nothing, right? Assign $1 to $1. But no, when you change a field, Awk rebuilds the $0 variable. It takes all the fields and concats them, separated by OFS (single space by default). All the whitespace between fields is gone.

28. Insert 5 blank spaces at beginning of each line.

awk '{ sub(/^/, "     "); print }'

This one-liner substitutes the zero-length beginning of line anchor "^" with five empty spaces. As the anchor is zero-length and matches the beginning of line, the five whitespace characters get appended to beginning of the line.

29. Align all text flush right on a 79-column width.

awk '{ printf "%79s\n", $0 }' 

This one-liner asks printf() to print the string in $0 variable and left pad it with spaces until the total length is 79 chars.

Please see the documentation of printf function for more information and examples.

30. Center all text on a 79-character width.

awk '{ l=length(); s=int((79-l)/2); printf "%"(s+l)"s\n", $0 }'

First this one-liner calculates the length() of the line and puts the result in variable "l". Length(var) function returns the string length of var. If the variable is not specified, it returns the length of the entire line (variable $0). Next it calculates how many white space characters to pad the line with and stores the result in variable "s". Finally it printf()s the line with appropriate number of whitespace chars.

For example, when printing a string "foo", it first calculates the length of "foo" which is 3. Next it calculates the column "foo" should appear which (79-3)/2 = 38. Finally it printf("%41", "foo"). Printf() function outputs 38 spaces and then "foo", making that string centered (38*2 + 3 = 79)

31. Substitute (find and replace) "foo" with "bar" on each line.

awk '{ sub(/foo/,"bar"); print }'

This one-liner is very similar to the others we have seen before. It uses the sub() function to replace "foo" with "bar". Please note that it replaces just the first match. To replace all "foo"s with "bar"s use the gsub() function:

awk '{ gsub(/foo/,"bar"); print }'

Another way is to use the gensub() function:

gawk '{ $0 = gensub(/foo/,"bar",4); print }'

This one-liner replaces only the 4th match of "foo" with "bar". It uses a never before seen gensub() function. The prototype of this function is gensub(regex, s, h[, t]). It searches the string "t" for "regex" and replaces "h"-th match with "s". If "t" is not given, $0 is assumed. Unlike sub() and gsub() it returns the modified string "t" (sub and gsub modified the string in-place).

Gensub() is a non-standard function and requires GNU Awk or Awk included in NetBSD.

In this one-liner regex = "/foo/", s = "bar", h = 4, and t = $0. It replaces the 4th instance of "foo" with "bar" and assigns the new string back to the whole line $0.

32. Substitute "foo" with "bar" only on lines that contain "baz".

awk '/baz/ { gsub(/foo/, "bar") }; { print }'

As I explained in the first one-liner in the first part of the article, every Awk program consists of a sequence of pattern-action statements "pattern { action statements }". Action statements are applied only to lines that match pattern.

In this one-liner the pattern is a regular expression /baz/. If line contains "baz", the action statement gsub(/foo/, "bar") is executed. And as we have learned, it substitutes all instances of "foo" with "bar". If you want to substitute just one, use the sub() function!

33. Substitute "foo" with "bar" only on lines that do not contain "baz".

awk '!/baz/ { gsub(/foo/, "bar") }; { print }'

This one-liner negates the pattern /baz/. It works exactly the same way as the previous one, except it operates on lines that do not contain match this pattern.

34. Change "scarlet" or "ruby" or "puce" to "red".

awk '{ gsub(/scarlet|ruby|puce/, "red"); print}'

This one-liner makes use of extended regular expression alternation operator | (pipe). The regular expression /scarlet|ruby|puce/ says: match "scarlet" or "ruby" or "puce". If the line matches, gsub() replaces all the matches with "red".

35. Reverse order of lines (emulate "tac").

awk '{ a[i++] = $0 } END { for (j=i-1; j>=0;) print a[j--] }'

This is the trickiest one-liner today. It starts by recording all the lines in the array "a". For example, if the input to this program was three lines "foo", "bar", and "baz", then the array "a" would contain the following values: a[0] = "foo", a[1] = "bar", and a[2] = "baz".

When the program has finished processing all lines, Awk executes the END { } block. The END block loops over the elements in the array "a" and prints the recorded lines. In our example with "foo", "bar", "baz" the END block does the following:

for (j = 2; j >= 0; ) print a[j--]

First it prints out j[2], then j[1] and then j[0]. The output is three separate lines "baz", "bar" and "foo". As you can see the input was reversed.

36. Join a line ending with a backslash with the next line.

awk '/\\$/ { sub(/\\$/,""); getline t; print $0 t; next }; 1'

This one-liner uses regular expression "/\\$/" to look for lines ending with a backslash. If the line ends with a backslash, the backslash gets removed by sub(/\\$/,"") function. Then the "getline t" function is executed. "Getline t" reads the next line from input and stores it in variable t. "Print $0 t" statement prints the original line (but with trailing backslash removed) and the newly read line (which was stored in variable t). Awk then continues with the next line. If the line does not end with a backslash, Awk just prints it out with "1".

Unfortunately this one liner fails to join more than 2 lines (this is left as an exercise to the reader to come up with a one-liner that joins arbitrary number of lines that end with backslash :)).

37. Print and sort the login names of all users.

awk -F ":" '{ print $1 | "sort" }' /etc/passwd

This is the first time we see the -F argument passed to Awk. This argument specifies a character, a string or a regular expression that will be used to split the line into fields ($1, $2, ...). For example, if the line is "foo-bar-baz" and -F is "-", then the line will be split into three fields: $1 = "foo", $2 = "bar" and $3 = "baz". If -F is not set to anything, the line will contain just one field $1 = "foo-bar-baz".

Specifying -F is the same as setting the FS (Field Separator) variable in the BEGIN block of Awk program:

awk -F ":"
# is the same as
awk 'BEGIN { FS=":" }'

/etc/passwd is a text file, that contains a list of the system's accounts, along with some useful information like login name, user ID, group ID, home directory, shell, etc. The entries in the file are separated by a colon ":".

Here is an example of a line from /etc/passwd file:

pkrumins:x:1000:100:Peteris Krumins:/home/pkrumins:/bin/bash

If we split this line on ":", the first field is the username (pkrumins in this example). The one-liner does just that - it splits the line on ":", then forks the "sort" program and feeds it all the usernames, one by one. After Awk has finished processing the input, sort program sorts the usernames and outputs them.

38. Print the first two fields in reverse order on each line.

awk '{ print $2, $1 }' file

This one liner is obvious. It reverses the order of fields $1 and $2. For example, if the input line is "foo bar", then after running this program the output will be "bar foo".

39. Swap first field with second on every line.

awk '{ temp = $1; $1 = $2; $2 = temp; print }'

This one-liner uses a temporary variable called "temp". It assigns the first field $1 to "temp", then it assigns the second field to the first field and finally it assigns "temp" to $2. This procedure swaps the first two fields on every line. For example, if the input is "foo bar baz", then the output will be "bar foo baz".

Ps. This one-liner was incorrect in Eric's awk1line.txt file. "Print" was missing.

40. Delete the second field on each line.

awk '{ $2 = ""; print }'

This one liner just assigns empty string to the second field. It's gone.

41. Print the fields in reverse order on every line.

awk '{ for (i=NF; i>0; i--) printf("%s ", $i); printf ("\n") }'

We saw the "NF" variable that stands for Number of Fields in the part one of this article. After processing each line, Awk sets the NF variable to number of fields found on that line.

This one-liner loops in reverse order starting from NF to 1 and outputs the fields one by one. It starts with field $NF, then $(NF-1), ..., $1. After that it prints a newline character.

42. Remove duplicate, consecutive lines (emulate "uniq")

awk 'a !~ $0; { a = $0 }'

Variables in Awk don't need to be initialized or declared before they are being used. They come into existence the first time they are used. This one-liner uses variable "a" to keep the last line seen "{ a = $0 }". Upon reading the next line, it compares if the previous line (in variable "a") is not the same as the current one "a !~ $0". If it is not the same, the expression evaluates to 1 (true), and as I explained earlier, any true expression is the same as "{ print }", so the line gets printed out. Then the program saves the current line in variable "a" again and the same process continues over and over again.

This one-liner is actually incorrect. It uses a regular expression matching operator "!~". If the previous line was something like "fooz" and the new one is "foo", then it won't get output, even though they are not duplicate lines.

Here is the correct, fixed, one-liner:

awk 'a != $0; { a = $0 }'

It compares lines line-wise and not as a regular expression.

43. Remove duplicate, nonconsecutive lines.

awk '!a[$0]++'

This one-liner is very idiomatic. It registers the lines seen in the associative-array "a" (arrays are always associative in Awk) and at the same time tests if it had seen the line before. If it had seen the line before, then a[line] > 0 and !a[line] == 0. Any expression that evaluates to false is a no-op, and any expression that evals to true is equal to "{ print }".

For example, suppose the input is:

foo
bar
foo
baz

When Awk sees the first "foo", it evaluates the expression "!a["foo"]++". "a["foo"]" is false, but "!a["foo"]" is true - Awk prints out "foo". Then it increments "a["foo"]" by one with "++" post-increment operator. Array "a" now contains one value "a["foo"] == 1".

Next Awk sees "bar", it does exactly the same what it did to "foo" and prints out "bar". Array "a" now contains two values "a["foo"] == 1" and "a["bar"] == 1".

Now Awk sees the second "foo". This time "a["foo"]" is true, "!a["foo"]" is false and Awk does not print anything! Array "a" still contains two values "a["foo"] == 2" and "a["bar"] == 1".

Finally Awk sees "baz" and prints it out because "!a["baz"]" is true. Array "a" now contains three values "a["foo"] == 2" and "a["bar"] == 1" and "a["baz"] == 1".

The output:

foo
bar
baz

Here is another one-liner to do the same. Eric in his one-liners says it's the most efficient way to do it.

awk '!($0 in a) { a[$0]; print }'

It's basically the same as previous one, except that it uses the 'in' operator. Given an array "a", an expression "foo in a" tests if variable "foo" is in "a".

Note that an empty statement "a[$0]" creates an element in the array.

44. Concatenate every 5 lines of input with a comma.

awk 'ORS=NR%5?",":"\n"'

We saw the ORS variable in part one of the article. This variable gets appended after every line that gets output. In this one-liner it gets changed on every 5th line from a comma to a newline. For lines 1, 2, 3, 4 it's a comma, for line 5 it's a newline, for lines 6, 7, 8, 9 it's a comma, for line 10 a newline, etc.

Awk one-liners explained e-book

I have written my first e-book called "Awk One-Liners Explained". I improved the explanations of the one-liners in this article series, added new one-liners and added three new chapters - introduction to awk one-liners, summary of awk special variables and idiomatic awk. Please take a look:

Have Fun!

Have fun with these one-liners. I hope you learned something new.

If you liked this article, you may also like a very similar article on Famous Sed One-Liners Explained.

Ps. If you notice anything that you can't understand, please let me know in the comments. Thanks!

This article is part of the article series "Awk One-Liners Explained."
<- previous article next article ->

Comments

Jimmy Dean Permalink
December 13, 2008, 16:22

Nice, goodone dude, real good.

Roman Permalink
December 14, 2008, 14:52

Regarding 'printf "\n"' — it's still correct to use it in Windows. The LF to CRLF conversion happens in the C runtime, outside awk.

December 14, 2008, 14:59

Oooh! You are absolutely correct, Roman! I did not think about C runtime... Thanks! :)

Fixed in article.

Gunwant Permalink
December 17, 2008, 14:09

thanks a lot.

this makes lot of things clear

February 09, 2009, 12:10

Hi,
I didnt understand #43 where u say "If it had seen the line before, then a[line] > 0 and !a[line] == 0. Any expression that evaluates to false is a no-op, and any expression that evals to true is equal to “{ print }”. "
Should it be the opposite? Can u explain it in other words?

One more thing in #41 if we have multiple tabs then they will all go vanished. right? we will only get the fields(words)

-yogesh

Peter Passchier Permalink
February 22, 2009, 11:23

One-liner 42 is not correct if the first line of a file is empty, because it will match the initial value of a. I came up with:
awk 'BEGIN {srand();a=rand()} a!=$0 {a=$0;print}'

(I didn't understand what the semicolon after a!=$0 does or means, so I removed it, and added a print action whenever the line isn't equal to the previous.)

Paul Permalink
July 29, 2009, 16:07

Ref #36. Join a line ending with a backslash with the next line. This joins an arbitrary number of such lines, but breaks (Solaris nawk) at 6144 chars in any merged line.

This works because sub() returns the number of changes it made. No matches is not an error.

awk '{ while (sub (/\\$/, "")) { getline t; $0 = $0 t; } print; }'

Paul Permalink
July 29, 2009, 16:25

Reply to Peter Passchier:
February 22nd, 2009 at 11:23 am
Peter came up with:
awk ‘BEGIN {srand();a=rand()} a!=$0 {a=$0;print}’

No need to use a random value (insecure too - Sun awk only uses 32768 distinct random values). Best initialiser for a variable is "\n", because that can never be in your input but you can stuff it in a string.

Alternative tool to deal with initial conditions is to factor in a test for first line: NR == 1

(I didn’t understand what the semicolon after a!=$0 does or means, so I removed it, and added a print action whenever the line isn’t equal to the previous.)

The ; is a trick. It makes the one-liner into two lines (!). Before the ";" is a match without an action, so it prints if true. After the ";" is an action without a match, so it always stores the value.

banhao Permalink
January 08, 2010, 07:25

I used the command:

awk '{sub(/piobe:/,":piobe:")};{print}' /etc/inittab

but the command will not change the /etc/inittab,it only display the result on the screen.
how to use the awk to change the file directory?
thanks!

January 08, 2010, 16:16

banhao, you can't change the file directly. Do something like

$ awk ... > tmp_file
$ mv tmp_file /etc/inittab
Steve Krampach Permalink
January 26, 2010, 19:29

I need to substitute
untouched_data Data Any Any untouched_data
with
untouched_data Data New New untouched_data

How could that be done?

January 27, 2010, 18:41

Steve, use the gsub function:

awk '{ gsub("Any", "New"); print }'
Steve Krampach Permalink
January 28, 2010, 16:20

I need to substitute
untouched_data untouched_data Data Any Any untouched_data
with
untouched_data Data New New untouched_data

How could that be done?

Let me clarify;
Data will be the item searched for and may appear anywhere in the line
followed by two unknown variables and any amount of following data.

I need to search for "Data" and replace the to next unknown items ("Any")
with two pieces of data specific to "Data"...

ie. Keying off the name "John" for example;

"This is a speech by John M Doe in April"
(Data Any Any = John M Doe)
replacing it with:
"This is a speech by John Quincy Adams in April" (Data New New = John Quincy Adams)

Thank you in advance!

Gaurav Permalink
May 26, 2012, 09:00

hi Steve,

it can also be done in this way

sed -e 's/untouched_data //2' -e 's/Any/New/g'

January 31, 2010, 22:35

Hi Steve Krampach,

Can you use sed?

With sed it's really easy:

sed -e 's/\(This is a speech by \).\+\( in April\)/\1John Quincy Adams\2/'

Input:

This is a speech by John M Doe in April

Output:

This is a speech by John Quincy Adams in April
srinivas dakuri Permalink
June 03, 2010, 11:50

hi
the file contains duplicate consecute lines except first column.how i can delete those duplicate lines.

raj Permalink
November 19, 2010, 22:41

I have a file with contents like:

abcdefg
hijklmn

and I need to have each of the lines repeated so the end result from the above example will be:

abcdefg abcdefg
hijklmn hijklmn

Any help please?

Thanks,

March 25, 2011, 18:14

hi raj,

echo "srinu" | sed 's/.*/& &/g'

output: srinu srinu

Sam Lachterman Permalink
May 08, 2011, 18:15

awk '{ print $0 " " $0 }' myfile

Sam Lachterman Permalink
May 08, 2011, 18:18

Regarding the CRLF example, I am confused about the pattern \r$ matching. If the text reads CR LF, then would not the LF between the CR and the end of the line prevent this pattern from matching?

May 09, 2011, 06:33

It reads in the line without the trailing LF. I was just preparing my Awk One-Liners e-book and noticed that I had explained it incorrectly before. When a line gets read in $0, LF gets stripped, so if it ended with CRLF, then now it ends with CR and \r$ matches that CR.

Kundan Chaudhari Permalink
October 07, 2011, 21:12

I have pattern similar to below in a file
P1
10,9:12/971552020883
,10:1
,11:1

11,9:15/424030013064669
,10:60

12,18:14/35491004546837
,19:7

15,9:12/971550930015
,10:1
,11:1
,14:1001

I want to have convert it as below

P1
10,9:12/971552020883
10,10:1
10,11:1

11,9:15/424030013064669
11,10:60

12,18:14/35491004546837
12,19:7

15,9:12/971550930015
15,10:1
15,11:1
15,14:1001

October 09, 2011, 14:17
perl -lne '
 if (m|^(\d+),\d+:\d+/\d+|) { $n = $1; print; }
 elsif (m|^,\d+:\d+|) { print $n . $_; }
 else { print }
'
Arun Permalink
November 16, 2011, 08:33

Ref#36, to join lines with backslash arbitrarily -
awk '!/\\$/ {ORS="\n"}; /\\$/ {sub(/\\$/,""); ORS=""}; 1'

In above, first condition sets \n as ORS if no backslash, second condition sets ORS as "" and removes backslash from line and final one "; 1" jus prints the line.

Swarup Permalink
September 12, 2012, 11:12

Hi
If there are 20 fields in one line and 30 such lines in a file and i want to modify 3rd field of every line with a value say "HERO" and 4th field of every line with a value"ASDF".
What will be the Awk one liner for this.
Can you please respond as this is urgent.

September 12, 2012, 14:14

Here is how:

awk '{$3="HERO"; $4="ASDF"; print}'

Jotne Permalink
March 20, 2014, 12:05

35. TAC
This can be simplified:
awk '{a[i++]=$0} END {while(i--) print a[i]}' file

Hodgesers Permalink
April 04, 2014, 07:09

I love this article and writing a dissertation available for students online. We help in all kind of studies online.

November 25, 2014, 07:52

This can be simplified:
awk '{a[i++]=$0} END {while(i--) print a[i]}' file

November 27, 2014, 12:15

thanks for sharing. Merry christmas songs | New year's.

Leave a new comment

(why do I need your e-mail?)

(Your twitter name, if you have one. (I'm @pkrumins, btw.))

Type the word "apple": (just to make sure you're a human)

Please preview the comment before submitting to make sure it's OK.

Advertisements