Awk Programming February 09, 2009

# Update on Awk One-Liners Explained: String and Array Creation

<- previous article next article ->

This is an update post on my three-part article Awk One-Liners Explained.

I received an email from Eric Pement (the original author of Awk one-liners) and he said that there was a new version of awk1line.txt file available. I did a diff and found that there were seven new one-liners in it!

The new file has two new sections "String Creation" and "Array Creation" and it updates "Selective Printing of Certain Lines" section. I'll explain the new one-liners in this article.

The original Eric Pement's Awk one-liner collection consists of five sections, and I explained them in my previous three articles:

Awesome news: I have written an e-book based on this article series. Check it out:

Okay, let's roll with the new one-liners:

## String Creation

1. Create a string of a specific length (generate a string of x's of length 513).

`awk 'BEGIN { while (a++<513) s=s "x"; print s }'`

This one-liner uses the "BEGIN { }" special block that gets executed before anything else in an Awk program. In this block a while loop appends character "x" to variable "s" 513 times. After it has looped, the "s" variable gets printed out. As this Awk program does not have a body, it quits after executing the BEGIN block.

This one-liner printed the 513 x's out, but you could have used it for anything you wish in BEGIN, main program or END blocks.

Unfortunately this is not the most effective way to do it. It's a linear time solution. My friend waldner (who, by the way, wrote a guest post on 10 Awk Tips, Tricks and Pitfalls) showed me a solution that's logarithmic time (based on idea of recursive squaring):

```function rep(str, num,     remain, result) {
if (num < 2) {
remain = (num == 1)
} else {
remain = (num % 2 == 1)
result = rep(str, (num - remain) / 2)
}
return result result (remain ? str  : "")
}
```

This function can be used as following:

```awk 'BEGIN { s = rep("x", 513) }'
```

2. Insert a string of specific length at a certain character position (insert 49 x's after 6th char).

`gawk --re-interval 'BEGIN{ while(a++<49) s=s "x" }; { sub(/^.{6}/,"&" s) }; 1'`

This one-liner works only with Gnu Awk, because it uses the interval expression ".{6}" in the Awk program's body. Interval expressions were not traditionally available in awk, that's why you have to use "--re-interval" option to enable them.

For those that do not know what interval expressions are, they are regular expressions that match a certain number of characters. For example, ".{6}" matches any six characters (the any char is specified by the dot "."). An interval expression "b{2,4}" matches at least two, but not more than four "b" characters. To match words, you have to give them higher precedence - "(foo){4}" matches "foo" repeated four times - "foofoofoofoo".

The one-liner starts the same way as the previous - it creates a 49 character string "s" in the BEGIN block. Next, for each line of the input, it calls sub() function that replaces the first 6 characters with themselves and "s" appended. The "&" in the sub() function means the matched part of regular expression. The '"&" s' means matched part of regex and contents of variable "s". The "1" at the end of whole Awk one-liner prints out the modified line (it's syntactic sugar for just "print" (that itself is syntactic sugar for "print \$0")).

The same can be achieved with normal standard Awk:

`awk 'BEGIN{ while(a++<49) s=s "x" }; { sub(/^....../,"&" s) }; 1`

Here we just match six chars "......" at the beginning of line, and replace them with themselves + contents of variable "s".

It may get troublesome to insert a string at 29th position for example... You'd have to go tapping "." twenty-nine times ".............................". Better use Gnu Awk then and write ".{29}".

Once again, my friend waldner corrected me and pointed to Awk Feature Comparsion chart. The chart suggests that the original one-liner with ".{6}" would also work with POSIX awk, Busybox awk, and Solaris awk.

## Array Creation

3. Create an array from string.

`split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", month, " ")`

This is not a one-liner per se but a technique to create an array from a string. The split(Str, Arr, Regex) function is used do that. It splits string Str into fields by regular expression Regex and puts the fields in array Arr. The fields are placed in Arr[1], Arr[2], ..., Arr[N]. The split() function itself returns the number of fields the string was split into.

In this piece of code the Regex is simply space character " ", the array is month and string is "Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec". After the split, month[1] is "Jan", month[2] is "Feb", ..., month[12] is "Dec".

4. Create an array named "mdigit", indexed by strings.

`for (i=1; i<=12; i++) mdigit[month[i]] = i`

This is another array creation technique and not a real one-liner. This technique creates a reverse lookup array. Remember from the previous "one-liner" that month[1] was "Jan", ..., month[12] was "Dec". Now we want to the reverse lookup and find the number for each month. To do that we create a reverse lookup array "mdigit", such that mdigit["Jan"] = 1, ..., mdigit["Dec"] = 12.

It's really trivial, we loop over month[1], month[2], ..., month[12] and set mdigit[month[i]] to i. This way mdigit["Jan"] = 1, etc.

## Selective Printing of Certain Lines

5. Print all lines where 5th field is equal to "abc123".

`awk '\$5 == "abc123"'`

This one-liner uses idiomatic Awk - if the given expression is true, Awk prints out the line. The fifth field is referenced by "\$5" and it's checked to be equal to "abc123". If it is, the expression is true and the line gets printed.

Unwinding this idiom, this one-liner is really equal to:

```awk '{ if (\$5 == "abc123") { print \$0 } }'
```

6. Print any line where field #5 is not equal to "abc123".

`awk '\$5 != "abc123"'`

This is exactly the same as previous one-liner, except it negates the comparison. If the fifth field "\$5" is not equal to "abc123", then print it.

Unwinding it, it's equal to:

```awk '{ if (\$5 != "abc123") { print \$0 } }'
```

Another way is to literally negate the whole previous one-liner:

`awk '!(\$5 == "abc123")'`

7. Print all lines whose 7th field matches a regular expression.

`awk '\$7  ~ /^[a-f]/'`

This is also idiomatic Awk. It uses "~" operator to test if the seventh "\$7" field matches a regular expression "^[a-f]". This regular expression means "all lines that start with a lower-case letter a, b, c, d, e, or f".

`awk '\$7 !~ /^[a-f]/'`

This one-liner matches negates the previous one and prints all lines that do not start with a lower-case letter a, b, c, d, e, and f.

Another way to write the same is:

`awk '\$7 ~ /^[^a-f]/'`

Here we negated the group of letters [a-f] by adding "^" in the group. That's a regex trick to know.

## Awk one-liners explained e-book

I have written my first e-book called "Awk One-Liners Explained". I improved the explanations of the one-liners in this article series, added new one-liners and added three new chapters - introduction to awk one-liners, summary of awk special variables and idiomatic awk. Please take a look:

## Have Fun!

Have fun with these Awk oneliners!

<- previous article next article ->

Another way to make n copies of a string s:

```function repeat(n, s      , str)
{
str = sprintf("%*s", n, " "); # make n spaces
gsub(/ /, s, str); # replace space with s
return str;
}
```

Another idiom I sometimes use is this to make a string of "-" to underline another string:

```ul = str; # copy the string
gsub(/./, "-", ul); # replace each char with "-"
print str; # print the string
print ul; # underline it
```

Some of awk examples I posted are here: http://unstableme.blogspot.com/search/label/Awk

You did really a great job! But I still have 2 open questions. I explain with a practical example although there are many other situations where the same questions arise.

Say you have an LDAP directory and you want to add an attribute to all the entry of the directory which do not jet have it set.
First you do an LDIF export of your directory ending up with blocks of the kind:

```dn: cn=Robert Smith,dc=bechtle,dc=de
objectClass: inetOrgPerson
cn: Robert Smith
cn: bob  smith
sn: smith
uid: rjsmith
homePhone: 555-111-2222
mail: r.smith@example.com

objectClass: inetOrgPerson
sn: marshall
uid: bmarshall
homePhone: 555-111-2222
mail: b.marshall@example.com```

then your problem is split in 3
1) Find which entry (1 dn: line = 1 entry identifier) already have the attribute already set
2) extract a list of all entries in the LDIF export except the ones in step (1) (which already have the attribute set)
3) write a script which use this entry list to add the missing attribute.

I know how to do the part (3). The problems are part (1) and (2) i.e. how to generate the list of entries to be modified. I have a solution but is not really elegant:

```grep -n dn "export-secure.ldif" > ./tmp0.dat
grep -n vkek "export-secure.ldif" >> ./tmp0.dat
sort -n ./tmp0.dat | cut -d":" -f 2- | grep -B1 vkek | grep dn > ./list-vkek.dat

grep dn "export-secure.ldif" > ./tmp.dat
cp tmp.dat save.dat

for NAME in \$(awk '{print \$2}'  ./tmp1.dat
mv ./tmp1.dat ./tmp.dat

done```

I am sure there are better ways.

When timing the logarithmic squaring solution in oneliner 1 against the linear one, I found that it only started to be faster when you want to print way more than 10000..!

Wow, Peter. That is a good finding. I had not done timing tests.

1. Create a string of a specific length (generate a string of x’s of length 513).

You don't want the loop, and you don't want the recursion much either.

Two awk standard functions do this FAST.
First make a blank string of the required length.
Then stuff it with the character(s)s you want.

You can call it like x = rep( "x", 513);
Or even like x = rep( "Money ", 20);

function rep (str, num, result) {
result = sprintf ("%" num "s", "");
gsub (/./, str, result);
return (result);
}

Thanks for your great articles about bash and awk, they helped me a lot.

Another fast possibility to generate a string of x's of length 513 is:

head -c 513 < /dev/zero | tr '\0' 'x'