Perl One-Liners Explained, Part V: Text conversion and substitution

This is the fifth part of a nine-part article on Perl one-liners. In this part I will create various one-liners for text conversion and substitution. See part one for introduction of the series.

Perl one-liners is my attempt to create "perl1line.txt" that is similar to "awk1line.txt" and "sed1line.txt" that have been so popular among Awk and Sed programmers.

The article on Perl one-liners will consist of nine parts:

Part I: File spacing.
Part II: Line numbering.
Part III: Calculations.
Part IV: String creation and array creation.
Part V: Text conversion and substitution (this part).
Part VI: Selective printing and deleting of certain lines.
Part VII: Handy regular expressions.
Part VIII: Release of perl1line.txt.
Part IX: Release of Perl One-Liners e-book.

After I'm done with explaining the one-liners, I'll release an ebook. Subscribe to my blog to know when that happens!

Awesome news: I have written an e-book based on this article series. Check it out:

Perl book

Alright then, here are today's one-liners:

Text conversion and substitution

62. ROT13 a string.

'y/A-Za-z/N-ZA-Mn-za-m/'

This one-liner uses the y operator (also known as tr operator) to do ROT13. Operators y and tr do string transliteration. Given y/SEARCH/REPLACE/, the operator transliterates all occurrences of the characters found in SEARCH list with the corresponding (position-wise) characters in REPLACE list.

In this one-liner A-Za-z creates the following list of characters:

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

And N-ZA-Mn-za-m creates this list:

NOPQRSTUVWXYZABCDEFGHIJKLMnopqrstuvwxyzabcdefghijklm

If you look closely you'll notice that the second list is actually the first list offset by 13 characters. Now the y operator translates each character in the first list to a character in the second list, thus performing the ROT13 operation.

If you want to ROT13 the whole file then do this:

perl -lpe 'y/A-Za-z/N-ZA-Mn-za-m/' file

The -p argument puts each of file's line in the $ variable, the y does ROT13, and -p prints the $ out. The -l appends a newline to the output.

Note: remember that applying ROT13 twice produces the same string, i.e., ROT13(ROT13(string)) == string.

63. Base64 encode a string.

perl -MMIME::Base64 -e 'print encode_base64("string")'

This one-liner uses the MIME::Base64 module that is in the core (no need to install it, it comes with Perl). This module exports the encode_base64 function that takes a string and returns base64 encoded version of it.

To base64 encode the whole file do the following:

perl -MMIME::Base64 -0777 -ne 'print encode_base64($_)' file

Here the -0777 argument together with -n causes Perl to slurp the whole file into the $_ variable. Then the file gets base64 encoded and printed out, just like the string example above.

If we didn't slurp the file and encoded it line-by-line we'd get a mess.

64. Base64 decode a string.

perl -MMIME::Base64 -le 'print decode_base64("base64string")'

The MIME::Base64 module also exports decode_base64 function that takes a base64-encoded string and decodes it.

The whole file can be similarly decoded by:

perl -MMIME::Base64 -ne 'print decode_base64($_)' file

There is no need to slurp the whole file into $_ because each line of a base64 encoded file is exactly 76 characters and decodes nicely.

65. URL-escape a string.

perl -MURI::Escape -le 'print uri_escape($string)'

You'll need to install the URI::Escape module as it doesn't come with Perl. The module exports two functions - uri_escape and uri_unescape. The first one does URL-escaping (sometimes also referred to as URL encoding), and the other does URL-unescaping (URL decoding).

66. URL-unescape a string.

perl -MURI::Escape -le 'print uri_unescape($string)'

This one-liner uses the uri_unescape function from URI::Escape module to do URL-unescaping.

67. HTML-encode a string.

perl -MHTML::Entities -le 'print encode_entities($string)'

This one-liner uses the encode_entities function from HTML::Entities module. This function encodes HTML entities. For example, < and > get turned into < and >.

68. HTML-decode a string.

perl -MHTML::Entities -le 'print decode_entities($string)'

This one-liner uses the decode_entities function from HTML::Entities module.

69. Convert all text to uppercase.

perl -nle 'print uc'

This one-liner uses the uc function, which by default operates on the $_ variable and returns an uppercase version of it.

Another way to do the same is to use -p command line option that enables automatic printing of $_ variable and modify it in-place:

perl -ple '$_=uc'

The same can also be also achieved by applying the \U escape sequence to string interpolation:

perl -nle 'print "\U$_"'

It causes anything after it (or until the first occurrence of \E) to be upper-cased.

70. Convert all text to lowercase.

perl -nle 'print lc'

This one-liner is very similar to the previous. Here the lc function is used that converts the contents of $_ to lowercase.

Or, using escape sequence \L and string interpolation:

perl -nle 'print "\L$_"'

Here \L causes everything after it (until the first occurrence of \E) to be lower-cased.

71. Uppercase only the first word of each line.

perl -nle 'print ucfirst lc'

The one-liner first applies the lc function to the input that makes it lower case and then uses the ucfirst function that upper-cases only the first character.

It can also be done via escape codes and string interpolation:

perl -nle 'print "\u\L$_"'

First the \L lower-cases the whole line, then \u upper-cases the first character.

72. Invert the letter case.

perl -ple 'y/A-Za-z/a-zA-Z/'

This one-liner does transliterates capital letters A-Z to lowercase letters a-z, and lowercase letters to uppercase letters, thus switching the case.

73. Camel case each line.

perl -ple 's/(\w+)/\u$1/g'

This is a lousy Camel Casing one-liner. It takes each word and upper-cases the first letter of it. It fails on possessive forms like "friend's car". It turns them into "Friend'S Car".

An improvement is:

s/(?&lt;!['])(\w+)/\u\1/g

Which checks if the character before the word is not single quote '. But I am sure it still fails on some more exotic examples.

74. Strip leading whitespace (spaces, tabs) from the beginning of each line.

perl -ple 's/^[ \t]+//'

This one-liner deletes all whitespace from the beginning of each line. It uses the substitution operator s. Given s/REGEX/REPLACE/ it replaces the matched REGEX by the REPLACE string. In this case the REGEX is ^[ \t]+, which means "match one or more space or tab at the beginning of the string" and REPLACE is nothing, meaning, replace the matched part with empty string.

The regex class [ \t] can actually be replaced by \s+ that matches any whitespace (including tabs and spaces):

perl -ple 's/^\s+//'

75. Strip trailing whitespace (space, tabs) from the end of each line.

perl -ple 's/[ \t]+$//'

This one-liner deletes all whitespace from the end of each line.

Here the REGEX of the s operator says "match one or more space or tab at the end of the string." The REPLACE part is empty again, which means to erase the matched whitespace.

76. Strip whitespace from the beginning and end of each line.

perl -ple 's/^[ \t]+|[ \t]+$//g'

This one-liner combines the previous two. Notice that it specifies the global /g flag to the s operator. It's necessary because we want it to delete whitespace at the beginning AND end of the string. If we didn't specify it, it would only delete whitespace at the beginning (assuming it exists) and not at the end.

77. Convert UNIX newlines to DOS/Windows newlines.

perl -pe 's|\n|\r\n|'

This one-liner substitutes the Unix newline \n LF with Windows newline \r\n CRLF on each line. Remember that the s operator can use anything for delimiters. In this one-liner it uses vertical pipes to delimit REGEX from REPLACE to improve readibility.

78. Convert DOS/Windows newlines to UNIX newlines.

perl -pe 's|\r\n|\n|'

This one-liner does the opposite of the previous one. It takes Windows newlines CRLF and converts them to Unix newlines LF.

79. Convert UNIX newlines to Mac newlines.

perl -pe 's|\n|\r|'

Apple Macintoshes used to use \r CR as newlines. This one-liner converts UNIX's \n to Mac's \r.

80. Substitute (find and replace) "foo" with "bar" on each line.

perl -pe 's/foo/bar/'

This one-liner uses the s/REGEX/REPLACE/ command to substitute "foo" with "bar" on each line.

To replace all "foos" with "bars", add the global /g flag:

perl -pe 's/foo/bar/g'

81. Substitute (find and replace) "foo" with "bar" on lines that match "baz".

perl -pe '/baz/ && s/foo/bar/'

This one-liner is equivalent to:

while (defined($line = &lt;&gt;)) {
  if ($line =~ /baz/) {
    $line =~ s/foo/bar/
  }
}

It puts each line in variable $line, then checks if line matches "baz", and if it does, it replaces "foo" with "bar" in it.

Perl one-liners explained e-book

I've now written the "Perl One-Liners Explained" e-book based on this article series. I went through all the one-liners, improved explanations, fixed mistakes and typos, added a bunch of new one-liners, added an introduction to Perl one-liners and a new chapter on Perl's special variables. Please take a look:

Perl book

Have Fun!

Have fun with these one-liners for now. The next part is going to be about selective printing and deleting of certain lines.

Can you think of other text conversion and substitution procedures that I did not include here?