This article is part of the article series "Perl One-Liners Explained."
<- previous article next article ->

Perl One LinersThe ultimate goal of the Perl One-Liners Explained article series was to release the perl1line.txt file. Last week I finished the series and today I am happy to announce perl1line.txt - a collection of handy Perl one-liner scripts.

The perl1line.txt file contains over a hundred short Perl one-line scripts for various text processing tasks. The file processing tasks include: changing file spacing, numbering lines, doing calculations, creating strings and arrays, converting and substituting text, selective printing and deleting of certain lines and text filtering and modifications through regular expressions.

The latest version of perl1line.txt is always at:

  • http://www.catonmat.net/download/perl1line.txt

Here is the full table of contents of perl1line.txt:

  • File Spacing.
  • Line Numbering.
  • Calculations.
  • String Creation and Array Creation.
  • Text Conversion and Substitution.
  • Selective Printing and Deleting of Certain Lines.
  • Handy Regular Expressions.

You can send me bug fixes and updates via GitHub. I put the file in its own perl1line.txt repository. I also accept translations. Send them in!

Awesome news: I have written an e-book based on the one-liners in this file. Check it out:

Here is the whole file of perl1line.txt at version 1.0:

Useful One-Line Scripts for Perl                     Nov 14 2011 | version 1.0
--------------------------------                     -----------   -----------

Compiled by Peteris Krumins (peter@catonmat.net, @pkrumins on Twitter)
http://www.catonmat.net -- good coders code, great reuse

Latest version of this file is always at:

    http://www.catonmat.net/download/perl1line.txt

This file is also available in other languages:
    (None at the moment.)
    Please email me peter@catonmat.net if you wish to translate it.

I am also writing "Perl One-Liners Explained" ebook that's based on
this file. It explains all the one-liners here. Get it soon at:

    http://www.catonmat.net/blog/perl-book/

These one-liners work both on UNIX systems and Windows. Most likely your
UNIX system already has Perl. For Windows get the Strawberry Perl at:

    http://www.strawberryperl.com/

Table of contents:

    1. File Spacing
    2. Line Numbering
    3. Calculations
    4. String Creation and Array Creation
    5. Text Conversion and Substitution
    6. Selective Printing and Deleting of Certain Lines    
    7. Handy Regular Expressions


FILE SPACING 
------------

# Double space a file
perl -pe '$\="\n"'
perl -pe 'BEGIN { $\="\n" }'
perl -pe '$_ .= "\n"'
perl -pe 's/$/\n/'

# Double space a file, except the blank lines
perl -pe '$_ .= "\n" unless /^$/'
perl -pe '$_ .= "\n" if /\S/'

# Triple space a file
perl -pe '$\="\n\n"'
perl -pe '$_.="\n\n"'

# N-space a file
perl -pe '$_.="\n"x7'

# Add a blank line before every line
perl -pe 's//\n/'

# Remove all blank lines
perl -ne 'print unless /^$/'
perl -lne 'print if length'
perl -ne 'print if /\S/'

# Remove all consecutive blank lines, leaving just one
perl -00 -pe ''
perl -00pe0

# Compress/expand all blank lines into N consecutive ones
perl -00 -pe '$_.="\n"x4'


LINE NUMBERING
--------------

# Number all lines in a file
perl -pe '$_ = "$. $_"'

# Number only non-empty lines in a file
perl -pe '$_ = ++$a." $_" if /./'

# Number and print only non-empty lines in a file (drop empty lines)
perl -ne 'print ++$a." $_" if /./'

# Number all lines but print line numbers only non-empty lines
perl -pe '$_ = "$. $_" if /./'

# Number only lines that match a pattern, print others unmodified
perl -pe '$_ = ++$a." $_" if /regex/'

# Number and print only lines that match a pattern
perl -ne 'print ++$a." $_" if /regex/'

# Number all lines, but print line numbers only for lines that match a pattern
perl -pe '$_ = "$. $_" if /regex/'

# Number all lines in a file using a custom format (emulate cat -n)
perl -ne 'printf "%-5d %s", $., $_'

# Print the total number of lines in a file (emulate wc -l)
perl -lne 'END { print $. }'
perl -le 'print $n=()=<>'
perl -le 'print scalar(()=<>)'
perl -le 'print scalar(@foo=<>)'
perl -ne '}{print $.'

# Print the number of non-empty lines in a file
perl -le 'print scalar(grep{/./}<>)'
perl -le 'print ~~grep{/./}<>'
perl -le 'print~~grep/./,<>'

# Print the number of empty lines in a file
perl -lne '$a++ if /^$/; END {print $a+0}'
perl -le 'print scalar(grep{/^$/}<>)'
perl -le 'print ~~grep{/^$/}<>'

# Print the number of lines in a file that match a pattern (emulate grep -c)
perl -lne '$a++ if /regex/; END {print $a+0}'


CALCULATIONS
------------

# Check if a number is a prime
perl -lne '(1x$_) !~ /^1?$|^(11+?)\1+$/ && print "$_ is prime"'

# Print the sum of all the fields on a line
perl -MList::Util=sum -alne 'print sum @F'

# Print the sum of all the fields on all lines
perl -MList::Util=sum -alne 'push @S,@F; END { print sum @S }'
perl -MList::Util=sum -alne '$s += sum @F; END { print $s }'

# Shuffle all fields on a line
perl -MList::Util=shuffle -alne 'print "@{[shuffle @F]}"'
perl -MList::Util=shuffle -alne 'print join " ", shuffle @F'

# Find the minimum element on a line
perl -MList::Util=min -alne 'print min @F'

# Find the minimum element over all the lines
perl -MList::Util=min -alne '@M = (@M, @F); END { print min @M }'
perl -MList::Util=min -alne '$min = min @F; $rmin = $min unless defined $rmin && $min > $rmin; END { print $rmin }'

# Find the maximum element on a line
perl -MList::Util=max -alne 'print max @F'

# Find the maximum element over all the lines
perl -MList::Util=max -alne '@M = (@M, @F); END { print max @M }'

# Replace each field with its absolute value
perl -alne 'print "@{[map { abs } @F]}"'

# Find the total number of fields (words) on each line
perl -alne 'print scalar @F'

# Print the total number of fields (words) on each line followed by the line
perl -alne 'print scalar @F, " $_"'

# Find the total number of fields (words) on all lines
perl -alne '$t += @F; END { print $t}'

# Print the total number of fields that match a pattern
perl -alne 'map { /regex/ && $t++ } @F; END { print $t }'
perl -alne '$t += /regex/ for @F; END { print $t }'
perl -alne '$t += grep /regex/, @F; END { print $t }'

# Print the total number of lines that match a pattern
perl -lne '/regex/ && $t++; END { print $t }'

# Print the number PI to n decimal places
perl -Mbignum=bpi -le 'print bpi(n)'

# Print the number PI to 39 decimal places
perl -Mbignum=PI -le 'print PI'

# Print the number E to n decimal places
perl -Mbignum=bexp -le 'print bexp(1,n+1)'

# Print the number E to 39 decimal places
perl -Mbignum=e -le 'print e'

# Print UNIX time (seconds since Jan 1, 1970, 00:00:00 UTC)
perl -le 'print time'

# Print GMT (Greenwich Mean Time) and local computer time
perl -le 'print scalar gmtime'
perl -le 'print scalar localtime'

# Print local computer time in H:M:S format
perl -le 'print join ":", (localtime)[2,1,0]'

# Print yesterday's date
perl -MPOSIX -le '@now = localtime; $now[3] -= 1; print scalar localtime mktime @now'

# Print date 14 months, 9 days and 7 seconds ago
perl -MPOSIX -le '@now = localtime; $now[0] -= 7; $now[4] -= 14; $now[7] -= 9; print scalar localtime mktime @now'

# Calculate factorial of 5
perl -MMath::BigInt -le 'print Math::BigInt->new(5)->bfac()'
perl -le '$f = 1; $f *= $_ for 1..5; print $f'

# Calculate greatest common divisor (GCM)
perl -MMath::BigInt=bgcd -le 'print bgcd(@list_of_numbers)'

# Calculate GCM of numbers 20 and 35 using Euclid's algorithm
perl -le '$n = 20; $m = 35; ($m,$n) = ($n,$m%$n) while $n; print $m'

# Calculate least common multiple (LCM) of numbers 35, 20 and 8
perl -MMath::BigInt=blcm -le 'print blcm(35,20,8)'

# Calculate LCM of 20 and 35 using Euclid's formula: n*m/gcd(n,m)
perl -le '$a = $n = 20; $b = $m = 35; ($m,$n) = ($n,$m%$n) while $n; print $a*$b/$m'

# Generate 10 random numbers between 5 and 15 (excluding 15)
perl -le '$n=10; $min=5; $max=15; $, = " "; print map { int(rand($max-$min))+$min } 1..$n'

# Find and print all permutations of a list
perl -MAlgorithm::Permute -le '$l = [1,2,3,4,5]; $p = Algorithm::Permute->new($l); print @r while @r = $p->next'

# Generate the power set
perl -MList::PowerSet=powerset -le '@l = (1,2,3,4,5); for (@{powerset(@l)}) { print "@$_" }'

# Convert an IP address to unsigned integer
perl -le '$i=3; $u += ($_<<8*$i--) for "127.0.0.1" =~ /(\d+)/g; print $u'
perl -le '$ip="127.0.0.1"; $ip =~ s/(\d+)\.?/sprintf("%02x", $1)/ge; print hex($ip)'
perl -le 'print unpack("N", 127.0.0.1)'
perl -MSocket -le 'print unpack("N", inet_aton("127.0.0.1"))'

# Convert an unsigned integer to an IP address
perl -MSocket -le 'print inet_ntoa(pack("N", 2130706433))'
perl -le '$ip = 2130706433; print join ".", map { (($ip>>8*($_))&0xFF) } reverse 0..3'
perl -le '$ip = 2130706433; $, = "."; print map { (($ip>>8*($_))&0xFF) } reverse 0..3'


STRING CREATION AND ARRAY CREATION
----------------------------------

# Generate and print the alphabet
perl -le 'print a..z'
perl -le 'print ("a".."z")'
perl -le '$, = ","; print ("a".."z")'
perl -le 'print join ",", ("a".."z")'

# Generate and print all the strings from "a" to "zz"
perl -le 'print ("a".."zz")'
perl -le 'print "aa".."zz"'

# Create a hex lookup table
@hex = (0..9, "a".."f")

# Convert a decimal number to hex using @hex lookup table
perl -le '$num = 255; @hex = (0..9, "a".."f"); while ($num) { $s = $hex[($num%16)&15].$s; $num = int $num/16 } print $s'
perl -le '$hex = sprintf("%x", 255); print $hex'
perl -le '$num = "ff"; print hex $num'

# Generate a random 8 character password
perl -le 'print map { ("a".."z")[rand 26] } 1..8'
perl -le 'print map { ("a".."z", 0..9)[rand 36] } 1..8'

# Create a string of specific length
perl -le 'print "a"x50'

# Create a repeated list of elements
perl -le '@list = (1,2)x20; print "@list"'

# Create an array from a string
@months = split ' ', "Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec"
@months = qw/Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec/

# Create a string from an array
@stuff = ("hello", 0..9, "world"); $string = join '-', @stuff

# Find the numeric values for characters in the string
perl -le 'print join ", ", map { ord } split //, "hello world"'

# Convert a list of numeric ASCII values into a string
perl -le '@ascii = (99, 111, 100, 105, 110, 103); print pack("C*", @ascii)'
perl -le '@ascii = (99, 111, 100, 105, 110, 103); print map { chr } @ascii'

# Generate an array with odd numbers from 1 to 100
perl -le '@odd = grep {$_ % 2 == 1} 1..100; print "@odd"'
perl -le '@odd = grep { $_ & 1 } 1..100; print "@odd"'

# Generate an array with even numbers from 1 to 100
perl -le '@even = grep {$_ % 2 == 0} 1..100; print "@even"'

# Find the length of the string
perl -le 'print length "one-liners are great"'

# Find the number of elements in an array
perl -le '@array = ("a".."z"); print scalar @array'
perl -le '@array = ("a".."z"); print $#array + 1'


TEXT CONVERSION AND SUBSTITUTION
--------------------------------

# ROT13 a string
'y/A-Za-z/N-ZA-Mn-za-m/'

# ROT 13 a file
perl -lpe 'y/A-Za-z/N-ZA-Mn-za-m/' file

# Base64 encode a string
perl -MMIME::Base64 -e 'print encode_base64("string")'
perl -MMIME::Base64 -0777 -ne 'print encode_base64($_)' file

# Base64 decode a string
perl -MMIME::Base64 -le 'print decode_base64("base64string")'
perl -MMIME::Base64 -ne 'print decode_base64($_)' file

# URL-escape a string
perl -MURI::Escape -le 'print uri_escape($string)'

# URL-unescape a string
perl -MURI::Escape -le 'print uri_unescape($string)'

# HTML-encode a string
perl -MHTML::Entities -le 'print encode_entities($string)'

# HTML-decode a string
perl -MHTML::Entities -le 'print decode_entities($string)'

# Convert all text to uppercase
perl -nle 'print uc'
perl -ple '$_=uc'
perl -nle 'print "\U$_"'

# Convert all text to lowercase
perl -nle 'print lc'
perl -ple '$_=lc'
perl -nle 'print "\L$_"'

# Uppercase only the first word of each line
perl -nle 'print ucfirst lc'
perl -nle 'print "\u\L$_"'

# Invert the letter case
perl -ple 'y/A-Za-z/a-zA-Z/'

# Camel case each line
perl -ple 's/(\w+)/\u$1/g'
perl -ple 's/(?<!['])(\w+)/\u\1/g'

# Strip leading whitespace (spaces, tabs) from the beginning of each line
perl -ple 's/^[ \t]+//'
perl -ple 's/^\s+//'

# Strip trailing whitespace (space, tabs) from the end of each line
perl -ple 's/[ \t]+$//'

# Strip whitespace from the beginning and end of each line
perl -ple 's/^[ \t]+|[ \t]+$//g'

# Convert UNIX newlines to DOS/Windows newlines
perl -pe 's|\n|\r\n|'

# Convert DOS/Windows newlines to UNIX newlines
perl -pe 's|\r\n|\n|'

# Convert UNIX newlines to Mac newlines
perl -pe 's|\n|\r|'

# Substitute (find and replace) "foo" with "bar" on each line
perl -pe 's/foo/bar/'

# Substitute (find and replace) all "foo"s with "bar" on each line
perl -pe 's/foo/bar/g'

# Substitute (find and replace) "foo" with "bar" on lines that match "baz"
perl -pe '/baz/ && s/foo/bar/'


SELECTIVE PRINTING AND DELETING OF CERTAIN LINES
------------------------------------------------

# Print the first line of a file (emulate head -1)
perl -ne 'print; exit'

# Print the first 10 lines of a file (emulate head -10)
perl -ne 'print if $. <= 10'
perl -ne '$. <= 10 && print'

# Print the last line of a file (emulate tail -1)
perl -ne '$last = $_; END { print $last }'
perl -ne 'print if eof'

# Print the last 10 lines of a file (emulate tail -10)
perl -ne 'push @a, $_; @a = @a[@a-10..$#a]; END { print @a }'

# Print only lines that match a regular expression
perl -ne '/regex/ && print'

# Print only lines that do not match a regular expression
perl -ne '!/regex/ && print'

# Print the line before a line that matches a regular expression
perl -ne '/regex/ && $last && print $last; $last = $_'

# Print the line after a line that matches a regular expression
perl -ne 'if ($p) { print; $p = 0 } $p++ if /regex/'

# Print lines that match regex AAA and regex BBB in any order
perl -ne '/AAA/ && /BBB/ && print'

# Print lines that don't match match regexes AAA and BBB
perl -ne '!/AAA/ && !/BBB/ && print'

# Print lines that match regex AAA followed by regex BBB followed by CCC
perl -ne '/AAA.*BBB.*CCC/ && print'

# Print lines that are 80 chars or longer
perl -ne 'print if length >= 80'

# Print lines that are less than 80 chars in length
perl -ne 'print if length < 80'

# Print only line 13
perl -ne '$. == 13 && print && exit'

# Print all lines except line 27
perl -ne '$. != 27 && print'
perl -ne 'print if $. != 27'

# Print only lines 13, 19 and 67
perl -ne 'print if $. == 13 || $. == 19 || $. == 67'
perl -ne 'print if int($.) ~~ (13, 19, 67)' 

# Print all lines between two regexes (including lines that match regex)
perl -ne 'print if /regex1/../regex2/'

# Print all lines from line 17 to line 30
perl -ne 'print if $. >= 17 && $. <= 30'
perl -ne 'print if int($.) ~~ (17..30)'
perl -ne 'print if grep { $_ == $. } 17..30'

# Print the longest line
perl -ne '$l = $_ if length($_) > length($l); END { print $l }'

# Print the shortest line
perl -ne '$s = $_ if $. == 1; $s = $_ if length($_) < length($s); END { print $s }'

# Print all lines that contain a number
perl -ne 'print if /\d/'

# Find all lines that contain only a number
perl -ne 'print if /^\d+$/'

# Print all lines that contain only characters
perl -ne 'print if /^[[:alpha:]]+$/

# Print every second line
perl -ne 'print if $. % 2'

# Print every second line, starting the second line
perl -ne 'print if $. % 2 == 0'

# Print all lines that repeat
perl -ne 'print if ++$a{$_} == 2'

# Print all unique lines
perl -ne 'print unless $a{$_}++'


HANDY REGULAR EXPRESSIONS
-------------------------

# Match something that looks like an IP address
/^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$/
/^(\d{1,3}\.){3}\d{1,3}$/

# Test if a number is in range 0-255
/^([0-9]|[0-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])$/

# Match an IP address
my $ip_part = qr|([0-9]|[0-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])|;
if ($ip =~ /^($ip_part\.){3}$ip_part$/) {
 say "valid ip";
}

# Check if the string looks like an email address
/.+@.+\..+/

# Check if the string is a decimal number
/^\d+$/
/^[+-]?\d+$/
/^[+-]?\d+\.?\d*$/

# Check if the string is a hexadecimal number
/^0x[0-9a-f]+$/i

# Check if the string is an octal number
/^0[0-7]+$/

# Check if the string is binary
/^[01]+$/

# Check if a word appears twice in the string
/(word).*\1/

# Increase all numbers by one in the string
$str =~ s/(\d+)/$1+1/ge

# Extract HTTP User-Agent string from the HTTP headers
/^User-Agent: (.+)$/

# Match printable ASCII characters
/[ -~]/

# Match unprintable ASCII characters
/[^ -~]/

# Match text between two HTML tags
m|<strong>([^<]*)</strong>|
m|<strong>(.*?)</strong>|

# Replace all <b> tags with <strong>
$html =~ s|<(/)?b>|<$1strong>|g

# Extract all matches from a regular expression
my @matches = $text =~ /regex/g;


PERL ONE-LINERS EXPLAINED E-BOOK
--------------------------------

I am writing an ebook based on the one-liners in this file. If you wish to
support my work and learn more about these one-liners, you can get a copy
of my ebook soon at:

    http://www.catonmat.net/blog/perl-book/

The ebook is based on the 7-part article series that I wrote on my blog.
In the ebook I reviewed all the one-liners, improved explanations and added
new ones.

You can read the original article series here:

    http://www.catonmat.net/blog/perl-one-liners-explained-part-one/
    http://www.catonmat.net/blog/perl-one-liners-explained-part-two/
    http://www.catonmat.net/blog/perl-one-liners-explained-part-three/
    http://www.catonmat.net/blog/perl-one-liners-explained-part-four/
    http://www.catonmat.net/blog/perl-one-liners-explained-part-five/
    http://www.catonmat.net/blog/perl-one-liners-explained-part-six/
    http://www.catonmat.net/blog/perl-one-liners-explained-part-seven/


CREDITS
-------

Andy Lester       http://www.petdance.com
Shlomi Fish       http://www.shlomifish.com
Madars Virza      http://www.madars.org


FOUND A BUG? HAVE ANOTHER ONE-LINER?
------------------------------------

Email bugs and new one-liners to me at peter@catonmat.net!


HAVE FUN
--------

I hope you found these one-liners useful. Have fun!

#---end of file---

Perl One-Liners Explained Article Series

If you're curious how the one-liners work, take a look at the Perl One-Liners Explained article series. I have explained all of them in the following articles:

The next, and final, thing I am doing with these one-liners is releasing Perl One-Liners Explained e-book. Stay tuned!

Update: I finished writing the e-book. Check it out!

Awk and Sed One-Liners Explained

I based the perl1line.txt file on the famous awk1line.txt and sed1line.txt files. That's how I actually learned awk and sed. I studied these two files inside out and learned everything I could about sed and awk. Later I decided to contribute back to community and explained all the one-liners in these two article series:

Take a look at these article series. The Awk one has been read over 800,000 times and the sed one over 500,000 times!

Enjoy!

Enjoy and see you later!

This article is part of the article series "Perl One-Liners Explained."
<- previous article next article ->

Perl One LinersThis is the seventh part of a nine-part article on famous Perl one-liners. Perl is not Perl without regular expressions, therefore in this part I will come up with and explain various Perl regular expressions. Please see part one for the introduction of the series.

Famous Perl one-liners is my attempt to create "perl1line.txt" that is similar to "awk1line.txt" and "sed1line.txt" that have been so popular among Awk and Sed programmers, and Unix sysadmins. I will release the perl1line.txt in the next part of the series.

The article on famous Perl one-liners consists of nine parts:

After I am done with the next part of the article, I will release the whole article series as a pdf e-book! Please subscribe to my blog to be the first to get it. You can also follow me on Twitter.

Awesome news: I have written an e-book based on this article series. Check it out:

And here are today's one-liners:

109. Match something that looks like an IP address.

/^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$/

This regex doesn't guarantee that the thing that got matched is in fact a valid IP. All it does is match something that looks like an IP. It matches a number followed by a dot four times. For example, it matches a valid IP 81.198.240.140 and it also matches an invalid IP such as 923.844.1.999.

Here is how it works. The ^ at the beginning of regex is an anchor that matches the beginning of string. Next \d{1,3} matches one, two or three consecutive digits. The \. matches a dot. The $ at the end is an anchor that matches the end of the string. It's important to use both ^ and $ anchors, otherwise strings like foo213.3.1.2bar would also match.

This regex can be simplified by grouping the first three repeated \d{1,3}\. expressions:

/^(\d{1,3}\.){3}\d{1,3}$/

110. Test if a number is in range 0-255.

/^([0-9]|[0-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])$/

Here is how it works. A number can either be one digit, two digit or three digit. If it's a one digit number then we allow it to be anything [0-9]. If it's two digit, we also allow it to be any combination of [0-9][0-9]. However if it's a three digit number, it has to be either one hundred-something or two-hundred something. If it'e one hundred-something, then 1[0-9][0-9] matches it. If it's two hundred-something then it's either something up to 249, which is matched by 2[0-4][0-9] or it's 250-255, which is matched by 25[0-5].

111. Match an IP address.

my $ip_part = qr|([0-9]|[0-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])|;
if ($ip =~ /^($ip_part\.){3}$ip_part$/) {
 say "valid ip";
}

This regexp combines the previous two. It uses the my $ip_part = qr/.../ operator compiles the regular expression and puts it in $ip_part variable. Then the $ip_part is used to match the four parts of the IP address.

112. Check if the string looks like an email address.

/.+@.+\..+/

This regex makes sure that the string looks like an email address. Notice that I say "looks like". It doesn't guarantee it is an email address. Here is how it works - first it matches something up to the @ symbol, then it matches as much as possible until it finds a dot, and then it matches some more. If this succeeds, then it it's something that at least looks like email address with the @ symbol and a dot in it.

For example, cats@catonmat.net matches but cats@catonmat doesn't because the regex can't match the dot \. that is necessary.

Much more robust way to check if a string is a valid email would be to use Email::Valid module:

use Email::Valid;
print (Email::Valid->address('john@example.com') ? 'valid email' : 'invalid email');

113. Check if the string is a decimal number.

Checking if the string is a number is really difficult. I based my regex and explanation on the one in Perl Cookbook.

Perl offers \d that matches digits 0-9. So we can start with:

/^\d+$/

This regex matches one or more digits \d starting at the beginning of the string ^ and ending at the end of the string $. However this doesn't match numbers such as +3 and -3. Let's modify the regex to match them:

/^[+-]?\d+$/

Here the [+-]? means match an optional plus or a minus before the digits. This now matches +3 and -3 but it doesn't match -0.3. Let's add that:

/^[+-]?\d+\.?\d*$/

Now we have expanded the previous regex by adding \.?\d*, which matches an optional dot followed by zero or more numbers. Now we're in business and this regex also matches numbers like -0.3 and 0.3.

Much better way to match a decimal number is to use Regexp::Common module that offers various useful regexes. For example, to match an integer you can use $RE{num}{int} from Regexp::Common.

How about positive hexadecimal numbers? Here is how:

/^0x[0-9a-f]+$/i

This matches the hex prefix 0x followed by hex number itself. The /i flag at the end makes sure that the match is case insensitive. For example, 0x5af matches, 0X5Fa matches but 97 doesn't, cause it's just a decimal number.

It's better to use $RE{num}{hex} because it supports negative numbers, decimal places and number grouping.

Now how about octal? Here is how:

/^0[0-7]+$/

Octal numbers are prefixed by 0, which is followed by octal digits 0-7. For example, 013 matches but 09 doesn't, cause it's not a valid octal number.

It's better to use $RE{num}{oct} because of the same reasons as above.

Finally binary:

/^[01]+$/

Binary base consists of just 0s and 1s. For example, 010101 matches but 210101 doesn't, because 2 is not a valid binary digit.

It's better to use $RE{num}{bin} because of the same reasons as above.

114. Check if a word appears twice in the string.

/(word).*\1/

This regex matches word followed by something or nothing at all, followed by the same word. Here the (word) captures the word in group 1 and \1 refers to contents of group 1, therefore it's almost the same as writing /(word).*word/

For example, silly things are silly matches /(silly).*\1/, but silly things are boring doesn't, because silly is not repeated in the string.

115. Increase all numbers by one in the string.

$str =~ s/(\d+)/$1+1/ge

Here we use the substitution operator s///. It matches all integers (\d+), puts them in capture group 1, then it replaces them with their value incremented by one $1+1. The g flag makes sure it finds all the numbers in the string, and the e flag evaluates $1+1 as a Perl expression.

For example, this 1234 is awesome 444 gets turned into this 1235 is awesome 445.

116. Extract HTTP User-Agent string from the HTTP headers.

/^User-Agent: (.+)$/

HTTP headers are formatted as Key: Value pairs. It's very easy to parse such strings, you just instruct the regex engine to save the Value part in $1 group variable.

For example, if the HTTP headers contain,

Host: localhost:8000
Connection: keep-alive
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_0_0; en-US)
Accept: application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3

Then the regular expression will extract the Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_0_0; en-US) string.

117. Match printable ASCII characters.

/[ -~]/

This is really tricky and smart. To understand it, take a look at man ascii. You'll see that space starts at value 0x20 and the ~ character is 0x7e. All the characters between a space and ~ are printable. This regular expression matches exactly that. The [ -~] defines a range of characters from space till ~. This is my favorite regexp of all time.

You can invert the match by placing ^ as the first character in the group:

/[^ -~]/

This matches the opposite of [ -~].

118. Match text between two HTML tags.

m|<strong>([^<]*)</strong>|

This regex matches everything between <strong>...</strong> HTML tags. The trick here is the ([^<]*), which matches as much as possible until it finds a < character, which starts the next tag.

Alternatively you can write:

m|<strong>(.*?)</strong>|

But this is a little different. For example, if the HTML is <strong><em>hello</em></strong> then the first regex doesn't match anything because the < follows <strong> and ([^<]*) matches as little as possible. The second regex matches <em>hello</em> because the (.*?)</strong> matches as little as possible until it finds </strong>, which happens to be <em>hello</em>.

However don't use regular expressions for matching and parsing HTML. Use modules like HTML::TreeBuilder to accomplish the task cleaner.

119. Replace all <b> tags with <strong>

$html =~ s|<(/)?b>|<$1strong>|g

Here I assume that the HTML is in variable $html. Next the <(/)?b> matches the opening and closing <b> tags, captures the optional closing tag slash in group $1 and then replaces the matched tag with either <strong> or </strong>, depending on if it was an opening or closing tag.

120. Extract all matches from a regular expression.

my @matches = $text =~ /regex/g;

Here the regular expression gets evaluated in the list context that makes it return all the matches. The matches get put in the @matches variable.

For example, the following regex extracts all numbers from a string:

my $t = "10 hello 25 moo 31 foo";
my @nums = $text =~ /\d+/g;

@nums now contains (10, 25, 30).

Perl one-liners explained e-book

I've now written the "Perl One-Liners Explained" e-book based on this article series. I went through all the one-liners, improved explanations, fixed mistakes and typos, added a bunch of new one-liners, added an introduction to Perl one-liners and a new chapter on Perl's special variables. Please take a look:

Have Fun!

Thanks for reading the article! In the next part I am releasing the perl1line.txt that will contain all the one-liners in a single file.

Follow me everywhere!

Testling now has moved to Testling-CI

Don't use anything described below as it doesn't work anymore

All questions about Testling-CI: feedback@browserling.com

We're happy to announce Headless Testling. Headless Testling lets you run your JavaScript tests locally through jsdom and remotely with Testling.

We put headless testling on npm so installing it is as easy as:

npm install -g testling

And now you can run your tests locally!

$ testling test.js 

Or run them on real browsers by specifying --browsers argument:

$ testling test.js --browsers=iexplore/7.0,iexplore/8.0,firefox/3.5

For example, if your test.js is this:

var test = require('testling');

test('json parse', function (t) {
    t.deepEqual(JSON.parse('[1,2]'), [1,2]);
    t.end();
});

Then running testling headlessly you'll get the following output:

node/jsdom                      1/1  100 % ok

But running it with --browsers=iexplore/7.0,iexplore/8.0,firefox/3.5, the test will get executed on native IE7, IE8 and Firefox 3.5:

Bundling...  done

iexplore/7.0        0/1    0 % ok
  Error: 'JSON' is undefined
    at [anonymous]() in /test.js : line: 4, column: 5
    at [anonymous]() in /test.js : line: 3, column: 29
    at test() in /test.js : line: 3, column: 1

  > t.deepEqual(JSON.parse('[1,2]'), [1,2]);

iexplore/8.0        1/1  100 % ok
firefox/3.5         1/1  100 % ok

total               2/3   66 % ok

You can follow the development of headless testling at testling's github repo.

Follow the founders of Browserling on GitHub, Twitter, Plurk, Google+ and Facebook!

And subscribe to my blog for Browserling announcements and all kinds of other awesome blog posts!

Testling now has moved to Testling-CI

Don't use anything described below as it doesn't work anymore

All questions about Testling-CI: feedback@browserling.com

We have amazing news at Browserling. We just launched a new product called Testling! Testling is automated cross-browser JavaScript testing tool. You write the JavaScript test, and we run it on all the browsers behind the scenes and report the results.

It's super easy to use. All you need is curl. Try this: create a file test.js with the following contents:

var test = require('testling');

test('json parse', function (t) {
    t.deepEqual(JSON.parse('[1,2]'), [1,2]);
    t.end();
});

And now do this:

curl -sSNT test.js testling.com/?browsers=iexplore/7.0,iexplore/8.0,chrome/14.0

This will run the JSON test in Internet Explorer 7, Internet Explorer 8 and Chrome 14. It's well known that IE7 does not have a JSON object so the test will fail. However IE8 and Chrome have the global JSON object and the test will succeed:

Here is the output:

Bundling...  done

iexplore/7.0        0/1    0 % ok
  Error: 'JSON' is undefined
    at [anonymous]() in /test.js : line: 4, column: 5
    at [anonymous]() in /test.js : line: 3, column: 29
    at test() in /test.js : line: 3, column: 1

  > t.deepEqual(JSON.parse('[1,2]'), [1,2]);

iexplore/8.0        1/1  100 % ok
chrome/14.0         1/1  100 % ok

total               2/3   66 % ok

It precisely shows what the error was on IE7 - JSON is undefined in test.js, on line 4.

Testling supports all the major browsers:

  • Internet Explorer 6, 7, 8 and 9 (all native).
  • Chrome 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.
  • Firefox 3.0, 3.5, 3.6, 4.0, 5.0, 6.0, 7.0.
  • Opera 10.0, 10.5, 11.0, 11.5.
  • Safari 5.0.5, 5.1.

When you run a test, you can specify, which browsers to run tests on via the ?browsers query parameter. For example, to run the test on IE6, Opera 11 and Safari 5.1, do this:

curl -sSNT test.js testling.com/?browsers=iexplore/6.0,opera/11.0,safari/5.1

You can script interactions with websites, too. Here is an example that uses jQuery to submit a login form and compare the greeting text:

var test = require('testling');

test('testling login test', function (t) {
    t.createWindow('http://www.testling.com/test-form/', function (win, $) {
        t.plan(1);

        var form = $('#form')[0];
        $('input[name=login]').val('testling');
        $('input[name=passw]').val('qwerty');

        t.submitForm(form, function (win, $) {
            t.ok($('#welcome p:first').text() === "Login successful.", "login failed");
            t.end();
        });
    });
});

Submitting a form only works in IE8, IE9, Chrome 7-14 and all Firefox browsers. Opera, IE6 and IE7 can't do that yet because of implementation limitations, but we're solving this problem right now.

See Testling documentation for a detailed information how to write tests!

Follow the founders of Browserling on GitHub, Twitter, Plurk, Google+ and Facebook!

And subscribe to my blog for Browserling announcements and all kinds of other awesome blog posts!

This is the world's best introduction to sed - the superman of UNIX stream editing. Originally I wrote this introduction for my second e-book, however later I decided to make it a part of the free e-book preview and republish it here as this article.

Introduction to sed

Mastering sed can be reduced to understanding and manipulating the four spaces of sed. These four spaces are:

  • Input Stream
  • Pattern Space
  • Hold Buffer
  • Output Stream

Think about the spaces this way - sed reads the input stream and produces the output stream. Internally it has the pattern space and the hold buffer. Sed reads data from the input stream until it finds the newline character \n. Then it places the data read so far, without the newline, into the pattern space. Most of the sed commands operate on the data in the pattern space. The hold buffer is there for your convenience. Think about it as temporary buffer. You can copy or exchange data between the pattern space and the hold buffer. Once sed has executed all the commands, it outputs the pattern space and adds a \n at the end.

It's possible to modify the behavior of sed with the -n command line switch. When -n is specified, sed doesn't output the pattern space and you have to explicitly print it with either p or P commands.

Let's look at several examples to understand the four spaces and sed. These are just examples to illustrate what sed looks like and what it's all about.

Here is the simplest possible sed program:

sed 's/foo/bar/'

This program replaces text "foo" with "bar" on every line. Here is how it works. Suppose you have a file with these lines:

abc
foo
123-foo-456

Sed opens the file as the input stream and starts reading the data. After reading "abc" it finds a newline \n. It places the text "abc" in the pattern space and now it applies the s/foo/bar/ command. Since we have "abc" in the pattern space and there is no "foo" anywhere, sed does nothing to the pattern space. At this moment sed has executed all the commands (in this case just one). The default action when all the commands have been executed is to print the pattern space, followed by newline. So the output from the first line is "abc\n".

Now sed reads in the second line "foo" and executes s/foo/bar/. This replaces "foo" with "bar". The pattern space now contains just "bar". The end of the script has been reached and sed prints out the pattern space, followed by newline. The output from the second line is "bar\n".

Now the 3rd line is read in. The pattern space is now "123-foo-456". Since there is "foo" in the text, the s/foo/bar/ is successful and the pattern space is now "123-bar-456". The end is reached and sed prints the pattern space. The output is "123-bar-456\n".

All the lines of the input have been read at this moment and sed exits. The output from running the script on our example file is:

abc
bar
123-bar-456

In this example we never used the hold buffer because there was no need for temporary storage.

Before we look at an example with temporary storage, let's take a look at three command line switches: -n, -e and -i. First -n.

If you specify -n to sed, like this:

sed -n 's/foo/bar/'

Then sed will no longer print the pattern space when it reaches the end of the script. So if you run this program on our sample file above, there will be no output. You must use sed's p command to force sed to print the line:

sed -n 's/foo/bar/; p'

As you can see, sed commands are separated by the ; character. You can also use -e switch to separate the commands:

sed -n -e 's/foo/bar/' -e 'p'

It's the same as if you used ;. Next, let's take a look at the -i command line argument. This one forces sed to do in-place editing of the file, meaning it reads the contents of the file, executes the commands, and places the new contents back in the file.

Here is an example. Suppose you have a file called "users", with the following content:

pkrumins:hacker
esr:guru
rms:geek

And you wish to replace the ":" symbol with ";" in the whole file. Then you can do it as easily as:

sed -i 's/:/;/' users

It will silently execute the s/:/;/ command on all lines in the file and do all substitutions. Be very careful when using -i as it's destructive and it's not reversible! It's always safer to run sed without -i, and then replace the file yourself.

Alternatively you can specify a file extension to the -i command. This way sed will make a backup copy of the file before it makes in-place modifications.

For example, if you specify -i.bak, like this:

sed -i.bak 's/:/;/' users

Then sed will create users.bak before modifying the contents of users file.

Actually, before we look at the hold buffer, let's take a look at addresses and ranges. Addresses allow you to restrict sed commands to certain lines, or ranges of lines.

The simplest address is a single number that limits sed commands to the given line number:

sed '5s/foo/bar/'

This limits the s/foo/bar/ only to the 5th line of file or input stream. So if there is a "foo" on the 5th line, it will be replaced with "bar". No other lines will be touched.

The addresses can be also inverted with the ! after the address. To match all lines that are not the 5th line (lines 1-4, plus lines 6-...), do this:

sed '5!s/foo/bar/'

The inversion can be applied to any address.

Next, you can also limit sed commands to a range of lines by specifying two numbers, separated by a comma:

sed '5,10s/foo/bar/'

In this one-liner the s/foo/bar/ is executed only on lines 5 - 10, inclusive. Here is a quick, useful one-liner. Suppose you want to print lines 5 - 10 in the file. You can first disable implicit line printing with the -n command line switch, and then use the p command on lines 5 - 10:

sed -n '5,10p'

This will execute the p command only on lines 5 - 10. No other lines will be output. Pretty neat, isn't it?

There is a special address $ that matches the last line of the file. Here is an example that prints the last line of the file:

sed -n '$p'

As you can see, the p command has been limited to $, which is the last line of input.

Next, there is also a single regular expression address match like this /regex/. If you specify a regex before a command, then the command will only get executed on lines that match the regex. Check this out:

sed -n '/a\+b\+/p'

Here the p command will get called only on lines that match a\+b\+ regular expression, which means one or more letters "a" followed by one or more letters "b". For example, it prints lines like "ab", "aab", "aaabbbbb", "foo-123-ab", etc. Note how the + has to be escaped. That's because sed uses basic regular expressions by default. You can enable extended regular expressions by using the -r command line switch:

sed -rn '/a+b+/p'

This way you don't need to quote meta-characters like +, ( and ).

There is also an expression to match a range between two regexes. Here is an example,

sed '/foo/,/bar/d'

This one-liner matches all lines between the first line that matches "/foo/" regex and the first line that matches "/bar/" regex, inclusive. It applies the d command that stands for delete. In other words, it deletes a range of lines between the first line that matches "/foo/" and the first line after "/foo/" that matches "/bar/", inclusive.

Now let's take a look at the hold buffer. Suppose you have a problem where you want to print the line before the line that matches a regular expression. How do you do this? If sed didn't have a hold buffer, things would be tough, but with hold buffer we can always save the current line to the hold buffer, and then let sed read in the next line. Now if the next line matches the regex, we would just print the hold buffer, which holds the previous line. Easy, right?

The command for copying the current pattern space to the hold buffer is h. The command for copying the hold buffer back to the pattern space is g. The command for exchanging the hold buffer and the pattern space is x. We just have to choose the right commands to solve this problem. Here is the solution:

sed -n '/regex/{x;p;x}; h'

It works this way - every line gets copied to the hold buffer with the h command at the end of the script. However, for every line that matches the /regex/, we exchange the hold buffer with the pattern space by using the x command, print it with the p command, and then exchange the buffers back, so that if the next line matches the /regex/ again, we could print the current line.

Also notice the command grouping. Several commands can be grouped and executed only for a specific address or range. In this one-liner the command group is {x;p;x} and it gets executed only if the current line matches /regex/.

Note that this one-liner doesn't work if it's the first line of the input matches /regex/. To fix this, we can limit the p command to all lines that are not the first line with the 1! inverted address match:

sed -n '/regex/{x;1!p;x}; h'

Notice the 1!p. This says - call the p command on all the lines that are not the 1st line. This prevents anything to be printed in case the first line matches /regex/.

Well, that's it! I think this introduction explains the most important concepts in sed, including various command line switches, the four spaces and various sed commands.

If you wish to learn more, I suggest you get a copy of my "Sed One-Liners Explained" e-book. The e-book contains exactly 100 well-explained one-liners. Once you work through them, you'll have rewired your brain to "think in sed". In other words, you'll have learned how to manipulate the pattern space, the hold buffer and you'll know when to print the data to get the results that you need.

Have fun!

If you enjoy my writing, you can subscribe to my blog, follow me on Twitter or Google+.