Follow me on Twitter for my latest adventures!
We all know the regular expression character classes, right? There are 12 standard classes:
[:alnum:] [:digit:] [:punct:] [:alpha:] [:graph:] [:space:] [:blank:] [:lower:] [:upper:] [:cntrl:] [:print:] [:xdigit:]
But have you seen a visual representation of what these classes match? Probably not. Therefore I created a visualization that illustrates which part of the ASCII set each character class matches. Call it a cheat sheet if you like:
A bunch of programs that I used
Just for my own reference, in case I ever need them again, here are the one-liners I used to create this cheat sheet:
perl -nle 'printf "%08b - %08b\n", map { hex "0x".(split / /)[0], hex "0x".(split / /)[1] } $_ '
perl -nle 'printf "%03o - %03o\n", map { (split / /)[0], (split / /)[1] } $_'
And I used this perl program to generate and check the red/green matches:
use warnings;
use strict;
my $red = "\e[31m";
my $green = "\e[32m";
my $clear = "\e[0m";
my ($start, $end) = @ARGV;
die 'start or end not given' unless defined $start && defined $end;
my @classes = qw/alnum alpha blank cntrl digit graph lower print punct space upper xdigit/;
for (map { chr } $start..$end) {
for my $class (@classes) {
print "${green}1${clear}" if /[[:$class:]]/;
print "${red}0${clear}" unless /[[:$class:]]/;
}
print "\n"
}
Credits
I was inspired to create this visualization when I saw a similar table for C's ctype.h character classification functions.



Facebook
Plurk
more
GitHub
LinkedIn
FriendFeed
Google Plus
Amazon wish list
Comments
Save on a call to split with
map { (split / /)[0,1] }
:)
Man that is super helpful to have! Great idea, I have a bunch of char cheatsheets I use all the time, but this is a new one. Thanks for sharing!!
I rarely see these character classes used because they tend to obscure the meaning of regular expressions because support across the standard unix tools and documentation is uneven. For example, GNU grep supports [:blank:] but the man page for GNU grep doesn't mention it. Solaris grep doesn't support character classes at all, unless you're using /usr/xpg4/bin/grep, which may. YMMV with HP-UX, AIX, IRIX/ULTRIX, etc, which all ship with their own custom implementations of regular expressions. GNU Emacs adds [:unibyte:], [:multibyte:], [:word:], [:nonascii:], [:graph:], [:ascii:], and some of those are also defined in the POSIX standard. Python may not support "standard" character classes at all-- I couldn't find any mention of them in the online docs. The actual meaning of these character classes varies as well depending on your locale environment variables. This sometimes is a good thing! Ultimately, unless you have the privilege of controlling where your regular expression is used in all cases, you have to fall back to the minimum supported syntax.
Its unique...amazing.Thanks for sharing this.
Leave a new comment