You're viewing a comment by Justin and its responses.

Justin Permalink
February 28, 2013, 17:54

I rarely see these character classes used because they tend to obscure the meaning of regular expressions because support across the standard unix tools and documentation is uneven. For example, GNU grep supports [:blank:] but the man page for GNU grep doesn't mention it. Solaris grep doesn't support character classes at all, unless you're using /usr/xpg4/bin/grep, which may. YMMV with HP-UX, AIX, IRIX/ULTRIX, etc, which all ship with their own custom implementations of regular expressions. GNU Emacs adds [:unibyte:], [:multibyte:], [:word:], [:nonascii:], [:graph:], [:ascii:], and some of those are also defined in the POSIX standard. Python may not support "standard" character classes at all-- I couldn't find any mention of them in the online docs. The actual meaning of these character classes varies as well depending on your locale environment variables. This sometimes is a good thing! Ultimately, unless you have the privilege of controlling where your regular expression is used in all cases, you have to fall back to the minimum supported syntax.

Reply To This Comment

(why do I need your e-mail?)

(Your twitter name, if you have one. (I'm @pkrumins, btw.))

Type the first letter of your name: (just to make sure you're a human)

Please preview the comment before submitting to make sure it's OK.