Character Classes

The matching operator, m/  /, does not only accept string literals but also character classes. Character classes are groups of characters that share certain syntactic traits. (Syntactic refers here to programming languages.) There are three predefined classes:

NAME NOTATION CHARACTERS NEGATION
digits   \d 0-9   \D
word characters   \w a-z, A-Z, 0-9, _ (underscore)   \W
whitespace characters   \s space, tab, newline, return, formfeed   \S

Negation means the logical opposite: \w refers to all word characters, and \W refers to all non-word characters (which includes punctuation marks and similar characters).

To test whether a string contains any printable characters, you could check if it contains any non-whitespace characters:

if ($myString =~ m/\S/) 
{
     ... do something ...
}

This, incidentally, is be equivalent to

if ($myString !~ m/\s/)
{
     ... do something ...
}

Which one to use is a matter of choice. There's always more than one way to do it...

"Custom" Character Classes

Perl also lets you define character classes. A typical instance is if you want to change a word with variant spellings. Character classes are defined in square brackets [ . . . ]; any character given within the brackets is part of the class.

The following expression substitutes 'grey' or 'gray' with 'pink':

s/gr[ea]y/pink/g;

Consecutive characters can be expressed as ranges:

a-z ==> any lowercase letter
A-Z ==> any uppercase letter
0-9 ==> any digit

$line =~ m/[1-4]/;

The previous expression returns true if $line contains at least one 1, 2, 3, or 4.

Negation in character classes is expressed by a caret, ^.

$line =~ m/[^1-4]/;

This expression returns true if $line contains at least one character that is not a 1, 2, 3, or 4.

Note: The caret is also used as an anchor that symbolizes the beginning of a line.

$line =~ m/^[1-4]/;

The meaning of this expression is very different: it returns true if $line starts with a 1, 2, 3, or 4.