Regular Expressions are a powerful method of searching for text. Regular expressions are text patterns that are described using tokens. There are tokens to describe individual characters, words, punctuation, white space and more. Regular expressions should be considered an advanced topic, and if you do not feel comfortable using regular expressions you should continue to use regular matching.
The following is a list of all the tokens you can use with a description of what the token does.
Token
|
Description
|
.
|
Matches any character
|
\d
|
Matches a single digit from 0-9
|
\D
|
Matches any character that is not a digit from 0-9 including white space
|
\w
|
Matches a single word character including upper and lowercase letters from a to z and digits from 0-9 and the underscore character _
|
\W
|
Matches any non word character
|
\t
|
The tab character
|
\n
|
The new line (linefeed) character
|
\r
|
The carriage-return character
|
\f
|
The form-feed character
|
$
|
Matches the end of a line
|
^
|
Matches the start of a line
|
\s
|
Matches a white space character including new line, carriage-return, tab, form-feed, and end-of-line.
|
\S
|
Matches any non white space character
|
\b
|
Matches a word boundary which is the character immediately before the start of a word (a character matching \w). Normally this matches a space, tab, end of line, or beginning of line.
|
\B
|
Matches a non word boundary
|
{
|
Defines the start of a range
|
}
|
Defines the end of a range
|
(
|
Defines the start of a group
|
)
|
Defines the end of a group
|
|
|
A symbol meaning OR
|
*
|
Indicates the preceding should be repeated zero or more times
|
+
|
Indicates the preceding should be repeated one or more times
|
?
|
Indicates the preceding should be repeated zero or one times
|
{n}
|
Indicates the preceding should be repeated n times
|
{n,}
|
Indicates the preceding should be repeated at least n times
|
{n,m}
|
Indicates the preceding should be repeated at least n times but no more than m times
|
\
|
The following character should not be a meta character (allows you to match reserved characters like \, +, ., *, {, }, [, ], ?, ^, $, (, ))
|
In addition to these tokens, you can create character classes to match a single character to a user defined set of characters.
Pattern
|
Description
|
[abc]
|
Matches the characters within the brackets (in this case, abc)
|
[^abc]
|
Matches anything except the characters within the brackets (in this case, anything except abc)
|
[a-g]
|
Matches the characters a range of characters starting at the letter before the hyphen and ending at the letter after the hyphen (in this case, abcdefg)
|
[a-gm-p]
|
Matches the characters in two ranges (in this case, abcdefgmnop)
|
[a-g[m-p]]
|
Same as above
|
[a-z&&[m-p]]
|
The intersection of the sets a-z and m-p (in this case, mnop)
|
[a-z&&[^m-p]]
|
Subtracts a set from another set (in this case, all letters from a-z except mnop)
|
Additionally, there are predefined Portable Operating System Interface for UNIX (POSIX) character classes which define common character classes.
Pattern
|
Description
|
\p{Lower}
|
Matches lower case letters from a to z
|
\p{Upper}
|
Matches upper case letters from A to Z
|
\p{ASCII}
|
Matches all ASCII characters
|
\p{Alpha}
|
Matches all upper and lower case characters
|
\p{Digit}
|
Matches all digits from 0 to 9
|
\p{Alnum}
|
Matches all numbers and letters
|
\p{Punct}
|
Matches all punctuation symbols
|
\p{Graph}
|
Matches all visible characters
|
\p{Print}
|
Matches all printable characters
|
\p{Blank}
|
Matches a tab or space
|
\p{Cntrl}
|
Matches a control character
|
\p{XDigit}
|
Matches a hexadecimal digit
|
\p{Space}
|
Matches a white space character
|
If you have additional questions about using regular expressions, please feel free to contact Zizasoft support at
support@zizasoft.com
.
|