Regular Expressions

Character	Meaning
\	For characters that are usually treated literally, indicates that the next character is special and not to be interpreted literally. For example, `/b/` matches the character 'b'. By placing a backslash in front of b, that is by using `/\b/`, the character becomes special to mean match a word boundary. -or- For characters that are usually treated specially, indicates that the next character is not special and should be interpreted literally. For example, `` is a special character that means 0 or more occurrences of the preceding character should be matched; for example, `/a/` means match 0 or more a's`.` To match `` literally, precede the it with a backslash; for example, `/a\/` matches 'a*'.
^	Matches beginning of input or line. For example, `/^A/` does not match the 'A' in "an A," but does match it in "An A."
$	Matches end of input or line. For example, `/t$/` does not match the 't' in "eater", but does match it in "eat"
*	Matches the preceding character 0 or more times. For example, `/bo*/` matches 'boooo' in "A ghost booooed" and 'b' in "A bird warbled", but nothing in "A goat grunted".
+	Matches the preceding character 1 or more times. Equivalent to `{1,}`. For example, `/a+/` matches the 'a' in "candy" and all the a's in "caaaaaaandy."
?	Matches the preceding character 0 or 1 time. For example, `/e?le?/` matches the 'el' in "angel" and the 'le' in "angle."
.	(The decimal point) matches any single character except the newline character. For example, `/.n/` matches 'an' and 'on' in "nay, an apple is on the tree", but not 'nay'.
(x)	Matches 'x' and remembers the match. For example, `/(foo)/` matches and remembers 'foo' in "foo bar." The matched substring can be recalled from the resulting array's elements `[1]`, ..., `[n]`, or from the predefined `RegExp` object's properties `$1`, ..., `$9`.
x\|y	Matches either 'x' or 'y'. For example, `/green\|red/` matches 'green' in "green apple" and 'red' in "red apple."
{n}	Where `n` is a positive integer. Matches exactly `n` occurrences of the preceding character. For example, `/a{2}/` doesn't match the 'a' in "candy," but it matches all of the a's in "caandy," and the first two a's in "caaandy."
{n,}	Where `n` is a positive integer. Matches at least `n` occurrences of the preceding character. For example, `/a{2,}` doesn't match the 'a' in "candy", but matches all of the a's in "caandy" and in "caaaaaaandy."
{n,m}	Where `n` and `m` are positive integers. Matches at least `n` and at most `m` occurrences of the preceding character. For example, `/a{1,3}/` matches nothing in "cndy", the 'a' in "candy," the first two a's in "caandy," and the first three a's in "caaaaaaandy" Notice that when matching "caaaaaaandy", the match is "aaa", even though the original string had more a's in it.
[xyz]	A character set. Matches any one of the enclosed characters. You can specify a range of characters by using a hypen. For example, `[abcd]` is the same as `[a-c]`. They match the 'b' in "brisket" and the 'c' in "ache"`.`
[^xyz]	A negated or complemented character set. That is, it matches anything that is not enclosed in the brackets. You can specify a range of characters by using a hypen. For example, `[^abc]` is the same as `[^a-c]`. They initially match 'r' in "brisket" and 'h' in "chop."
[\b]	Matches a backspace. (Not to be confused with `\b`.)
\b	Matches a word boundary, such as a space. (Not to be confused with `[\b]`.) For example, `/\bn\w/` matches the 'no' in "noonday";`/\wy\b/` matches the 'ly' in "possibly yesterday."
\B	Matches a non-word boundary. For example, `/\w\Bn/` matches 'on' in "noonday", and `/y\B\w/` matches 'ye' in "possibly yesterday."
\cX	Where X is a control character. Matches a control character in a string. For example, `/\cM/` matches control-M in a string.
\d	Matches a digit character. Equivalent to `[0-9]`. For example, `/\d/` or `/[0-9]/` matches '2' in "B2 is the suite number."
\D	Matches any non-digit character. Equivalent to `[^0-9]`. For example, `/\D/` or `/[^0-9]/` matches 'B' in "B2 is the suite number."
\f	Matches a form-feed.
\n	Matches a linefeed.
\r	Matches a carriage return.
\s	Matches a single white space character, including space, tab, form feed, line feed. Equivalent to `[ \f\n\r\t\v]`. for example, `/\s\w*/` matches ' bar' in "foo bar."
\S	Matches a single character other than white space. Equivalent to `[^ \f\n\r\t\v]`. For example, `/\S/\w*` matches 'foo' in "foo bar."
\t	Matches a tab
\v	Matches a vertical tab.
\w	Matches any alphanumeric character including the underscore. Equivalent to `[A-Za-z0-9_]`. For example, `/\w/` matches 'a' in "apple," '5' in "$5.28," and '3' in "3D."
\W	Matches any non-word character. Equivalent to `[^A-Za-z0-9_]`. For example, `/\W/` or `/[^$A-Za-z0-9_]/` matches '%' in "50%."
\n	Where n is a positive integer. A back reference to the last substring matching the n parenthetical in the regular expression (counting left parentheses). For example, `/apple(,)\sorange\1/` matches 'apple, orange', in "apple, orange, cherry, peach." A more complete example follows this table. Note: If the number of left parentheses is less than the number specified in \n, the \n is taken as an octal escape as described in the next row.
\ooctal \xhex	Where `\o``octal` is an octal escape value or `\x``hex` is a hexadecimal escape value. Allows you to embed ASCII codes into regular expressions.

Using Parentheses

Parentheses around any part of the regular expression pattern cause that part of the matched substring to be remembered. Once remembered, the substring can be recalled for other use, as described in "Using Parenthesized Substring Matches" on page 108.

For example, the pattern /Chapter (\d+)\.\d*/ illustrates additional escaped and special characters and indicates that part of the pattern should be remembered. It matches precisely the characters 'Chapter ' followed by one or more numeric characters (\d means any numeric character and + means 1 or more times), followed by a decimal point (which in itself is a special character; preceding the decimal point with \ means the pattern must look for the literal character '.'), followed by any numeric character 0 or more times (\d means numeric character, * means 0 or more times). In addition, parentheses are used to remember the first matched numeric characters.

This pattern is found in "Open Chapter 4.3, paragraph 6" and '4' is remembered. The pattern is not found in "Chapters 3 and 4", because that string does not have a period after the '3'.

Working With Regular Expressions

Regular expressions are used with the regular expression methods test and exec and with the String methods match, replace, search, and split. These methods are explained in detail at their linked locations.

exec
A regular expression method that executes a search for a match in a string. It returns an array of information.

test
A regular expression method that tests for a match in a string. It returns true or false.

match
A String method that executes a search for a match in a string. It returns an array of information or null on a mismatch.

search
A String method that tests for a match in a string. It returns the index of the match, or -1 if the search fails.

replace
A String method that executes a search for a match in a string, and replaces the matched substring with a replacement substring.

split
A String method that uses a regular expression or a fixed string to break a string into an array of substrings.

When you want to know whether a pattern is found in a string, use the test or search method; for more information (but slower execution) use the exec or match methods. If you use exec or match and if the match succeeds, these methods return an array and update properties of the associated regular expression object and also of the predefined regular expression object, RegExp. If the match fails, the exec method returns null (which converts to false).

For information on the predefined RegExp object and its properties, see Chapter 11, "The RegExp Object."

In the following example, the script uses the exec method to find a match in a string.

<SCRIPT LANGUAGE="JavaScript1.2">
myRe=/d(b+)d/g;
myArray = myRe.exec("cdbbdbsbz");
</SCRIPT>

If you do not need to access the properties of the regular expression, an alternative way of creating myArray is with this script:

<SCRIPT LANGUAGE="JavaScript1.2">
myArray = /d(b+)d/g.exec("cdbbdbsbz");
</SCRIPT>

If you want to be able to recompile the regular expression, yet another alternative is this script:

<SCRIPT LANGUAGE="JavaScript1.2">
myRe= new RegExp ("d(b+)d", "g:);
myArray = myRe.exec("cdbbdbsbz");
</SCRIPT>

With these scripts, the match succeeds and returns the array and updates the properties shown in Table 10.2.

Table 10.2 Results of regular expression execution.

Object Property or Index Description In this example

myArray

The matched string and all remembered substrings
["dbbd", "bb"]

index
The 0-based index of the match in the input string
1

input
The original string
"cdbbdbsbz"

[0]
The last matched characters
"dbbd"

myRe

lastIndex
The index at which to start the next match. (This property is set only if the regular expression uses the g option, described in "Executing a Global Search and Ignoring Case" on page 110.)
5

source
The text of the pattern
"d(b+)d"

RegExp

lastMatch
The last matched characters
"dbbd"

leftContext
The substring preceding the most recent match
"c"

rightContext
The substring following the most recent match
"bsbz"

Object	Property or Index	Description	In this example
myArray		The matched string and all remembered substrings	`["dbbd", "bb"]`
index	The 0-based index of the match in the input string	`1`
input	The original string	`"cdbbdbsbz"`
[0]	The last matched characters	`"dbbd"`
myRe	lastIndex	The index at which to start the next match. (This property is set only if the regular expression uses the `g` option, described in "Executing a Global Search and Ignoring Case" on page 110.)	`5`
source	The text of the pattern	`"d(b+)d"`
RegExp	lastMatch	The last matched characters	`"dbbd"`
leftContext	The substring preceding the most recent match	`"c"`
rightContext	The substring following the most recent match	`"bsbz"`

RegExp.leftContext and RegExp.rightContext can be computed from the other values. RegExp.leftContext is equivalent to:

myArray.input.substring(0, myArray.index)

and RegExp.rightContext is equivalent to:

myArray.input.substring(myArray.index + myArray[0].length)

As shown in the second form of this example, you can use the literal form of a regular expression without assigning it to a variable. If you do, however, every occurrence of the literal is a new regular expression. For this reason, if you use the literal form without assigning it to a variable, you cannot subsequently access the properties of that regular expression. For example, assume you have this script:

<SCRIPT LANGUAGE="JavaScript1.2">
myRe=/d(b+)d/g;
myArray = myRe.exec("cdbbdbsbz");
document.writeln("The value of lastIndex is " + myRe.lastIndex);
</SCRIPT>

This script displays:

The value of lastIndex is 5

However, if you have this script:

<SCRIPT LANGUAGE="JavaScript1.2">
myArray = /d(b+)d/g.exec("cdbbdbsbz");
document.writeln("The value of lastIndex is " + /d(b+)d/g.lastIndex);
</SCRIPT>

It displays:

The value of lastIndex is 0

The occurrences of /d(b+)d/g in the two statements are different regular expression objects and hence have different values for their lastIndex property. If you need to access the properties of a literal regular expression, you should first assign it to a variable.

Using Parenthesized Substring Matches

Including parentheses in a regular expression pattern causes the corresponding submatch to be remembered. For example, /a(b)c/ matches the characters 'abc' and remembers 'b'. To recall these parenthesized substring matches, use the RegExp properties $1, ..., $9 or the Array elements [1], ..., [n].

The number of possible parenthesized substrings is unlimited. The predefined RegExp object holds up to the last nine and the returned array holds all that were found. The following examples illustrate how to use parenthesized substring matches.

Example 1. : The following script uses the replace method to switch the words in the string. For the replacement text, the script uses the values of the $1 and $2 properties.

<SCRIPT LANGUAGE="JavaScript1.2">
re = /(\w+)\s(\w+)/;
str = "John Smith";
newstr = str.replace(re, "$2, $1");
document.write(newstr)
</SCRIPT>

This prints "Smith, John".

Example 2. : In the following example, RegExp.input is set by the Change event. In the getInfo function, the exec method uses the value of RegExp.input as its argument. Note that RegExp must be prepended to its $ properties (because they appear outside the replacement string). (Example 3 is a more efficient, though possibly more cryptic, way to accomplish the same thing.)

<HTML>

<SCRIPT LANGUAGE="JavaScript1.2">
function getInfo(){
   re = /(\w+)\s(\d+)/
   re.exec();
   window.alert(RegExp.$1 + ", your age is " + RegExp.$2);
}
</SCRIPT>

Enter your first name and your age, and then press Enter.

<FORM>
<INPUT TYPE:"TEXT" NAME="NameAge" onChange="getInfo(this);">
</FORM>

</HTML>

Example 3. : The following example is similar to Example 2. Instead of using the RegExp.$1 and RegExp.$2, this example creates an array and uses a[1] and a[2]. It also uses the shortcut notation for using the exec method.

<HTML>

<SCRIPT LANGUAGE="JavaScript1.2">
function getInfo(){
   a = /(\w+)\s(\d+)/();
   window.alert(a[1] + ", your age is " + a[2]);
}
</SCRIPT>

Enter your first name and your age, and then press Enter.

<FORM>
<INPUT TYPE:"TEXT" NAME="NameAge" onChange="getInfo(this);">
</FORM>

</HTML>

Executing a Global Search and Ignoring Case

Regular expressions have two optional flags that allow for global and case insensitive searching. To indicate a global search, use the g flag. To indicate a case insensitive search, use the i flag. These flags can be used separately or together in either order, and are included as part of the regular expression.

To include a flag with the regular expression, use this syntax:

re = /pattern/[g|i|gi]
re = new RegExp("pattern", ['g'|'i'|'gi'])

Note that the flags, i and g, are an integral part of a regular expression. They cannot be added or removed later.

For example, re = /\w+\s/g creates a regular expression that looks for one or more characters followed by a space, and it looks for this combination throughout the string.

<SCRIPT LANGUAGE="JavaScript1.2">
re = /\w+\s/g;
str = "fee fi fo fum";
myArray = str.match(re);
document.write(myArray);
</SCRIPT>

This displays ["fee ", "fi ", "fo "]. In this example, you could replace the line:

re = /\w+\s/g;

with:

re = new RegExp("\\w+\\s", "g");

and get the same result.

Examples

Changing the Order in an Input String

The following example illustrates the formation of regular expressions and the use of string.split() and string.replace().

It cleans a roughly-formatted input string containing names (first name first) separated by blanks, tabs and exactly one semicolon.

Finally, it reverses the name order (last name first) and sorts the list.

<SCRIPT LANGUAGE="JavaScript1.2">

// The name string contains multiple spaces and tabs,
// and may have multiple spaces between first and last names.
names = new String ( "Harry Trump ;Fred Barney; Helen Rigby ;\
       Bill Abel ;Chris Hand ")

document.write ("---------- Original String" + "<BR>" + "<BR>")
document.write (names + "<BR>" + "<BR>")

// Prepare two regular expression patterns and array storage.
// Split the string into array elements.

// pattern: possible white space then semicolon then possible white space
pattern = /\s*;\s*/

// Break the string into pieces separated by the pattern above and
// and store the pieces in an array called nameList
nameList = names.split (pattern)

// new pattern: one or more characters then spaces then characters.
// Use parentheses to "memorize" portions of the pattern.
// The memorized portions are referred to later.
pattern = /(\w+)\s+(\w+)/

// New array for holding names being processed.
bySurnameList = new Array;

// Display the name array and populate the new array
// with comma-separated names, last first.
//
// The replace method removes anything matching the pattern
// and replaces it with the memorized string--second memorized portion
// followed by comma space followed by first memorized portion.
// 
// The variables $1 and $2 refer to the portions
// memorized while matching the pattern.

document.write ("---------- After Split by Regular Expression" + "<BR>")
for ( i = 0; i < nameList.length; i++) {
   document.write (nameList[i] + "<BR>")
   bySurnameList[i] = nameList[i].replace (pattern, "$2, $1")
}

// Display the new array.
document.write ("---------- Names Reversed" + "<BR>")
for ( i = 0; i < bySurnameList.length; i++) {
   document.write (bySurnameList[i] + "<BR>")
}

// Sort by last name, then display the sorted array.
bySurnameList.sort()
document.write ("---------- Sorted" + "<BR>")
for ( i = 0; i < bySurnameList.length; i++) {
   document.write (bySurnameList[i] + "<BR>")
}

document.write ("---------- End" + "<BR>")

</SCRIPT>

Using Special Characters to Verify Input

In the following example, a user enters a phone number. When the user presses Enter, the script checks the validity of the number. If the number is valid (matches the character sequence specified by the regular expression), the script posts a window thanking the user and confirming the number. If the number is invalid, the script posts a window telling the user that the phone number isn't valid.

The regular expression looks for zero or one open parenthesis $?, followed by three digits \d{3}, followed by zero or one close parenthesis $?, followed by one dash, forward slash, or decimal point and when found, remember the character ([-\/\.]), followed by three digits \d{3}, followed by the remembered match of a dash, forward slash, or decimal point \1, followed by four digits \d{4}.

The Change event activated when the user presses Enter, sets the value of RegExp.input.

<HTML>
<SCRIPT LANGUAGE = "JavaScript1.2">

re = /\(?\d{3}\)?([-\/\.])\d{3}\1\d{4}/

function testInfo() {
   OK = re.exec()
   if (!OK)
      window.alert (RegExp.input + 
         " isn't a phone number with area code!")
   else
      window.alert ("Thanks, your phone number is " + OK[0])
}

</SCRIPT>

Enter your phone number (with area code) and then press Enter.
<FORM> 
<INPUT TYPE:"TEXT" NAME="Phone" onChange="testInfo(this);">
</FORM>

</HTML>

[Contents] [Previous] [Next] [Last]

Last Updated: 10/22/97 11:48:12

`exec`	A regular expression method that executes a search for a match in a string. It returns an array of information.
`test`	A regular expression method that tests for a match in a string. It returns true or false.
`match`	A `String` method that executes a search for a match in a string. It returns an array of information or null on a mismatch.
`search`	A `String` method that tests for a match in a string. It returns the index of the match, or -1 if the search fails.
`replace`	A `String` method that executes a search for a match in a string, and replaces the matched substring with a replacement substring.
`split`	A `String` method that uses a regular expression or a fixed string to break a string into an array of substrings.