Inside Technique : Easy Cross-Browser Form Validation Using Regular Expressions
By Karen Gayda

Introduction

Validating user input is the bane of every software developer’s existence. When you are developing cross-browser web applications (for IE and Netscape v4.0 or higher) this task becomes even less enjoyable due to the lack of useful intrinsic validation functions in JavaScript. Fortunately, JavaScript 1.2 has incorporated regular expressions. In this article I will present a brief tutorial on the basics of regular expressions and then give some examples of how they can be used to simplify data validation. A demonstration page and code library of common validation functions has been included to supplement the examples in the article.

Regular Expressions and Patterns

Regular expressions are very powerful tools for performing pattern matches. PERL programmers and UNIX shell programmers have enjoyed the benefits of regular expressions for years. Once you master the pattern language, most validation tasks become trivial. You can perform complex tasks that once required lengthy procedures with just a few lines of code using regular expressions.

So how are regular expressions implemented in JavaScript? There are two intrinsic objects associated with programming regular expressions: the RegExp object and the Regular Expression object. The RegExp object is the parent to the regular expression object. RegExp has a constructor function that instantiates a Regular Expression object much like the Date object instantiates an new date. If you wanted to create a Regular Expression object, you would use the following syntax:

var RegularExpression  =  new RegExp( “pattern”, [“switch”] );

JavaScript has an alternate syntax for creating Regular Expression objects that implicitly calls the RegExp constructor function. The syntax for that method is the following:

var RegularExpression = /pattern/[switch]

To use the Regular Expression object to validate the user input you must be able to define a pattern string that represents the search criteria. Patterns are defined using string literal characters and metacharacters. For example, to determine if a string contained a valid US zip code you would use the following search pattern:

/(^\d{5}$)|(^\d{5}-\d{4}$)/

At first glance this looks like a comic strip version of something I might say when my code won’t run. It is actually a pattern that you can use to confirm that a string contains a valid 5-digit zip code or zip+4 zip code. The pattern is divided into two parts. Regular expressions use parentheses for grouping and precedence like mathematical expressions. The part in the first set of parentheses matches a 5-digit zip code. The pipe symbol in between denotes an OR operation. The part contained in the second set of parentheses matches a zip+4 zip code.

For simplicity, let’s deconstruct just the first part of the pattern, ^\d{5}$.

  • ^ indicates the beginning of the string. Using a ^ metacharacter requires that the match start at the beginning.
  • \d indicates a digit character and the {5} following it means that there must be 5 consecutive digit characters.
  • $ indicates the end of the string. Using a $ metacharacter requires that the match end at the end of the string.

Translated to English, this pattern states: “Starting at the beginning of the string there must be nothing other than 5 digits. There must also be nothing following those 5 digits.”

Categories Of Regular Expression Pattern Characters

Pattern-matching characters can be grouped into several categories. The following are categorized tables explaining the use of the pattern-matching characters.

Position Matching

Symbol Function
^

Only matches the beginning of a string.

"^P" matches first "P" in "Paul Peterson, President."

$

Only matches the ending of a string.

"t$" matches the last "t" in "A cat in the hat"

\b

Matches any word boundary (test characters must exist at the beginning or end of a word within the string)

"ly\b" matches "ly" in "regular expressions are really cool."

\B

Matches any non-word boundary

“\Bor” matches the “or” in normal but not the one in origami.

Literals

Symbol Function
Alphanumeric

Matches alphabetical and numerical characters literally.

“2 days” matches “2 days”

\n

Matches a new line character

\f

Matches a form feed character

\r

Matches carriage return character

\t

Matches a horizontal tab character

\v

Matches a vertical tab character

\?

Matches ?

\*

Matches *

\+

Matches +

\.

Matches .

\|

Matches |

\{

Matches {

\}

Matches }

\\

Matches \

\[

Matches [

\]

Matches ]

\(

Matches (

\)

Matches )

\xxx

Matches the ASCII character expressed by the octal number xxx.

"\50" matches left parentheses character "("

\xdd

Matches the ASCII character expressed by the hex number dd.

"\x28" matches left parentheses character "("

\uxxxx

Matches the ASCII character expressed by the UNICODE xxxx.

"\u00A3" matches "£".

Character Classes

Symbol Function
[xyz]

Match any one character enclosed in the character set.

"/[AN]BC/" matches ABC and NBC but not BBC since the leading “B” is not in the set.

[^xyz]

Match any one character not enclosed in the character set. The caret indicates that none of the characters

"/[^AN]BC/" matches BBC and NBC but not ABC or NBC.

NOTE: the caret used within a character class is not to be confused with the caret that denotes the beginning of a string. Negation is only performed within the square brackets.

.


“\b.t\” matches bat, bit, but, bet.

\w

Match any single word (non- punctuation or non-whitespace) character. Equivalent to [a-zA-Z_0-9].

\W

Match any single non-word character. Equivalent to [^a-zA-Z_0-9].

\d

Match any single digit. Equivalent to [0-9].

\D

Match any non-digit. Equivalent to [^0-9].

\s

Match any single space character. Equivalent to [ \t\r\n\v\f].

\S

Match any single non-space character. Equivalent to [^ \t\r\n\v\f].

Repetition

SymbolFunction
{x}

Match exactly x occurrences of a regular expression.

"\d{5}" matches 5 digits.

{x,}

Match x or more occurrences of a regular expression.

"\s{2,}" matches at least 2 space characters.

{x,y}

Matches x to y number of occurrences of a regular expression.
B "\d{2,3}" matches at least 2 but no more than 3 digits.

?

Match zero or one occurrences. Equivalent to {0,1}.

"a\s?b" matches "ab" or "a b".

*

Match zero or more occurrences. Equivalent to {0,}.

+

Match one or more occurrences. Equivalent to {1,}.

Alternation & Grouping

SymbolFunction
()

Grouping a clause to create a clause. May be nested. "(abc)+(def)" matches one or more occurrences of "abc" followed by one occurrence of "def".

|

Alternation combines clauses into one regular expression and then matches any of the individual clauses.

"(ab)|(cd)|(ef)" matches "ab" or "cd" or "ef".

Backreferences

SymbolFunction
()\n

Matches a parenthesized clause in the pattern string. n is the number of the clause to the left of the backreference.

"(\w+)\s+\1" matches any word that occurs twice in a row, such as "hubba hubba." The \1 denotes that the first word after the space must match the portion of the string that matched the pattern in the last set of parentheses. If there were more than one set of parentheses in the pattern string you would use \2 or \3 to match the appropriate grouping to the left of the backreference. Up to 9 backreferences can be used in a pattern string.

Pattern Switches

In addition to the pattern-matching characters, you can use switches to make the match global or case- insensitive or both. The following is an example of a pattern string definition that uses a switch:

/\s/g

This pattern and switch combination matches all occurrences of a space because it uses the global switch. Below is a table of pattern switches.

Switches

PropertyDescription
i

Ignore the case of characters.

g

Global search for all occurrences of a pattern

gi

Global search, ignore case.


The Regular Expression Object

The Regular Expression Object

The Regular Expression object exposes three methods and two properties. This is the object that performs the work of pattern-matching. The following are tables describing the methods and properties of the regular expression object.

Methods

Method Description
test(string)

Tests a string for pattern matches. This method returns a Boolean that indicates whether or not the specified pattern exists within the searched string. This is the most commonly used method for validation. It updates some of the properties of the parent RegExp object following a successful search.

exec(string)

Executes a search for a pattern within a string. If the pattern is not found, exec() returns a null value. If it finds one or more matches it returns an array of the match results. It also updates some of the properties of the parent RegExp object.

compile(pattern)

Compiles a regular expression pattern. When a regular expression is instantiated, the initial pattern is automatically compiled. Resetting the value of the pattern is slow. To speed up execution of your scripts it is recommended that you use the compile method if you plan to repeatedly update the pattern (such as in a loop).

Properties

Property Description
source

Stores a copy of the regular expression pattern.

lastIndex

The index from which to begin the next search.

The RegExp Object

The RegExp object exposes four properties and no methods. This is the parent to the regular expression object. It is used to instantiate the regular expression object with its constructor and also to store information about its children’s pattern match searches. The RegExp object cannot be created directly, but it is always available. The RegExp object’s properties remain undefined until a child regular expression object successfully performs a search. The following is a table describing the properties of the RegExp object.

Properties

Property Description
$n

n represents a number from 1 to 9
Stores the nine most recently memorized portions of a parenthesized match pattern. For example, if the pattern used by a regular expression for the last match was /(Hello)(\s+)(world)/ and the string being searched was “Hello world” the contents of RegExp.$2 would be all of the space characters between “Hello” and “world”.

index

Stores the beginning character position of the first successful match found in the searched string.

input

Stores the string against which a search was performed.

lastIndex

Stores the beginning character position of the last successful match found in the searched string. If no match was found, the lastIndex property is set to –1.

Other Uses For Regular Expressions

Although you can use regular expressions to test a string for validity against a search pattern, there are other uses for them. The String object has four methods that take regular expressions as arguments. Although most of the String methods parallel the methods of the Regular Expression object, the most useful one by far is the replace() method.

You can use the replace() method to reformat a string. This is accomplished by using the $1…$9 properties of the RegExp object. Those properties are populated with the contents of the portions of the searched string that matched the portions of the search pattern contained within parentheses. The following example illustrates how to use the replace method to swap the order of first and last names and insert a comma and a space in between them:

<SCRIPT LANGUAGE=”JavaScript 1.2”>
  var objRegExp = /(\w+)\s(\w+)/;
  var strFullName = ‘Jane Doe’;
  var strReverseName = strReverseName.replace(objRegExp, ‘$2, $1’);
  document.write(strReverseName)
</SCRIPT>

The output of this code will be “Doe, Jane”. How this works is that the pattern in the first parentheses matches “Jane” and this string is placed in the RegExp.$1 property. The \s (space) character match is not saved to the RegExp object because it is not in parentheses. The pattern in the second set of parentheses matches “Doe” and is saved to the RegExp.$2 property. The String replace() method takes the Regular Expression object as its first argument and the replacement text as the second argument. The $2 and $1 in the replacement text are substitution variables that will substitute the contents of RegExp.$2 and RegExp.$1 in the result string.

You can also use replace() method to strip unwanted characters from a string before testing the string for validity or before saving the string to a database. It can be used to add formatting characters for the display of a string as well.

The following is a table listing the methods of the String object that use regular expressions.

String Methods Using Regular Expressions

Method Description
match( regular expression )

Returns, as an array, the results of search using the supplied Regular Expression object. Similar to the exec() method of the Regular Expression object. Also updates the $1…$9 properties in the RegExp object.

replace( regular expression, replacement text )

Returns a copy of the string with text replaced with the specified RegExp $1…$9 properties.

split ( string literal or regular expression )

Returns the array of strings that results when a string is separated into substrings. Splitting is done based on the occurrences of the string literal or regular expression matches.

search( regular expression )

Returns the position of first substring match in a regular expression search. Similar to reading the index property of the RegExp object after executing the exec() or test() methods.


Sample Usage

Sample Usage

Now that you’ve been introduced to regular expressions and patterns, let’s look at a few examples of common validation and formatting functions.

Valid Phone Number

Assuming the area code and phone are not separate fields, a valid phone number would consist of the area code contained within parentheses, maybe a space, 3 digits, a dash, and 4 more digits. There should be no leading or trailing characters. The first number of the area code may not be a zero. A regular expression to perform that task would look like this:

var objRegExp  = /^\([1-9]\d{2}\)\s?\d{3}\-\d{4}$/;

Valid Date Format

A valid short date should consist of a 2-digit month, date separator, 2-digit day, date separator, and a 4-digit year (e.g. 02/02/2000). It would be nice to allow the user to use any valid date separator character that your backend database supported such as slashes, dashes and periods. You want to be sure the user enters the same date separator character for all occurrences. This can be done with a regular expression like this:

var objRegExp = /^\d{1,2}(\-|\/|\.)\d{1,2}\1\d{4}$/

This example uses backreferencing to insure that the second date separator matches the first one.

Valid Integer

A valid integer value should contain only digits plus possibly a leading minus sign for negative numbers. A regular expression to do that would look like this:

var objRegExp  = /(^-?\d\d*$)/;

Remove Commas

A user may enter a number separated by commas. You may wish to remove the commas before storing the number. A reformatting function to do that would look like this:

function removeCommas( strValue ) {
  var objRegExp = /,/g; //search for commas globally
  //replace all matches with empty strings
  return strValue.replace(objRegExp,'');
}

Demonstration Page

As you can see, complex validation and character replacement tasks can be accomplished quite easily. I have written a library of validation and formatting functions that can be re-used in your applications. This library can be accessed and tested via this demonstration page.


Figure 1. Demonstration Page

Summary

This article presented a brief introduction to regular expressions and showed you how they can be used to simplify cross-browser validation tasks. Regular expressions have a host of other uses and are found in many languages. VBScript 5.0 now incorporates regular expressions as part of its object model. To fully understand and appreciate the power of regular expressions, I recommend that you read further on the subject.

Further Reading

To learn more about regular expressions, I suggest the following:

  • JavaScript Bible, 3rd Edition by Danny Goodman, http://www.idgbooks.com/
  • Myriad Voices Homepage, http://www.myriadvoices.com/
  • Microsoft Developer Network, http://msdn.microsoft.com/
  • Webreference.com’s JavaScript section, http://www.webreference.com/js/column5/
  • Mastering Regular Expressions: Powerful Techniques for Perl and Other Tools by Jeffrey Friedl, http://public.yahoo.com/~jfriedl/regex/

Demonstration

Easy Form Validation Using Regular Expressions

Enter a value in textbox, select the appropriate button to test validation or formatting function:

Validation Functions

Formatting Functions

About the Author

Karen Gayda is a Senior Software Engineer in the Internet Services Division at Stellcom Technologies located in San Diego,CA. Her specialties include VB COM, Active Server Pages, Dynamic HTML, SQL Server, and scripting languages. She can be reached at kgayda@yahoo.com