Inside Technique : Easy Cross-Browser Form Validation Using Regular Expressions IntroductionValidating user input is the bane of every software developer’s existence. When you are developing cross-browser web applications (for IE and Netscape v4.0 or higher) this task becomes even less enjoyable due to the lack of useful intrinsic validation functions in JavaScript. Fortunately, JavaScript 1.2 has incorporated regular expressions. In this article I will present a brief tutorial on the basics of regular expressions and then give some examples of how they can be used to simplify data validation. A demonstration page and code library of common validation functions has been included to supplement the examples in the article. Regular Expressions and PatternsRegular expressions are very powerful tools for performing pattern matches. PERL programmers and UNIX shell programmers have enjoyed the benefits of regular expressions for years. Once you master the pattern language, most validation tasks become trivial. You can perform complex tasks that once required lengthy procedures with just a few lines of code using regular expressions. So how are regular expressions implemented in JavaScript? There are two intrinsic objects associated with programming regular expressions: the RegExp object and the Regular Expression object. The RegExp object is the parent to the regular expression object. RegExp has a constructor function that instantiates a Regular Expression object much like the Date object instantiates an new date. If you wanted to create a Regular Expression object, you would use the following syntax: var RegularExpression = new RegExp( “pattern”, [“switch”] ); JavaScript has an alternate syntax for creating Regular Expression objects that implicitly calls the RegExp constructor function. The syntax for that method is the following: var RegularExpression = /pattern/[switch] To use the Regular Expression object to validate the user input you must be able to define a pattern string that represents the search criteria. Patterns are defined using string literal characters and metacharacters. For example, to determine if a string contained a valid US zip code you would use the following search pattern: /(^\d{5}$)|(^\d{5}-\d{4}$)/ At first glance this looks like a comic strip version of something I might say when my code won’t run. It is actually a pattern that you can use to confirm that a string contains a valid 5-digit zip code or zip+4 zip code. The pattern is divided into two parts. Regular expressions use parentheses for grouping and precedence like mathematical expressions. The part in the first set of parentheses matches a 5-digit zip code. The pipe symbol in between denotes an OR operation. The part contained in the second set of parentheses matches a zip+4 zip code. For simplicity, let’s deconstruct just the first part of the pattern, ^\d{5}$.
Translated to English, this pattern states: “Starting at the beginning of the string there must be nothing other than 5 digits. There must also be nothing following those 5 digits.” Categories Of Regular Expression Pattern CharactersPattern-matching characters can be grouped into several categories. The following are categorized tables explaining the use of the pattern-matching characters. Position Matching
Literals
Character Classes
Repetition
Alternation & Grouping
Backreferences
Pattern SwitchesIn addition to the pattern-matching characters, you can use switches to make the match global or case- insensitive or both. The following is an example of a pattern string definition that uses a switch: /\s/g This pattern and switch combination matches all occurrences of a space because it uses the global switch. Below is a table of pattern switches. Switches
The Regular Expression Object The Regular Expression ObjectThe Regular Expression object exposes three methods and two properties. This is the object that performs the work of pattern-matching. The following are tables describing the methods and properties of the regular expression object. Methods
Properties
The RegExp ObjectThe RegExp object exposes four properties and no methods. This is the parent to the regular expression object. It is used to instantiate the regular expression object with its constructor and also to store information about its children’s pattern match searches. The RegExp object cannot be created directly, but it is always available. The RegExp object’s properties remain undefined until a child regular expression object successfully performs a search. The following is a table describing the properties of the RegExp object. Properties
Other Uses For Regular ExpressionsAlthough you can use regular expressions to test a string for validity against a search pattern, there are other uses for them. The String object has four methods that take regular expressions as arguments. Although most of the String methods parallel the methods of the Regular Expression object, the most useful one by far is the replace() method. You can use the replace() method to reformat a string. This is accomplished by using the $1…$9 properties of the RegExp object. Those properties are populated with the contents of the portions of the searched string that matched the portions of the search pattern contained within parentheses. The following example illustrates how to use the replace method to swap the order of first and last names and insert a comma and a space in between them: <SCRIPT LANGUAGE=”JavaScript 1.2”> var objRegExp = /(\w+)\s(\w+)/; var strFullName = ‘Jane Doe’; var strReverseName = strReverseName.replace(objRegExp, ‘$2, $1’); document.write(strReverseName) </SCRIPT> The output of this code will be “Doe, Jane”. How this works is that the pattern in the first parentheses matches “Jane” and this string is placed in the RegExp.$1 property. The \s (space) character match is not saved to the RegExp object because it is not in parentheses. The pattern in the second set of parentheses matches “Doe” and is saved to the RegExp.$2 property. The String replace() method takes the Regular Expression object as its first argument and the replacement text as the second argument. The $2 and $1 in the replacement text are substitution variables that will substitute the contents of RegExp.$2 and RegExp.$1 in the result string. You can also use replace() method to strip unwanted characters from a string before testing the string for validity or before saving the string to a database. It can be used to add formatting characters for the display of a string as well. The following is a table listing the methods of the String object that use regular expressions. String Methods Using Regular Expressions
Sample Usage Sample UsageNow that you’ve been introduced to regular expressions and patterns, let’s look at a few examples of common validation and formatting functions. Valid Phone NumberAssuming the area code and phone are not separate fields, a valid phone number would consist of the area code contained within parentheses, maybe a space, 3 digits, a dash, and 4 more digits. There should be no leading or trailing characters. The first number of the area code may not be a zero. A regular expression to perform that task would look like this: var objRegExp = /^\([1-9]\d{2}\)\s?\d{3}\-\d{4}$/; Valid Date FormatA valid short date should consist of a 2-digit month, date separator, 2-digit day, date separator, and a 4-digit year (e.g. 02/02/2000). It would be nice to allow the user to use any valid date separator character that your backend database supported such as slashes, dashes and periods. You want to be sure the user enters the same date separator character for all occurrences. This can be done with a regular expression like this: var objRegExp = /^\d{1,2}(\-|\/|\.)\d{1,2}\1\d{4}$/ This example uses backreferencing to insure that the second date separator matches the first one. Valid IntegerA valid integer value should contain only digits plus possibly a leading minus sign for negative numbers. A regular expression to do that would look like this: var objRegExp = /(^-?\d\d*$)/; Remove CommasA user may enter a number separated by commas. You may wish to remove the commas before storing the number. A reformatting function to do that would look like this: function removeCommas( strValue ) { var objRegExp = /,/g; //search for commas globally //replace all matches with empty strings return strValue.replace(objRegExp,''); } Demonstration PageAs you can see, complex validation and character replacement tasks can be accomplished quite easily. I have written a library of validation and formatting functions that can be re-used in your applications. This library can be accessed and tested via this demonstration page. Figure 1. Demonstration Page SummaryThis article presented a brief introduction to regular expressions and showed you how they can be used to simplify cross-browser validation tasks. Regular expressions have a host of other uses and are found in many languages. VBScript 5.0 now incorporates regular expressions as part of its object model. To fully understand and appreciate the power of regular expressions, I recommend that you read further on the subject. Further ReadingTo learn more about regular expressions, I suggest the following:
|
Demonstration
Karen Gayda is a Senior Software Engineer in the Internet Services Division at Stellcom Technologies located in San Diego,CA. Her specialties include VB COM, Active Server Pages, Dynamic HTML, SQL Server, and scripting languages. She can be reached at kgayda@yahoo.com