When you think ASP, think...
Recent Articles
All Articles
ASP.NET Articles
ASPFAQs.com
Message Board
Related Web Technologies
User Tips!
Coding Tips

Sections:
Sample Chapters
Commonly Asked Message Board Questions
JavaScript Tutorials
MSDN Communities Hub
Official Docs
Security
Stump the SQL Guru!
XML Info
Information:
Feedback
Author an Article
ASP ASP.NET ASP FAQs Message Board Feedback
Print this page.
Published: Wednesday, March 15, 2000

Utilizing Regular Expressions, Part 2


Read Part 1


In Part 1 we discussed a high-level overview of using the RegExp object in VBScript, as well as special regular expression characters that could be used for position matching. In this part, we'll examine other special characters for regular expressions!

- continued -

Character Classes
Regular expression contains several special characters to help search for a particular set of characters. The most versatile special character is the braces ([]). Braces allow the matching of a specific set of characters. For example, if you wanted to determine if a string contained any vowels, you could use the following regular expression:

[aeiou]

Note that the braces search for a single character. The number of characters within the braces indicates what the valid characters to search for are, but, again, single characters are checked. If you want to determine if a string does not contain a particular character, place a carrot (^) before listing any characters. For example, to find out if a string contained no vowels, the following regular expression could be used:

[^aeiou]

The hyphen character can be used to denote a range of characters. For example, if you wanted to determine if a string contained any uppercase alphabetical characters, you could use the following regular expression:

[^a-z]

To match any single character, use the period (.). For example, 4.uys would match strings like 4Guys, 45uys, and 4zuys. To match a single "word character," use \w. A word character is defined as any alphanumeric character or an underscore (that is, [a-zA-Z_0-9]). \W is the inverse of \w, matching any non-alphanumeric character (that is, [^a-zA-Z_0-9]).

As mentioned earlier in this article, \d matches any single digit, and is synonymous to [0-9]. It's inverse, \D, matches any non-digit, and is synonymous to [^0-9]). \s matches any whitespace character, such as a new-line character (which is represented as \n), a carraige return (\r) or a tab (\t). \S, on the other hand, matches any non-whitespace character.

Character classes are really useful for matching complex patterns. For example, imagine that we wanted to determine if a phone number was valid or not. If we required that phone numbers be in the format (###) ###-####, we could use the following regular expression:

^\(\d\d\d\) \d\d\d-\d\d\d\d$

Note the \( and \). These characters search for a literal left and right parenthesis, respectively. If you wish to search for a literal of a character that also has special meaning (like a parenthesis, a period, a brace, etc.), you must prefix that character with a backslash.

This is useful in form validation (more on this a little later).

repetition
There are several special symbols that can be used to search for repeating substrings or patterns. The curly braces ({n}) searches for exactly n repetitions of the substring is follows. For example, to search for three consecutive digits, the following regular expression could be used:

\d{3}

The curly braces can also accept a second parameter, like {n,p}. When using two parameters with the curly braces, you are indicating that you are willing to accept a certain range of repetitions: the first parameter is the lower bound, while the second parameter is the upper bound. So, to match four to six vowels in succession, the following regular expression could be used:

[aeiou]{4,6}

If the second parameter - the upper bound - is left off, the regular expression searches for n or more occurrences. For example, if we wanted to match four or more successive vowels, the regular expression would be adjusted to [aeiou]{4,}

The question mark (?) matches zero or one occurrences, synonymous to {0,1}. The asterisk (*) matches zero or more occurrences ({0,}) while the plus sign (+) matches one or more occurences ({1,}).

Repition matching is another powerful facet of regular expressions. For example, in an earlier example we demonstrated how to match a phone number. What if we wanted to make the area code optional? We could adjust the regular expression to:

^(\(\d\d\d\) )?\d\d\d-\d\d\d\d$

Note that the parenthesis around the area code group the entire \(\d\d\d\) so the regular expression parser knows where to apply the ? special symbol. Of course, with the curly braces we could pretty up the regular expression a bit to:

^(\(\d{3}\) )?\d{3}-\d{4}$

  • Read Part 3
  • Read Part 1


  • ASP.NET [1.x] [2.0] | ASPMessageboard.com | ASPFAQs.com | Advertise | Feedback | Author an Article