Published: Wednesday, March 15, 2000
Utilizing Regular Expressions
This article is a follow-up to an earlier 4Guys article,
An Introduction to Regular Expression with VBScript.
(If you are not familiar with regular expressions, I highly recommend that you read the
An Introduction to Regular Expression with VBScript
article!) Since the previous article served more as an introduction to what regular expression
is and how to use it with VBScript in an ASP page, this follow-up article will focus on more
detailed aspects of regular expression.
A Quick Overview
A common programming task is to match or find a particular substring within a string.
For simple substrings, the InStr
function works nicely, finding the first instance of a literal substring within a string. If
you want to search for more complex substrings, or wish to search for a pattern rather than a
literal substring, InStr just won't cut it.
Enter Regular Expressions. A regular expression is a string that is used to represent a
complex pattern or substring. For example, a regular expression that identifies a
pattern of two or three successive digits is: \d{2,3}. Don't worry if that looks
like complete gibberish, we'll get to explaining how these regular expressions represent
patterns.
With the VBScript Scripting Engine 5.0 (downloadable for free at http://msdn.microsoft.com/scripting),
VBScript added a regular expression object, RegExp. This object has three properties:
Pattern - which is the actual regular expression; IgnoreCase - a boolean
value indicating whether or not to ignore case; and Global - a boolean value
that indicates whether or not the regular expression should find as many matches as it can in the entire string
or just return the first match.
The RegExp object has three methods as well:
Test, which takes the string to be searched as a parameter, and returns True if
the regular expression is found within the string, False otherwise; Replace,
which takes a string to be searched and a replace string, and replaces all instances of the
matched regular expression in the search string with the text in the replace string; and, finally,
Execute, which expects a search string passed in and returns a Matches
collection containing a Match object for each regular expression match found in the
search string.
In An Introduction to Regular Expression with VBScript we
looked at some simple regular expression matches and demonstrated how to use the Execute
method of the RegExp object. In this article, we'll look at some more advanced
regular expressions, as well as examine the Replace and Test methods.
Position Matching
Regular expressions can be used to match a substring's particular position within a string.
For example, if you wanted to determine if a string began with the substring Scott,
you could use the following code:
<%
Dim objRegExp
Set objRegExp = New RegExp
objRegExp.IgnoreCase = True
objRegExp.Pattern = "^Scott"
Dim strStringToSearch
strStringToSearch = "Scott Mitchell is my name."
'objRegExp.Test(strStringToSearch) will return true, since
'it starts with the substring Scott
Set objRegExp = Nothing 'Clean up!
%>
|
Note that the Test method takes a single parameter, the string to search. The string
is searched for matches specified by the regular expression in the Pattern property
of the RegExp object. Regular expression provides four special characters to search
for a pattern or substring within a specific position of a string, and our outlined in the
table below:
| Symbol | Description |
^ |
Matches the regular expression only if it is at the beginning of the search string. |
$ |
Matches the regular expression only if it is at the end of the search string. |
\b |
Matches any word boundary. A word boundary is the virtual space between two words. For example,
if you want to replace all instances of the word "hell" in a string, and you just do something like:
str = Replace(str, "hell", "heck"), works like "Hello" will be changed to "Hecko", which
is silly. To find only WORDS (and not substrings in other words), use the word boundary symbols like
so: \bhell\b. |
\B |
Matches any non-word boundary. |
You can use the ^ and $ in conjunction to match an entire string.
For example, if you want to determine if a string contains a digit and nothing else, you can
use the following regular expression:
The \d, as we'll see shortly, matches any decimal value (0 - 9)
Without the ^ and $, the regular expression would match any
decimal that was in the search string. If you wanted to use the Test method to
determine if a particular string contained just one digit and nothinge else, you'd
need to use the ^ and $.
Read Part 2