Utilizing Regular Expression SubMatches
By Scott Mitchell
| More About Regular Expressions |
|---|
| This article examines advanced features of regular expressions. For more basic and beginner-level information on regular expressions, as well as a number of articles illustrating various applications of regular expressions, be sure to check out the articles at the Regular Expression Article Index! |
Introduction
With version 5.0 of the VBScript Scripting Engine, Microsoft (finally) added regular expression support for VBScript
in the form of a COM object. (JScript has enjoyed intrinsic regular expression support for quite a bit longer.)
While version 5.0 implemented regular expression support, it only implemented some of the more basic regular
expression features. Fortunately, with the 5.5 version of the scripting engines, Microsoft beefed up their
regular expression object's capabilities, allowing for non-greedy pattern matching
and other powerful features.
This article examines one of these features available with version 5.5 of the scripting
engines: using the SubMatches collection property of the Matches object. Therefore, to
utilize the techniques discussed in this article you will need to have Microsoft's scripting engines version 5.5 or
greater installed on your Web server. To determine what server-side scripting language version you are using on
your Web server, check out: Determining the Server-Side Scripting Language and Version.
You can download (for free) the latest version of the scripting engines at:
http://msdn.microsoft.com/scripting/default.htm?/scripting/vbscript/download/vbsdown.htm.
Referring to Found String Matches
One of the many reasons regular expressions are so useful is that they allow for the matching of an entire string
that can later be referenced. For example, imagine that you wanted to replace all of the instances of a particular
word in a string with that particular instance of the word surrounded by bold tags (i.e., replacing
all instances of your string with <b>your string</b>). This can be easily
accomplished using regular expressions. Our pattern would be:
(\byour string\b)
|
The parenthesis in the above pattern instruct the regular expression engine to allow us to refer back to
the resulting string. We do this using the special character $N, where N is the
Nth parenthetical statement in the regular expression. (Hence, with the above pattern we'd use $1.)
Therefore, to replace all instances of your string with <b>your string</b>,
we would only need to call the Replace method like so:
objRegExpInstance.Replace(str, "<b>$1</b>)
|
In a previous 4Guys article Common Applications of Regular Expressions, Richard Lowe demonstrated how to use this method to replace all instances of words that contain the substring ".NET" with bold tags. A live demo of Richard's code can be seen here.
Another (more powerful) example of using the dollar-sign notation to refer back to string matches found is when we
are searching for particular strings that contain no specific pattern. For example, assume that we had an HTML page
that contained hyperlinks in the form: <a href="URL">URL description</a>, but,
perhaps for a text newsletter, we wanted to transform those strings into the more text-friendly format of:
URL Description [URL]. This can be easily accomplished in just a few lines of code
with regular expressions:
|
While this is neat and quite useful, what if we wanted to grab just the URL portion of the HREF tags, perhaps
displaying a list of URLs on the page? The regular expression object contains an Execute method that
returns a Matches collection representing each pattern match, but if we were to list all of the
matches using the above code, the match values would appear like so:
<a href="URL">URL Description</a>
|
Notice that the entire HREF tag is returned, but we only want the URL portion. While we could
use another regular expression on this match or use VBScript's string operators to pick out the URL, both of
those approaches are a bit messy. With version 5.5 of the scripting languages we can refer to the string values
in the Match object similarly to how we referred to the $N references in the
Replace method. The solution: use the SubMatches collection property of the
Match object! We'll examine this property in Part 2.