When you think ASP, think...
Recent Articles xml
All Articles
ASP.NET Articles
Related Web Technologies
User Tips!
Coding Tips
spgif spgif

Sample Chapters
JavaScript Tutorials
MSDN Communities Hub
Official Docs
Stump the SQL Guru!
XML Info
Author an Article
spgif spgif
ASP ASP.NET ASP FAQs Feedback topnav-right
Print this page.
Published: Monday, April 30, 2001

Utilizing Regular Expression SubMatches

By Scott Mitchell

More About Regular Expressions
This article examines advanced features of regular expressions. For more basic and beginner-level information on regular expressions, as well as a number of articles illustrating various applications of regular expressions, be sure to check out the articles at the Regular Expression Article Index!

- continued -

With version 5.0 of the VBScript Scripting Engine, Microsoft (finally) added regular expression support for VBScript in the form of a COM object. (JScript has enjoyed intrinsic regular expression support for quite a bit longer.) While version 5.0 implemented regular expression support, it only implemented some of the more basic regular expression features. Fortunately, with the 5.5 version of the scripting engines, Microsoft beefed up their regular expression object's capabilities, allowing for non-greedy pattern matching and other powerful features.

This article examines one of these features available with version 5.5 of the scripting engines: using the SubMatches collection property of the Matches object. Therefore, to utilize the techniques discussed in this article you will need to have Microsoft's scripting engines version 5.5 or greater installed on your Web server. To determine what server-side scripting language version you are using on your Web server, check out: Determining the Server-Side Scripting Language and Version. You can download (for free) the latest version of the scripting engines at: http://msdn.microsoft.com/scripting/default.htm?/scripting/vbscript/download/vbsdown.htm.

Referring to Found String Matches
One of the many reasons regular expressions are so useful is that they allow for the matching of an entire string that can later be referenced. For example, imagine that you wanted to replace all of the instances of a particular word in a string with that particular instance of the word surrounded by bold tags (i.e., replacing all instances of your string with <b>your string</b>). This can be easily accomplished using regular expressions. Our pattern would be:

(\byour string\b)

The parenthesis in the above pattern instruct the regular expression engine to allow us to refer back to the resulting string. We do this using the special character $N, where N is the Nth parenthetical statement in the regular expression. (Hence, with the above pattern we'd use $1.) Therefore, to replace all instances of your string with <b>your string</b>, we would only need to call the Replace method like so:

objRegExpInstance.Replace(str, "<b>$1</b>)

In a previous 4Guys article Common Applications of Regular Expressions, Richard Lowe demonstrated how to use this method to replace all instances of words that contain the substring ".NET" with bold tags. A live demo of Richard's code can be seen here.

Another (more powerful) example of using the dollar-sign notation to refer back to string matches found is when we are searching for particular strings that contain no specific pattern. For example, assume that we had an HTML page that contained hyperlinks in the form: <a href="URL">URL description</a>, but, perhaps for a text newsletter, we wanted to transform those strings into the more text-friendly format of: URL Description [URL]. This can be easily accomplished in just a few lines of code with regular expressions:

'Assume strHTML contains the HTML with the <a href="URL">URL Description</a>
'We want to store into strText the HTML in strHTML, but with the HREF tags
'changed to a more text-friendly URL Description [URL]

'First, create a reg exp object
Dim objRegExp
Set objRegExp = New RegExp

objRegExp.IgnoreCase = True
objRegExp.Global = True
objRegExp.Pattern = "<a\s+href=""http://(.*?)"">\s*((\n|.)+?)\s*</a>"

'Now, replace the HREF tags with our preferred format
strText = objRegExp.Replace(strHTML, "$2 [http://$1]")
[View the live demo!]

While this is neat and quite useful, what if we wanted to grab just the URL portion of the HREF tags, perhaps displaying a list of URLs on the page? The regular expression object contains an Execute method that returns a Matches collection representing each pattern match, but if we were to list all of the matches using the above code, the match values would appear like so:

<a href="URL">URL Description</a>

Notice that the entire HREF tag is returned, but we only want the URL portion. While we could use another regular expression on this match or use VBScript's string operators to pick out the URL, both of those approaches are a bit messy. With version 5.5 of the scripting languages we can refer to the string values in the Match object similarly to how we referred to the $N references in the Replace method. The solution: use the SubMatches collection property of the Match object! We'll examine this property in Part 2.

  • Read Part 2!

  • ASP.NET [1.x] [2.0] | ASPFAQs.com | Advertise | Feedback | Author an Article