Published: Tuesday, October 31, 2000
Picking Out Delimited Text with Regular Expressions, Part 2
Read Part 1
In Part 1 we looked at the difference between greedy and non-greedy
repetition matching with regular expressions. We also started a coding example. In this part we'll finish out
our coding example, look at another coding example, and wrap the article up!
'Create a regular expression object
Dim objRegExp
Set objRegExp = New RegExp
'Set our pattern
objRegExp.Pattern = "<title>(.*?)<\/title>"
objRegExp.IgnoreCase = True
objRegExp.Global = True
'Get the matches from the contents of our HTML file, strContents
Dim objMatches
Set objMatches = objRegExp.Execute(strContents)
|
At this point we have all of our regular expression "matches" stored in a collection, objMatches.
Since objMatches is just an ordinary collection, we can do a number of things at this point:
we can iterate through the collection with a For Each ... Next, we can access a particular match
randomly, etc. If the HTML document lacked a title, the objMatches.Count property will return
zero. Else, we have found a title and want to display it. Keep in mind that the regular expression engine
will return the both the delimiters and the text between them. So, if /SomePage.htm
contained a title tag like: <TITLE>Hello, World!</TITLE>, the resulting match would
be exactly that... not just "Hello, World!" For that reason, before displaying the title, we need
to strip it out. (For that we'll use the Mid function, a standard VBScript string function.)
If objMatches.Count > 0 then
Response.Write "The Web page title is: " & _
Mid(objMatches(0).Value, 8, Len(objMatches(0).Value) - 16)
Else
Response.Write "No TITLE tag found."
End If
Set objRegExp = Nothing 'Clean up!
|
[
Try out the live demo!]
Another example of using the non-greedy repetition regular expression feature is listing all of the text
that appears between bold tags (or italics, or underlined, or whatnot). To accomplish this, we simply need
to alter our regular expression:
'Set our pattern
objRegExp.Pattern = "<b>(.*?)<\/b>"
objRegExp.IgnoreCase = True
objRegExp.Global = True
'Get the matches from the contents of our HTML file, strContents
Dim objMatches
Set objMatches = objRegExp.Execute(strContents)
|
Now, instead of just outputting a specifc match in the objMatches collection, let's list
each match:
Dim objMatch
For Each objMatch in objMatches
Response.Write objMatch.Value & "<BR>"
Next
|
[
View the live demo!]
I hope this article has proved to be both interesting and educational! Remember, for the non-greedy repetition
regular expressions to work, you will need the VBScript Scripting Engine 5.5 or higher.
Again, to download the latest version of
the VBScript Scripting Engine visit http://msdn.microsoft.com/scripting/;
to determine what server-side scripting engine version you're using, be sure to read:
Determining the Server-Side Scripting Language and Version!
Happy Programming!
Attachments:
Visit the Regular Expressions Article Index
Read Determining the Server-Side Scripting Language and Version
Download the latest VBScript Scripting Engine
Visit the Regular Expressions Forum