To read the article online, visit http://www.4GuysFromRolla.com/webtech/031500-1.shtml

Utilizing Regular Expressions


This article is a follow-up to an earlier 4Guys article, An Introduction to Regular Expression with VBScript. (If you are not familiar with regular expressions, I highly recommend that you read the An Introduction to Regular Expression with VBScript article!) Since the previous article served more as an introduction to what regular expression is and how to use it with VBScript in an ASP page, this follow-up article will focus on more detailed aspects of regular expression.

A Quick Overview
A common programming task is to match or find a particular substring within a string. For simple substrings, the InStr function works nicely, finding the first instance of a literal substring within a string. If you want to search for more complex substrings, or wish to search for a pattern rather than a literal substring, InStr just won't cut it.

Enter Regular Expressions. A regular expression is a string that is used to represent a complex pattern or substring. For example, a regular expression that identifies a pattern of two or three successive digits is: \d{2,3}. Don't worry if that looks like complete gibberish, we'll get to explaining how these regular expressions represent patterns.

With the VBScript Scripting Engine 5.0 (downloadable for free at http://msdn.microsoft.com/scripting), VBScript added a regular expression object, RegExp. This object has three properties: Pattern - which is the actual regular expression; IgnoreCase - a boolean value indicating whether or not to ignore case; and Global - a boolean value that indicates whether or not the regular expression should find as many matches as it can in the entire string or just return the first match.

The RegExp object has three methods as well: Test, which takes the string to be searched as a parameter, and returns True if the regular expression is found within the string, False otherwise; Replace, which takes a string to be searched and a replace string, and replaces all instances of the matched regular expression in the search string with the text in the replace string; and, finally, Execute, which expects a search string passed in and returns a Matches collection containing a Match object for each regular expression match found in the search string.

In An Introduction to Regular Expression with VBScript we looked at some simple regular expression matches and demonstrated how to use the Execute method of the RegExp object. In this article, we'll look at some more advanced regular expressions, as well as examine the Replace and Test methods.

Position Matching
Regular expressions can be used to match a substring's particular position within a string. For example, if you wanted to determine if a string began with the substring Scott, you could use the following code:

<%
  Dim objRegExp
  Set objRegExp = New RegExp
  
  objRegExp.IgnoreCase = True
  objRegExp.Pattern = "^Scott"
  
  Dim strStringToSearch
  strStringToSearch = "Scott Mitchell is my name."
  
  'objRegExp.Test(strStringToSearch) will return true, since
  'it starts with the substring Scott

  Set objRegExp = Nothing     'Clean up!
%>

Note that the Test method takes a single parameter, the string to search. The string is searched for matches specified by the regular expression in the Pattern property of the RegExp object. Regular expression provides four special characters to search for a pattern or substring within a specific position of a string, and our outlined in the table below:

SymbolDescription
^ Matches the regular expression only if it is at the beginning of the search string.
$ Matches the regular expression only if it is at the end of the search string.
\b Matches any word boundary. A word boundary is the virtual space between two words. For example, if you want to replace all instances of the word "hell" in a string, and you just do something like: str = Replace(str, "hell", "heck"), works like "Hello" will be changed to "Hecko", which is silly. To find only WORDS (and not substrings in other words), use the word boundary symbols like so: \bhell\b.
\B Matches any non-word boundary.

You can use the ^ and $ in conjunction to match an entire string. For example, if you want to determine if a string contains a digit and nothing else, you can use the following regular expression:

^\d$

The \d, as we'll see shortly, matches any decimal value (0 - 9)

Without the ^ and $, the regular expression would match any decimal that was in the search string. If you wanted to use the Test method to determine if a particular string contained just one digit and nothinge else, you'd need to use the ^ and $.

  • Read Part 2


  • Article Information
    Article Title: Utilizing Regular Expressions
    Article Author: Scott Mitchell
    Published Date: Wednesday, March 15, 2000
    Article URL: http://www.4GuysFromRolla.com/webtech/031500-1.shtml


    Copyright 2017 QuinStreet Inc. All Rights Reserved.
    Legal Notices, Licensing, Permissions, Privacy Policy.
    Advertise | Newsletters | E-mail Offers