When you think ASP, think...
Recent Articles
All Articles
ASP.NET Articles [1.x] [2.0]
ASPFAQs.com
Message Board
Related Web Technologies
User Tips!
Coding Tips
Search

Sections:
Book Reviews
Sample Chapters
Commonly Asked Message Board Questions
Headlines from ASPWire.com
JavaScript Tutorials
MSDN Communities Hub
Official Docs
Security
Stump the SQL Guru!
Web Hosts
XML Info
Information:
Advertise
Feedback
Author an Article
Technology Jobs

















internet.com
IT
Developer
Internet News
Small Business
Personal Technology
International

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers
ASP ASP.NET ASP FAQs Message Board Feedback ASP Jobs
Print this page.

Windows Systems Administrator
Jupitermedia
US-CT-Darien

Justtechjobs.com Post A Job | Post A Resume

Published: Tuesday, October 31, 2000

Picking Out Delimited Text with Regular Expressions


Problems with Windows Scripting Host 5.5
Some people have reported problems installing WSH 5.5, which is required for the non-greedy repetition regular expressions discussed in this article. For more information on these installation problems, consult this ASPMessageboard post.

- continued -

Have you ever wanted to parse an HTML document and be able to easily grab the text between certain text delimiters? For example, imagine that we wanted to list all the text in an HTML document that falls within any bold tags (<b> ... </b>). Or say that we wanted to grab the text (if any) that was the HTML TITLE (i.e., the text was between the TITLE tags: <TITLE> ... </TITLE>). While thi can be done with standard VBScript string functions, oftentimes the needed code is messy, usually requiring multiple variables to hold various indexes where certain substrings start and end.

With regular expressions, however, this is a very easy task! This article will not delve into what, exactly, regular expression are or the specifics of using them in an ASP page. For more information on these topics be sure to read the articles recommended in the Regular Expressions Article Index and read some of the posts at the Regular Expression Forum. If you are familiar with regular expressions, you might think the challenge I propose is easily solvable with the following regular expression:

-- in general terms
delimiter(.*)delimiter

-- to find text between bold tags:
<b>(.*)</>

Well, you're kind of right. For those not familiar with the above regular expression, it is, basically, saying, "Search for the first delimiter (<b>), look for zero or more characters, and then look for the closing delimiter (</b>)." While this may seem like the right thing to ask for consider the following HTML document:

<HTML>
<BODY>
  Hello <B>there</B>!  How are <b>you</b> today?
</BODY>
</HTML>

The above regular expression (<b>(.*)</>) is a bit ambiguous in this scenario. Do you want to return <B>there</B> and <b>you</b> (as two separate strings), or <B>there</B>! How are <b>you</b> (as one lengthier string)? Realize that both of the possible results follow the English explanation given above. The strings in both results begin with the starting delimiter <b>, contain zero to more characters, and end with the closing delimiter, </b>. The first result returns two strings, while the second result returns just one. If you use the regular expression <B>there</B> it will return the second result, the longer string, <B>there</B>! How are <b>you</b>.

To tell get the two shorter strings, we've got to tell the regular expression engine that, when searching for zero or more characters between our two delimiters, return the match that has the least number of characters between the delimiters. This is done by using non-greedy repetition. The .* represents greedy repetition - it looks for zero or more characters (emphasis on more). We can specify non-greedy repetition, which will return matches that have the fewest number of characters between the delimiters, by using .*? (note the addition of the question mark).

Regular expressions became available in VBScript with the 5.0 release. Non-greedy repetition, however, wasn't supported! With the latest release of Microsoft's scripting engines (version 5.5), non-greedy repetition is supported. So, before you can use the code we will examine shortly, you must ensure that you have the VBScript Scripting Engine 5.5 or greater installed on your system. To download the latest version of the VBScript Scripting Engine visit http://msdn.microsoft.com/scripting/; to determine what server-side scripting engine version you're using, be sure to read: Determining the Server-Side Scripting Language and Version!

Let's look at a quick code example that returns the TITLE of an HTML page on the Web server. First we will open the HTML page using the FileSystemObject to read in the contents of a local Web page. Next, we will use a non-greedy repetition regular expression to pick out the text between <TITLE> and </TITLE>. (For this examply, greedy repetition would work, in theory, since there should only be one TITLE tag per Web page.) First, we'll grab the contents of an HTML file using the FileSystemObject:

'Open an HTML page and read in its contents into'
'the variable strContents
Dim objFSO
Set objFSO = Server.CreateObject("Scripting.FileSystemObject")

Dim objFile
Set objFile = objFSO.OpenTextFile(Server.MapPath("/SomePage.htm"))

Dim strContents
strContents = objFile.ReadAll

'Clean up...
objFile.Close
Set objFile = Nothing
Set objFSO = Nothing

OK, at this point we have the contents of the HTML page /SomePage.htm read into a local variable, strContents. Now we need to setup our regular expression, which we'll look at in Part 2 of this article! If you are new to regular expressions, I highly recommend that you take a moment read some of the articles suggested at our Regular Expressions Article Index before continuing onto Part 2.

Read Part 2!


Windows Internet Technology | ASP.NET [1.x] [2.0] | ASPMessageboard.com | ASPFAQs.com | Advertise | Feedback | Author an Article



JupiterOnlineMedia

internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and JupiterOnlineMedia

Jupitermedia Corporate Info


Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy.

Advertise | Newsletters | Tech Jobs | Shopping | E-mail Offers

Solutions
Whitepapers and eBooks
Microsoft Article: HyperV-The Killer Feature in WinServer ‘08
Avaya Article: How to Feed Data into the Avaya Event Processor
Microsoft Article: Install What You Need with Win Server ‘08
HP eBook: Putting the Green into IT
Whitepaper: HP Integrated Citrix XenServer for HP ProLiant Servers
Intel Go Parallel Portal: Interview with C++ Guru Herb Sutter, Part 1
Intel Go Parallel Portal: Interview with C++ Guru Herb Sutter, Part 2--The Future of Concurrency
Avaya Article: Setting Up a SIP A/S Development Environment
IBM Article: How Cool Is Your Data Center?
Microsoft Article: Managing Virtual Machines with Microsoft System Center
HP eBook: Storage Networking , Part 1
Microsoft Article: Solving Data Center Complexity with Microsoft System Center Configuration Manager 2007
MORE WHITEPAPERS, EBOOKS, AND ARTICLES
Webcasts
Intel Video: Are Multi-core Processors Here to Stay?
On-Demand Webcast: Five Virtualization Trends to Watch
HP Video: Page Cost Calculator
Intel Video: APIs for Parallel Programming
HP Webcast: Storage Is Changing Fast - Be Ready or Be Left Behind
Microsoft Silverlight Video: Creating Fading Controls with Expression Design and Expression Blend 2
MORE WEBCASTS, PODCASTS, AND VIDEOS
Downloads and eKits
Sun Download: Solaris 8 Migration Assistant
Sybase Download: SQL Anywhere Developer Edition
Red Gate Download: SQL Backup Pro and free DBA Best Practices eBook
Red Gate Download: SQL Compare Pro 6
Iron Speed Designer Application Generator
MORE DOWNLOADS, EKITS, AND FREE TRIALS
Tutorials and Demos
How-to-Article: Preparing for Hyper-Threading Technology and Dual Core Technology
eTouch PDF: Conquering the Tyranny of E-Mail and Word Processors
IBM Article: Collaborating in the High-Performance Workplace
HP Demo: StorageWorks EVA4400
Intel Featured Algorhythm: Intel Threading Building Blocks--The Pipeline Class
Microsoft How-to Article: Get Going with Silverlight and Windows Live
MORE TUTORIALS, DEMOS AND STEP-BY-STEP GUIDES