When you think ASP, think...
Recent Articles
All Articles
ASP.NET Articles
Related Web Technologies
User Tips!
Coding Tips

Sample Chapters
JavaScript Tutorials
MSDN Communities Hub
Official Docs
Stump the SQL Guru!
XML Info
Author an Article
Print this page.
Published: Wednesday, November 14, 2001

Translating from a Custom Markup to HTML using Regular Expressions

By Scott Mitchell

If you run a messageboard-type site, you may want to allow users to enter certain HTML tags, but not others. For example, you may want users to be able to use tags like <b> and <u>, but any other HTML tags would be displayed as-is. That is, if a user were to enter as a message:

- continued -

Hello. I am <b>really</b> confused when it comes to using the <SCRIPT> tag. Help please!

You'd like it to display as:

Hello. I am really confused when it comes to using the <SCRIPT> tag. Help please!

Furthermore, you may wish to add custom tags. Like, if the user enters <highlight>some text </highlight> the resulting HTML would become <span style="background-color: yellow;">some text</span>, which renders to: some text. In fact, jperkins007 asked this exact question via an ASPMessageboard.com post. This article will examine how to easily convert from a custom markup language to HTML using regular expressions! (If you are new to regular expressions I highly recommend that you first read: An Introduction to Regular Expression with VBScript.)

Finding Delimited Text
This problem essentially boils down to finding delimited text. If we have a custom tag like highlight, we want to find all instances of <highlight> with some text inbetween it, followed by </highlight>. Fortunately there's an existing article on 4Guys already that answers this question: Picking Out Delimited Text with Regular Expressions. Simply put, the following regular expression pattern is used:


The ((.|\n)*?) translates to: "search for the minimum number of characters between startingDelimiter and endingDelimiter." In our highlight example, the starting and ending delimiters would be, respectively: <highlight> and </highlight>. The *? specifies nongreedy repetition matching, and is further discussed in Picking Out Delimited Text with Regular Expressions. Note that we don't use just ., but (.|\n); . matches any character except for the new-line character (\n), hence we have to search for any character or the new-line character.

Note that this approach requires VBScript version 5.5 or better, since non-greedy pattern matching wasn't made available until then. Again, read Picking Out Delimited Text with Regular Expressions for more information.

Replacing Custom Markup Tags with Valid HTML Tags
Now that we know how to find the custom markup tags using regular expressions, we need to deduce how to replace such tags with valid HTML tags. Fortunately the regular expression object contains a Replace method that allows us to hunt through a string for a particular regular expression and replace it with some string. So, to search for use of our custom highlight tag, we first create our regular expression object and set its pattern:

'Create the Regular Expression object
Dim oRegExp 
Set oRegExp = New RegExp
'Specify the pattern
oRegExp.Pattern = "<highlight>((.|\n)*?)</highlight>"
[View a live demo!]

Great! Now, imagine that we have a variable called userEnteredText, which is a string containing the message the user posted. At this point, we'd like to replace all instances of the matched pattern with the proper HTML pattern: <span style="background-color: yellow;">some text</span>. We can use the Replace method to do this, and back-reference the text found within between the delimiters by the special character $1:

userEnteredText = oRegExp.Replace(userEnteredText, _
                           "<span style=""background-color: yellow;"">$1</span>")
[View a live demo!]

Simple enough. Now, note that for each tag in our customized markup language, we will need to reapply the above steps. If we have a small language of custom markup tags, this can be hardcoded, but if you want to allow for a large number of tags, or allow the custom markup language to easily change over time, your best bet is to use an approach I employed when working on my latest project, WebForums.NET. WebForums.NET is an online forum system for ASP.NET Web sites, and, among its many cool features, includes one where the administrator of WebForums.NET can easily define, via a text file, what HTML tags and what custom tags to allow. For example, an administrator could have a text file like:

<span style="background-color: yellow;"><CONTENTS></span>
<b style="font-size: 24pt;"><CONTENTS></b>

And that would replace all highlight tags with the span code we examined earlier, and all important tags with bold tags with larger fonts. To convert the user's post containing the custom markup to standard HTML, I open the file and loop through the contents, systematically performing the code snippet shown above... I'll leave it at that, and leave the implementation as an exercise to the reader! :)

This article demonstrated how to allow a user to post a message in a customized markup language, and have that language translate to HTML. Of course, this approach could be done via XML using XSL as a translation language, and I encourage you to explore that avenue as well. Note that this article has a lot of pre-reading material, so hopefully you took the time to read An Introduction to Regular Expression with VBScript, if needed, and Picking Out Delimited Text with Regular Expressions. Also, you can find out more information about regular expressions at the Regular Expressions Article Index.

Happy Programming!

  • By Scott Mitchell

  • ASP.NET [1.x] [2.0] | ASPFAQs.com | Advertise | Feedback | Author an Article