When you think ASP, think...
Recent Articles
All Articles
ASP.NET Articles
Related Web Technologies
User Tips!
Coding Tips

Sample Chapters
JavaScript Tutorials
MSDN Communities Hub
Official Docs
Stump the SQL Guru!
XML Info
Author an Article
Print this page.
Published: Sunday, July 30, 2000

Removing All HTML Tags from a String Using Regular Expressions

By Jˇhann Gunnarsson

Have you ever had the need to remove all HTML tags from a file? Perhaps you wanted to display all HTML tags on a Web page, but as the raw HTML tags (like showing <B>bold</B> as opposed to bold). The script presented in this artcle will accomplish these tasks with ease!

- continued -

To accomplish this, included in this article is the code to a function, ClearHTMLTags, which has the following definition:

function ClearHTMLTags(strHTML, intWorkFlow)

Where strHTML is the string to be cleared of HTML tags and intWorkFlow determines how to clear the HTML tag... a value of 0 simply strips the HTML tags while a value of 1 displays the HTML tags as text in the document (like showing <B>bold</B> as opposed to bold).

The code for ClearHTMLTags is presented below... there is also a live demo to give the script a run. Note that the ClearHTMLTags function uses regular expressions to ease the hunt and removal of HTML tags... if you are not familiar with regular expressions I strongly suggest you study up on them... start by visiting our Regular Expressions Section!

Happy Programming!


  • Download the source for ClearHTMLTags in text format
  • View the live demo!

    'Coded by Jˇhann Haukur Gunnarsson
    '  Purpose: This function clears all HTML tags from a
    '           string using Regular Expressions.
    '   Inputs: strHTML;
    '            A string to be cleared of HTML TAGS
    '	intWorkFlow;
    '            An integer that if equals to 0 runs only the RegExp filter
    '              .. 1 runs only the HTML source render filter
    '              .. 2 runs both the RegExp and the HTML source render
    '              .. >2 defaults to 0
    '  Returns: A string that has been filtered by the function
    function ClearHTMLTags(strHTML, intWorkFlow)
      'Variables used in the function
      dim regEx, strTagLess
      strTagless = strHTML
      'Move the string into a private variable
      'within the function
      'regEx initialization
      set regEx = New RegExp 
      'Creates a regexp object		
      regEx.IgnoreCase = True
      'Don't give frat about case sensitivity
      regEx.Global = True
      'Global applicability
      'Phase I
      '	"bye bye html tags"
      if intWorkFlow <> 1 then
        regEx.Pattern = "<[^>]*>"
        'this pattern mathces any html tag
        strTagLess = regEx.Replace(strTagLess, "")
        'all html tags are stripped
      end if
      'Phase II
      '	"bye bye rouge leftovers"
      '	"or, I want to render the source"
      '	"as html."
      'We *might* still have rouge < and > 
      'let's be positive that those that remain
      'are changed into html characters
      if intWorkFlow > 0 and intWorkFlow < 3 then
        regEx.Pattern = "[<]"
        'matches a single <
        strTagLess = regEx.Replace(strTagLess, "<")
        regEx.Pattern = "[>]"
        'matches a single >
        strTagLess = regEx.Replace(strTagLess, ">")
      end if
      'Clean up
      set regEx = nothing
      'Destroys the regExp object
      ClearHTMLTags = strTagLess
      'The results are passed back
    end function

  • ASP.NET [1.x] [2.0] | ASPFAQs.com | Advertise | Feedback | Author an Article