To read the article online, visit http://www.4GuysFromRolla.com/webtech/073000-1.shtml

Removing All HTML Tags from a String Using Regular Expressions

By Jˇhann Gunnarsson


Have you ever had the need to remove all HTML tags from a file? Perhaps you wanted to display all HTML tags on a Web page, but as the raw HTML tags (like showing <B>bold</B> as opposed to bold). The script presented in this artcle will accomplish these tasks with ease!

To accomplish this, included in this article is the code to a function, ClearHTMLTags, which has the following definition:

function ClearHTMLTags(strHTML, intWorkFlow)

Where strHTML is the string to be cleared of HTML tags and intWorkFlow determines how to clear the HTML tag... a value of 0 simply strips the HTML tags while a value of 1 displays the HTML tags as text in the document (like showing <B>bold</B> as opposed to bold).

The code for ClearHTMLTags is presented below... there is also a live demo to give the script a run. Note that the ClearHTMLTags function uses regular expressions to ease the hunt and removal of HTML tags... if you are not familiar with regular expressions I strongly suggest you study up on them... start by visiting our Regular Expressions Section!

Happy Programming!


Attachments:

  • Download the source for ClearHTMLTags in text format
  • View the live demo!


    '[ClearHTMLTags]
    	
    'Coded by Jˇhann Haukur Gunnarsson
    'joi@innn.is
    	
    '  Purpose: This function clears all HTML tags from a
    '           string using Regular Expressions.
    '   Inputs: strHTML;
    '            A string to be cleared of HTML TAGS
    '	intWorkFlow;
    '            An integer that if equals to 0 runs only the RegExp filter
    '              .. 1 runs only the HTML source render filter
    '              .. 2 runs both the RegExp and the HTML source render
    '              .. >2 defaults to 0
    '  Returns: A string that has been filtered by the function
    	
    function ClearHTMLTags(strHTML, intWorkFlow)
      'Variables used in the function
    		
      dim regEx, strTagLess
    		
      '---------------------------------------
      strTagless = strHTML
      'Move the string into a private variable
      'within the function
      '---------------------------------------
    
      'regEx initialization
      '---------------------------------------
      set regEx = New RegExp 
      'Creates a regexp object		
      regEx.IgnoreCase = True
      'Don't give frat about case sensitivity
      regEx.Global = True
      'Global applicability
      '---------------------------------------
    
      'Phase I
      '	"bye bye html tags"
      if intWorkFlow <> 1 then
        '---------------------------------------
        regEx.Pattern = "<[^>]*>"
        'this pattern mathces any html tag
        strTagLess = regEx.Replace(strTagLess, "")
        'all html tags are stripped
        '---------------------------------------
      end if
    		
      'Phase II
      '	"bye bye rouge leftovers"
      '	"or, I want to render the source"
      '	"as html."
    
      '---------------------------------------
      'We *might* still have rouge < and > 
      'let's be positive that those that remain
      'are changed into html characters
      '---------------------------------------	
    
      if intWorkFlow > 0 and intWorkFlow < 3 then
        regEx.Pattern = "[<]"
        'matches a single <
        strTagLess = regEx.Replace(strTagLess, "<")
    
        regEx.Pattern = "[>]"
        'matches a single >
        strTagLess = regEx.Replace(strTagLess, ">")
        '---------------------------------------
      end if
    		
      'Clean up
      '---------------------------------------
      set regEx = nothing
      'Destroys the regExp object
      '---------------------------------------	
    		
      '---------------------------------------
      ClearHTMLTags = strTagLess
      'The results are passed back
      '---------------------------------------
    end function
    


  • Article Information
    Article Title: Removing All HTML Tags from a String Using Regular Expressions
    Article Author: Jˇhann Gunnarsson
    Published Date: Sunday, July 30, 2000
    Article URL: http://www.4GuysFromRolla.com/webtech/073000-1.shtml


    Copyright 2017 QuinStreet Inc. All Rights Reserved.
    Legal Notices, Licensing, Permissions, Privacy Policy.
    Advertise | Newsletters | E-mail Offers