To read the article online, visit http://www.4GuysFromRolla.com/webtech/061399-3.shtml

Using Regular Expressions to Alter your HTML Code

By Neil Brown


I recently developed a discussion forum for a client and decided that if a user enters a url or an e-mail address in the body of the message, it should be displayed as a href. The easiest way to do this was to take advantage of JavaScript's regular expressions.

The user enters the message text in a textarea control which gets written, exactly as entered, to the message field of the discussion forum table. However, before displaying the message back to the user, I run it through a function which makes some minor changes to improve its appearance in HTML, processes the URLs and adds a small egg in the bargain.

I'm not going to discuss how I retrieve the record as that topic's covered thoroughly on this site. I'll just say that the message text, as it exists in the database, has been assigned to a variable called s_message.


<%
    '...start the page and spit out html to the browser

    'call the function to convert the message
    Response.Write to_html(s_message)
%>

The function looks like this:


<%
Function to_html(s_string)
    to_html = Replace(s_string, """", "&quot;")
    to_html = Replace(to_html, "<", "&lt;")
    to_html = Replace(to_html, ">", "&gt;")
    to_html = Replace(to_html, vbcrlf, "<br>")
    to_html = Replace(to_html, "/&lt;", "<")
    to_html = Replace(to_html, "/&gt;", ">")
    to_html = edit_hrefs(to_html)
End Function
%>

<script language="javascript1.2" runat=server>
function edit_hrefs(s_html){
    // use regular expressions to look for 
    // e_mail addresses and urls
    s_str = new String(s_html);

    s_str = s_str.replace(/\bhttp\:\/\/www(\.[\w+\.\:\/\_]+)/gi, 
		"http\:\/\/¬¤¸$1");

    s_str = s_str.replace(/\b(http\:\/\/\w+\.[\w+\.\:\/\_]+)/gi,
		"<a href=\"$1\">$1<\/a>");
		
    s_str = s_str.replace(/\b(www\.[\w+\.\:\/\_]+)/gi, 
		"<a href=\"http://$1\">$1</a>");
		
    s_str = s_str.replace(/\bhttp\:\/\/¬¤¸(\.[\w+\.\:\/\_]+)/gi,
		"<a href=\"http\:\/\/www$1\">http\:\/\/www$1</a>");
		
    s_str = s_str.replace(/\b(\w+@[\w+\.?]*)/gi, 
		"<a href=\"mailto\:$1\">$1</a>");
		
    
    return s_str;
}
</script>

These are the basic steps that the above code follows:
1. Convert quotation marks to a &quot; html object. It's not necessary, but it's nice to have.

2. Convert a less than character to html friendly &lt;. This way, if someone uses a less than character in the message, it doesn't confuse the html output. Secondly, it stops someone from being smart and wrapping, for example, <b>bold</b> tags around parts of the message.

3. Convert a greater than character to &gt;. Obviously the same reason as above.

4. Convert carriage returns to html line break "<br>" tags.

5. This is the egg. What it does is allows people to put simple html in the message and get away with it. However, they have to escape the angle brackets with "/" characters. I doubt if anyone will ever discover this. If they do however, good for them.

6. Finally, call the JavaScript edit_hrefs function.

Now, let's examine what each line of the JavaScript function does:
1. Create a string object from the variable passed to the function.

2. Check for all instances of http://www.[something] and convert it to http://¬¤¸.[something]. This is done as temporary measure so that all instances of www.[something] can be processed. So http://www.pinarello.com/ is changed to http://¬¤¸.pinarello.com/.

3. Convert the remainder of the http://[something], where something doesn't begin with "www", to an href tag. For example, http://uk.imdb.com/ will be changed to <a href="http://uk.imdb.com/">http://uk.imdb.com/</a>

4. Look for all instances of www.[something] and process it into html. For example, www.principia.dk will be converted into <a href="http://www.principia.dk">www.principia.dk</a>.

5. Return to all instances of http://¬¤¸.[something]/ which were temporarily created in Step 2 and correct them. So for example, the expression http://¬¤¸.pinarello.com/, created in step two would now be converted to http://www.pinarello.com/.

6. Finally, look for e-mail addresses and convert them to <a href="mailto:[e-mail address]">[e-mail address]</a>. The regular expression I used to find e-mail addresses is different than that of Ian Stalling (http://www.4guysfromrolla.com/webtech/052899-1.shtml), but I haven't found any e-mail addresses which don't work. However, it hasn't been thoroughly tested. In any event, you may like to replace my code with that of Ian's if you have any problems.

Hopefully, you can use this function to your advantage. Maybe you can improve it, clean it up, add new functionality, whatever. If so, have fun!

Happy Programming!


Article Information
Article Title: Using Regular Expressions to Alter your HTML Code
Article Author: Neil Brown
Published Date: Sunday, June 13, 1999
Article URL: http://www.4GuysFromRolla.com/webtech/061399-3.shtml


Copyright 2017 QuinStreet Inc. All Rights Reserved.
Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers