Chapter 2: Common ASP.NET Code Techniques
3. Using Regular Expressions
Regular expressions are a neat and efficient way to perform extremely powerful string pattern matching and replacing. For example, imagine that you had an HTML document and you wanted to pick through it to retrieve all the text between any bold tags (between <b> and </b>). Although this can be done with String class methods such as IndexOf and Substring, it would involve an unnecessarily complex loop with some ugly code. Instead, a relatively simple regular expression could be used.
Originally with classic ASP, regular expression support was a language-specific feature. That is, it was up to the syntax language being used to support regular expressions. With the release of the Microsoft Scripting Engine Version 5.0, Microsoft released a regular expression COM component to handle regular expression support in the same manner regardless of the scripting language.
With the .NET Framework, regular expressions are supported via a number of classes in the System.Text.RegularExpressions namespace. In this section we will examine these classes and look at some code samples. This section is not intended to teach regular expressions fundamentalsrather, it aims to illustrate how to work with regular expressions using the classes in the System.Text.RegularExpressions namespace.
For some very useful, very handy, real-world regular expressions, be sure to check out Chapter 3 which has a section on common regular expression validations for the RegularExpressionValidator validation control. Also take a peek at Appendix B, "Commonly Used Regular Expression Templates."
The class in the System.Text.RegularExpressions namespace that handles the bulk of the regular expression work is the Regex class. The constructor for this class is very important because it requires the most essential part of a regular expression: the pattern. Three forms of the constructor are as follows:
'Parameterless constructor Regex() 'Requires the string pattern Regex(pattern) 'Requires the string pattern and regular expression options Regex(pattern, options)
The pattern parameter, if specified, needs to be of type String. The options parameter, if specified, needs to be a member of the RegexOptions enumeration. The RegexOptions enumeration (also found in the System.Text.RegularExpressions namespace), contains a number of options you can set when creating a Regex object instance. Some of the more useful RegexOptions enumeration members include:
CompiledIf you are going to use a specific regular expression repeatedly in a single ASP.NET Web page you can achieve a small performance gain by having set the Compiled option. With this option set, the regular expression will be compiled once when it is first used rather than being recompiled for each instance of the regular expression on the page.
IgnoreCaseBy default, regular expressions are case-sensitive. Include this option if you wish to have a case-insensitive regular expression.
RightToLeftBy default regular expressions parse through the input string in a left-to-right manner. If you wish to reverse this order, specify this option.
To specify multiple RegexOptions options in the Regex constructor, use a bit-wise OR to string together multiple options (in C#, bit-wise Ors are specified with the pipe (|)). For example, to create a regular expression instance that is both compiled and case insensitive, we could use the following statements:
'In VB.NET Dim objRegex as Regex = New Regex(pattern, RegexOptions.IgnoreCase Or RegexOptions.Compiled) // in C# Regex objRegex = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Compiled);
The Regex class contains a number of methods for finding matches to a pattern in a string, replacing instances of the matching pattern with another string and testing to see if a matching pattern exists in a string. Let's look at the IsMatch method, which tests to see if a pattern is found in a string.
There are both static and nonstatic versions of the IsMatch method. The nonstatic version, which requires a Regex instance, requires only one parameter, the string to search for the pattern. (Recall that you must supply the pattern for the regular expression in the constructor.) A very simple IsMatch example can be seen in Listing 2.3.1.
Listing 2.3.1 The IsMatch Method Determines if a Pattern Is Found in a String
1: <%@ Import Namespace="System.Text.RegularExpressions" %> 2: <script language="VB" runat="server"> 3: Sub Page_Load(sender as Object, e as EventArgs) 4: Dim str as String = "Reality, the external world, exists " & _ 5: "independent of man's consciousness, independent of any " & _ 6: "observer's knowledge, beliefs, feelings, desires or fears. This means that A is A..." 7: 8: ' Check to see if the string str contains the pattern 'A is A' 9: Dim regexp as Regex = new Regex("A is A", RegexOptions.IgnoreCase) 10: 11: If regexp.IsMatch(str) then 12: Response.Write("This is an Ayn Rand quote.") 13: Else 14: Response.Write("I don't know who said this.") 15: End If 16: End Sub 17: </script>
Because the Regex class exists in the System.Text.RegularExpressions namespace, our first line of code imports the proper namespace so that we can refer to the Regex class without fully qualifying it (line 1). On lines 4 through 6, a string, str, is hard-coded with a quote from Ayn Rand. Next, on line 9, an instance of the Regex class is created. This instance, regexp, is created with the Regex constructor that takes two parameters, the pattern and options strings. The pattern "A is A" will simply match the substring "A is A"; the option RegexOptions.IgnoreCase indicates that the search should not be case sensitive.
On line 11, the IsMatch method is used to check if the substring "A is A" exists in the string str (line 11). IsMatch returns a Boolean value: True if the pattern is found in the passed-in string, False otherwise. If the substring "A is A" is found in str, "This is an Ayn Rand quote." is displayed; otherwise "I don't know who said this." is displayed. As you might have guessed, the output of Listing 2.3.1, when viewed through a browser, is "This is an Ayn Rand quote.".
As mentioned earlier, there is also a static version of the IsMatch method. The static version takes either two or three parameters. The first parameter is the input string, the second is the regular expression pattern, and the third option parameter is the options string for the regular expression. In Listing 2.3.1, line 9 could be snipped and line 11 replaced with the following:
If Regex.IsMatch(str, "A is A", RegexOptions.IgnoreCase) then
Finding out whether a regular expression pattern exists in a string is all well and good, but being able to grab a listing of substrings that matched would be ideal. The Matches method of the Regex has such functionality. The non-static version of the method expects a single parameter, the string to search, and returns the resulting matches as a MatchCollection.
Listing 2.3.2 uses the Matches method to list all the text between the bold tags in an HTML document. This code borrows some file-reading code from Listing 2.2.2 to read in the contents of a text file on the Web server. The output is shown in Figure 2.13.
Listing 2.3.2 The Matches Method Will Return All the Matching Regular Expression Patterns in a String
1: <%@ Import Namespace="System.IO" %> 2: <%@ Import Namespace="System.Text.RegularExpressions" %> 3: <script language="VB" runat="server"> 4: Sub Page_Load(sender as Object, e as EventArgs) 5: 'Read in the contents of the file strFilePath 6: Dim strFilePath as String = "C:\Inetpub\wwwroot\Index.htm" 7: Dim objFileInfo as FileInfo = new FileInfo(strFilePath) 8: Dim objStream as StreamReader = objFileInfo.OpenText() 9: Dim strContents as String = objStream.ReadToEnd() 10: objStream.Close() 11: 12: 'List the text between the bold tags: 13: Dim regexp as Regex = New Regex("<b>((.|\n)*?)</b>", RegexOptions.IgnoreCase) 14: 15: Response.Write("<u><b>Items Between Bold Tags in the HTML page " & _ 16: strFilePath & ":</b></u><br>") 17: 18: 'Create a Match object instance / iterate through the MatchCollection 19: Dim objMatch as Match 20: For Each objMatch in regexp.Matches(strContents) 21: Response.Write("<li>" & objMatch.ToString() & "<br>") 22: Next 23: End Sub 24: </script>
Listing 2.3.2 begins with two Import directives: the first imports the System.IO namespace to assist with our use of the FileInfo class (line 1); the second imports the System.Text. RegularExpressions namespace to assist with our use of the Regex and Match classes (line 2).
When the Imports are out of the way, Listing 2.3.2 starts by opening and reading the contents of a hard coded HTML file (lines 6 through 10). The contents of this HTML file are stored in the variable strContents. (This code snippet should look familiarit's taken directly from Listing 2.2.2!)
Next, a regular expression instance is created with a pattern set to <b>((.|\n)*?)</b>. This might seem a bit confusing, especially if you are not very familiar with regular expressions. Translated to English, the pattern would read: "Find a bold tag (<b>), find zero or more characters, and then find a closing bold tag (</b>)." The period (.) is a special character in regular expressions, meaning, "Match any character except the new line character." The new line character is represented by \n, the asterisk (*) means to match zero or more characters, and the question mark following the asterisk means to perform "nongreedy" matching. For a more thorough explanation of all these terms, be sure to read Picking Out Delimited Text with Regular Expressions, available at http://www.4guysfromrolla.com/webtech/103100-1.shtml.
If you are new to regular expressions the above regular expression may look quite confusing. To learn more about the basics of regular expressions be sure to check out Appendix B, and the "Other Resources" section at the end of this chapter.
When we have our regular expression object instance, we're ready to call the Matches method. The Matches method returns a MatchCollection class instance, which is, essentially, a collection of Match objects. The Match class contains various properties and methods that provide information on a particular match of a regular expression pattern in a string.
To iterate through the MatchCollection, we first create a Match instance, objMatch, on line 19. Next, a For Each ... Next loop is used to iterate through each resulting Match returned by the Matches method on the HTML contents of the file (line 20). On line 21, the matched text is outputted. Figure 2.13 shows the output of Listing 2.3.2 when the file C:\Inetpub\ wwwroot\Index.htm contains the following text:
<html> <head><title>Hello, World!</title></head> <body bgcolor=white text=black> <b>Bold</b> text in HTML is defined using the <b>bold</b> tags: <code><b> ... </b></code>. For example, <b>this sentence is bold.</b> This sentence is not bold. <i>This sentence is in italics!</i> <p> To learn more about the <b>bold syntax</b>, read a book covering HTML syntax in-depth! </body> </html>
Another useful task of regular expressions is their capability of doing powerful string replacements. For example, many Web sites have a searching feature in which the user enters a keyword on which to search and various results are returned. Wouldn't it be nice to have the words in the results that matched the search keyword to be highlighted?
Listing 2.3.3 contains the code for an ASP.NET page that allows the user to enter both a search term and some text that the search term will be applied to. Any instances of the search term are highlighted. The output is shown in Figure 2.14.
Listing 2.3.3 Regular Expressions Perform Search and Replace Features on Complex Patterns
1: <%@ Import Namespace="System.Text.RegularExpressions" %> 2: <script language="VB" runat="server"> 3: Sub btnSubmit_OnClick(sender as Object, e as EventArgs) 4: 'Create a regex object 5: Dim strTerms as String = txtSearchTerms.Text 6: Dim regexp as Regex = new Regex("\b" & strTerms & "\b", RegexOptions.IgnoreCase) 7: 8: 'Replace all search terms in txtText 9: Dim strNewText as String = regexp.Replace(txtText.Text, _ 10: "<span style='color:black;background-color:yellow'>" & strTerms & "</span>") 11: 12: lblResults.Text = "<p><hr><p><b><u>Search Results:</u></b><br>" & strNewText 13: End Sub 14: </script> 15: 16: <html> 17: <body> 18: <form runat="server"> 19: <b>Enter a search term:</b><br> 20: <asp:textbox id="txtSearchTerms" runat="server" /> 21: <p> 22: <b>Enter some text:</b><br> 23: <asp:textbox id="txtText" runat="server" TextMode="MultiLine" 24: Cols="50" Rows="6" /> 25: <p> 26: <asp:button id="btnSubmit" runat="server" OnClick="btnSubmit_OnClick" 27: text="Search Entered Text For Keyword" /> 28: <p><asp:label id="lblResults" runat="server" /> 29: </form> 30: </body> 31: </html>
Listing 2.3.3 is another code example that doesn't contain a Page_Load event handler. Therefore, let's begin our examination with the HTML portion of the script (lines 16 through 31). Because Listing 2.3.3 uses postback forms, a server-side form is created on line 18. Next, a pair of text boxes are created: The first, txtSearchTerms, will be the text box the user enters his search term in (line 20); the second, txtText, is a multilined text box in which the user will enter the text to search (lines 23 and 24). On lines 26 and 27, a button control is created that, when clicked, will fire the btnSubmit_OnClick event handler. Finally, on line 28 a label control, lblResults, is created; this label control will display the user's text entered in txtTerms with all instances of the search term entered in txtSearchTerms highlighted.
The btnSubmit_OnClick function, starting at line 3, begins by reading in the value of the txtSearchTerms text box into a string variable, strTerms (line 5). Next, a Regex object instance is created with a pattern of the user-entered search term surrounded by \b. In regular expressions, \b is a special character representing a word boundary. Adding this both before and after the search term will have the effect of only highlighting search terms in the user-entered text that are their own words. That is, if the user enters a search term of "in" and the text, "I sleep in the incinerator," the word "in" will be highlighted, but the "in" in "incinerator" will not. Also, the RegexOptions.IgnoreCase option is specified, indicating that the search and replace will be non case-sensitive (line 6).
On lines 9 and 10, the Replace method of the Regex class is used. The Replace method accepts two parameters: the string to search for the pattern and the string to replace any found matches. In English, lines 9 and 10 say, "Search the contents of txtText.Text looking for any matches to the pattern (the search term as its own word). If you find any, replace it with a highlighted version of the search term."
The Replace method returns a string that has had all matches in its first parameter replaced by the second. On line 9 we set this return string equal to the variable strNewText. Finally, line 12 outputs the highlighted results, strNewText.
The Regex class contains a number of other useful methods. One really neat method that I encourage you to examine in detail on your own is the Split method. This method is similar to the Split method of the String class, except that instead of taking a string delimiter to split a string into an array, it accepts a regular expression as a delimiter!
4. Generating Images Dynamically
There are many real-world scenarios in which the ability to create graphic images on-the-fly is needed. For example, imagine that you had a database table with monthly profits. It would be nice to be able to allow a user to visit a reporting page and have a chart dynamically created as a GIF file on the Web server and seamlessly embedded into the reports ASP.NET page.
This was an impossible task in classic ASP without the use of a third-party component (or without some very ugly and hacked-together code). With ASP.NET, though, creating images on-the-fly in a wide variety of formats is quite possible and easy, thanks to the inherent support of image generation in the .NET Framework.
The .NET Framework contains an extensive drawing API, offering a multitude of classes with an array of methods and properties for drawing all sorts of shapes and figures. In fact, there is such an assortment of drawing functions that an entire book could be dedicated to the topic. Therefore, we will only look at a few of the more useful drawing functions. I highly suggest that you take ample time to root through all the classes in the System.Drawing and its derivative namespaces. There are many classes within these namespaces with literally hundreds of methods and properties!
When an image is dynamically created using the .NET Framework, it is created as an in- memory image. Methods can be used to send this in-memory image to disk on the Web server or to a stream (such as the Response stream). In this section we'll examine both how to save an image to file on the Web server and how to send a dynamically created image to the user's browser!