Chapter 2: Common ASP.NET Code Techniques
6. Network Access via an ASP.NET Page
With classic ASP, performing network access through an ASP page was impossible without the use of a third-party or custom-developed COM component. For example, grabbing the HTML output of a Web page on a remote Web server via an ASP page was only possible with a component of some kind, such as ASPHTTP from ServerObjects.com. The .NET Framework, however, contains a plethora of classes to assist with network access. These numerous classes are all located in the System.Net namespace.
A very common need for Web developers is the ability to grab the HTML content of a Web page on a remote server. Perhaps the developer wants to perform a screen scrape, grabbing specific portions of the HTML output to integrate into his own Web page. To assist in this task, the .NET Framework provides an easy-to-use classWebClientthat can be used for accessing information over the Internet.
Listing 2.6.1 contains a "Poor Man's Internet Explorer," a browser within a browser. The code in Listing 2.6.1, when viewed through a browser presents the user with a text box in to which he can enter a fully qualified URL (such as http://www.4GuysFromRolla.com). After the user enters a URL and clicks the Go button, the Web page he entered is displayed along with the Response and Request headers. Output is shown in Figure 2.20.
Listing 2.6.1 The .NET Framework Provides Internet Access from an ASP.NET Page
1: <%@ Import Namespace="System.Net" %> 2: <script language="VB" runat="server"> 3: Sub btnSubmit_OnClick(sender as Object, e as EventArgs) 4: 'Create a WebClient instance 5: Dim objWebClient as New WebClient() 6: 7: Dim strHeader as String 8: lblHTML.Text = "<b>Request Header Information:</b><br>" 9: For Each strHeader in objWebClient.Headers 10: lblHTML.Text &= strHeader & " - " & _ 11: objWebClient.Headers(strHeader) & "<br>" 12: Next 13: 14: 'Read the Response into an array of bytes, but use the UTF8Encoding 15: 'class to convert the byte array into a string 16: Dim objUTF8 as New UTF8Encoding() 17: Dim strRequestedHTML as String 18: strRequestedHTML = objUTF8.GetString(objWebClient.DownloadData(txtURL.Text)) 19: 20: 21: lblHTML.Text &= "<p><b>Response Header Information:</b><br>" 22: For Each strHeader in objWebClient.ResponseHeaders 23: lblHTML.Text &= strHeader & " - " & _ 24: objWebClient.ResponseHeaders(strHeader) & "<br>" 25: Next 26: 27: 'Output the contents of the Web request 28: lblHTML.Text &= strRequestedHTML 29: 30: End Sub 31: </script> 32: 33: <html> 34: <body> 35: <form runat="server"> 36: <font size=+1><b>Poor Man's Internet Explorer</b></font> 37: <br>Browse the Web: 38: <asp:textbox id="txtURL" runat="server" /><br> 39: <i>Enter a URL starting with <code>http://</code></i><br> 40: <asp:button id="btnSubmit" runat="server" Text=" Go! " 41: OnClick="btnSubmit_OnClick" /> 42: 43: <p><hr><p> 44: <asp:label id="lblHTML" runat="server" /> 45: </form> 46: </body> 47: </html>
Listing 2.6.1 starts out with some familiar lines of code. Line 1 Imports the System.Net namespace, which contains the WebClient class that we'll be using.
Because Listing 2.6.1 contains no Page_Load event handler, let's start our examination of the code with the HTML content. On line 35 a postback form is created. Inside this form are two form elements: a text box, txtURL, into which the user will enter the URL she wants to visit (line 38); and a Go button, btnSubmit, for the user to click to load the entered URL (lines 40 and 41). Note that the button has its OnClick event wired up to the btnSubmit_OnClick event handler. When this button is clicked, the form will be submitted; when the page reloads, the btnSubmit_OnClick event handler will execute.
The btnSubmit_OnClick event handler, starting on line 3, is responsible for grabbing the HTML from the URL entered by the user in the txtURL text box. To grab the HTML at the URL specified by the user, we need to simply create an instance of the WebClient class and call its DownloadData method.
Whenever a client sends an HTTP request to the server, a number of optional headers are usually passed along with various bits of information. Likewise, when the server returns the content to the client, a number of headers can be sent along with the data. (These headers are referred to as request headers and response headers, respectively.) The WebClient class has both Headers and ResponseHeaders properties, for reading the request and response headers, respectively. These properties are instances of the WebHeadersCollection class, which represents a collection of WebHeaders classes. Through this property, a developer can programmatically send request headers and iterate through the response headers.
To explicitly send a header to the server when making a Web request from an ASP.NET page, simply use the Add method of the WebHeadersCollection class to add a new request header:
Therefore, if you wanted to add the specific request header Foo: Bar, in Listing 2.6.1 you would need to add the following line of code before line 18 (where the actual Web request is initiated).
Recall that the Headers property returns a WebHeadersCollection class instance. The WebHeadersCollection class is derived from the NameValueCollection class, which means that we can treat the Headers property similar to an ordinary Hashtable collection. On lines 9 through 12, the request headers are displayed using a For Each loop. On lines 22 through 25, the response headers are displayed in a similar fashion.
Take a moment to examine Figure 2.20. Note that there are no request headers being sent. This is because the WebClient class does not implicitly set any request headers. That is, if you wish any request headers to be sent along, you must explicitly supply them.
Next, we need to actually do the work of downloading the HTML for the requested URL. This is accomplished via the WebClient class's DownloadData method (line 18). This method accepts a single parameter, the URL of the content to retrieve, and returns an array of bytes containing the result of the Web request. Since we'd rather deal with a String than an array of bytes, on line 16 a UTF8Encoding class is instantiated and, on line 18, the GetString method is used to convert the array of bytes returned by DownloadData into a String, which is then assigned to the variable strRequestedHTML (line 18). Listing 2.6.1 concludes by listing the response headers (lines 22 through 25) and then outputting strRequestedHTML (line 28).
Another useful class in the System.Net namespace is the DNS class. (This capability was not possible in classic ASP without the use of a third-party component.) This class can be used to resolve DNS hostnames into IP addresses. Such a technique could be used to verify email addresses by ensuring that the domain name specified by the user actually resolved to an existing domain name.
Listing 2.6.2 illustrates how to use the DNS classes Resolve method, along with a regular expression, to build a fairly reliable email address validation ASP.NET page. Of course the only way to truly guarantee that a user has entered his own valid email address is to require him to respond to a confirmation email message. However, the technique shown in Listing 2.6.2 is more reliable than just checking the email address against a regular expression.
Listing 2.6.2 ASP.NET Provides Built-In DNS Resolving Support
1: <%@ Import Namespace="System.Net" %> 2: <%@ Import Namespace="System.Net.Sockets" %> 3: <%@ Import Namespace="System.Text.RegularExpressions" %> 4: <script language="VB" runat="server"> 5: Sub btnSubmit_OnClick(sender as Object, e as EventArgs) 6: 'Check to make sure that the email addy is in the right form 7: Dim strEmail as String, strPattern as String 8: strEmail = txtEmail.Text 9: strPattern = "^[\w-_\.]+\@([\w]+\.)+\w+$" 10: 11: If Not Regex.IsMatch(strEmail, strPattern, RegexOptions.IgnoreCase) then 12: 'Invalid email address form! 13: Response.Write("<font color=red><i>Your email address is in an" & _ 14: " illegal format.</i></font><p>") 15: Else 16: 'Check to see if the domain name entered in the email address exists 17: Dim strDomain as String 18: strDomain = strEmail.Substring(strEmail.IndexOf("@") + 1) 19: 20: 'Attempt to Resolve the hostname 21: Dim strIP as String 22: try 23: strIP = DNS.Resolve(strDomain).AddressList(0)ToString() 24: 25: 'If we reach here, we have a valid email address, so do whatever 26: 'processing or whatnot needs to be done... 27: Response.Write("<b>Valid email address. Your domain name has " & _ 28: "an IP of " & strIP & ". Thank you!</b>") 29: catch se as SocketException 30: 'The DNS resolve was unsuccessful... 31: strIP = se.Message 32: 33: Response.Write("<font color=red><i>" & strIP & "</i></font><p>") 35: end try 36: End If 37: End Sub 38: </script> 39: 40: <html> 41: <body> 42: <form runat="server"> 43: <br><b>Enter your Email address:</b> 44: <asp:textbox id="txtEmail" runat="server" /> 45: <p> 46: <asp:button id="btnSubmit" runat="server" Text="Check Email" 47: OnClick="btnSubmit_OnClick" /> 48: </form> 49: </body> 50: </html>
Listing 2.6.2 begins by Importing three namespaces: System.Net, which contains the definition for the DNS class; System.Net.Sockets, which contains the definition for the SocketException class that we'll need to use if the user enters an email address with an invalid domain name; and System.Text.RegularExpressions because we are going to use a regular expression to ensure that the email address is in the proper format.
The code in Listing 2.6.2 creates a postback form with a text box, txtEmail, for the user to enter his email address (lines 42 and 44, respectively). The page also displays a button titled Check Email, which, when clicked, will submit the form, causing the btnSubmit_OnClick event handler to fire. This button, btnSubmit, is created on lines 46 and 47.
The task of the btnSubmit_OnClick event handler is to ensure that the user has entered a valid email address. We begin by reading the value of the txtEmail text box in to a String variable, strEmail (line 8). Then, on line 9, we create a regular expression pattern, strPattern, which, in short, looks for one or more characters preceding the @ sign, followed by one or more of a number of characters followed by a period, and then followed by another grouping of one or more characters. Whew, that sounds overly complex! Basically we are trying to ensure that the user has entered the following:
SomeCharacters@(SomeCharacters.(ONE OR MORE TIMES))SomeCharacters
Next, on line 11, we call the static version of the IsMatch method of the Regex class to determine if our pattern is found in the email address. If it is not, lines 12 through 14 are executed, which displays an error message. If the pattern is found, the code following the Else statement on line 15 is executed.
We could have simply used a RegularExpressionValidator control in our HTML document to ensure that the email address was valid. For more information on the RegularExpressionValidator control, refer to Chapter 3.
If the email address is in the proper format, all that is left to do is ensure that the domain name portion of the email address is valid. To do this, we must first snip out the domain name from the email address, which we do on line 18 using the Substring and IndexOf methods of the String class. When we have the domain name, we are ready to try to resolve it in to an IP addressif we can resolve the domain name in to an IP address, the domain name is valid; otherwise it is invalid.
The Resolve method of the DNS class resolves a domain name (in the form of a string) into an instance of the IPHostEntry class, which contains a list of IPAddress class instances (AddressList) that express the domain name in an IP address. If the domain name cannot be resolved into an IP address, a SocketException is thrown. For that reason, we must have the call to the Resolve method in a Try ... Catch block. On line 22 we begin the Try ... Catch block and, on line 23, make a call to Resolve, a static method. Because the Resolve method returns an IPAddress instance, we must call the ToString() method of the IPAddress class before assigning it to our string variable, strIP.
If the domain name is successfully resolved into an IP address, line 27 will be reached, where a message is displayed informing the user that his email address is valid (and also displaying the resolved IP address). If, however, the domain name was not resolved to an IP address, the Catch portion of the Try ... Catch block will be reached (line 29). On line 31, strIP is assigned to the value of the Message property of the SocketException exception. On line 33, this Message is displayed as an error message to the user.
Figures 2.21, 2.22, and 2.23 show the output of Listing 2.6.2 for various scenarios. Figure 2.21 shows the error message a user will receive if she enters her email address in an invalid format. Figure 2.22 depicts the output the user will be presented with if he enters an invalid domain name in his email address. Finally, Figure 2.23 shows the output of Listing 2.6.2 when the user enters an email address in a valid format with a valid domain name.
Understand that a clever user could easily bypass this check system by entering an invalid username portion of his email address with a valid domain name. For example, the domain name yahoo.com is, of course, valid. However, I suspect the username ThisIsAReallyLongNameAndIAmTryingToProveAPoint is not a registered @yahoo email address. Therefore, if a user were to enter the following:
the script would identify that email address as valid, even though it clearly does not really exist. As aforementioned, if you must guarantee that a user enters a valid email address, it is imperative that you send him a confirmation email and require him to respond.