Published: Friday, July 06, 2001
Screen Scrapes in ASP.NET
By Scott Mitchell
| For More Information on ASP.NET |
|
For more information on ASP.NET be sure to check out the
ASP.NET Article Index. From there
you'll find a plethora of links to great ASP.NET articles, tutorials, and technical
docs!
|
Introduction
One of the great things about ASP.NET is that many things that required a COM component in classic ASP are
easily accomplished through a native ASP.NET Web page. This is because ASP.NET has access to the hundreds of
.NET Framework classes, which are the same set of classes that all .NET-applications (from an ASP.NET Web page to
a stand-alone Windows application) use. (ASP.NET is currently in Beta2 (as of July 6th 2001) and
can be downloaded for free from http://www.ASP.NET.)
Some of the neat things that can be accomplished in an ASP.NET page, which in classic ASP required a component,
include:
- Performing screen scrapes
- Sending email messages
- Working with regular expressions
- Creating dynamic GIF and JPG images
- Working with the Web server's file system
- Accessing the Windows Event Log and Performance Counters
- ... the list goes on and on!
This article will focus on how to quickly and easily perform a screen scrape via an ASP.NET Web page using the
System.Net.WebClient class.
Performing Screen Scrapes in Classic ASP
Before we delve into performing screen scrapes with ASP.NET, let's look at what was required to accomplish this with
classic ASP. Since classic ASP cannot initiate an HTTP request, a COM component is required. There are a number of
free COM components that can perform screen scrapes, such as ASPTear and AspHttp. Not surprisingly, there are a
gaggle of articles on 4Guys explaining how to perform screen scrapes in classic ASP. If you are interested in
learning more about this, be sure to check out the following articles:
Grabbing Information from Other Servers, Grabbing
Table Columns from Other Pages, and this FAQ.
Performing Screen Scrapes in ASP.NET
To perform a screen scrape in ASP.NET, we will be using the WebClient class. (For those who have spent
much time working with the Beta1 code, understand that this class is new to the Beta2 code release.) This class is
located in the System.Net namespace. (The complete technical docs for this class can be
found here.)
The simplest use of this class involves just a few lines of code. The steps we need to perform include:
- Creating an instance of the class
- Calling the
DownloadData method, passing in the URL we wish to scrape (which returns an array of
Bytes)
- Convert the downloaded data's
Byte array into a String
Once we accomplish these three steps, we'll have successfully scraped the content from another URL and have it
sitting in a String variable, ready to manipulate or display however we see fit. So let's get started! Below
you will see the code for an ASP.NET Web page coded in VB.NET that tackles all three steps in the
Page_Load event handler:
<%@ Import Namespace="System.Net" %>
<script language="VB" runat="server">
Sub Page_Load(sender as Object, e as EventArgs)
'STEP 1: Create a WebClient instance
Dim objWebClient as New WebClient()
'STEP 2: Call the DownloadedData method
Const strURL as String = "http://www.aspmessageboard.com/"
Dim aRequestedHTML() as Byte
aRequestedHTML = objWebClient.DownloadData(strURL)
'STEP 3: Convert the Byte array into a String
Dim objUTF8 as New UTF8Encoding()
Dim strRequestedHTML as String
strRequestedHTML = objUTF8.GetString(aRequestedHTML)
'WE'RE DONE! - display the string
lblHTMLOutput.Text = strRequestedHTML
End Sub
</script>
<html>
<body>
<h1>Screen Scrape of www.aspmessageboard.com</h1>
<p>
<asp:label id="lblHTMLOutput" runat="server" />
</body>
</html>
|
[
View a live demo!]
Note that we accomplish our three steps in the Page_Load event handler. Of course, all of this
can be shortened down into two lines of code:
<%@ Import Namespace="System.Net" %>
<script language="VB" runat="server">
Sub Page_Load(sender as Object, e as EventArgs)
'STEP 1: Create a WebClient instance
Dim objWebClient as New WebClient()
'STEP 2 and 3
Const strURL as String = "http://www.aspmessageboard.com/"
Dim objUTF8 as New UTF8Encoding()
lblHTMLOutput.Text = objUTF8.GetString(objWebClient.DownloadData(strURL))
End Sub
</script>
<html>
<body>
<h1>Screen Scrape of www.aspmessageboard.com</h1>
<p>
<asp:label id="lblHTMLOutput" runat="server" />
</body>
</html>
|
[
View a live demo!]
And there you have it, performing screen scrapes in ASP.NET! Pretty simple, eh? For more information on
ASP.NET be sure to check out the ASP.NET Article Index!
| For More Complicated Screen Scrapes... |
The WebClient class is handy for making very simple HTTP requests. However, if you need a more advanced
HTTP request - one that tunnels through a proxy server, perhaps, or one that you want to specify a timeout for - you'll
need to use the HttpWebRequest class instead. (Actually, the WebClient class uses the
HttpWebRequest class internally.)
For more information on performing more "interesting" HTTP requests using the HttpWebRequest class, refer to
A Deeper Look at Performing HTTP Requests in an ASP.NET Page.
If you need to make authenticated HTTP requests, check out Making Authenticated HTTP Requests from an ASP.NET Page.
|
Happy Programming!
By Scott Mitchell