Screen Scrapes in ASP.NET
By Scott Mitchell
An Updated Version of this Article is Available |
---|
If you are using ASP.NET version 3.5 or beyond, consider using the Html Agility Pack, a free, open-source library that greatly simplifies screen scraping and parsing HTML documents. Read Parsing HTML Documents with the Html Agility Pack for more information. |
Introduction
One of the great things about ASP.NET is that many things that required a COM component in classic ASP are
easily accomplished through a native ASP.NET Web page. This is because ASP.NET has access to the hundreds of
.NET Framework classes, which are the same set of classes that all .NET-applications (from an ASP.NET Web page to
a stand-alone Windows application) use. (ASP.NET is currently in Beta2 (as of July 6th 2001) and
can be downloaded for free from http://www.ASP.NET.)
Some of the neat things that can be accomplished in an ASP.NET page, which in classic ASP required a component, include:
- Performing screen scrapes
- Sending email messages
- Working with regular expressions
- Creating dynamic GIF and JPG images
- Working with the Web server's file system
- Accessing the Windows Event Log and Performance Counters
- ... the list goes on and on!
This article will focus on how to quickly and easily perform a screen scrape via an ASP.NET Web page using the
System.Net.WebClient
class.
Performing Screen Scrapes in Classic ASP
Before we delve into performing screen scrapes with ASP.NET, let's look at what was required to accomplish this with
classic ASP. Since classic ASP cannot initiate an HTTP request, a COM component is required. There are a number of
free COM components that can perform screen scrapes, such as ASPTear and AspHttp. Not surprisingly, there are a
gaggle of articles on 4Guys explaining how to perform screen scrapes in classic ASP. If you are interested in
learning more about this, be sure to check out the following articles:
Grabbing Information from Other Servers, Grabbing
Table Columns from Other Pages, and this FAQ.
Performing Screen Scrapes in ASP.NET
To perform a screen scrape in ASP.NET, we will be using the WebClient
class. (For those who have spent
much time working with the Beta1 code, understand that this class is new to the Beta2 code release.) This class is
located in the System.Net
namespace. (The complete technical docs for this class can be
found here.)
The simplest use of this class involves just a few lines of code. The steps we need to perform include:
- Creating an instance of the class
- Calling the
DownloadData
method, passing in the URL we wish to scrape (which returns an array ofByte
s)
- Convert the downloaded data's
Byte
array into a String
Once we accomplish these three steps, we'll have successfully scraped the content from another URL and have it
sitting in a String variable, ready to manipulate or display however we see fit. So let's get started! Below
you will see the code for an ASP.NET Web page coded in VB.NET that tackles all three steps in the
Page_Load
event handler:
|
Note that we accomplish our three steps in the Page_Load
event handler. Of course, all of this
can be shortened down into two lines of code:
|
And there you have it, performing screen scrapes in ASP.NET! Pretty simple, eh? For more information on ASP.NET be sure to check out the ASP.NET Article Index!
For More Complicated Screen Scrapes... |
---|
The WebClient class is handy for making very simple HTTP requests. However, if you need a more advanced
HTTP request - one that tunnels through a proxy server, perhaps, or one that you want to specify a timeout for - you'll
need to use the HttpWebRequest class instead. (Actually, the WebClient class uses the
HttpWebRequest class internally.)
For more information on performing more "interesting" HTTP requests using the |
Happy Programming!