When you think ASP, think...
Recent Articles
All Articles
ASP.NET Articles
ASPFAQs.com
Message Board
Related Web Technologies
User Tips!
Coding Tips

Sections:
Sample Chapters
Commonly Asked Message Board Questions
JavaScript Tutorials
MSDN Communities Hub
Official Docs
Security
Stump the SQL Guru!
XML Info
Information:
Feedback
Author an Article
Technology Jobs
ASP ASP.NET ASP FAQs Message Board Feedback ASP Jobs
Print this page.
Published: Friday, July 06, 2001

Screen Scrapes in ASP.NET

By Scott Mitchell


An Updated Version of this Article is Available
If you are using ASP.NET version 3.5 or beyond, consider using the Html Agility Pack, a free, open-source library that greatly simplifies screen scraping and parsing HTML documents. Read Parsing HTML Documents with the Html Agility Pack for more information.

Introduction
One of the great things about ASP.NET is that many things that required a COM component in classic ASP are easily accomplished through a native ASP.NET Web page. This is because ASP.NET has access to the hundreds of .NET Framework classes, which are the same set of classes that all .NET-applications (from an ASP.NET Web page to a stand-alone Windows application) use. (ASP.NET is currently in Beta2 (as of July 6th 2001) and can be downloaded for free from http://www.ASP.NET.)

Some of the neat things that can be accomplished in an ASP.NET page, which in classic ASP required a component, include:

  • Performing screen scrapes
  • Sending email messages
  • Working with regular expressions
  • Creating dynamic GIF and JPG images
  • Working with the Web server's file system
  • Accessing the Windows Event Log and Performance Counters
  • ... the list goes on and on!

This article will focus on how to quickly and easily perform a screen scrape via an ASP.NET Web page using the System.Net.WebClient class.

Performing Screen Scrapes in Classic ASP
Before we delve into performing screen scrapes with ASP.NET, let's look at what was required to accomplish this with classic ASP. Since classic ASP cannot initiate an HTTP request, a COM component is required. There are a number of free COM components that can perform screen scrapes, such as ASPTear and AspHttp. Not surprisingly, there are a gaggle of articles on 4Guys explaining how to perform screen scrapes in classic ASP. If you are interested in learning more about this, be sure to check out the following articles: Grabbing Information from Other Servers, Grabbing Table Columns from Other Pages, and this FAQ.

Performing Screen Scrapes in ASP.NET
To perform a screen scrape in ASP.NET, we will be using the WebClient class. (For those who have spent much time working with the Beta1 code, understand that this class is new to the Beta2 code release.) This class is located in the System.Net namespace. (The complete technical docs for this class can be found here.)

The simplest use of this class involves just a few lines of code. The steps we need to perform include:

  1. Creating an instance of the class
  2. Calling the DownloadData method, passing in the URL we wish to scrape (which returns an array of Bytes)
  3. Convert the downloaded data's Byte array into a String

Once we accomplish these three steps, we'll have successfully scraped the content from another URL and have it sitting in a String variable, ready to manipulate or display however we see fit. So let's get started! Below you will see the code for an ASP.NET Web page coded in VB.NET that tackles all three steps in the Page_Load event handler:

<%@ Import Namespace="System.Net" %>
<script language="VB" runat="server">
  Sub Page_Load(sender as Object, e as EventArgs)
    'STEP 1: Create a WebClient instance
    Dim objWebClient as New WebClient()


    'STEP 2: Call the DownloadedData method
    Const strURL as String = "http://www.aspmessageboard.com/"
    Dim aRequestedHTML() as Byte

    aRequestedHTML = objWebClient.DownloadData(strURL)

    'STEP 3: Convert the Byte array into a String
    Dim objUTF8 as New UTF8Encoding()
    Dim strRequestedHTML as String
    strRequestedHTML = objUTF8.GetString(aRequestedHTML)


    'WE'RE DONE! - display the string
    lblHTMLOutput.Text = strRequestedHTML
  End Sub
</script>

<html>
<body>
  <h1>Screen Scrape of www.aspmessageboard.com</h1>
  <p>
  <asp:label id="lblHTMLOutput" runat="server" />
</body>
</html>
[View a live demo!]

Note that we accomplish our three steps in the Page_Load event handler. Of course, all of this can be shortened down into two lines of code:

<%@ Import Namespace="System.Net" %>
<script language="VB" runat="server">
  Sub Page_Load(sender as Object, e as EventArgs)
    'STEP 1: Create a WebClient instance
    Dim objWebClient as New WebClient()


    'STEP 2 and 3
    Const strURL as String = "http://www.aspmessageboard.com/"
    Dim objUTF8 as New UTF8Encoding()
    lblHTMLOutput.Text = objUTF8.GetString(objWebClient.DownloadData(strURL))
  End Sub
</script>

<html>
<body>
  <h1>Screen Scrape of www.aspmessageboard.com</h1>
  <p>
  <asp:label id="lblHTMLOutput" runat="server" />
</body>
</html>
[View a live demo!]

And there you have it, performing screen scrapes in ASP.NET! Pretty simple, eh? For more information on ASP.NET be sure to check out the ASP.NET Article Index!

For More Complicated Screen Scrapes...
The WebClient class is handy for making very simple HTTP requests. However, if you need a more advanced HTTP request - one that tunnels through a proxy server, perhaps, or one that you want to specify a timeout for - you'll need to use the HttpWebRequest class instead. (Actually, the WebClient class uses the HttpWebRequest class internally.)

For more information on performing more "interesting" HTTP requests using the HttpWebRequest class, refer to A Deeper Look at Performing HTTP Requests in an ASP.NET Page. If you need to make authenticated HTTP requests, check out Making Authenticated HTTP Requests from an ASP.NET Page.

Happy Programming!

  • By Scott Mitchell



  • ASP.NET [1.x] [2.0] | ASPMessageboard.com | ASPFAQs.com | Advertise | Feedback | Author an Article