To read the article online, visit http://www.4GuysFromRolla.com/webtech/040600-1.shtml

Getting Information from Another Web Page Without Using a Third-Party Component

By Nathan Pond


Microsoft has provided us with a way to download a web page from ASP using the Microsoft Internet Transfer Control. Be warned, though, that Microsoft warns us that it might not be such a great idea to use this method. So before you go on and use the techniques discussed in this article, be sure to check out http://support.microsoft.com/support/kb/articles/q188/9/55.asp and be aware of the dangers.

These are very simple examples on how to download the HTML from a specified URL. More specifically, we are going to get the world population from http://www.census.gov/cgi-bin/ipc/popclockw and display it on our page. We will download the page, then use Regular Expressions to extract the world population. (For more information on Regular Expressions, be sure to visit our Regular Expressions Information page!) I have included both a JScript and VBScript example, simply because the JScript version is much cleaner, in my opinion, but most ASP deveopers use VBScript so I wrote a version for that as well. I couldn't get the Regular Expression to do exactly what I wanted in VBScript, so if you could fix that then the versions would be identical. Anyway, here's the JScript code:

<%@ Language = JScript%>
<%
// The URL to download
var url = "http://www.census.gov/cgi-bin/ipc/popclockw"

// Create instance of Inet Control
inet = new ActiveXObject("InetCtls.Inet");

// Set the timeout property
inet.RequestTimeOut = 20;

// Set the URL property of the control
inet.Url = url;

// Actually download the file
var s = inet.OpenURL();

// Regular expression to find the string stored between
// the <H1> tags.  This is where the world population is.
rWorldPop = /<H1>(.*)<\/H1>/i;
// Execute the regular expression on the raw HTML
rWorldPop.exec( s );
var sWorldPop = RegExp.$1;
%>
<HTML>
<HEAD>
<TITLE>The world population</TITLE>
</HEAD>
<BODY>

<P>The World Population is : <%=sWorldPop %></P>

</BODY>
</HTML>

As you can see, we create an instance of InetCtls.Inet to download the page, then the regular expression extracted the text between the <H1> tags. We throw that into a variable and then display it in HTML. There are many properties and methods for the internet control, all we use RequestTimeout (which specifies how long to wait for the page), URL (which is the URL that we want to download), and OpenURL() (which will actually download the web page and return the raw HTML).

Anyway, onto the VBScript code, the only difference in logic is that the Regular Expression returns the <H1> tags along with the World Population, so I also had to do a simple Replace() do get rid of them.

COMMENT FROM ALERT 4GUYS READER BILL WILKINSON:
Rather than using a regular expression, you can also use:

	sHTML = inet.OpenURL
	first = InStr( sHTML, "<H1>" ) + 4
	last = InStr( first, sHTML, "</H1>" )
	sWorldPop = Mid( sHTML, first, last-first )

Or any of several other 2 or 3 statement schemes. You could argue that my version relies upon finding <H1> instead of <h1>, where the regular expression scheme says "ignoreCase". Fine, but then you turn around and do a pair of Replace calls that depend upon <H1> and </H1> !! (We could both fix our code using UCase, of course.)
END COMMENT

<% Option Explicit %>
<%
Dim url       'The URL to download
Dim inet      'Object for Inet Control
Dim sHTML     'String to hold HTML from download
Dim rWorldPop 'var to hold regular expression
Dim objCols   'Object to hold collections from Regular expression
Dim sWorldPop 'string to hold the world population
Dim objMatch  'Object for matches

url = "http://www.census.gov/cgi-bin/ipc/popclockw"

'Create instance of Inet Control
Set inet = Server.CreateObject("InetCtls.Inet")

'Set the timeout property
inet.RequestTimeOut = 20

'Set the URL property of the control
inet.Url = url

'Actually download the file
sHTML = inet.OpenURL()

'Regular expression to find the string stored between
'he <H1> tags.  This is where the world population is.
Set rWorldPop = New regexp
rWorldPop.Pattern = "<H1>(.*)<\/H1>"
rWorldPop.Global = False
rWorldPop.IgnoreCase = True
'Execute the regular expression on the raw HTML
Set objCols = rWorldPop.Execute( sHTML )

'Step through our matches
For Each objMatch in objCols
	sWorldPop = sWorldPop & objMatch.Value
Next

'Clean up
Set rWorldPop = Nothing
Set objCols = Nothing

'Strip the <H1> tags off of the world population
sWorldPop = Replace(Replace(sWorldPop, "<H1>", ""), "</H1>", "")

%>
<HTML>
<HEAD>
<TITLE>The world population</TITLE>
</HEAD>
<BODY>

<P>The World Population is: <%=sWorldPop %></P>

</BODY>
</HTML>

I won't explain this code any further, because it it fairly self explanitory. I just wanted to share the Internet Controls with everyone. If you have any questions be sure to e-mail me at npond@bgnet.bgsu.edu, and I'll get back to you.

Happy Programming!


  • By Nathan Pond

    Related Articles:

  • Using ASPTear to Grab Information Off of Other Web Servers

  • Article Information
    Article Title: Getting Information from Another Web Page Without Using a Third-Party Component
    Article Author: Nathan Pond
    Published Date: Thursday, April 06, 2000
    Article URL: http://www.4GuysFromRolla.com/webtech/040600-1.shtml


    Copyright 2017 QuinStreet Inc. All Rights Reserved.
    Legal Notices, Licensing, Permissions, Privacy Policy.
    Advertise | Newsletters | E-mail Offers