Published: Thursday, April 06, 2000
Getting Information from Another Web Page Without Using a Third-Party Component
By Nathan Pond
Microsoft has provided us with a way to download a web page from ASP using the Microsoft Internet Transfer
Control. Be warned, though, that Microsoft warns us that it
might not be such a great idea to use this method. So before you go on and use the techniques discussed in
this article, be sure to check out
http://support.microsoft.com/support/kb/articles/q188/9/55.asp
and be aware of the dangers.
These are very simple examples on how to download the HTML from a specified URL. More specifically, we are going
to get the world population from http://www.census.gov/cgi-bin/ipc/popclockw
and display it on our page. We will download the page, then use Regular Expressions to extract the world
population.
(For more information on Regular Expressions, be sure to visit our
Regular Expressions Information page!)
I have included both a JScript and VBScript example, simply because the JScript version is much cleaner, in my
opinion, but most ASP deveopers use VBScript so I wrote a version for that as well. I couldn't get the
Regular Expression to do exactly what I wanted in VBScript, so if you could fix that then the versions would
be identical. Anyway, here's the JScript code:
<%@ Language = JScript%>
<%
// The URL to download
var url = "http://www.census.gov/cgi-bin/ipc/popclockw"
// Create instance of Inet Control
inet = new ActiveXObject("InetCtls.Inet");
// Set the timeout property
inet.RequestTimeOut = 20;
// Set the URL property of the control
inet.Url = url;
// Actually download the file
var s = inet.OpenURL();
// Regular expression to find the string stored between
// the <H1> tags. This is where the world population is.
rWorldPop = /<H1>(.*)<\/H1>/i;
// Execute the regular expression on the raw HTML
rWorldPop.exec( s );
var sWorldPop = RegExp.$1;
%>
<HTML>
<HEAD>
<TITLE>The world population</TITLE>
</HEAD>
<BODY>
<P>The World Population is : <%=sWorldPop %></P>
</BODY>
</HTML>
|
As you can see, we create an instance of InetCtls.Inet to download the page, then the regular
expression extracted the text between the <H1> tags. We throw that into a variable and
then display it in HTML. There are many properties and methods for the internet control, all we use
RequestTimeout (which specifies how long to wait for the page), URL (which is the
URL that we want to download), and OpenURL() (which will actually download the web page and return
the raw HTML).
Anyway, onto the VBScript code, the only difference in logic is that the Regular Expression returns the
<H1> tags along with the World Population, so I also had to do a simple
Replace() do get rid of them.
COMMENT FROM ALERT 4GUYS READER BILL WILKINSON:
Rather than using a regular expression, you can also use:
sHTML = inet.OpenURL
first = InStr( sHTML, "<H1>" ) + 4
last = InStr( first, sHTML, "</H1>" )
sWorldPop = Mid( sHTML, first, last-first )
Or any of several other 2 or 3 statement schemes.
You could argue that my version relies upon finding <H1> instead
of <h1>, where the regular expression scheme says "ignoreCase".
Fine, but then you turn around and do a pair of Replace calls that
depend upon <H1> and </H1> !! (We could both fix our code
using UCase, of course.)
END COMMENT
<% Option Explicit %>
<%
Dim url 'The URL to download
Dim inet 'Object for Inet Control
Dim sHTML 'String to hold HTML from download
Dim rWorldPop 'var to hold regular expression
Dim objCols 'Object to hold collections from Regular expression
Dim sWorldPop 'string to hold the world population
Dim objMatch 'Object for matches
url = "http://www.census.gov/cgi-bin/ipc/popclockw"
'Create instance of Inet Control
Set inet = Server.CreateObject("InetCtls.Inet")
'Set the timeout property
inet.RequestTimeOut = 20
'Set the URL property of the control
inet.Url = url
'Actually download the file
sHTML = inet.OpenURL()
'Regular expression to find the string stored between
'he <H1> tags. This is where the world population is.
Set rWorldPop = New regexp
rWorldPop.Pattern = "<H1>(.*)<\/H1>"
rWorldPop.Global = False
rWorldPop.IgnoreCase = True
'Execute the regular expression on the raw HTML
Set objCols = rWorldPop.Execute( sHTML )
'Step through our matches
For Each objMatch in objCols
sWorldPop = sWorldPop & objMatch.Value
Next
'Clean up
Set rWorldPop = Nothing
Set objCols = Nothing
'Strip the <H1> tags off of the world population
sWorldPop = Replace(Replace(sWorldPop, "<H1>", ""), "</H1>", "")
%>
<HTML>
<HEAD>
<TITLE>The world population</TITLE>
</HEAD>
<BODY>
<P>The World Population is: <%=sWorldPop %></P>
</BODY>
</HTML>
|
I won't explain this code any further, because it it fairly self explanitory. I just wanted to share the
Internet Controls with everyone. If you have any questions be sure to e-mail me at
npond@bgnet.bgsu.edu, and I'll get back to you.
Happy Programming!
By Nathan Pond
Related Articles:
Using ASPTear to Grab Information Off of Other Web Servers