Getting Information from Another Web Page Without Using a Third-Party ComponentBy Nathan Pond
Microsoft has provided us with a way to download a web page from ASP using the Microsoft Internet Transfer Control. Be warned, though, that Microsoft warns us that it might not be such a great idea to use this method. So before you go on and use the techniques discussed in this article, be sure to check out http://support.microsoft.com/support/kb/articles/q188/9/55.asp and be aware of the dangers.
These are very simple examples on how to download the HTML from a specified URL. More specifically, we are going to get the world population from http://www.census.gov/cgi-bin/ipc/popclockw and display it on our page. We will download the page, then use Regular Expressions to extract the world population. (For more information on Regular Expressions, be sure to visit our Regular Expressions Information page!) I have included both a JScript and VBScript example, simply because the JScript version is much cleaner, in my opinion, but most ASP deveopers use VBScript so I wrote a version for that as well. I couldn't get the Regular Expression to do exactly what I wanted in VBScript, so if you could fix that then the versions would be identical. Anyway, here's the JScript code:
As you can see, we create an instance of
InetCtls.Inet to download the page, then the regular
expression extracted the text between the
<H1> tags. We throw that into a variable and
then display it in HTML. There are many properties and methods for the internet control, all we use
RequestTimeout (which specifies how long to wait for the page),
URL (which is the
URL that we want to download), and
OpenURL() (which will actually download the web page and return
the raw HTML).
Anyway, onto the VBScript code, the only difference in logic is that the Regular Expression returns the
<H1> tags along with the World Population, so I also had to do a simple
Replace() do get rid of them.
COMMENT FROM ALERT 4GUYS READER BILL WILKINSON:
Rather than using a regular expression, you can also use:
sHTML = inet.OpenURL first = InStr( sHTML, "<H1>" ) + 4 last = InStr( first, sHTML, "</H1>" ) sWorldPop = Mid( sHTML, first, last-first )
Or any of several other 2 or 3 statement schemes.
You could argue that my version relies upon finding <H1> instead
of <h1>, where the regular expression scheme says "ignoreCase".
Fine, but then you turn around and do a pair of Replace calls that
depend upon <H1> and </H1> !! (We could both fix our code
using UCase, of course.)
I won't explain this code any further, because it it fairly self explanitory. I just wanted to share the Internet Controls with everyone. If you have any questions be sure to e-mail me at firstname.lastname@example.org, and I'll get back to you.