SyteKit - Part OneBy Marc Draco
Sytekit is a suite of ASP routines that everyone can use no matter what their level of skill to improve their pages. Although Sytekit is being marketed as ASPExpressware (more of which, later) I'm presenting a large part of it on 4Guys for its educational value. Putting away my soapbox, it's off to first utility. This recursive engine forms the driving force behind Floyd, the site search engine part of Sytekit. You can see Floyd in action at www.nima99.co.uk.
Floyd, presented in part 2 of this series (coming soon!), spiders an entire website - either locally or on the server - and maps it ready for high-speed searching just like a real-live search engine. At Floyd's heart is a version of this nifty little routine which locates its position on the server and scans all the folders and subfolders for objects of interest.
If you're new to programming you might not have come across recursion before. It's a well-understood and widely-used technique for examining trees and other binary structures. Confused? Don't worry. Most modern operating systems have something like Windows Explorer which shows the folder (directory) list as a tree. Each branch sub-divides into zero or more branches which eventually lead to directories. It's just like a family tree with great, great... grandaddy at the top and his descendants below. In fact, you can read a previously written article on 4Guys dealing with recursion: Recursion: Why It's Cool.
Figure 1 shows this in more detail. Folders (directories) are shown in green and files are shown as grey boxes; this also proves I cannot draw for toffee. Each folder can have zero or more descendants; it may even be completely empty like the one on the extreme right. It looks very complex and, in reality, it's usually a lot worse than this!
Searching this lot iteratively (that's in a repeating loop) would be a nightmare to program, but using a divide-and-conquer approach it's a breeze. Here's what we do:
1) Look in a directory and get the list of its contents.
3) Display the name of the file/folder we find.
4) If we find a directory, make a note of where we are and return to Step 1.
5) Until we've run out of things to process.
6) Return to where we were called from.
It's this last bit that confuses people when they first see it: because this is the essence of recursion. At step 6, the program "unwinds" back until it reaches the point just after it branched; Step 5. Eventually, the program runs out of things to process and "falls off" Step 6.
If all that left you feeling that you wished you hadn't started, read it again and you'll get it. When you get back here you understand recursion. I know that because if you follow that instruction implicitly, you're recursing through this article!
Dazed and Coded
OK, let's take a closer look at how this all works (note that the complete code is available for downloading). This:
Tells us where we are executing on the server. Don't be fooled into thinking that the directory you're executing in is the same one you FTP to, it probably isn't! (If you're using this on a desktop, you could set an absolute path like this:
Path = "c:" and get a listing of your entire hard disk - at least, until the script times out. More of that later.)
How many times have you seen that one? Straight out of Microsoft's documentation it creates an instance of a FilingSystemObject which we'll need to access files and folders. If you're new to the FileSystemObject, there's a great FileSystemObject FAQ on 4Guys. We'll be using the FileSystemObject in
ScanFolders like this:
Ask the FSO we created earlier for details on the current folder... note that this is handed in to the function as a parameter -that's important. The list of files within the folder is determined from the Files method of the FolderInfo object. Next, we iterate through each file displaying its name (coloured according to extension for a bit of pazazz). For the sake of this example, I've kept it very simple. A more complex scan will be found in Floyd; Part Two of this series.
Finally the following piece of code performs the recursion
Use of blockquotes indents the text further each level. You may want to examine the HTML generated by this code to see how it all fits together. Recursion isn't so much tricky as one of those things that suddenly clicks into place, so if you don't grab it straight away, don't worry, you will in time.
Time for Bed
IIS only allows scripts to execute for specified amount of time before it considers they have got lost and stops them with a message like this:
The maximum amount of time for a script to execute was exceeded. You can change this limit by specifying a new value for the property Server.ScriptTimeOut or by changing the value in the IIS administration tools
While you can increase the timeout I don't recommend it; only very large sites (thousands of pages) will take that long to scan. This only really becomes an issue for Floyd, as we'll see in the next part. Until then, happy programming!
So What's ASPExpressware?
I use David Wier's ASP Express to develop all my ASP code but it's Shareware and I'm running out of evaluation time. Being a programmer of more years than I care to recall, I believe that it's good karma to pay for tools we use. In order to do this on my limited income, I'm asking a small fee for Sytekit to cover the expense of registering ASP Express. Registration details to follow.