Published: Saturday, December 04, 1999
Searching Through the Text of Each File on a WebSite, Part 3
In Part 2 we discussed the all but the recursive searching function
for search.asp. At this point, we have read in the form field variables, and used the
FileSystemObject to obtain the information for the folder we are interested in searching.
All we have left is to write the function that will search through a particular folder and its subfolders,
examining the contents of the files, looking for potential matches.
This function has the following definition:
GetFiles(FolderObject, TermsArray, LastFileFound, SearchUsingAndBooleanLogic,
FilesFoundCount)
|
The FolderObject is one of the objects encompassed by the FileSystemObject
library (which was obtained using the GetFolder method). TermsArray is
the array we created in Part 2. This array should contain 1 to
many elements, each element of the array corresponding to one of the search terms entered by the user.
LastFileFound is a string variable, representing the file name and path of the
last listed file. SearchUsingAndBooleanLogic is a boolean variable that, if True,
indicates that the user wants to search using AND between each term. If this variable is
False, then the user wants to search using an OR between each term. Finally,
FilesFoundCount is the number of matching files we've currently found.
Here is the exact function definition used in search.asp:
Function GetFiles(objFolder, aLookFor, strLF, bolLFFound,bolAnd, iCount)
|
Now, let's delve into this function! First off, there are two times when we want to exit the function
immediately. First, if iCount is greater than the number of items we want to show per page, we need to
exit the function. (Since this function may be called recursively, we need to have these checks at the
beginning of our function.) Also, if the current folder being iterated begins with an underscore, we
can assume it is a folder like _private or _vti_conf, and we don't want to
iterate through those. The following lines of code should start off GetFiles:
'Exit function if current folder begins with an underscore
If Left(objFolder.Name,1) = "_" then exit function
'If iCount > # of hits per page, exit function
Const iListPerPage = 9
if iCount > iListPerPage then Exit Function
|
Now we need to iterate through each file in the folder object objFolder. We'll do so using
a For Each ... Next loop. First, we'll need to define a number of variables, though:
Dim objFile, objTextStream, objFSO, strContents, iUBound, iLoop, bolValid
Dim strTitle, iPos, strDesc
iUBound = UBound(aLookFor)
'Iterate through each file in objFolder
For Each objFile in objFolder.Files
'... continued in next code example ...
|
We don't want to search every file, only those files that have the .SHTML extension. For
your site, you may only want to search .ASP files, or .ASP files and
.HTML files. Also, we do not want to search through any files until we've found the last
file searched for. Why spend the time opening a file and looking through its contents if we know, that
we may have already listed this file? Finally, we do not want to parse through a file if its file size is
zero characters, because doing so will cause an error!
If we do reach a point where we want to search through a file (that is, the file has the proper extension,
we have already found the last file, and the filesize is greater than zero bytes, then we want to use the
FileSystemObject to open up the text file, reading in the contents of the file into a
single string variable. The following code will perform these checks and, if the current file passes thse
checks, the contents will be read into a string variable, strContents:
'Do we need to search this file?
If UCase(Right(objFile.Name,6)) = ".SHTML" then
If bolLFFound then
if objFile.Size > 0 then
'Read the contents of the file into a string variable
Set objFSO = Server.CreateObject("Scripting.FileSystemObject")
Set objTextStream = objFSO.OpenTextFile(objFile.Path,1)
strContents = objTextStream.ReadAll
objTextStream.Close
Set objFSO = Nothing
'... continued in next code block ...
|
Now that we have the contents of the file, we need to determine whether or not the terms in
aLookFor exist within the contents of the file. We need to iterate from
LBound(aLookFor) to UBound(aLookFor). If the user wants to search using
AND logic, then each element in aLookFor must exist in the file's contents.
If, however, the user wants to search using the OR logic, then only one element in aLookFor
needs to exist in the file's contents. The following For statement will determine whether
or not a file is valid - that is, if the file has the terms needed to satisfy the conditions specified
by the boolean logic and the search terms. Also, if the file is valid, a link to the file is
displayed.
if bolAnd then bolValid = True else bolValid = False
For iLoop = 0 to iUBound
If InStr(1,strContents,aLookFor(iLoop),1) then
if Not bolAnd then bolValid = True
Else
If bolAnd then bolValid = False
End If
Next
If bolValid then
'Display the file as a hit
Response.Write objFile.Name & ""
'Increment the # of files found
iCount = iCount + 1
End if
'... code continued in next code block ...
|
The For loop iterates through all of the elements in the array aLookFor (recall
that iUBound was set to UBound(aLookFor) before our For Each ... Next
loop). If the file is "valid," then we display the title and description of the file, with a link to
the file. I will leave that as an exercise for the reader. (Note that each article on 4Guys has, at the
top, <!--TITLE:Title-->, and Description-->.
It is important that, when we find a file, we increment iCount.
What do we do when we've displayed our nth link (where n is the number of total "hits" we want to
display per paged result)? At that point, we need to exit the function - no need to continue processing
any files when we know we aren't going to be displaying any of them. For this reason, immediately after
our display and iCount incrementation, we need to check to see if iCount is
greater than iListPerPage, our constant that represents the number of links we are going to
show per page. The following code will accomplish this:
If iCount > iListPerPage then
strLF = FormatURL(objFile.Path)
exit function
End If
'... continued in next code block ...
|
Note that, if we have listed all of our links, we need to set strLF to the URL of the
final file we processed. The FormatURL function takes a physical path and translates it
into a URL-type path. We'll look at this function in a bit.
Earlier in our For Each ... Next loop, we checked to see whether or not bolLFFound
was True or not. If it was, we processed the file. If it is False, that means
we have not yet reached the last link showed on the previous search page. If it is currently False,
we need to check whether or not we need to make it True now. All we do is compare
strLF to the current file name. If they match, then it is time to set bolLFFound
to True. The following code example will do that for us:
Elseif FormatURL(objFile.Path) = strLF then
bolLFFound = True
End If
End if
Next
'... continued in next code block ...
|
We have now completed our For Each ... Next loop! We still need to recurse through all
of the subfolders, though, so we need to loop through each subfolder of objFolder,
recursively calling GetFiles. If you're unfamiliar with recursion, you really should read
Recursion, Why it's Cool. The following code will iterate through
each of the subfolders of objFolder, and each of those subfolder's subfolders and so on.
Once we've done this, we're done with our function, GetFiles. Here's the final code:
Dim objSubFolder
For Each objSubFolder in objFolder.SubFolders
GetFiles objSubFolder,aLookFor,strLF,bolLFFound,bolAnd,iCount
Next
End Function
|
Note that we recursively pass GetFiles all of the same variables. This concludes our
examination of the code for search.asp. We still need to look at the function FormatURL,
and we also need to look at displaying our search results and providing the Show more results
link! We'll tackle these issues in Part 4.
Read Part 4
Read Part 2
Read Part 1