![]() |
|
|
Published: Thursday, November 30, 2000 Have you ever performed a search on 4Guys? If not, take a moment to visit the search page and try a search. I've received a number of questions from users asking how to search a Web site, so I thought it would make a great article to describe how I search 4Guys!
Originally, when 4Guys was a lot smaller and received less traffic on a daily basis, my search engine used the FileSystemObject to search through the text of each file on the Web server whenever a search was performed. In fact, I wrote up an article on how to do this back in December, 1999: Searching Through the Text of Each File on a WebSite. There are two common techniques used for content-rich sites like 4Guys. One method is to have the contents of each article stored in a database and to create a single ASP page to display each of these. Sites like ASPWatch.com and SQLTeam.com use this technique. The other approach is to have a Web page for each article. Sites like 4Guys and ASP101.com follow this model. On 4Guys, each article exists as its own file; this makes a textual search using the FileSystemObject a plausible solution. However, as the number of articles and visitors on 4Guys grew (as of 11/28/00 there are over 725 total articles and over 100,000 daily page views), the FileSystemObject approach slowed down considerably. I looked at using Index Server, but had fits getting it setup; also, I was wanting to create some sort of custom-database repository of the content on 4Guys. I then looked at using a product like XCache. For those unfamiliar with XCache, it is an application that allows a Web master to build a database of the site's content. Then, on a regular schedule, XCache will go through the database and turn it into a series of static HTML pages. This approach is useful for enhancing performance, since you remove all database calls (and all ASP execution time) from the site. Rather than go with any of these solutions, I decided to create my own. Since I already (at the time) had about 300 articles (and I was very comfortable with the process I had for adding new content to the site), I didn't want to make any changes that would disrupt existing content (or my methodologies for adding new articles). Therefore, I decided to sort of do the inverse of what XCache does: rather than creating a database of my site's content and scheduling the creation of a static version for the site, I decided to write a script that would my existing (and future) static content, and build up a database of this information.
With that in mind, I created a database table,
For each article on 4Guys, I'd add a row to the table. I automated this process by
creating a simple script that would iterate through the ASP pages that comprised each article on 4Guys
and use the FileSystemObject to populate each of the columns. I then used the task scheduler to schedule this
script to execute once a day, late at night. (Each time it ran, it obliterated all of the contents of the
The script that builds up the database each night borrows a lot of its code from Searching Through the Text of Each File on a WebSite. The same code presented in Part 3 of that article is used in the database-building script. Some code has been added, though, to insert a row into the database for each article found. I am not going to go into detail on how the database-building script works, for I think it is pretty self-explanatory if you've thoroughly read Searching Through the Text of Each File on a WebSite. The database-building script's source can be viewed here.
Please do take a moment and check out the database-building script.
It is important to realize that each article on 4Guys has an HTML header. Go ahead and do a View/Source on
this article and you will see what I mean. The title of the article is wedged between a
Now that we have the
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() |
![]() |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||