Published: Tuesday, November 10, 1998
Implementing a Search Engine in ASP
Introduction:
As a web site grows, finding content on the site becomes increasingly
difficult. To combat the difficulty of finding relevant information on a
large site, many developers turn to writing a search engine for their
site. This article discusses how to implement such a system using Active
Server Pages and SQL Server.
There are two "types" of search engines. Both take a search string from
the user to begin, but what, exactly, they search differs. A completely
dynamic search engine for a completely dynamic web site will hit a
database table which ties an article URL to the articles description. The
database can then compare the user's search request to the descriptions of
the available articles and return the relevant URLs.
Another approach is to do an actual text search through each of the files.
For example, say that the user searched for "Microsoft." Your search
engine would then look through all of your HTML files and return the URLs
of those which had the word "Microsoft" somewhere in the document. Such a
system is used for this web site's search engine. In my opinion, it is
much easier to write such a system described in Perl (which this system is
written in), than in Active Server Pages; however, it is quite possible to
write a text-finding search system in ASP.
In this article I plan to implement the former search engine, the dynamic
search engine. For this example I will make a table called ArticleURL,
which will have the following definition:
| ArticleURL |
|
ArticleURLID
|
int PK
|
|
URL
|
varchar(255)
|
|
Title
|
varchar(100)
|
|
Description
|
varchar(255)
|
Now that we've got our table definition, let's look at how our web
visitors will enter their queries.
Search Querying
A search engine is rather useless unless queries can be made, and the
results are returned. Let's examine how we will code the first needed
part, the user search requests. All we will need is a simple HTML FORM
which takes input from the user and passes it on to an ASP page. Here is
an example of a file we'll call SearchStart.htm:
<HTML>
<BODY>
<FORM METHOD=POST ACTION="Search.asp&ID=0">
Search for: <INPUT TYPE=TEXT NAME="txtSearchString" SIZE="50">
<P>
<INPUT TYPE=SUBMIT>
</FORM>
</BODY>
</HTML>
This, of course, is not a pretty HTML page, but its functionality is
there. There are many things which could be done to
enhance this page. It is recommended that JavaScript functions be present
to make sure the user is searching something (i.e. not just
clicking Submit when there is no search string).
Now that we have the Query, we need to look at the second phase of any
search engine: retrieving the data and presenting it to the user. Here is
where the real fun begins!
Retrieving the Data and Presenting It:
Our ASP page Search.asp must do a few steps. First, it must
parse the FORM variable txtSearchString. Right now, I am assuming that
each word in the txtSearchString separated by a space will be ANDed
together. You can alter this (have it ORed), or, to make it more
professional, you can give the user the option of which boolean to put
inbetween each spaced word.
Next, Search.asp will need to hit the database table
ArticleURL and return the data in a user-friendly fashion. Also, we will
want to display the results only 10 records at a time, so logic will need
to be implemented to handle this as well. Let's look at some code.
<%
'Connect to Database
Dim Conn
Set Conn = Server.CreateObject("ADODB.Connection")
Conn.Open Application("MyConnectString")
'Set these up to your preference
DefaultBoolean = "AND"
RecordsPerPage = 10
'Get our form variable
Dim strSearch
strSearch = Request.form("txtSearchString")
'Get our current ID. This let's us know where we are
Dim ID
ID = Request.QueryString("ID")
'Set up our SQL Statement
Dim strSQL, tmpSQL
strSQL = "SELECT * FROM ArticleURL WHERE "
tmpSQL = "(Description LIKE "
'OK, we need to parse our string here
Dim Pos
Pos = 1
While Pos > 0
Pos = InStr(1, strSearch," ")
If Pos = 0 Then
'We have hit the end
tmpSQL = tmpSQL & "'%" & strSearch & "%')"
Else
tmpSQL = tmpSQL & "'%" & Mid(strSearch,1,Pos) &
"%' " & DefaultBoolean & " Description LIKE "
strSearch = Mid(strSearch,Pos+1,len(strSearch))
End If
Wend
'Now, we've got to make sure we only get the right records
strSQL = strSQL & tmpSQL & " AND ArticleURLID > " & ID
strSQL = strSQL & " ORDER BY ID" 'Important!
'Make our Recordset variable and get the results
Dim rsResults
Set rsResults = Server.CreateObject("ADODB.Recordset")
'Get the right number of records per page
rsResults.MaxRecords = RecordsPerPage
'Set our recordset properties (include ADOVBS.inc for the
constant definitions!)
rsResults.CursorType = adForwardOnly
'Get our data
rsResults.Open strSQL
'OK, we've got the data, let's display it in HTML
'First, though, let's get the total number of records
Dim rsTotalRecords
strSQL = "SELECT COUNT(*) FROM ArticleURL WHERE " & tmpSQL
Set rsTotalRecords = Conn.Execute(strSQL)
'We also need the max ID value for our search
Dim rsMaxID
strSQL = "SELECT MAX(ArticleURLID) FROM ArticleURL WHERE " & tmpSQL
Set rsMaxID = Conn.Execute(strSQL)
%>
<HTML>
<BODY>
<% if rsResults.EOF then 'No matches found
%>
No matches found! Try broadening your search
criteria.<P>
<A HREF="SearchStart.htm">Return to
Search</A>
<% Else
Dim iCurrentID
While Not rsResults.EOF
iCurrentID = rsResults("ArticleURLID") %>
<A HREF="<%=rsResults("URL")%>">
<%=rsResults("Title")%></A>
<%=rsResults("Description")%>
<% rsResults.MoveNext
Wend %>
<P>
<%=rsTotalRecords(0)%> Found!<BR>
<% if iCurrentID < rsMaxID(0) then %>
<!-- We have at least another record... -->
<FORM METHOD=POST
ACTION="Search.asp?ID=<%=iCurrentID%>">
<INPUT TYPE=HIDDEN NAME="txtSearchString"
VALUE="<%=Request.form("strSearchString")%>">
<INPUT TYPE=SUBMIT VALUE="Next">
</FORM>
<% end if
end if 'End if for .EOF clause above %>
</BODY>
</HTML>
Note: Please forgive me if there are many errors or
typos. I wrote this code while writing this article. It has not been
fully tested. In theory it should work. More important than running
source code are the ideas behind the code. Source code is a mere
transformation of ideas into something a computer can understand. If you
truly understand the ideas, the code should write itself.
Hopefully you can understand what this code is doing. This
file, Search.asp, will be called the first time a search is
executed and each time the user wants to view the next N records. To
start out, the file gets the search string and the current ID. The
current ID is an important value, it tells this page which records we've
already seen. The SQL searches for records who have an ArticleURLID
greater than the passed in ID. To start off, we pass in an ID of 0, so
all records (assuming ArticleURLID was set as an
IDENTITY(1,n))) will be included.
Next we parse out our Search String into a string variable called
tmpSQL. If the user searched on "Magnum P
I", tmpSQL would contain (Description LIKE
'%Magnum%' AND Description LIKE '%P%' AND Description LIKE '%I%').
We then add to our WHERE clause ArticleURLID > ID, where ID
is the ID we pass into Search.asp.
Next, we create an instance of an ADO Recordset object, and set the
MaxRecords property to N, where N is the number of rows we want to display
per page. This will only return N records to our recordset object.
Finally we get the total number of records which match our search criteria
and the maximum ID which matches our criteria. We need the maximum ID to
determine if we are currently on the last recordset. Once we have all of
this data we are ready to display our information.
We start out by seeing if we have any information in the first place!
You'll not the if rsResults.EOF then. If no records are
found then we inform the user that we could find no results and provide a
link back to the SearchStart.htm page from which they came.
If, however, rsResults is not empty, we iterate through the recordset. We
then check to see if our last ArticleURLID is less than the maximum ID.
If it is, then we know we have at least one more record to show, so we
display the "Next" button which will display the next N records.
Areas for Improvement:
As I'm sure you can note, this search engine solution leaves a lot to be
desired as far as functionality goes when we compare it to standard
internet search engines. For example, there is no Back button, only a
forward. Also, you cannot do any complex boolean searches, such as:
"Microsoft AND 'Active Server Pages' AND NOT (VBScript OR JScript)".
These can both be accomplished, though!
Personally, I have written a parser which accepted complex boolean
searches similar to the one shown above and transformed it into a SQL
WHERE clause. To implement a Back feature, I would recommend a dynamic
Array (or stack). You would need to put this in a Session-level variable.
Each time the user hits Next, you will want to push the
Request.QueryString("ID") onto the stack. When they hit the "Back" button
you'll want to pop the last ID off the stack and pass it as ID to
Search.asp.
From Alert 4Guys Reader Sam N.:
I feel that it's important to point out that the system you suggest scales
extremely poorly. Using the LIKE operator and substring matches against
large database tables always requires a full table scan, so the time to
query increases linearly in the amount of content.
To actually build a production full-text search engine, an indexer is
typically required, either in the database or (more commonly) as a
separate application.
(To learn more about Index Server, Microsoft's indexing software, be sure to visit
our Learn More section!)
|
Conclusion:
In this article we've examined how to implement a simplistic dynamic
search engine using Active Server Pages and SQL Server. While the model
implemented in this article is not exactly "feature-ful," it does search,
and presents the basic ideas behind a search engine. Without major
modifications, this system could be transformed into a very impressive,
professional looking search engine.
Happy Programming!