Searching Your Website with Microsoft Index ServicesBy Chris Gray
Virtually every Web application has a search functionality, from big eCommerce sites like Amazon.com, down to small blogs or personal sites that might have only a handful of pages. Search has become so ubiquitous that when visitors to your site want to find some past article they read, they'll immediately start hunting for the search capability.
From the website developer's perspective, providing search capabilities requires two steps:
- First, you must create an index of the site's contents. An index of a website is synonymous to the index in a book. If you want to read about a particular topic in the book you can quickly find the page(s) through the index, as opposed to having to read through the book's entire contents.
- Once an index has been created you need to be able to search through the index. Essentially, a user will enter a search string and you'll need to find the matching, relevant results in the index, displaying them to the user.
This article examines using Microsoft Index Services for your site's search functionality. With Index Services you can specify a specific group of documents or HTML pages to be indexed, and then create an ASP.NET page that can query this index. We'll build a simple, fast, and extensible search tool using .NET and Microsoft Indexing Services along with the Adobe IFilter plug-in, which allows MS Indexing Services to index PDF documents and display them in your search results. Read on to learn more!
Configuring Microsoft Indexing Services
The first step in creating an index for your search application is to configure Indexing Services on the IIS server that your Web application will be running. To do this you need access to the Web server itself. Open the Microsoft management console by clicking Start, then Run; type
mmcand click Ok. Next, to open the Indexing Services snap-in, you must:
- Click file,
- Click Add/Remove Snap-In,
- Click Add,
- Select the Indexing Service Snap-In,
- Click Add,
- Click Finish,
- Close the dialog
To create a new catalog - which is the vernacular Microsoft uses for an index - right-click on the Indexing Service node,
click New and then Catalog. You then need to choose a location to store the catalog file. Once you've done that, expand
the catalog that you just created and click on the directories icon. Right-click on the directories folder, click new
directory, and add the directory or directories that contain the content that you want to search. These directories
can reside anywhere that the host computer can access, virtual directories and even UNC paths (
may be used. However, each directory that is indexed must either reside physically, or be included as a virtual directory,
in the root of the website that you are indexing. If a directory is specified that is not in the web root via a physical
folder or virtual directory, the results will be displayed in your search, but they will return broken links.
Indexing Services will effectively index HTML, Word, and, once properly configured, PDF documents. To ensure that your required directories will be indexed you should verify that the index flag is properly set on the files and folders. You can verify this setting by right clicking on any folder or file and selecting properties. Click the "Advanced button" and make sure that the "For fast searching, allow indexing services to index this folder" checkbox is checked, as shown in the screenshot to the right.
Next, you want to set the properties of this catalog so that the HTML paths can be used, and so that Indexing Services will generate abstracts for the documents as they are indexed. To do this right-click on the catalog you just created and select Properties. On the tracking tab, you'll need to make sure that the "WWW Server:" field is set to the website that your application will be running from. This ensures that the html paths work as they should when you get to building the front-end for the search. If you want to display a short bit of each article along with your search results, then go to the Generation tab, uncheck "inherit above settings from service," then check generate abstracts and set the number of characters you wish to have displayed in each abstract.
If you want your search to include PDF documents, then you must install the Adobe IFilter extension. You can download this free of charge from Adobe: http://www.adobe.com/support/downloads/detail.jsp?ftpID=2611. This plug-in is a standard windows installer and requires no additional configuration. After the plug-in has been installed, PDF documents will automatically be included in the search results as they are indexed without any user intervention or configuration required.
When you navigate to the Directories folder in the catalog that you've created, you may notice that there one or more directories appear in addition to the ones you added in the previous step. These are website shares added automatically by Indexing Services, they and need to be excluded from indexing if you don't want your search to include them. To exclude these directories, you must find them in the file system via windows explorer. Next, right click the folder and choose Properties. From the dialog that appears click advanced and uncheck the box that says "For fast searching, allow index services to index this folder." (See the screenshot above) This will exclude the folder from your search results. The configuration of indexing services is now complete.
As you can see, an index may include as little as one folder of documents or as much as an entire website or group of websites. It's up to you to determine the breadth of the index. However, since Index Services does not crawl links like a spider, it will only catalog file system objects. Thus, the results from this search will include static files such as HTML pages, Word documents, and PDF documents, but not any dynamically generated pages. Changes made to these static documents will be picked up by Indexing Services and will very quickly be reflected in your search results.
Searching the Index
Once the index has been created, the next step is to build a search page that allows the website visitor to search through the index. To do this, you need, at minimum, a TextBox Web control for the end user to enter search terms, a Button Web control to initiate the search, and a Repeater control to display the results. The following markup shows the bare minimum markup for a simple search page:
This results page will display a list of results with a line for the document title followed by the abstract, which is generated by indexing services. Let's take a look at the code-behind class.
In the code-behind page, an OleDbConnection is attached to the Indexing Services catalog that we set up earlier. Once connected, the catalog can be searched using a variety of query languages, including SQL syntax. You can read about each of the language options here: Query Languages for Indexing Services. For this example, I'm going to use the IS Query Language to perform a freetext search which allows for natural language search capabilities, but you can modify your search to use Boolean, phrase, or any of the query types that indexing services support.
To set up the connection to the indexing services catalog you need to set up a OleDB connection as follows:
The fields returned from querying the index include:
- Doctitle: The title of document, which is the text between the
<title>tags in an HTML document or the text in the title field of a word or PDF document.
- Filename: The physical name of the file that the result was returned from.
- Vpath: The virtual path of the file where the result was returned from. This is the field you use to specify an HTML link to the file.
- Rank: The relevance of the returned result.
- Characterization: The abstract for the document, usually the first 320 characters.
Tying it All Together
While there are a number of ways in which you can display the results from your search, a Repeater is likely the most efficient and gives you the greatest control on how your results are formatted. The sample application that is attached demonstrates how to bind the results of your search to a Repeater control. It also adds paging functionality to the results that will make the results easier to use, as shown in the screenshot below. The results can easily be modified to show paths to documents or display the rankings of each result.
This search tool is small, fast, and simple enough to deploy for searching individual folders of specific content in your intranet or Internet site. However, it can easily be used to search entire sites composed of thousands of documents. Due to the power of Microsoft Indexing Services, all that you will need to do is alter the scope of the Indexing Services catalog to include any folders you want indexed. Adding new documents to these folders or modifying the existing documents will automatically be picked up by Indexing Services.
For more information about Indexing Services, be sure to read the following resources:
- Introduction to MS Index Server
- Tutorial to create search using FrontPage, but it has a list of all of the fields that you can search and/or display using MS IS
- Documentation for the different query languages you can use with IS
- IFilter Shop - A collection of plug-ins for IS that allow you to search more types of content