Spell Check a String with ASP and Microsoft Word AutomationBy Eric Blanpain
More than 50% of search queries on my site return no results simply because of misspelled words in the search query. Search engines like Google provide helpful suggestions on new searches if you happen to misspell a word in your search query. Perhaps you've wondered how you could do the same for the search engine on your own Web site? In this article we'll look at a server-side solution that uses Microsoft Word to provide spell checking and suggestions for your search engine.
Essentially the code to perform a spell check opens the Word object on the server, adds the user's
search query to it, and then uses the proofing capabilities of Word, returning the suggested spelling
corrections (if any). We want to do this server-side, so we can not use Word's
since it opens a message box and waits until a user interactively manually validates the correction.
Instead we will use the
SpellingErrors method, which returns a collection of spelling errors
and their suggestions.
Using Microsoft Word on the server-side has some performance and security implications, as discussed
later in the article. Note that for this code to work you will need to have Word installed on the Web
server and setup so that the anonymous Web user (
IUSR_machinename) can perform
Word automation (how to do this is discussed later on in this article). The performance is a
bit sluggish, too, and can take up to several full seconds for larger search queries. You should
consider using this approach only for lightly loaded Web servers.
Spell Checking with Microsoft Word
To spell check the user's search engine query, we will first create an instance of the Microsoft Word object, like so:
Next, we must create a document in the Word object whose content is the user's query that we wish to spell check. This is accomplished by creating a new Word document and adding to the document the user's query, like so:
QueryText is a string variable that holds the user's query...)
Now we want to check if there are any spelling errors in the document. To do this we call the
SepllingErrors method, which returns a collection of misspelled words. The
property indicates how many words were misspelled. Hence, if the
Count property is
greater than 0, then we know we have misspelled words.
If there are spelling words we want to loop through each word in our document, determining if
the word is spelled correctly or not. If it is, we want to display that it was spelled correctly,
otherwise we want to get the suggestions for the correct spelling (if any exist). To do this
we simply loop through all of the words in our document and see if the word contains any spelling
errors. If it does, we call the
GetSpellingSuggestions method, which returns a collection
of suggested spellings for the misspelled word. We opt to select the first item in this collection as
a suggestion to the user on how to re-enter their search query.
That's all there is to it code-wise. Of course, to get the code to work it is imperative that you setup the security settings for Word properly.
Setting Up Word Security
To run this code you will need Microsoft Word installed on the Web server with the proper security settings. These security settings allow the anonymous Internet user to access Word on the server. There is a good Microsoft article on how to achieve this, available at: http://support.microsoft.com/default.aspx?scid=kb;en-us;Q288367. Make sure you read this article! If you don't properly specify the security settings the code will not work, resulting in an
(0x800A175D) Could not open macro storage... error message.
I've found it best is to create a
MSWordUser group account and let
account be a member. Alternatively you may set up Word so that it works with the interactive user
(faster to set-up, but less safe); information on setting up Word to use the interactive user
can be found at: http://support.microsoft.com/default.aspx?scid=kb;en-us;Q288366.
Performance of this Approach
At each Web request to the ASP page, Word is opened and closed, which can be costly; hence, this is best suited for sites with limited load. In my informal tests I've found that execution times for a three word queries is typically 0.5 seconds, once the objects are created (usually between 0.5 and 1 seconds). I have not run any formal performance benchmarks, but would be delighted if someone was interested in doing this and sharing their results.
Since many users will likely search using the same search queries, one potential optimization would be to save a search string when it has been spell checked and shown to contain no errors. Such a "correct" query could be stored in a database. When a user did a search, a quick check against this database table could be performed to determine if the spell checker needed to be invoked or not.
While using Microsoft Word server-side is not for everyone (be sure to read this Microsoft article first), but there are certain situations where the use of server-side Office automation is really helpful. Just be sure to do performance testing adequately before deploying your Office automation-based Web site, and be certain that you have the security issues setup properly to allow you to utilize Office automation through an ASP page.
Eric Blanpain, just turned 40, runs a company that markets scientific instrumentation, based in Paris, France. As a Marketing and Internet/e-commerce/ASP consultant he has helped many companies in the field expend their sales dramatically! Inquiries welcome.