This tip comes from Lee Jae
Introduction |
---|
The situation: you've got a database containing information about each employee in your company. You
decide to provide a search interface, allowing fellow employees to search for information on their
coworkers by their last or first names. The only problem? People are searching for terms like Mike,
and no results are being returned because, perhaps, Mike's name is stored as Michael in the database.
Urg. Ideally, you'd like to return results that sound like Mike, but how can you do this? Actually,
it's quite simple with SQL Server's built in SOUNDEX option! Read on to learn more!
|
I've encountered a situation where client wanted a utility that finds a matches of similar sounding
names. For example, when typing "BILL", it needs to also find "WILL" or "WILLIAM". This is difficult
because a computer has no innate concept of hearing - a computer is good at black and white comparisons
(x = y
), but usually pretty bad at fuzzy ones (is x
"like" y
?)
Initially, in a quest to solve this problem, I attempted to use the LIKE
SQL
statement; however, such an approach yielded unwanted results. For example, when using LIKE %ANA%
,
words like tatiANA
are returned as well.
The solution, I soon discovered, was SQL Server's SOUNDEX
statement. What this statement
does is to find similar "sounding" names in the database by ignoring all vowels and converting the
strings into four-digit codes where comparison can be made.
In order to show the usefulness of SOUNDEX
, let's look at a simple example that searches
for records in the following data table:
|
Essentially, the above query states: "Give me all rows whose FirstName
column sounds like
"John". In creating a search interface, you'd have the user-entered query inserted where we have the
hard-coded value of "John". The above query would return two rows: John Badagliacca
and Jean-Pierre Blaise
.
Another option available is to use the DIFFERENCE
keyword. The DIFFERENCE
keyword returns a value from 1
to 4
, indicating how closely one word
matches another phonetically (1 being loosely similar and 4 being very similar). Using this approach
you could specify a threshold of, say, 3
, or perhaps even let the user define his or
her own threshold. To learn more about DIFFERENCE
and SOUNDEX
read:
SOUNDEX String Comparison.
Happy Programming!
Return to user tips... |