||This FAQ is a follow up to a previous FAQ on ASPFAQs.com. In the FAQ: "How can I remove multiple spaces between words in a string? That is, if I have: |
Hi there how can I get:
Hi there?" a method for removing extraneous spaces within a string was examined. That FAQ prompted a number of readers to write in asking the follow up question: "How can I count the total number of words in a string?" This FAQ illustrates how to accomplish this!
There are many scenarios in which you may wish to be able to count the number of words in a string. For example, image that you run a Web site with a classified section and you restrict users to posting a classified ad with only, say, 200 words (or perhaps you charge for the ad based on the number of words in the ad).
As with "removing extraneous spaces in a string there are a number of ways to count the words in a string. One method involves using
split to turn the string into an array. Basically you are just using the VBScript
split function to delimit on the space character. (To learn more about
split be sure to read Parsing with
split.) So, if you have the string:
str = "Today is a great day indeed, Bob."
And you use
split to break it down into an array like so:
aWords = split(str, " ")
aWords would have the following elements:
aWords(0) == "Today"
aWords(1) == "is"
aWords(2) == "a"
aWords(3) == "great"
aWords(4) == "day"
aWords(5) == "indeed,"
aWords(6) == "Bob."
So, to get the total number of words all you would have to do is use
UBound(aWords) + 1 (you need to add one since
UBound(aWords) would return
6 since the array is indexed at zero). Things get a little more complex with this technique if your sentence has multiple spaces in the string, like:
str = "Hi. How are you?"
Note that there are two spaces between "Hi." and "How are you?" When using
split this will return the array as:
aWords(0) == "Hi."
aWords(1) == ""
aWords(2) == "How"
aWords(3) == "are"
aWords(4) == "you?"
Ah! It's counting the two spaces as a single word (see
aWords(1)). To compensate for this we would need to strip out all of the extraneous spaces in the string before applying the split solution. Fortunately there is a previous FAQ demonstrating how to remove extraneous spaces in a string: How can I remove multiple spaces between words in a string? Using the code presented in that FAQ, we have:
str = "Hi. How are you?"
'Start by trimming leading/trailing spaces
str = Trim(str)
'Now, while we have 2 consecutive spaces, replace them
'with a single space...
Do While InStr(1, str, " ")
str = Replace(str, " ", " ")
aWords = split(str, " ")
Response.Write "There are " & UBound(aWords) + 1 & " words in " & str
Neat, eh? There is, however, a much cleaner way for counting the number of words in a string and it involves regular expressions. (For more information on regular expressions be sure to visit the Regular Expressions Article Index!) The regular expression to count the number of words in a string uses the non-greedy repitition pattern matching symbol. This special symbol is only available in the regular expression engine that ships with the Microsoft Scripting Engines version 5.5 or greater. To learn more about this special non-greedy matching symbol be sure to read: Picking Out Delimited Text with Regular Expressions.
To count the number of words in a sentence our regular expression should search for one or more word characters surrounded by word boundaries. Word boundaries represent the beginning or end of a word. They can be spaces or punctuation. For example, the string "Hello, how are you?" has two word boundaries around each word. The first occurs right before the first letter of the string, the second right before the comma after "Hello", the next is right before the "h" in "how," and so on. Regular expressions have a special character when searching for a word boundary:
\b. Since we are looking for one or more word characters between word boundaries, our regular expression is:
\w character translates to any word character (any alphanumeric character); the
+ means match one or more such characters; the
? means to apply the non-greedy search, which basically means match the fewest number of characters that appear between two word boundaries. So, in plain English, the regular expression states: "Match one or more word characters between word boundaries."
Unfortunately apostrophes count as word boundaries, meaning the string:
Will be counted as three words:
funny. So... how can we fix this? It's a bit of a hack, but in the
Execute function we can replace all aposotrphe's with blank strings. Examine the example below to see how this is done.
Execute this regular expression, we simply need to count the number of
Matches returned and that will let us know how many words are in our string. An example can be seen below:
set regex = new RegExp
regex.IgnoreCase = True
regex.Global = True
regex.Pattern = "\b(\w+?)\b"
'Remember to remove all apostrophes in str!
'Note the Replace statement in the Execute function
Response.Write "<p>There are " & _
FormatNumber(regex.Execute(Replace(str,"'","")).Count, 0) & _
" words in your sentence(s): ""
" & _"".<p>"
str & "
View a live demo!!
Personally I prefer the regular expression way: it's compact code and doesn't require any messy looping code. (Of course there is a third way this could be done: with gratuitous use of
Mids, and a plethora of VBScript's other string operators. I prefer these approaches for their cleanliness and readability.)