When you think ASP, think...
Recent Articles
All Articles
ASP.NET Articles
ASPFAQs.com
Message Board
Related Web Technologies
User Tips!
Coding Tips

Sections:
Sample Chapters
Commonly Asked Message Board Questions
JavaScript Tutorials
MSDN Communities Hub
Official Docs
Security
Stump the SQL Guru!
XML Info
Information:
Feedback
Author an Article
ASP ASP.NET ASP FAQs Message Board Feedback

The 4 Guys Present: ASPFAQs.com

Jump to a FAQ
Enter FAQ #:
..or see our 10 Most Viewed FAQs.

4GuysFromRolla.com : ASP FAQS : Strings


Question:

How can I count the total number of words that appear in a string?


[Print this FAQ]

Answer: This FAQ is a follow up to a previous FAQ on ASPFAQs.com. In the FAQ: "How can I remove multiple spaces between words in a string? That is, if I have: Hi    there how can I get: Hi there?" a method for removing extraneous spaces within a string was examined. That FAQ prompted a number of readers to write in asking the follow up question: "How can I count the total number of words in a string?" This FAQ illustrates how to accomplish this!

There are many scenarios in which you may wish to be able to count the number of words in a string. For example, image that you run a Web site with a classified section and you restrict users to posting a classified ad with only, say, 200 words (or perhaps you charge for the ad based on the number of words in the ad).

As with "removing extraneous spaces in a string there are a number of ways to count the words in a string. One method involves using split to turn the string into an array. Basically you are just using the VBScript split function to delimit on the space character. (To learn more about split be sure to read Parsing with join and split.) So, if you have the string:

Dim str
str = "Today is a great day indeed, Bob."

And you use split to break it down into an array like so:

Dim aWords
aWords = split(str, " ")

The array aWords would have the following elements:

aWords(0) == "Today"
aWords(1) == "is"
aWords(2) == "a"
aWords(3) == "great"
aWords(4) == "day"
aWords(5) == "indeed,"
aWords(6) == "Bob."

So, to get the total number of words all you would have to do is use UBound(aWords) + 1 (you need to add one since UBound(aWords) would return 6 since the array is indexed at zero). Things get a little more complex with this technique if your sentence has multiple spaces in the string, like:

Dim str
str = "Hi.  How are you?"

Note that there are two spaces between "Hi." and "How are you?" When using split this will return the array as:

aWords(0) == "Hi."
aWords(1) == ""
aWords(2) == "How"
aWords(3) == "are"
aWords(4) == "you?"

Ah! It's counting the two spaces as a single word (see aWords(1)). To compensate for this we would need to strip out all of the extraneous spaces in the string before applying the split solution. Fortunately there is a previous FAQ demonstrating how to remove extraneous spaces in a string: How can I remove multiple spaces between words in a string? Using the code presented in that FAQ, we have:

Dim str
str = "Hi. How are you?"

'Start by trimming leading/trailing spaces
str = Trim(str)

'Now, while we have 2 consecutive spaces, replace them
'with a single space...
Do While InStr(1, str, "  ")
  str = Replace(str, "  ", " ")
Loop

Dim aWords
aWords = split(str, " ")
Response.Write "There are " & UBound(aWords) + 1 & " words in " & str

Neat, eh? There is, however, a much cleaner way for counting the number of words in a string and it involves regular expressions. (For more information on regular expressions be sure to visit the Regular Expressions Article Index!) The regular expression to count the number of words in a string uses the non-greedy repitition pattern matching symbol. This special symbol is only available in the regular expression engine that ships with the Microsoft Scripting Engines version 5.5 or greater. To learn more about this special non-greedy matching symbol be sure to read: Picking Out Delimited Text with Regular Expressions.

To count the number of words in a sentence our regular expression should search for one or more word characters surrounded by word boundaries. Word boundaries represent the beginning or end of a word. They can be spaces or punctuation. For example, the string "Hello, how are you?" has two word boundaries around each word. The first occurs right before the first letter of the string, the second right before the comma after "Hello", the next is right before the "h" in "how," and so on. Regular expressions have a special character when searching for a word boundary: \b. Since we are looking for one or more word characters between word boundaries, our regular expression is:

\b(\w+?)\b

The \w character translates to any word character (any alphanumeric character); the + means match one or more such characters; the ? means to apply the non-greedy search, which basically means match the fewest number of characters that appear between two word boundaries. So, in plain English, the regular expression states: "Match one or more word characters between word boundaries."

Unfortunately apostrophes count as word boundaries, meaning the string:

I'm funny.

Will be counted as three words: I, m, and funny. So... how can we fix this? It's a bit of a hack, but in the Execute function we can replace all aposotrphe's with blank strings. Examine the example below to see how this is done.

Once we Execute this regular expression, we simply need to count the number of Matches returned and that will let us know how many words are in our string. An example can be seen below:

dim regex
set regex = new RegExp
regex.IgnoreCase = True
regex.Global = True
regex.Pattern = "\b(\w+?)\b"

'Remember to remove all apostrophes in str!
'Note the Replace statement in the Execute function
Response.Write "<p>There are " & _
                FormatNumber(regex.Execute(Replace(str,"'","")).Count, 0) & _
                " words in your sentence(s): """ & _
                str & "
"".<p>"

View a live demo!!

Personally I prefer the regular expression way: it's compact code and doesn't require any messy looping code. (Of course there is a third way this could be done: with gratuitous use of InStrs, Mids, and a plethora of VBScript's other string operators. I prefer these approaches for their cleanliness and readability.)

Happy Programming!


FAQ posted by Scott Mitchell at 4/7/2001 8:08:34 PM to the Strings category. This FAQ has been viewed 88,303 times.

Do you have a FAQ you'd like to suggest? Suggestions? Comments? If so, send it in! Also, if you'd like to be a FAQ Admin (creating/editing FAQs), let me know! If you are looking for other FAQs, be sure to check out the 4Guys FAQ and Commonly Asked Messageboard Questions!

Most Viewed FAQs:

1.) How can I format numbers and date/times using ASP.NET? For example, I want to format a number as a currency. (761643 views)
2.) I am using Access and getting a 80004005 error (or a [Microsoft][ODBC Microsoft Access Driver] The Microsoft Jet database engine cannot open the file '(unknown)' error) when trying to open a connection! How can I fix this problem? (207777 views)
3.) How can I convert a Recordset into an array? Also, how can I convert an array into a Recordset? (202549 views)
4.) How can I quickly sort a VBScript array? (196039 views)
5.) How can I find out if a record already exists in a database? If it doesn't, I want to add it. (156019 views)
6.) How do I display data on a web page using arrays instead of Do...While...MoveNext...???... (152331 views)
7.) When I get a list of all files in a directory via the FileSystemObject, they aren't ordered in any reasonable way. How can I sort the files by name? Or by size? Or by date created? Or... (140381 views)
8.) For session variables to work, must the Web visitor have cookies enabled? (110162 views)
9.) Can I send emails without using CDONTS? (107083 views)
10.) How can I take the result of a SELECT...MULTIPLE or a group of same-named checkboxes and turn it into a query? That is, if the user selects 3 answers, how can I construct a query that looks for all 3? (106308 views)
Last computed at 9/17/2007 3:22:00 AM


ASP.NET [1.x] [2.0] | ASPMessageboard.com | ASPFAQs.com | Advertise | Feedback | Author an Article