When you think ASP, think...
Recent Articles
All Articles
ASP.NET Articles
ASPFAQs.com
Message Board
Related Web Technologies
User Tips!
Coding Tips

Sections:
Sample Chapters
Commonly Asked Message Board Questions
JavaScript Tutorials
MSDN Communities Hub
Official Docs
Security
Stump the SQL Guru!
XML Info
Information:
Feedback
Author an Article
ASP ASP.NET ASP FAQs Message Board Feedback
Print this page.
Published: Thursday, January 04, 2001

Efficiently Reading Large Text Files

By Bret Hern


For More Information...
For more information on the FileSystemObject, be sure to check out the FileSystemObject FAQs Category at ASPFAQs.com!

- continued -

Is there a big, honking text file standing between you and performance nirvana? Wondering how you can find the needle in that 10 MB haystack? About to give up on the FileSystemObject? What follows is a way to read those big files quickly enough to make Evelyn Wood jealous.

At relatively small sizes - ~100K or less - using the standard methods of the FileSystemObject and TextStream object to read in entire files are reasonably snappy. However, once the file sizes get into megabyte territory, the standard approaches begin to have, er, issues. Letís take a scenario where you need to read a text file to determine if a keyword is present. For the text file, I set up three test cases - a file size of 10 KB, 100 KB and 1,000 KB. In each test, the keyword to be found was placed at the tail end of the file.

Hereís the standard, "brute-force" method of loading up the entire file into a single buffer variable:

const ForReading = 1
dim strSearchThis
dim objFS
dim objTS
set objFS = Server.CreateObject("Scripting.FileSystemObject")
set objTS = objFS.OpenTextFile(Server.MapPath("myfile.txt"), _
                               ForReading)

strSearchThis = objTS.ReadAll
if instr(strSearchThis, "keyword") > 0 then
    Response.Write "Found it!"
end if

While this works fine at smaller file sizes, once we break the megabyte barrier, weíre looking at script timeouts. Notice the explosion in time required to complete the above task against the 1 MB file - only the most dedicated of users will hang around that long.

Test #1: Brute Force
(all times in seconds)
 10 KB File100 KB File 1000 KB File
TextStream ReadAll 0.010.6273.56

Clearly that wonít work for our large file search. You might then be tempted to simply parse the file line by line to get it into our search string, thinking that a hard-working loop might outperform the ReadAll method. You would be wrong. The following method, with just a simple string concatenation, is even slower:

const ForReading = 1
dim strSearchThis
dim objFS
dim objTS
set objFS = Server.CreateObject("Scripting.FileSystemObject")
set objTS = objFS.OpenTextFile(Server.MapPath("myfile.txt"), _
                               ForReading)

do until objTS.AtEndOfStream
  strSearchThis = strSearchThis & objTS.ReadLine
loop

if instr(strSearchThis, "keyword") > 0 then
  Response.Write "Found it!"
end if

Test #1: Standard Parse
(all times in seconds)
 10 KB File100 KB File 1000 KB File
Standard Parse 0.021.27162.44

It turns out that string concatenation is one of the slower operations in the engine, and this methodís performance reflects that. (Of course, if you could count on the keyword being fully contained on one line, and you had no other value for the file beyond this one check, you could simply parse the file and perform the INSTR check on each line of the loop without taking the concatenation hit. That approach would be extremely fast, but itís a bit of a cheat for the topic at hand, so letís move on.)

Now, there is a way, using an extremely counterintuitive dynamic array-building approach, to build up a searchable array that results in very fast performance. Despite what youíve always heard about REDIMing arrays as a bad idea, it turns out that the array processing overhead is minuscule compared to the string concatenation issues noted above. Hereís how this approach lays out:

const ForReading = 1
dim strSearchThis
redim arrSearchThis(-1)
dim i
dim objFS
dim objTS
set objFS = Server.CreateObject("Scripting.FileSystemObject")
set objTS = objFS.OpenTextFile(Server.MapPath("myfile.txt"), _
                               ForReading)

i = 0
do until objTS.AtEndOfStream
  redim preserve arrSearchThis(i)
  arrSearchThis(i) = objTS.ReadLine
  i = i + 1
loop

strSearchThis = join(arrSearchThis, VbCrLf)
if instr(strSearchThis, "keyword") > 0 then
  Response.Write "Found it!"
end if

Test #1: Redimmed Array
(all times in seconds)
 10 KB File100 KB File 1000 KB File
Redimmed Array 0.020.152.05

Not bad, eh? I generally stop when I get a 30 or 40-fold performance improvement, but as every good infomercial commands, wait, thereís more!

Besides the more commonly used ReadAll and ReadLine methods, the TextStream object also supports a Read(n) method, where n is the number of bytes in the file/textstream in question. By instantiating an additional object (a file object), we can obtain the size of the file to be read, and then use the Read(n) method to race through our file. As it turns out, the "read bytes" method is extremely fast by comparison:

const ForReading = 1
const TristateFalse = 0
dim strSearchThis
dim objFS
dim objFile
dim objTS
set objFS = Server.CreateObject("Scripting.FileSystemObject")
set objFile = objFS.GetFile(Server.MapPath("myfile.txt"))
set objTS = objFile.OpenAsTextStream(ForReading, TristateFalse)

strSearchThis = objTS.Read(objFile.Size)

if instr(strSearchThis, "keyword") > 0 then
    Response.Write "Found it!"
end if

Test #1: Read Bytes
(all times in seconds)
 10 KB File100 KB File 1000 KB File
Read Bytes 0.010.030.28

A pretty good dayís work. We started at over a minute to perform this read/search, and weíre now down well under a second. While there would be some minor additional overhead associated with the additional object, the massive speed improvement would in most cases be an appropriate tradeoff. Wrap a function declaration around this snippet and you've got another good tool for the toolbox!

Test Summary
 10 KB File100 KB File 1000 KB File
TextStream ReadAll 0.010.6273.56
Standard Parse 0.021.27162.44
Redimmed Array 0.020.152.05
Read Bytes 0.010.030.28

Test Conditions...
All tests were performed on an otherwise idle webserver configured with 128 MB RAM, a single 450 MHz Pentium II processor, running Windows 2000 Advanced Server (IIS V5.0). The test timings were done with the VBScript Timer function, meaning that at the low-end extremes (the 10 KB File readings), it would be imprudent to read too much into the 100ths of second differences between methods. All timings included both setup tasks (variable dimensioning) and shutdown tasks (object destruction). (For information on timing the execution of ASP scripts, be sure to read: Timing ASP Execution Using a Profiling Component and Timing the Execution of Your ASP Scripts!)

  • By Bret Hern

    Credits: Billy Monroe asked the question in the microsoft.public.scripting.vbscript newsgroup that got this ball rolling, Bill James brought forward the "Redimmed Array" approach, and Al Dunbar joined me in wondering aloud about the relative speed of the Read(n) function. This article wouldn't have happened without them.


  • ASP.NET [1.x] [2.0] | ASPMessageboard.com | ASPFAQs.com | Advertise | Feedback | Author an Article