When you think ASP, think...
Recent Articles
All Articles
ASP.NET Articles [1.x] [2.0]
ASPFAQs.com
Message Board
Related Web Technologies
User Tips!
Coding Tips
Search

Sections:
Book Reviews
Sample Chapters
Commonly Asked Message Board Questions
Headlines from ASPWire.com
JavaScript Tutorials
MSDN Communities Hub
Official Docs
Security
Stump the SQL Guru!
Web Hosts
XML Info
Information:
Advertise
Feedback
Author an Article
Technology Jobs

















internet.com
IT
Developer
Internet News
Small Business
Personal Technology
International

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers
ASP ASP.NET ASP FAQs Message Board Feedback ASP Jobs
Print this page.

Windows Systems Administrator
Jupitermedia
US-CT-Darien

Justtechjobs.com Post A Job | Post A Resume

Published: Thursday, January 04, 2001

Efficiently Reading Large Text Files
By Bret Hern


For More Information...
For more information on the FileSystemObject, be sure to check out the FileSystemObject FAQs Category at ASPFAQs.com!

- continued -

Is there a big, honking text file standing between you and performance nirvana? Wondering how you can find the needle in that 10 MB haystack? About to give up on the FileSystemObject? What follows is a way to read those big files quickly enough to make Evelyn Wood jealous.

At relatively small sizes - ~100K or less - using the standard methods of the FileSystemObject and TextStream object to read in entire files are reasonably snappy. However, once the file sizes get into megabyte territory, the standard approaches begin to have, er, issues. Let’s take a scenario where you need to read a text file to determine if a keyword is present. For the text file, I set up three test cases - a file size of 10 KB, 100 KB and 1,000 KB. In each test, the keyword to be found was placed at the tail end of the file.

Here’s the standard, "brute-force" method of loading up the entire file into a single buffer variable:

const ForReading = 1
dim strSearchThis
dim objFS
dim objTS
set objFS = Server.CreateObject("Scripting.FileSystemObject")
set objTS = objFS.OpenTextFile(Server.MapPath("myfile.txt"), _
                               ForReading)

strSearchThis = objTS.ReadAll
if instr(strSearchThis, "keyword") > 0 then
    Response.Write "Found it!"
end if

While this works fine at smaller file sizes, once we break the megabyte barrier, we’re looking at script timeouts. Notice the explosion in time required to complete the above task against the 1 MB file - only the most dedicated of users will hang around that long.

Test #1: Brute Force
(all times in seconds)
 10 KB File100 KB File 1000 KB File
TextStream ReadAll 0.010.6273.56

Clearly that won’t work for our large file search. You might then be tempted to simply parse the file line by line to get it into our search string, thinking that a hard-working loop might outperform the ReadAll method. You would be wrong. The following method, with just a simple string concatenation, is even slower:

const ForReading = 1
dim strSearchThis
dim objFS
dim objTS
set objFS = Server.CreateObject("Scripting.FileSystemObject")
set objTS = objFS.OpenTextFile(Server.MapPath("myfile.txt"), _
                               ForReading)

do until objTS.AtEndOfStream
  strSearchThis = strSearchThis & objTS.ReadLine
loop

if instr(strSearchThis, "keyword") > 0 then
  Response.Write "Found it!"
end if

Test #1: Standard Parse
(all times in seconds)
 10 KB File100 KB File 1000 KB File
Standard Parse 0.021.27162.44

It turns out that string concatenation is one of the slower operations in the engine, and this method’s performance reflects that. (Of course, if you could count on the keyword being fully contained on one line, and you had no other value for the file beyond this one check, you could simply parse the file and perform the INSTR check on each line of the loop without taking the concatenation hit. That approach would be extremely fast, but it’s a bit of a cheat for the topic at hand, so let’s move on.)

Now, there is a way, using an extremely counterintuitive dynamic array-building approach, to build up a searchable array that results in very fast performance. Despite what you’ve always heard about REDIMing arrays as a bad idea, it turns out that the array processing overhead is minuscule compared to the string concatenation issues noted above. Here’s how this approach lays out:

const ForReading = 1
dim strSearchThis
redim arrSearchThis(-1)
dim i
dim objFS
dim objTS
set objFS = Server.CreateObject("Scripting.FileSystemObject")
set objTS = objFS.OpenTextFile(Server.MapPath("myfile.txt"), _
                               ForReading)

i = 0
do until objTS.AtEndOfStream
  redim preserve arrSearchThis(i)
  arrSearchThis(i) = objTS.ReadLine
  i = i + 1
loop

strSearchThis = join(arrSearchThis, VbCrLf)
if instr(strSearchThis, "keyword") > 0 then
  Response.Write "Found it!"
end if

Test #1: Redimmed Array
(all times in seconds)
 10 KB File100 KB File 1000 KB File
Redimmed Array 0.020.152.05

Not bad, eh? I generally stop when I get a 30 or 40-fold performance improvement, but as every good infomercial commands, wait, there’s more!

Besides the more commonly used ReadAll and ReadLine methods, the TextStream object also supports a Read(n) method, where n is the number of bytes in the file/textstream in question. By instantiating an additional object (a file object), we can obtain the size of the file to be read, and then use the Read(n) method to race through our file. As it turns out, the "read bytes" method is extremely fast by comparison:

const ForReading = 1
const TristateFalse = 0
dim strSearchThis
dim objFS
dim objFile
dim objTS
set objFS = Server.CreateObject("Scripting.FileSystemObject")
set objFile = objFS.GetFile(Server.MapPath("myfile.txt"))
set objTS = objFile.OpenAsTextStream(ForReading, TristateFalse)

strSearchThis = objTS.Read(objFile.Size)

if instr(strSearchThis, "keyword") > 0 then
    Response.Write "Found it!"
end if

Test #1: Read Bytes
(all times in seconds)
 10 KB File100 KB File 1000 KB File
Read Bytes 0.010.030.28

A pretty good day’s work. We started at over a minute to perform this read/search, and we’re now down well under a second. While there would be some minor additional overhead associated with the additional object, the massive speed improvement would in most cases be an appropriate tradeoff. Wrap a function declaration around this snippet and you've got another good tool for the toolbox!

Test Summary
 10 KB File100 KB File 1000 KB File
TextStream ReadAll 0.010.6273.56
Standard Parse 0.021.27162.44
Redimmed Array 0.020.152.05
Read Bytes 0.010.030.28

Test Conditions...
All tests were performed on an otherwise idle webserver configured with 128 MB RAM, a single 450 MHz Pentium II processor, running Windows 2000 Advanced Server (IIS V5.0). The test timings were done with the VBScript Timer function, meaning that at the low-end extremes (the 10 KB File readings), it would be imprudent to read too much into the 100ths of second differences between methods. All timings included both setup tasks (variable dimensioning) and shutdown tasks (object destruction). (For information on timing the execution of ASP scripts, be sure to read: Timing ASP Execution Using a Profiling Component and Timing the Execution of Your ASP Scripts!)

  • By Bret Hern

    Credits: Billy Monroe asked the question in the microsoft.public.scripting.vbscript newsgroup that got this ball rolling, Bill James brought forward the "Redimmed Array" approach, and Al Dunbar joined me in wondering aloud about the relative speed of the Read(n) function. This article wouldn't have happened without them.


    Windows Internet Technology | ASP.NET [1.x] [2.0] | ASPMessageboard.com | ASPFAQs.com | Advertise | Feedback | Author an Article



  • JupiterOnlineMedia

    internet.comearthweb.comDevx.commediabistro.comGraphics.com

    Search:

    Jupitermedia Corporation has two divisions: Jupiterimages and JupiterOnlineMedia

    Jupitermedia Corporate Info


    Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy.

    Advertise | Newsletters | Tech Jobs | Shopping | E-mail Offers

    Solutions
    Whitepapers and eBooks
    Microsoft Article: Will Hyper-V Make VMware This Decade's Netscape?
    Microsoft Article: 7.0, Microsoft's Lucky Version?
    Microsoft Article: Hyper-V--The Killer Feature in Windows Server 2008
    Avaya Article: How to Feed Data into the Avaya Event Processor
    Microsoft Article: Install What You Need with Windows Server 2008
    HP eBook: Putting the Green into IT
    Whitepaper: HP Integrated Citrix XenServer for HP ProLiant Servers
    Intel Go Parallel Portal: Interview with C++ Guru Herb Sutter, Part 1
    Intel Go Parallel Portal: Interview with C++ Guru Herb Sutter, Part 2--The Future of Concurrency
    Avaya Article: Setting Up a SIP A/S Development Environment
    IBM Article: How Cool Is Your Data Center?
    Microsoft Article: Managing Virtual Machines with Microsoft System Center
    HP eBook: Storage Networking , Part 1
    Microsoft Article: Solving Data Center Complexity with Microsoft System Center Configuration Manager 2007
    MORE WHITEPAPERS, EBOOKS, AND ARTICLES
    Webcasts
    Intel Video: Are Multi-core Processors Here to Stay?
    On-Demand Webcast: Five Virtualization Trends to Watch
    HP Video: Page Cost Calculator
    Intel Video: APIs for Parallel Programming
    HP Webcast: Storage Is Changing Fast - Be Ready or Be Left Behind
    Microsoft Silverlight Video: Creating Fading Controls with Expression Design and Expression Blend 2
    MORE WEBCASTS, PODCASTS, AND VIDEOS
    Downloads and eKits
    Sun Download: Solaris 8 Migration Assistant
    Sybase Download: SQL Anywhere Developer Edition
    Red Gate Download: SQL Backup Pro and free DBA Best Practices eBook
    Red Gate Download: SQL Compare Pro 6
    Iron Speed Designer Application Generator
    MORE DOWNLOADS, EKITS, AND FREE TRIALS
    Tutorials and Demos
    How-to-Article: Preparing for Hyper-Threading Technology and Dual Core Technology
    eTouch PDF: Conquering the Tyranny of E-Mail and Word Processors
    IBM Article: Collaborating in the High-Performance Workplace
    HP Demo: StorageWorks EVA4400
    Intel Featured Algorhythm: Intel Threading Building Blocks--The Pipeline Class
    Microsoft How-to Article: Get Going with Silverlight and Windows Live
    MORE TUTORIALS, DEMOS AND STEP-BY-STEP GUIDES