To read the article online, visit http://www.4GuysFromRolla.com/webtech/041206-1.shtml

Efficiently Paging Through Large Result Sets in SQL Server 2000

By Scott Mitchell


A Major Improvement in Performance
A couple of alert readers have written in to provide some tips on improving the performance of this approach. Greg Hamilton has been kind enough to write up an article exploring various improvements that can speed up my approach by more than an order of magnitude. In short, don't use the stored procedure discussed in this article, but rather check out Greg's advice in A More Efficient Method for Paging Through Large Result Sets.

Introduction
When displaying large amounts of data to a user, it's important that the information be presented in digestible chunks rather than slamming it all down their throats at once. For example, searching for "Microsoft" on Google returns approximately 2,190,000,000 results, but thankfully Google only shows me ten of those at a time. When providing paging access to particularly large result sets - tables with tens of thousands, hundreds of thousands, or millions of records - care must be taken in querying the data such that only the particular page of data requested is returned.

Both the DataGrid in ASP.NET 1.x and GridView in ASP.NET 2.0 providing two paging flavors:

  • Default paging - easy to implement but naively retrieves all records from the database and then trims the results to show only those for the requested page
  • Custom paging - requires the developer to correctly retrieve only the precise subset of records to show for the current page; requires more effort to implement
While default paging is quick and easy to implement, for sufficiently-sized result sets custom paging is a must. In March 2006 I authored a couple of articles here on 4Guys on how to efficiently page through large result sets using custom paging in the GridView with the ObjectDataSource and SQL Server 2005's new ROW_NUMBER() keyword. This new keyword simplifies efficiently retrieving a particular subset of data ordered by row number. After authoring these articles, I received many requests from readers for a look at how to implement this type of efficient custom paging using SQL Server 2000 (which lacks the ROW_NUMBER() keyword).

A previous article on 4Guys, Paging through Records using a Stored Procedure by Daniel Anderson, provides one approach that far outperforms the default paging implementation. However, it has a few areas that can be updated to improve the performance. This article looks at updating Daniel's stored procedure to provide an even-more efficient approach. The stored procedure presented at the end of this article can be used for classic ASP applications, custom paging with the DataGrid in ASP.NET 1.x, or used by the ObjectDataSource to provide custom paging for the GridView in ASP.NET 2.0 applications. Read on to learn more!

Examining Daniel's Approach
In Paging through Records using a Stored Procedure, Daniel Anderson provided a stored procedure named sp_PagedResults that accepted two integer input parameters - @Page and @RecsPerPage - and returned those records from an underlying database table that belonged to the specified page of date. For example, if you wanted to show the second page of data, where a "page" is 10 records, you'd simply call sp_PagedRecords passing in 2 and 10 as values for @Page and @RecsPerPage.

Grabbing a page of data from a table that has a sequentially increasing unique ID without any "holes" is a cinch and can be done using a simple WHERE clause. For example, if we have data in a table that looks like:

Employees
EmployeeIDLastName FirstNameDepartmentID SalaryHireDate
1MitchellScott.........
2LeeSam.........
3SmithHans.........
4JohnsonErnie.........
5GonzalezLaura.........
6WangHugo.........
7JacksonTito.........
8MaherTodd.........
..................

Since there are no gaps or holes in the EmployeeID values, we can easily get a particular page of data using the generic query:

SELECT ...
FROM Employees
WHERE EmployeeID BETWEEN (page * recsPerPage) + 1 AND (page * recsPerPage) + recsPerPage

This assumes that page is indexed starting at zero, meaning that page would be 0 for the first page of data, 1 for the second, and so on. Both the DataGrid and GridView start the page index at zero.

For example, if we wanted to show three employees per page, and wanted the first page of data, we'd do:

SELECT ...
FROM Employees
WHERE EmployeeID BETWEEN (0 * 3) + 1 AND (0 * 3) + 3 -- Which is tantamount to... SELECT ...
FROM Employees
WHERE EmployeeID BETWEEN 1 AND 3

Which would return the first three employees. Requesting to see the second page, we'd have:

SELECT ...
FROM Employees
WHERE EmployeeID BETWEEN (1 * 3) + 1 AND (1 * 3) + 3 -- Which is tantamount to... SELECT ...
FROM Employees
WHERE EmployeeID BETWEEN 4 AND 6

Which would return employees with EmployeeIDs 4, 5, and 6 - the second page of data.

This approach works great if there are no holes in primary key column values of the data being paged through. But holes can be created when records are deleted. For example, if Sam Lee and Hans Smith were fired, our Employees table would now look like:

Employees
EmployeeIDLastName FirstNameDepartmentID SalaryHireDate
1MitchellScott.........
4JohnsonErnie.........
5GonzalezLaura.........
6WangHugo.........
7JacksonTito.........
8MaherTodd.........
..................

We now have a hole from EmployeeID 1 to 4 and the simple algorithm examined a moment ago will fail.

To accommodate for these holes, Daniel Anderson's stored procedure creates a synthetic increasing ID. It does so by with the following algorithm:

  1. Create a temporary table that has a column for each column in the data to page through along with a column named ID that's marked as an IDENTITY column. An IDENTITY column is one whose value is auto-incremented with each record inserted.
  2. Dump the entire contents of the table to page through into the temp table. This will set the values in the ID column to 1, 2, 3, ...
  3. Apply the earlier, simplistic formula to a SELECT query against the temp table and its ID column
You can learn more about Daniel's stored procedure in Paging through Records using a Stored Procedure; the stored procedure's contents can be viewed in their entirety here.

At first glance it may seem like such an approach would be no better than just returning all of the records. After all, we're still working with all of the records in Employees when dumping them into the temp table. While this is true, Daniel's approach greatly reduces the traffic between the database server and web server. While both approaches work with the Employees table's data in its entirety, the stored procedure only returns 10 records (or however many records per page are configured).

Modernizing Daniel's Stored Procedure
Before we look at improving the performance of Daniel's stored procedure, let's first modernize it a bit. The stored procedure was written back in 1999 and uses some techniques that aren't "best practices." For one, the stored procedure name is prefixed with sp_, which causes SQL Server to first search the master database. This only impacts a slight performance impact, but why not clean it up here while we have the chance? (See Should I Use the sp_ Prefix for Procedure Names? for a more thorough discussion on this topic.)

Second, rather than using a temporary table, let's use a table variable. Moreover, Microsoft recommends: "use table variables whenever possible except when there is a significant volume of data and there is repeated use of the table." Since this stored procedure just dumps data into the temp table once, reads from it, and then is done with it, using a table variable would adhere to Microsoft's recommendation.

Finally, in ASP.NET 2.0 the custom paging logic used by the ObjectDataSource and GridView defines the records to access not by page index and records per page, but by the start row index and the maximum number of rows to return. Therefore, let's augment the stored procedure to take in the integer input parameters @startRowIndex and @maximumRows rather than @Page and @RecsPerPage. This has the nice side-effect of simplifying the formula used in the final query's WHERE clause.

Improving the Stored Procedure's Performance
Daniel's approach is costly in part because it dumps all of the columns from the table to be paged through into the temp table. This requires a complete table scan on the data to be paged through. Instead, we can create the table variable to hold only the primary key column(s) of the table to be paged through. Since these records likely exist in a clustered index, populating the table variable likely be done entirely from memory, without needing to hit the disk.

The remaining columns in the table whose data is being paged through can be accessed and returned in the stored procedure's final query, which grabs the particular subset of records from the table variable. After modernizing and optimizing Daniel's original stored procedure we end up with:

CREATE PROCEDURE PagedResults_New
(
   @startRowIndex      int,
   @maximumRows      int
)
AS


--Create a table variable
DECLARE @TempItems TABLE
(
   ID int IDENTITY,
   EmployeeID int
)

-- Insert the rows from tblItems into the temp. table
INSERT INTO @TempItems (EmployeeID)
SELECT EmployeeID
FROM Employees

-- Now, return the set of paged records
SELECT e.*, d.[Name] as DepartmentName
FROM @TempItems t
   INNER JOIN Employees e ON
      e.EmployeeID = t.EmployeeID
   INNER JOIN Departments d ON
      d.DepartmentID = e.DepartmentID
WHERE ID BETWEEN @startRowIndex AND (@startRowIndex + @maximumRows) - 1
GO

A Major Improvement in Performance
A couple of alert readers have written in to provide some tips on improving the performance of this approach. Greg Hamilton has been kind enough to write up an article exploring various improvements that can speed up my approach by more than an order of magnitude. In short, don't use the stored procedure discussed in this article, but rather check out Greg's advice in A More Efficient Method for Paging Through Large Result Sets.

Comparing the Performance of the Old and New Approach
To ascertain the improvements in performance by restricting the table variable to only including the primary key column(s) from the table to be paged through I used SQL Profiler and compared the different queries on the Employees table with different record counts. Keep in mind that these results are highly unscientific. I ran each of them numerous times to "warm them up" and then recorded nine test runs and computed the average. But still, I was doing this on my personal machine while running a variety of other programs. Additionally, I did not simulate any load that multiple, concurrent visitors to a website would incur. The results follow and are also avaiable at the end of this article as an Excel spreadsheet. In the New stored procedure results, the percentage improvement over the old approach is included for reference:

 Records
in
Employees
Avg. ReadsAvg. Duration
(in milliseconds)
Old50,00055,360425
 100,000110,492850
 200,000220,7261,567
New50,00051,637
6.7% improvement
307
27.7% improvement
 100,000103,028
6.7% improvement
536
36.9% improvement
 200,000205,876
6.7% improvement
1,033
34.1% improvement

During my testing I wondered if specifying the table variable's ID column as a primary key would boost performance. Marking the ID column as a primary key would create a clustered index, meaning that the results would be sorted by the ID column and the index could be scanned when finding correct subset of records in the final query. However, the index would incur additional overhead when dumping the contents of the table to be paged through into the table variable. The spreadsheet at the end of this article shows the numbers, but in my tests marking ID column as a primary key exhibited worse performance than not, although it was still better than the "old" approach of dumping the entire table contents into the temp table (the "old" approach).

While this approach is, perhaps, acceptable for paging through large result sets in SQL Server 2000, as illustrated in Custom Paging in ASP.NET 2.0 with SQL Server 2005, SQL Server 2005's ROW_NUMBER() technique affords much better performance, taking on average a mere 3 milliseconds on average to return 10 records from a total table of 50,000. Compare that to the approach examined in this article, which consumed 307 milliseconds - two orders of magnitude worse!

Before you swear off the approach discussed in this article, keep in mind that bringing back all the records takes considerably longer than dumping them into a table variable. From the Custom Paging in ASP.NET 2.0 with SQL Server 2005 article, paging through 50,000 records using the GridView's default paging, which retrieves all records from the Employees table, took on average 1,411 milliseconds. Also, since the ROW_NUMBER() option is a feature new to SQL Server 2005, if you're still using SQL Server 2000 you're stuck with the table variable technique.

Conclusion
In this article we looked at how to retrieve a specific page of data from a table using a stored procedure. This technique is useful for providing efficient, custom paging through large result sets with the ASP.NET 1.x DataGrid and ASP.NET 2.0 GridView. This technique can also be used in classic ASP applications or any other data-driven application that needs to work with paged data. If you are using SQL Server 2005, you should instead use the ROW_NUMBER() keyword as discussed in Custom Paging in ASP.NET 2.0 with SQL Server 2005.

Happy Programming!

  • By Scott Mitchell

    Attachments:

  • Download the stored procedures, tables, and data used for this article (includes the spreadsheet with the results from testing)
  • A Major Improvement in Performance
    A couple of alert readers have written in to provide some tips on improving the performance of this approach. Greg Hamilton has been kind enough to write up an article exploring various improvements that can speed up my approach by more than an order of magnitude. In short, don't use the stored procedure discussed in this article, but rather check out Greg's advice in A More Efficient Method for Paging Through Large Result Sets.


    Article Information
    Article Title: Efficiently Paging Through Large Result Sets in SQL Server 2000
    Article Author: Scott Mitchell
    Published Date: Wednesday, April 12, 2006
    Article URL: http://www.4GuysFromRolla.com/webtech/041206-1.shtml


    Copyright 2014 QuinStreet Inc. All Rights Reserved.
    Legal Notices, Licensing, Permissions, Privacy Policy.
    Advertise | Newsletters | E-mail Offers