By Scott Mitchell
In early November 2005, Microsoft launched three new product lines: Visual Studio 2005, SQL Server 2005, and the .NET Framework version 2.0 (which includes ASP.NET version 2.0). SQL Server 2005 is the newest version of Microsoft's database platform, with the last major SQL Server release dating back to SQL Server 2000. Over the past five years of development, SQL Server has added a vast array of new features and capabilities, which are summed up on Microsoft's website in What's New in SQL Server 2005? The new features that will likely be of particular interest to developers building web-based applications that use SQL Server 2005 as a backend data store, include the new Transact-SQL (T-SQL) enhancements, better Visual Studio integration, integration with the CLR/.NET Framework, and the SQL Server 2005 Management Studio application, which is a slicker version of SQL Server 2000's Enterprise Manager.
SQL Server 2005's T-SQL enhancements make writing certain types of queries significantly easier than in SQL Server 2000. While the new additions to 2005's T-SQL syntax could be replicated either in SQL Server 2000 or in the application layer, with 2005 the syntax is much more terse, readable, and sensible. In this article we'll look at SQL Server 2005's rank-related functions, which simplify the process of assigning ranks to query results, retrieve results with a particular rank, and other rank-related tasks common to reporting. Read on to learn more!
The Data Model and Ranked Results Basics
Before we look at how to accomplish these common query patterns, let's first create a data model against which we can run these queries. I am going to be using the free Express version of SQL Server 2005 for my demos, and have included the database and an ASP.NET 2.0 website for download at the end of this article. (Like Visual Studio, SQL Server ships with a number of different "versions." The Express version is a free version that is designed for hobbyists, students, and the like. If you download and install Visual Web Developer - the express version of Visual Studio geared toward web developers - you can indicate to install SQL Server 2005 Express edition as well.)
For these examples we'll use a database that has products, salespersons (employees), customers, and orders. This is modeled
using five tables:
Products tables contain a
row for each customer, employee, and product, respectively. Each time a customer makes a purchase, a record is added
Orders table indicating the customer that made the purchase, the employee that made the sale, and the
date the order was placed. The
OrderItems maps each product purchased for an order, the quantity of product,
and the total price billed (assuming that larger purchases may be discounted). The following figure shows these tables,
their fields, and the relationships among them.
As the diagram illustrates, the
OrderItems establishes a many-to-many join between the
When building reports or analyzing data, often users or management are interested in seeing the data ranked in some manner.
For example, your boss may want to have a report that shows the top 10 selling items, or the three people in the sales team who
have generated the greatest revenue in the third quarter. More complicated examples might be returning just the 3rd through
5th ranked salespersons. With SQL Server 2000 queries returning the top ranking items could be accomplished using
ROWCOUNT keywords, as discussed in Retrieving
the Top N Records from a SQL Query. To retrieve a particular ranked subset, you needed to use a
derived tables- or view-based approach.
SQL Server 2005 provides four new ranking functions:
NTILE. While these are definitely a step
above what SQL Server 2000 offers, the ranking functions still have some limitations that require the use of dervied tables
or views for more interesting applications. Let's look at each of these individually.
Numbering Rows with
ROW_NUMBER function assigns an ordinal
value with each record returned, with the ordinal values depending on a particular
ORDER BY clause used in
tandem with the function. The syntax for
order-by-clause). For example, the following query would return the products from most expensive to least
expensive, with a rank assigned to each product based
The results would be:
By default, the
ROW_NUMBER function assigns a increasing value to each record in the resultset. With the
optional partition parameter, you can instruct the
ROW_NUMBER function to start renumbering whenever
there's a change in the value of the column for partitioning. To demonstrate this, I created a view called
that returns each
OrderID and the total amount billed for that order using the following query syntax:
This returns each unique order in the
OrderItems table along with the sum of
pertaining to that order. With this view we can use the
ROW_NUMBER method to rank those orders by most amount
of money spent using the following:
This will return results like:
Notice that certain customers are on this list multiple times (Bob, Tito, and Bruce). Perhaps rather than seeing all orders,
ordered by sales amount, we are interested in seeing the top orders per customer. We can accomplish this with the
PARTITION BY clause in the
ROW_NUMBER function like so:
This will return the following results:
While these results are definitely cool, the bad news is that you can't use the
ROW_NUMBER() (or any of the
other ranking functions) in the
WHERE clause. That is, you might want to say, "Give me the products ranked
by price whose rank is between 5 and 8." To accomplish this you'd need to use a derived table or view. For example, you
could put the above query in a view called
vwPriceRankedProducts and then get back just the 5th through
8th ranked products use the query:
Handling Ties with
ROW_NUMBER function blindly increments the value for each returned result based on the optional partition
clause and the required order by clause. However, sometimes you may wish to handle ties differently, instead assigning the
same value to ties. For example, earlier when displaying the list of total order amounts the amount of the order placed by
Tito on 12/22/2005 is the same as the amount of the order placed by Bruce on 1/5/2006. However, the
function ranks these two rows as #4 and #5, rather than using #4 for both.
The results will be:
Notice how the two orders with the same amount are both ranked #4. The difference between
is in how they begin renumbering after the tied results.
RANK bypasses as many numbers as there are tied results.
In our example above, since there are two results tied at #4, following that rank #5 is skipped and the ranking resumes with
DENSE_RANK, on the other hand, picks up with the next number. Had we replaced the function name
DENSE_RANK in the above query, Tito's order on 12/18/2005 for $12.44 would have the rank #5.
DENSE_RANK can use the optional
Grouping Ranked Results with
The final ranking-related T-SQL function new to SQL Server 2005 is
operates like the other ranking functions, but breaks down the results into int number of groups, assigning the
same rankings for all records in the specified group. You can use
NTILE to break results down into halves, thirds,
quarters, and so on, as this example shows:
With the results being:
In the downloadable demo I have an example that uses a view with
NTILE to allow the user to selectively
view various quartiles of the highest order amounts.
In this article we examined four functions new to SQL Server 2005. These four functions -
NTILE - make it easier to rank results
than was previously available with SQL Server 2000. These functions are but a small sample of the new T-SQL features available
with SQL Server 2005. Future articles will highlight other useful facets!