Common Table Expressions (CTE) in SQL Server 2005By Scott Mitchell
When crafting a query in SQL, there are often times when we may need to operate over a set of data that doesn't inherently exist within the system. For example, the database for an eCommerce web application would have the standard tables -
OrderDetails, and so on - but we may need to
run reports on a particular subset of the data or against aggregate data across these tables. Or the reporting queries we need
might need to group or filter by results returned by scalar subqueries. Typically, views
are used to break down complex queries into digestible chunks or to provide scalar subquery results that can be grouped and filtered.
Views, however, are sometimes overkill, as they are permanent objects at the system-level. If we only need to reference
this complex query in a single stored procedure or UDF, another option is to use a derived
table. Unfortunately, derived tables muddle the readability of the query and must be repeated for each use in a statement.
Common Table Expressions, or CTE, are a new construct introduced in Microsoft SQL Server 2005 that offer a more readable form of the derived table that can be declared once and referenced multiple times in a query. Moreover, CTEs can be recursively defined, allowing a recursive entity to be enumerated without the need for recursive stored procedures. In this article we'll examine the benefits, uses, and syntax of both recursive and non-recursive CTEs. Read on to learn more!
|Obtaining the Northwind Traders Database for SQL Server 2005|
The examples used in this article are all operating against the Northwind Traders database, which has a simple eCommerce schema
A Simple Common Table Expression Example
Before we dive into the syntax or gritty details of CTEs, let's start by looking at a simple example. I think you'll agree that even without knowing the syntax of CTEs, they are pretty readable and straightforward (the hallmark of a well-designed programming language construct).
This query creates a CTE named
ProductAndCategoryNamesOverTenDollars that returns the name, category name, and
price of those products whose unit price exceeds $10.00. Once the CTE has been defined, it must then immediately be used in
a query. The query treats the CTE as if were a view or table in the system, returning the three fields defined by the CTE
UnitPrice), ordered alphabetically by category, then
by price, and then alphabetically by product name.
In short, a Common Table Expression allows us to define a temporary, view-like construct. We start by (optionally) specifying
the columns it returns, then define the query. Following that, the CTE can be used in a
Common Table Expression Syntax
A Common Table Expression contains three core parts:
- The CTE name (this is what follows the
- The column list (optional)
- The query (appears within parentheses after the
ProductAndCategoryNamesOverTenDollars CTE only applies to the first query following it. So when the second
query is reached,
ProductAndCategoryNamesOverTenDollars is undefined, resulting in an
"Invalid object name 'ProductAndCategoryNamesOverTenDollars'" error message.
You can, however, define multiple CTEs after the
WITH keyword by separating each CTE with a comma. For
example, the following query uses two CTEs. The subsequent
SELECT query then uses an
to match together the records from the two CTEs:
Unlike a derived table, CTEs can be defined just once, yet appear multiple times in the subsequent query. To demonstrate this,
consider the following example: the Northwind database's
Employees table contains an optional
ReportsTo column that, if specified, indicates the employee's manager.
ReportsTo is a
self-referencing foreign key, meaning that, if provided, it refers back to another
EmployeeID in the
Employees table. Imagine that we wanted to display a list of employees including how many other employees they
directly managed. This could be done using a simple, CTE-free
SELECT statement, but let's use a CTE for now (for
reasons which will become clear soon):
This query will return the employees records, showing each employee's last and first name along with how many other employees they manage. As the figure below shows, only Andrew Fuller and Steven Buchanan are manager material.
Now, imagine that our boss (Andrew Fuller, perhaps) comes charging into our office and demands that the report also lists
each employee's manager's name and number of subordinates (if the employee has a manager, that is - Mr. Fuller is all to quick to
point out that he reports to no one). Adding such functionality is a snap with the CTE - just add it in a
With this additional
LEFT JOIN, the employee's manager's results are returned; if there's no manager for the
NULLs are returned instead.
|When to Use Common Table Expressions|
Common Table Expressions offer the same functionality as a view, but are ideal for one-off usages where you don't necessarily
need a view defined for the system. Even when a CTE is not necessarily needed (as when listing just the employees and
their subordinate count in the example above), it can improve readability. In Using Common
Table Expressions, Microsoft offers the following four advantages of CTEs:
Using scalar subqueries (such as the
CTEs can also be used to recursively enumerate hierarchical data. We'll examine this next!
Recursive Common Table Expressions
Recursion is the process of defining a solution to a problem in terms of itself. For example, a teacher needs to sort a stack of tests alphabetically by the students' names. She could process the tests one at a time and, for each test, insert it into the appropriate spot to the left (called insertion sort), probably the way most people sort a hand of cards (at least that's the way I do it). However, depending on the distribution of the tests, the size of the work space, the number of tests to sort, and so on, it may be far more efficient to break down the problem into pieces. Rather than doing an insertion sort right off the bat, it might first make sense to divide the stack of papers in half, and then do an insertion sort on one half, an insertion sort on the second half, and then a merge of the two piles. Or perhaps it would make sense to divide the tests into four piles, or eight piles. (This approach is referred to as merge sort.)
With a recursive solution you will always have the following two pieces:
- The base case - what to do when you're done recursing. After dividing the tests into separate piles of say, eight elements per pile, the base case is to sort these piles via insertion sort.
- The recursive step - the action to perform that involves plugging the input "back into" the system. For merge sort, the recursive step is the division of one pile into two. Then into four. Then into eight, and so on, until the base case is reached.
Returning to CTEs... the
Employees database table holds the corporate hierarchy within its rows. Imagine that good ol'
Andrew Fuller has come back and insisted on a report that would list all persons in the company along with their position in
the hierarchy. Since the
Employees table can capture an arbitrary number of hierarchy levels, we need a recursive
solution. Enter CTEs...
Like any recursive definition, a recursive Common Table Expression requires both a base case and the recursive step. In SQL parlance,
this translates into two SQL queries - one that gets the "initial" data
UNIONed with one that performs the recursion.
Employees example, the base case is returning those employees that have no manager:
The recursion includes a query on the CTE itself. The following shows the CTE - with both the base case and recursive step -
along with a
SELECT query that returns the rows from the CTE:
The recursion occurs in the second query in the CTE by joining the results of
Employees against the
CTE itself (
EmployeeHierarchy) where the employees'
ReportsTo field matches up to the CTE's
EmployeeID. Included in this query is the
HierarchyLevel field, which returns 1 for the base case
and one greater than the previous level for each recursive step down the hierarchy. As requested, this resultset clearly
shows that Mr. Fuller is the alpha male in this organization. Furthermore, we can see that Steven, Laura, Nancy, Janet, and
Margaret make up the second tier in the organizational hierarchy, while poor Anne, Robert, and Michael are down at the bottom:
|Alternatives to Recursive Common Table Expressions|
As we saw in this article, enumerating hierarchical data recursively can be accomplished via CTEs (for more on using recursive
CTEs, don't forget to check out the official documentation - Recursive
Queries Using Common Table Expressions). However, there are other
options as well. One choice is to perform the recursion at the ASP/ASP.NET layer. That is, read in all employee information
to a Recordset of DataSet in code, and then recurse there. My article Efficiently
DisplayingParent-Child Data discusses this approach.
If you need to perform the recursion in SQL, you can use recursive stored procedures, as discussed in The Zen of Recursion. If you are designing a data model that needs to support hierarchical data, your best bet is to bake in some lineage information directly into the table from the get-go. See SQL for Threaded Discussions and More Trees & Hierarchies in SQL for more information.
One of the many new features in SQL Server 2005 are Common Table Expressions (CTEs), which provide a more readable and usable approach to derived tables. Additionally, CTEs may be recursively defined, allowing a recursive entity to be enumerated without the need for recursive stored procedures. For more on the new features found in SQL Server 2005, be sure to also check out Returning Ranked Results with Microsoft SQL Server 2005 and
TRY...CATCH in SQL Server 2005.