Summarizing Data with
By Scott Mitchell
CUBE Queries, Part 2
In Part 1 we examined how to use the
GROUP BY clause
to apply an aggregate function to partitions of a table, as opposed to the entire table. However, as
we saw, oftentimes we want to add summary data. For example, if we are using
GROUP BY to
sum across various partitions of the table, we might want to include a total sum for each partition dimension,
as well as a grand total sum. We'll see how to accomplish this using a single SQL statement in this part.
CUBE Statement in a SQL Query
In order to create a data cube all we need to do is alter our SQL statement slightly, adding
WITH CUBE to the end of the query. That is, if we change our SQL query to:
We'll get the results back in a data cube.
CUBE was added to SQL Server 6.5; earlier versions of SQL Server do not
include this functionality.) Specifically, the results returned for our particular data
set will be:
As you can see, there are a lot of NULLs in the results. Each NULL represents a "sum" column or row.
For example, the last record in the results shows the sum of all points for Shooting Guards.
The result with two NULLs shows the total points across all teams and positions. This query can be prettied
up using the
ISNULL() function to replace the NULL values with text like "All":
Which will replace all the NULLs in the output with the text "All".
Applications of Data Cubes
Data cubes are used primarily in data warehousing, which is a field of database research that examines storing vast amounts of data from numerous data stores for the purpose of report information. In data warehousing, there is often data that contains many "dimensions".
For example, a retailer like Wal-Mart might have sales data that contains the following information for each sale made:
- The item sold
- The store location the item was sold from
- The date the sale was made
- What method of payment was used (credit card/cash/check/etc.)
Using this information, a four-dimensional data cube could be constructed. With this data cube, executives could examine data from a very fine-level of detail to a very broad-level. For example, an executive might want to see the total number of XBox sales in Wal-Mart store #1233 sold on May 4th, 2003 and paid for with cash. A broader query might be to see how many XBoxes were sold in all of the locations on May 4th, 2003, or how many XBoxes has store #4421 sold in total? Or how many items were sold in store #2111 and paid for by check? Even broader queries could be along the lines of how many total products has store #5331 sold? Or, how many total XBoxes have sold at all Wal-Mart stores? Or, the broadest possible query: How many total items have been sold across all stores, all days, and all payment methods?
For more information on data cubes and the
WITH CUBE syntax, check out the following
- SQL Server 6.5: Cube & Rollup
- Enhancing SQL Server Functionality with Functions
- Summarizing Data with CUBE and ROLLUP
In this article we examined the
CUBE SQL statement and how to use it to return data cubes
from a single SQL query. Data cubes, as we discussed, enhance the results from simple
statements by providing summary data across all dimensions of the partitions of the data.