Published: Wednesday, April 30, 2003
Summarizing Data with CUBE Queries, Part 2
By Scott Mitchell
Read Part 1
In Part 1 we examined how to use the GROUP BY clause
to apply an aggregate function to partitions of a table, as opposed to the entire table. However, as
we saw, oftentimes we want to add summary data. For example, if we are using GROUP BY to
sum across various partitions of the table, we might want to include a total sum for each partition dimension,
as well as a grand total sum. We'll see how to accomplish this using a single SQL statement in this part.
Using the CUBE Statement in a SQL Query
In order to create a data cube all we need to do is alter our SQL statement slightly, adding
WITH CUBE to the end of the query. That is, if we change our SQL query to:
SELECT Team, [Position], SUM(Points)
FROM PlayerStats
GROUP BY Team, [Position]
WITH CUBE
|
We'll get the results back in a data cube.
(Note that CUBE was added to SQL Server 6.5; earlier versions of SQL Server do not
include this functionality.) Specifically, the results returned for our particular data
set will be:
Team | Position | |
| Kings | Point Guard | 875 |
| Kings | Power Forward | 1542 |
| Kings | NULL | 2,417 |
| Lakers | Center | 1,841 |
| Lakers | Power Forward | 522 |
| Lakers | Shooting Guard | 2,461 |
| Lakers | NULL | 4,824 |
| Rockets | Center | 1,104 |
| Rockets | NULL | 1,104 |
| NULL | NULL | 8,345 |
| NULL | Center | 2,945 |
| NULL | Point Guard | 875 |
| NULL | Power Foward | 2,064 |
| NULL | Shooting Guard | 2,461 |
As you can see, there are a lot of NULLs in the results. Each NULL represents a "sum" column or row.
For example, the last record in the results shows the sum of all points for Shooting Guards.
The result with two NULLs shows the total points across all teams and positions. This query can be prettied
up using the ISNULL() function to replace the NULL values with text like "All":
SELECT ISNULL(Team, 'All'), ISNULL([Position], 'All'), SUM(Points)
FROM PlayerStats
GROUP BY Team, [Position]
WITH CUBE
|
Which will replace all the NULLs in the output with the text "All".
Applications of Data Cubes
Data cubes are used primarily in data warehousing, which is a field of database research
that examines storing vast amounts of data from numerous data stores for the purpose of report information.
In data warehousing, there is often data that contains many "dimensions".
For example, a retailer like Wal-Mart might have sales data that contains the following information
for each sale made:
- The item sold
- The store location the item was sold from
- The date the sale was made
- What method of payment was used (credit card/cash/check/etc.)
Using this information, a four-dimensional data cube could be constructed. With this data cube,
executives could examine data from a very fine-level of detail to a very broad-level. For example,
an executive might want to see the total number of XBox sales in Wal-Mart store #1233 sold on May 4th, 2003
and paid for with cash. A broader query might be to see how many XBoxes were sold in all of the locations
on May 4th, 2003, or how many XBoxes has store #4421 sold in total? Or how many items were sold in
store #2111 and paid for by check? Even broader queries could be along the lines of how many total products
has store #5331 sold? Or, how many total XBoxes have sold at all Wal-Mart stores? Or, the broadest possible query:
How many total items have been sold across all stores, all days, and all payment methods?
For more information on data cubes and the WITH CUBE syntax, check out the following
articles:
Conclusion
In this article we examined the CUBE SQL statement and how to use it to return data cubes
from a single SQL query. Data cubes, as we discussed, enhance the results from simple GROUP BY
statements by providing summary data across all dimensions of the partitions of the data.
Happy Programming!
By Scott Mitchell