When you think ASP, think...
Recent Articles
All Articles
ASP.NET Articles
ASPFAQs.com
Message Board
Related Web Technologies
User Tips!
Coding Tips

Sections:
Sample Chapters
Commonly Asked Message Board Questions
JavaScript Tutorials
MSDN Communities Hub
Official Docs
Security
Stump the SQL Guru!
XML Info
Information:
Feedback
Author an Article
Technology Jobs
ASP ASP.NET ASP FAQs Message Board Feedback ASP Jobs
Print this page.
Published: Wednesday, April 30, 2003

Summarizing Data with CUBE Queries

By Scott Mitchell


Introduction
One of the motivations behind using databases to hold information is so that decisions can be made on the data. For example, the benefit of a retailer storing product sales information in a database is that the retailer can examine reports on the data, showing interesting facts, such as what products are the top sellers, where a particular item is selling best, the gross revenue generated from all sales of a particular item or store, and so on.

- continued -

'

Typically, interesting data has different dimensions. A dimension can be thought of as an attribute with a set of possible values. In the database world, this maps to a table column. For example, an eCommerce store might have a table tracking all sales, with a record for each item sold. One dimension is the item that was sold; another, the store from which is was sold. Oftentimes, it is useful to view summary data for a particular dimension, such as all items sold at a particular store, or all of the sales for a particular store. In this article we'll examine how to use native SQL commands to provide such summary queries.

First Things First - Examining GROUP BY
Before we can discuss what ROLLUP and CUBE are, we must first take a look at the GROUP BY clause, which is a powerful clause of the SQL SELECT statement that gets far less press than it deserves. Simply, the GROUP BY clause allows for aggregate functions to be applied to particular partitions of a database table. These partitions are specified via the column list in the GROUP BY clause.

Now that I have you thoroughly confused, let me attempt to demystify this confusion by presenting an example. Consider that we have a table to track stats for various basketball players in the NBA. This table, let's call it PlayerStats, might have the following structure:

PlayerStats
Namevarchar(50), Primary Key
Teamvarchar(50)
Positionvarchar(10)
Pointsint
Reboundsint
Blocksint

For our example, let's assume this table has the following data:

PlayerStats
NameTeamPositionPointsReboundsBlocks
Kobe BryantLakersShooting Guard2,46156467
ShaqLakersCenter1,841742159
Robert HorryLakersPower Forward52251461
Chris WebberKingsPower Forward1,54270488
Mike BibbyKingsPoint Guard8751478
Yao MingRocketsCenter1,104675147

Earlier I mentioned that the GROUP BY clause is useful when using aggregate functions. Before we can examine the GROUP BY function, let's take a moment to discuss SQL aggregate functions. An aggregate function is a function whose result is based on the results in a number of rows. Common SQL aggregate functions include COUNT, MIN, MAX, AVG, SUM, and so on. For example, we can determine the total number of points scored by players in the PlayerStats table by executing the following SQL query:

SELECT SUM(Points)
FROM PlayerStats

This query would return a single row with a single column, the value in that column being 8,345 - the sum of all of the points of all of the players. However, what if we wanted to view the points by position, or by team? Note that for this particular problem, we don't want to apply to the SUM aggregate function to all of the records in the PlayerStats table; rather, we want this function to be applied to various partitions of the table. So, if we want the total points by team, we want the function applied to the partition of the players who play for the Lakers; we also want the function applied to the partition of players who play for the Kings; furthermore, we want the function applied to the partition of players who play for the Rockets.

To construct such a query we use the GROUP BY clause. The GROUP BY clause specifies what table column(s) should be used to partition the records in the table. For example, to partition the table by the Team column, we'd use the following SQL query:

SELECT Team, SUM(Points)
FROM PlayerStats
GROUP BY Team

The results of this query would be three rows, one for each unique team. Specifically, the results would be:

Team 
Lakers4,824
Kings2,417
Rockets1,104

For more information on the GROUP BY clause, refer to the previous article, Using the GROUP BY Clause.

Grouping on Multiple Columns
The GROUP BY clause can be used on multiple columns in a table. For example, imagine that we wanted to partition our table into subsets by team and position. That is, instead of wanting to see the sum of all points for each team, we want to see the sum of points for the Laker point guards, shooting guards, small forwards, power forwards, and centers. Similarly, we want to see the sum of points for the Kings point guards, shooting guards, and so on. And similarly for all teams in the table. To accomplish this, we could use the following SQL statement:

SELECT Team, [Position], SUM(Points)
FROM PlayerStats
GROUP BY Team, [Position]

The reason Position has to be in brackets is because Position is a reserved word in SQL Server 2000...

The results returned by this query for our given data set would not be terribly interesting (since there are no records in the table where there are more than one player on the same team that play the same position). The results of this query on our current data set would be:

TeamPosition 
LakersCenter1,841
RocketsCenter1,104
KingsPoint Guard875
KingsPower Forward1542
LakersPower Forward522
LakersShooting Guard2,461

(If an additional record were added to the PlayerStats table, say, Bobby Jackson, Kings, Point Guard, 150 points, the value for the Kings, Point Guard result would be 1,025 (the sum of all points by Kings point guards).)

Summing Up the Points for Team and Position
Grouping on multiple columns is nice, but commonly for reports not only are we interested in the points per team per position, but also in the total number of points per team and the total number of points per position. That is, we would want the following results:

 Point GuardShooting Guard Power ForwardCenterSUM
Lakers 2,4615221,8414,824
Kings875 1,542 2,417
Rockets   1,1041,104
SUM8752,461 2,0642,9458,345

Note that the bottom-most row and right-most column are "SUM" columns, summing up their respective rows and columns. The bottom right hand corner is a "sum of sums". To put it another way, each of the bottom-most row's column shows the sum of points for each position. The right-most column shows the sum of points for each team. The bottom right hand corner shows the sum of points across all teams and positions. And, of course, in the table one can quickly find the sum of points for the players of a particular position on a particular team.

This data structure is, in the database world, referred to as a cube. A cube is a structure that provides aggregate functions to multiple dimensions. The output above is an example of a two-dimensional cube. However, cubes of any dimension can be created. (In fact, you can have one-dimension cubes.) Specifically, a data cube presents summary data for each of the dimensions used to partition a table.

Imagine for a moment that we were asked by our boss to create a Web report that displayed the NBA player statistics as shown above. How would we accomplish this? One way would to be to use a SQL statement with a GROUP BY clause on both the Team and Position columns. Then, in our ASP or ASP.NET Web page, we'd create two arrays to hold the sum of points for each "dimension" (points by team and points by player). Finally, to compute the bottom right hand cell, we'd need to just sum up the values in one of the two arrays.

While this approach is plausible, there's an even easier way to accomplish the same thing using one SQL statement. We'll look at this in Part 2 of this article.

  • Read Part 2

  • Software Developer / Programmer - Distributed Systems (NYC)
    Next Step Systems
    US-NY-New York

    Justtechjobs.com Post A Job | Post A Resume


    ASP.NET [1.x] [2.0] | ASPMessageboard.com | ASPFAQs.com | Advertise | Feedback | Author an Article