Published: Saturday, January 12, 2002
Using the Group By Clause
By Scott Mitchell
Introduction
While most developers are familiar with the basic SQL syntax for constructing SELECT statements,
many folks have rarely, if ever, used one of the more powerful clauses of the SELECT
statement: Group By. I think the reason of this lack of
use of this clause is two-fold:
- You don't really need to ever use the
Group By clause of the SELECT statement. You could simply
do some Recordset processing in the ASP page to simulate these results.
- The
Group By clause doesn't get much press in Web sites,
books, etc.
This article examines what these two clauses do and presents a number of common queries that become rather
simple with the use of Group By.
What Does Group By Do?
SQL Server, if you knew it or not, supports what are called aggregate functions. These functions -
SUM, COUNT, AVG, MAX, MIN, etc. -
can appear in the SELECT clause. For example, if we had an Employee database with the
following tables:
| Employee |
EmployeeID | int, PK |
Name | varchar(100) |
DepartmentID | int, FK |
Salary | money |
|
| Department |
DepartmentID | int, PK |
Name | varchar(100) |
Description | text |
|
Note that in this data model there would be a one to many relationship between Department.DepartmentID
and Employee.DepartmentID; this foreign key identifies what department each employee
belongs to (departments might have values such as Executive, IT, Sales, Marketing, Engineering, etc.).
Now, imagine that we wanted to view the average salary for our employees. We could accomplish this using
the AVE aggregate function like so:
SELECT AVG(Salary) AS AverageSalary
FROM Employees
|
This would return a Recordset with one row and one column, this single cell containing the average
salaries of our employees. In an ASP page you could access this value from the Recordset using:
objRS("AverageSalary").
Now imagine that we wanted to get the average salary for each department. Intuitively we may decide
to write our SQL query as follows:
SELECT DepartmentID, AVG(Salary) AS AverageSalary
FROM Employees
|
However, such a query will throw an error. This error is thrown because the SQL query processor can't
retrive the value for a single row (DepartmentID) at the same time it is computing an
aggregate value (AVG(Salary)) unless you use the Group By clause.
Essentially, when you leave off the Group By clause the aggregate function is applies to
all of the rows in the Employees table that match the WHERE clause conditions.
If you wish to apply the aggregate function to subsets of the rows of the Employees table,
though, you must provide information on how to partition the table. Specifically, we instruct SQL Server
to partition up the results of the Employees table into sets determined by the value of
DepartmentID.
Now that you're thoroughly confused, let's look at an example, to help clear things up. Imagine our
Employees table had the following data:
Data from Employees Table |
| EmployeeID | Name | DepartmentID | Salary |
| 1 | Scott Mitchell | 1 | 75,000 |
| 2 | Tina Lee | 1 | 65,000 |
| 3 | Todd Maher | 5 | 175,000 |
| 4 | Kevin Joy | 1 | 50,000 |
| 5 | James Miller | 3 | 45,000 |
| 6 | Justin Casey | 3 | 60,000 |
Now, if we apply the followign SQL statement:
SELECT AVG(Salary) AS AverageSalary
FROM Employees
|
We are asking SQL to find the average of all the salaries in the Employees table.
(Of course we could limit this via a WHERE clause, such as finding the average salary for
employees that made more than $50,000.) Essentially SQL iterates through our set of rows (the rows
of Employees and sums up the salaries, dividing the by the total number of salaries summed
up. This would result in a value of $78,333.33, the average salary.
Now imagine that we wanted to find the average salary of each department. That is, we want to
find the average salary of those in department 1 (Scott, Tina, and Kevin), the average salary of those
in department 3 (James and Justin) and the average salary of those in department 5 (Todd). To accomplish
this, we need to have the set of rows returned by the Employees table partitioned by
DepartmentID. By using the following SQL statement:
SELECT DepartmentID, AVG(Salary) AS AverageSalary
FROM Employees
GROUP BY DepartmentID
|
We essentially break the Employees table into three smaller tables: one containing those who
are in department 1; another containing those in department 3; and a third one containing those in
department 5. So the returned results will each department listed along with the average salary in the
department. The return result will be:
| DepartmentID | AverageSalary |
| 1 | $63,333.33 |
| 3 | $52,500.00 |
| 5 | $175,000.00 |
Summary
To summarize, the Group By clause partitions up a table by the columns specified; these
partitions then have the aggregate function applied to each of them separately. The Group By
clause can be very useful especially in data models where there are a number of related tables with
one-to-many or many-to-many relationships between them. The Group By clause allows you to
apply aggregate functions over partitions of these related tables. For more information on the
Group By clause be sure to check out the SQL Books Online.
Happy Programming!
By Scott Mitchell