When you think ASP, think...
Recent Articles
All Articles
ASP.NET Articles
ASPFAQs.com
Message Board
Related Web Technologies
User Tips!
Coding Tips
Search

Sections:
Book Reviews
Sample Chapters
Commonly Asked Message Board Questions
JavaScript Tutorials
MSDN Communities Hub
Official Docs
Security
Stump the SQL Guru!
Web Hosts
XML
Information:
Advertise
Feedback
Author an Article
Jobs

ASP ASP.NET ASP FAQs Message Board Feedback ASP Jobs
 
Print this Page!
Published: Wednesday, August 5, 2009

An Extensive Examination of LINQ: Grouping and Joining Data

By Scott Mitchell


A Multipart Series on LINQ
This article is one in a series of articles on LINQ, which was introduced with .NET version 3.5.

  • An Introduction to LINQ - provides an overview of the purpose of LINQ, its design goals, and core components.
  • Extension Methods, Implicitly Typed Variables, and Object Initializers - looks at three language enhancements to VB and C# that, in part, allow for LINQ's unique syntax and functionality.
  • Lambda Expressions and Anonymous Types - explores two more language enhancements to VB and C# that permit LINQ's unique syntax and functionality.
  • The Ins and Outs of Query Operators - learn how query operators provide a universal approach to querying and modifying enumerable collections of data.
  • The Standard Query Operators - explore LINQ's standard query operators, a suite of built-in query operators for working with enumerable data.
  • Using the Query Syntax - learn how to write and use C# and Visual Basic's new query syntax, which lets you write LINQ queries using SQL-like syntax.
  • Grouping and Joining Data - examines the standard query operators and query syntax used to group and join data.
  • Introducing LINQ to XML - provides an overview of working with XML data using the LINQ to XML API.
  • Querying and Searching XML Documents Using LINQ to XML - examines querying and filtering XML documents using the LINQ to XML API.
  • Extending LINQ - Adding Query Operators - shows how to extend the functionality of LINQ by adding your own query operators.
  • (Subscribe to this Article Series! )

    Introduction


    As discussed in An Introduction to LINQ, one of the cornerstones of LINQ is the standard query operators, which are a set of extension methods on the IEnumerable interface added to the .NET Framework version 3.5. The standard query operators can be applied to any enumeration - any collection of "things." In The Standard Query Operators installment we looked at some of the more common query operators, such as Where, Select, OrderBy, and others. Each standard query operator can be classified as a certain type of operator. There are aggregation operators like Count, Sum, and Max; element operators like First, Last, and ElementAt let you pick out a specific element from a sequence; and the ordering operators OrderBy and OrderByDescending order the elements of a sequence based on a specified sorting criteria.

    Another class of query operators that we've yet to explore are grouping and joining operators. The grouping and joining operators work with two (or more) sequences and combine them together, much like how a JOIN in SQL combines records from two (or more) tables into a single resultset. Through LINQ's standard query operators (or via its query syntax), it is possible to perform: nested (or grouped) queries; cross joins, or the Cartesian product of two sequences; inner joins; and left outer joins.

    This article explores the grouping and joining operators available in LINQ with a number of examples in both Visual Basic and C#. As with the previous installments in this article series, the complete code is available for download. Read on to learn more!

    - continued -

    Examining the Sample Data


    The examples in this article look at how to create LINQ queries that group and join the same set of sample data. If you download the demos (available at the end of this article) you'll find a class file named FauxNorthwind.cs in the App_Code/CSCode folder. This file defines a namespace (FauxNorthwind) and three classes within that namespace. The first two, Category and Product, mimic the schema of the Category and Product tables in the Northwind database. The Category class has CategoryId and CategoryName properties, while the Product class has the properties ProductId, ProductName, Category, UnitPrice, and Discontinued.

    public class Category
    {
       public int CategoryId { get; set; }
       public string CategoryName { get; set; }
    }

    public class Product
    {
       public int ProductId { get; set; }
       public string ProductName { get; set; }
       public Category Category { get; set; }
       public decimal? UnitPrice { get; set; }
       public bool Discontinued { get; set; }
    }

    The third class, Data, is a static class that defines the "records" in the sample data via two static arrays. There's an array of type Category named Categories and an array of type Product named Products. The Categories array is populated with six elements, while the Products array defines 15 products. (These arrays are populated using the collection initializer and object initializer enhancements added to C# 3.0 and discussed in the Extension Methods, Implicitly Typed Variables, and Object Initializers installment in this article series.)

    A snapshot of the data defined in the Categories and Products arrays follows:

    public static class Data
    {
       public static Category[] Categories =
             {
                new Category() { CategoryId = 1, CategoryName = "Beverages" },
                new Category() { CategoryId = 2, CategoryName = "Confections" },
                new Category() { CategoryId = 3, CategoryName = "Condiments" },
                ...
             };

       public static Product[] Products =
             {
                new Product() { ProductId = 1, ProductName = "Tasty Ale", UnitPrice = 1.95M, Discontinued = false, Category = Categories[0] },
                new Product() { ProductId = 2, ProductName = "Pastry Puffs", UnitPrice = 4.95M, Discontinued = false, Category = Categories[1] },
                new Product() { ProductId = 3, ProductName = "Honey Mustard", UnitPrice = 3.50M, Discontinued = false, Category = Categories[2] },
                ...
             };
    }

    Nested Queries


    When working with two parent/child-related sequences - categories and products, in this example - you may want to access all of the records in the parent sequence one at a time and, for each parent record, work with its children. Such functionality is possible in LINQ in two ways: by writing a nested query or by using LINQ's GroupBy operator. Underneath the covers, both approaches are the same. They differ only in the syntax. As we'll see, a nested query is a bit more verbose and not as readable as using the GroupBy operator.

    To understand how a nested query works, let's first look at writing a query that returns the categories. Using the query syntax added to C# 3.0 and Visual Basic 9 we could write:

    // C#
    var CategoriesAndProducts =
             from category in FauxNorthwind.Data.Categories
             select category;


    ' VB
    Dim CategoriesAndProducts = _
             From category In FauxNorthwind.Data.Categories _
             Select category

    Now, imagine that for each category we wanted to get back the total number of products. We could accomplish this by using a nested query. A nested query is a query that appears in the Select portion of the query. Consequently, the nested query is executed once per item returned from the outer query. If that doesn't quite make sense, perhaps this syntax will clarify the issue:

    // C#
    var CategoriesAndProducts =
       from category in FauxNorthwind.Data.Categories
       select new
       {
          CategoryName = category.CategoryName,
          ProductCount =
             (from product in FauxNorthwind.Data.Products
              where product.Category == category
              select product).Count()

       }

    ' VB
    Dim CategoriesAndProducts = _
       From category In FauxNorthwind.Data.Categories _
       Select _
          CategoryName = category.CategoryName, _
          ProductCount = (From product In FauxNorthwind.Data.Products _
                          Where product.Category.Equals(category)).Count()

    This new query uses projection to create an anonymous type with two members: CategoryName, which contains the name of the category record; and ProductCount, which indicates the total number of products associated with this particular category. Note that the nested query gets those products whose Category property matches the current Category object from the outer query. In plain English, for each category in the FauxNorthwind.Data.Categories array a new object is created. In creating that object the ProductCount is saying, "Get me all of the products that have the same category as the one that is currently being processed and then count up how many elements are in this nested query's results." The end result is that for each category in the FauxNorthwind.Data.Categories array a new object with the category's CategoryName property and the total number of products associated with that category is returned.

    We could add another nested query, one that returns the entire sequence of related products. To do so simply add another member to the anonymous type like so:

    // C#
    var CategoriesAndProducts =
       from category in FauxNorthwind.Data.Categories
       select new
       {
          CategoryName = category.CategoryName,
          ProductCount =
             (from product in FauxNorthwind.Data.Products
              where product.Category == category
              select product).Count(),
          Products =
             (from product in FauxNorthwind.Data.Products
              where product.Category == category
              select product)

       };


    ' VB
    Dim CategoriesAndProducts = _
       From category In FauxNorthwind.Data.Categories _
       Select _
          CategoryName = category.CategoryName, _
          ProductCount = (From product In FauxNorthwind.Data.Products _
                          Where product.Category.Equals(category)).Count(), _
          Products = (From product In FauxNorthwind.Data.Products _
                       Where product.Category.Equals(category))

    The demos available at the end of this article bind the grouped and joined data to a GridView to illustrate how each query operator works. The above query generates a grid with the following data:

    Each category is listed along with its associated products.

    Note that resulting data includes all of the categories regardless of whether they have any associated products. This is because the outer query gets all categories and then gets the related products (and the number of related products) via a nested query. Consequently, each category in the FauxNorthwind.Data.Categories array will be returned; it doesn't matter if there are any related products or not.

    Grouping Data


    While nested queries certainly will get the job done, LINQ includes a more terse and readable syntax for grouping data. The GroupBy standard query operator allows the items in query to be grouped by some property. Like many other standard query operators, the GroupBy operator returns a sequence. Specifically, each element returned by GroupBy implements the interface IGrouping<TKey, TSource>. This interface has a strongly-typed key (of type TKey) and a sequence of grouped items of type TSource. That description can be a bit confusing. Perhaps an example will help.

    Imagine that we want to get all of the products and group them by category so that we end up with a sequence of categories and for each category we have a sequence of its corresponding products. This could be accomplished using the GroupBy standard query operator like so:

    // C#
    var CategoriesAndProducts = FauxNorthwind.Data.Products
                       .GroupBy(p => p.Category)
                       .Select(catGroup => new
                          {
                             CategoryName = catGroup.Key.CategoryName,
                             ProductCount = catGroup.Count(),
                             Products = catGroup

                          }
                       );


    ' VB
    Dim CategoriesAndProducts = FauxNorthwind.Data.Products. _
                       GroupBy(Function(p) p.Category). _
                       Select(Function(catGroup) New With { _
                              .CategoryName = catGroup.Key.CategoryName, _
                              .ProductCount = catGroup.Count(), _
                              .Products = catGroup
    _
                           } _
                       )

    The GroupBy operator uses a lambda expression to indicate what property the source data (the Product objects in FauxNorthwind.Data.Products) should be grouped by. In this case we group by the Category property. The net result is that the GroupBy operator returns a sequence with five elements - one element for each of the five different Category objects found in the FauxNorthwind.Data.Products array. Each element has a Key property (that returns the Category object) along with a sequence of the Product objects that belong to that group.

    The statement in the code snippet above takes these results from the GroupBy operator and uses the Select operator to project them into a more usable format. Namely, the Select operator creates an anonymous type with three properties:

    • CategoryName - the name of the category. Note that the category information is accessible via the Key property.
    • ProductCount - the total number of elements in the group's sequence. In other words, the number of products associated with this category.
    • Products - the sequence associated with this group. Namely, the products that belong to the category group.
    The download includes a demo that displays these results in a GridView. The screen shot below shows this demo page when visited through a web page. Note that the grid below does not include categories that have no products (Meat/Produce), whereas the screen shot that showed the nested query included such categories. The reason is because in the nested query demo we explicitly queried the Categories array and then ran a nested query to get the related products. In other words, that query started by getting all of the categories and then each category's products (if any existed). However, the GroupBy example explicitly gets all products and then groups them by category. Those categories that have no products are never retrieved.

    Each product is listed, grouped by category.

    You can also use the GroupBy operator using the query syntax. The code snippet below shows an equivalent statement to the standard query operator usage above, but uses the query syntax rather than extension methods.

    // C#
    var CategoriesAndProducts =
          from product in FauxNorthwind.Data.Products
          group product by product.Category into categoryProducts
          select new
          {
             CategoryName = categoryProducts.Key.CategoryName,
             ProductCount = categoryProducts.Count(),
             Products = categoryProducts

          };


    ' VB
    Dim CategoriesAndProducts = _
       From product In FauxNorthwind.Data.Products _
       Group product By key = product.Category Into Group_
          Select _
             CategoryName = key.CategoryName, _
             ProductCount = Group.Count(), _
             Products = Group

    When using the query syntax you use the Into keyword to name the sequence of groups returned by the GroupBy operator. In C# you can use any name (such as categoryProducts), but in Visual Basic you must use the keyword Group. Furthermore, in Visual Basic you specify the name of the Key property directly in the grouping syntax - Group product By key = product.Category Into Group.

    Joins


    Nested queries and grouping in LINQ is useful, but differs from the manner by which different entities are typically combined in relational databases. In SQL, you can return data from two (or more) tables into a single result set using a join. There are different types of joins in SQL, including inner joins, outer joins, and cross joins. I don't want to belabor how joins work in SQL in this article, as this topic is covered in depth elsewhere. See the SQL Joins tutorial at w3schools.com, or Jeff Atwood's A Visual Explanation of SQL Joins post for more information.

    LINQ includes a Join standard query operator that allows for SQL JOIN-like behavior and syntax (when using LINQ's query syntax). This article shows how to perform four different types of joins in LINQ:

    • Group Join,
    • Inner Join,
    • Left Outer Join, and
    • Cross Join

    Group Joins


    A group join is similar to using the GroupBy operator, although the syntax and semantics differ slightly. In a nutshell, a group join uses the Join operator and the Into keyword to group the results of the join. The following query returns all categories and uses a group join to create a group of related products for each category.

    // C#
    var CategoriesAndProducts =
          from category in FauxNorthwind.Data.Categories
          join product in FauxNorthwind.Data.Products on
             category equals product.Category into categoryProducts

          select new
          {
             CategoryName = category.CategoryName,
             ProductCount = categoryProducts.Count(),
             Products = categoryProducts

          };


    ' VB
    Dim CategoriesAndProducts = _
       From category In FauxNorthwind.Data.Categories _
       Group Join product In FauxNorthwind.Data.Products _
          On category Equals product.Category Into Group
    _
          Select _
             CategoryName = category.CategoryName, _
             ProductCount = Group.Count(), _
             Products = Group

    Note that to use a group join in Visual Basic you must use the syntax Group Join instead of join, as is used in C#. In both cases, the results are grouped via the Into keyword. As in the GroupBy operator example, C# can group the data into a group of any name whereas in Visual Basic you must use the name Group.

    The results of the group join are identical to the nested query's. Namely, the result is a sequence with one element for each category, regardless of whether the category has any associated products. Each element has a "sub-sequence" that contains the products associated with that category (if any). Refer to the first screen shot in this article for a visual representation of these results.

    Inner Joins


    An inner join queries two sequences. For each pair of elements examined, the pair is returned as output if the pair match on some specified criteria. Otherwise the pair is omitted from the results. Let's look at an example. The following query syntax queries two sequences - FauxNorthwind.Data.Categories and FauxNorthwind.Data.Products - joining together the Product and Category objects with matching categories. The Join operator generates a sequence whose elements have two properties: category and product. This result is projected via the Select operator into an anonymous type that returns the just the CategoryName property from the category object along with the entire product object. (An equivalent query using extension methods instead of the query syntax is available in the code download; examine the comments in the InnerJoin.aspx page's code-behind class.)

    // C#
    var CategoriesAndProducts =
          from category in FauxNorthwind.Data.Categories
          join product in FauxNorthwind.Data.Products on
             category equals product.Category

          select new
          {
             category.CategoryName,
             product

          };

    ' VB
    Dim CategoriesAndProducts = _
       From category In FauxNorthwind.Data.Categories _
       Join product In FauxNorthwind.Data.Products _
          On category Equals product.Category
    _
          Select _
             CategoryName = category.CategoryName, _
             product

    The above query syntax is quite similar to SQL's JOIN syntax. Here we query "from" the FauxNorthwind.Data.Categories "table" and join with the FauxNorthwind.Data.Products "table" where the Category objects line up. If it helps, you can envision the above LINQ query as the following SQL statement:

    SELECT Categories.CategoryName, Products.*
    FROM Categories
       INNER JOIN Products ON
          Categories.CategoryId = Products.CategoryId

    The LINQ query generates a sequence with 15 elements, one for each product in the "database." Each item in the resulting sequence includes the category's name and all fields from the Product object. The following screen shot shows a GridView that displays the CategoryName property in the first column and the product's ProductName and UnitPrice properties in the second column.

    The results when joining categories and products.

    The result from a LINQ Join differs than those generated by a nested query, the GroupBy operator, and by group joins. The nested query, GroupBy operator, and group joins return a sequence that contains a "sub-sequence." For example, when using the GroupBy operator to group products by category, the resulting output was a sequence that contains a Key (the Category) and a sub-sequence of Product objects that belonged to that category. On the other hand, the Join operator returns a sequence that contains the two elements being queries (each Category and Product object whose categories align).

    Left Outer Joins


    An inner join only returns items where the two input sets intersect. For example, in the above screen shot the only categories that are returned are those that have a matching product. Similarly, any products without a category would be omitted. An left outer join returns all records from the first sequence (the "left" one) regardless of whether there is a matching element in the second sequence (the "right" one).

    A left outer join is implement in LINQ by using a group join on the "left" table and then using another From operator to get the matching products, if any exist. The two From operators are needed to return the data in the same format as the inner join (having the category and product properties side-by-side rather than in a sub-sequence). The following query uses a left outer join to return a record for all categories, regardless of whether there are matching products.

    // C#
    var CategoriesAndProducts =
    from category in FauxNorthwind.Data.Categories
       join product in FauxNorthwind.Data.Products on
          category equals product.Category into categoryProducts
    from product in categoryProducts.DefaultIfEmpty()

    select new
    {
       category.CategoryName,
       product

    };


    ' VB
    Dim CategoriesAndProducts = _
       From category In FauxNorthwind.Data.Categories _
       Group Join product In FauxNorthwind.Data.Products _
          On category Equals product.Category Into Group _
       From product In Group.DefaultIfEmpty()
    _
          Select _
             CategoryName = category.CategoryName, _
             product

    Note that there is both a group join and an additional From operator that queries the resulting group join sub-sequence, which are the products that match the category. The DefaultIfEmpty() extension method is used here to return null if there are no items in the sub-sequence (i.e., if there are no matching products for the category).

    The resulting output of the above query is a sequence with 16 elements. One element for each product plus one element for the category that has no products associated with it (Meat/Produce). The screen shot below shows this data when bound to a GridView.

    The results of a left outer join. Note that all categories, even those without any products, are included in the results.

    Cross Joins


    A cross join is the Cartesian product between the two entities performing the join. In other words, the cross join outputs an element for every single possible combination of elements in the two inputs. When doing a cross join on two sequences with sizes A and B, the resulting cross join has A x B results. A cross join in LINQ is accomplished by having two From operators that specify the two sequences to combine, as the following query syntax illustrates:

    // C#
    var CategoriesAndProducts =
          from category in FauxNorthwind.Data.Categories
          from product in FauxNorthwind.Data.Products

          select new
          {
             category.CategoryName,
             product.ProductName,
             product.UnitPrice
          };


    ' VB
    Dim CategoriesAndProducts = _
       From category In FauxNorthwind.Data.Categories _
       From product In FauxNorthwind.Data.Products
    _
        Select _
           category.CategoryName, _
           product.ProductName, _
           product.UnitPrice

    The above query generates a resulting sequence with 90 elements and contains all of the category/product combinations, even if the combinations aren't valid. For instance, the resulting sequence includes the pairing of category and product Meat/Produce and Chai, even though Chai product belongs to the Beverages category and the Meat/Produce category has no associated products.

    Because so many records are returned by the cross join, a screen shot would be impractical here. To see the output of the cross join download the code available at the end of this article and run the CrossJoin.aspx demo.

    Happy Programming!

  • By Scott Mitchell


    Attachments:


  • Download the code associated with this article series
  • Further Reading


  • GroupBy Operator Syntax
  • Join Operator Syntax
  • 101 Visual Basic LINQ Samples: Grouping Operators - Join Operators
  • A Multipart Series on LINQ
    This article is one in a series of articles on LINQ, which was introduced with .NET version 3.5.

  • An Introduction to LINQ - provides an overview of the purpose of LINQ, its design goals, and core components.
  • Extension Methods, Implicitly Typed Variables, and Object Initializers - looks at three language enhancements to VB and C# that, in part, allow for LINQ's unique syntax and functionality.
  • Lambda Expressions and Anonymous Types - explores two more language enhancements to VB and C# that permit LINQ's unique syntax and functionality.
  • The Ins and Outs of Query Operators - learn how query operators provide a universal approach to querying and modifying enumerable collections of data.
  • The Standard Query Operators - explore LINQ's standard query operators, a suite of built-in query operators for working with enumerable data.
  • Using the Query Syntax - learn how to write and use C# and Visual Basic's new query syntax, which lets you write LINQ queries using SQL-like syntax.
  • Grouping and Joining Data - examines the standard query operators and query syntax used to group and join data.
  • Introducing LINQ to XML - provides an overview of working with XML data using the LINQ to XML API.
  • Querying and Searching XML Documents Using LINQ to XML - examines querying and filtering XML documents using the LINQ to XML API.
  • Extending LINQ - Adding Query Operators - shows how to extend the functionality of LINQ by adding your own query operators.
  • (Subscribe to this Article Series! )



    ASP.NET [1.x] [2.0] | ASPMessageboard.com | ASPFAQs.com | Advertise | Feedback | Author an Article