When you think ASP, think...
Recent Articles
All Articles
ASP.NET Articles
ASPFAQs.com
Message Board
Related Web Technologies
User Tips!
Coding Tips
Search

Sections:
Book Reviews
Sample Chapters
Commonly Asked Message Board Questions
JavaScript Tutorials
MSDN Communities Hub
Official Docs
Security
Stump the SQL Guru!
Web Hosts
XML
Information:
Advertise
Feedback
Author an Article

ASP ASP.NET ASP FAQs Message Board Feedback
 
Print this Page!
Published: Wednesday, July 7, 2010

An Extensive Examination of LINQ: Extending LINQ - Adding Query Operators

By Scott Mitchell


A Multipart Series on LINQ
This article is one in a series of articles on LINQ, which was introduced with .NET version 3.5.

  • An Introduction to LINQ - provides an overview of the purpose of LINQ, its design goals, and core components.
  • Extension Methods, Implicitly Typed Variables, and Object Initializers - looks at three language enhancements to VB and C# that, in part, allow for LINQ's unique syntax and functionality.
  • Lambda Expressions and Anonymous Types - explores two more language enhancements to VB and C# that permit LINQ's unique syntax and functionality.
  • The Ins and Outs of Query Operators - learn how query operators provide a universal approach to querying and modifying enumerable collections of data.
  • The Standard Query Operators - explore LINQ's standard query operators, a suite of built-in query operators for working with enumerable data.
  • Using the Query Syntax - learn how to write and use C# and Visual Basic's new query syntax, which lets you write LINQ queries using SQL-like syntax.
  • Grouping and Joining Data - examines the standard query operators and query syntax used to group and join data.
  • Introducing LINQ to XML - provides an overview of working with XML data using the LINQ to XML API.
  • Querying and Searching XML Documents Using LINQ to XML - examines querying and filtering XML documents using the LINQ to XML API.
  • Extending LINQ - Adding Query Operators - shows how to extend the functionality of LINQ by adding your own query operators.
  • (Subscribe to this Article Series! )

    Introduction


    As discussed in earlier installments of this article series - most notably in An Introduction to LINQ and The Standard Query Operators - one of LINQ's primary components is its set of standard query operators. A query operator is a method that operates on a sequence of data and performs some task based on that data, are implemented as extension methods on types that implement the IEnumerable<T> interface. Some of the standard query operators that we've explored throughout the articles in this series include: Count, Average, First, Skip, Take, Where, and OrderBy, among others.

    While these standard query operators provide a great detail of functionality, there may be situations where they fall short. The good news is that it's quite easy to create your own query operators. Underneath the covers query operators are just methods that extend types that implement IEnumerable<T> and iterate over the sequence performing some task, such as computing the total number of items in the sequence, computing the average, filtering the results, or ordering them. This article examines how to extend LINQ's functionality by creating your own extension methods. Read on to learn more!

    - continued -

    A Quick Primer on Query Methods


    In a prior article, The Ins and Outs of Query Operators, we looked at the underlying functionality of query methods. (If you have not yet read that article I strongly suggest reading it before continuing on with this one.) Recall that, in a nutshell, query methods iterate over a sequence of data; specifically, a query operator iterates over a sequence that implements IEnumerable<T>, which includes arrays, lists, stacks, queues, dictionaries, and ADO.NET-related classes like DataRowCollection, among many other types.

    Query operators can be classified as to what type of value they return. For instance, some standard query operators return a scalar value based on some aggregating function (such as determining the maximum value in the sequence), while others return a single element from the sequence or a new sequence altogether. All query operators can be classified into one of the following categories:

    • Aggregate operators - aggregate operators return a scalar value based on some aggregating operation. For instance, the standard query operators Sum and Average are examples of aggregate operators because they return a scalar value (a number) based on an aggregate operation (summing or averaging some value in the underlying sequence)
    • Single element operators - query operators that return precisely one element from the underlying sequence. First and Single are examples of standard query operators that return a single element.
    • Sequence operators - sequence operators return a sequence. The returned sequence may be a subset of the underlying sequence, some modification of the underlying sequence, or an entirely new one. The Where standard query operator is an example of a sequence operator as it returns the elements in the underlying sequence that match a particular filter criteria.
    • Grouping operators - returns a group of sequences from a single underlying sequence. GroupBy is an example of a grouping standard query operator.
    In this article we will create a number of useful, real-world query operators that you can use in your projects and as a basis for creating your own custom query operators. Let's get started!

    Creating Standard Deviation and Variance Aggregate Query Operators


    When writing a SQL query you can utilize a number of aggregate functions, such as AVG, COUNT, MIN, MAX, and SUM. The standard query operators include similar aggregate operators; however, SQL offers a number of statistical aggregate functions not found in the standard query operators, including: In a nutshell, the variance and standard deviation are statistical measures that indicate how the values of a sequence are dispersed. For example, the sequences (8, 10, 12) and (2, 3, 25) both have the same average (10), but the elements in the latter sequence are much more dispersed than the numbers in the first. Computing the variance is pretty straightforward and can be done in a few simple steps:
    1. Compute the average value of the sequence.
    2. For each element in the sequence determine the difference between the number and the average computed in step (1).
    3. Square each of the differences determined in step (2) and sum these numbers.
    4. Divide the result in step (3) by the number of elements in the sequence.
    The standard deviation is simply the square root of the variance.

    The following extension methods compute the variance and standard deviation for a sequence of decimal values. (I've included just the C# version of these query operators here in this article; download the code available at the end of this article to see the Visual Basic version of these query operators.)

    public static double Variance(this IEnumerable<decimal> source)
    {
       // Compute the average of the sequence
       var avg = source.Average();

       // Sum up the difference of each element from the average, squared
       var runningSum = 0.0M;
       foreach (int value in source)
          runningSum += (value - avg) * (value - avg);

       // return the runningSum divided by the number of elements
       return Convert.ToDouble(runningSum / source.Count());
    }

    public static double StdDeviation(this IEnumerable<decimal> source)
    {
       // The standard deviation is the square root of the variance
       return Math.Sqrt(source.Variance());
    }

    As you can see, these two methods are implemented as extension methods on a type that implements the IEnumerable<decimal> interface. (To compute the standard deviation and variance on sequences of integers, doubles, and other numeric types you'd need to create additional Variance and StdDeviation methods that applied to types of IEnumerable<int>, IEnumerable<float>, and so on.) The Variance method starts by computing the average using the Average standard query operator. Next, it enumerates the elements in the underlying sequence (source) and, for each element, determines the square of the number less the average value. These numbers are summed into a variable named runningSum, which is then divided by the number of elements in the sequence. This is the variance.

    The StdDeviation method computes the variance of the sequence using the just-defined Variance method and returns its square root.

    With these query operators complete they can now be used on any sequences of decimals, much like how the aggregate standard query operators can be used. The demo includes a demo that allows the user to type a comma-delimited list of numbers into a textbox. These numbers are split apart and fed into a list of decimal values named sequenceOfDecimals. The Variance and StdDeviation query operators are then used to display these statistical metrics. The code snippet below shows how the Variance and StdDeviation query operators can be called like any other query operator on an object that implements IEnumerable<decimal>. The screen shot below the code snippet shows the output when viewed through a browser.

    // Display the standard deviation and variance of the sequence of decimals entered by the user
    lblResults.Text = string.Format("The variance is {0:N2} and the standard deviation is {1:N2}...",
                                    sequenceOfDecimals.Variance(),
                                    sequenceOfDecimals.StdDeviation()
    );

    The variance and standard deviation of the entered numbers are displayed.

    Returning Elements of a Sequence in Random Order


    Given a sequence of elements how would you return, say, three random elements? Or all of the elements but in a random order? There is no standard query operator for randomizing a sequence, so let's create one!

    As noted in Techniques for Randomly Reordering an Array, it's very easy to write a shuffle algorithm that does not generate truly random shuffles. A naive implementation can easily overweight certain permutations. One fairly simple algorithm that avoids such pitfalls is the Fisher-Yates shuffle. the Fisher-Yates shuffle randomly reorders the elements in an array in such a way that the various permutations are equally likely.

    A thorough description of the Fisher-Yates shuffle is beyond the scope of this article; refer to Techniques for Randomly Reordering an Array for more information. When preparing for this article I came across a code sample on StackOverflow by user LukeH that implements the Fisher-Yates shuffle as a query operator. Here is the code (which you can also find in the downloadable demo):

    public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source)
    {
       if (source == null)
          throw new ArgumentNullException("source");

       Random rng = new Random();
       T[] items = source.ToArray();

       for (int n = 0; n < items.Length; n++)
       {
          int k = rng.Next(n, items.Length);

          yield return items[k];

          items[k] = items[n];
       }
    }

    Another approach for getting the elements of a sequence in random order is to use the OrderBy standard query operator sorting on a new GUID. The code for this is much simpler, as it relies on an existing standard query operator:

    public static IEnumerable<T> Shuffle2<T>(this IEnumerable<T> source)
    {
       if (source == null)
          throw new ArgumentNullException("source");

       return source.OrderBy(elem => Guid.NewGuid());
    }

    Either one of these two methods will do the trick. The demo available for download includes both.

    To get one or more random elements from a sequence using either one of these shuffle query operators you would write like the following:

    // Get a single random element...
    var oneRandomElement = collection.Shuffle().First();

    // Get three random elements...
    var aSequenceOfThreeRandomElements = collection.Shuffle().Take(3);

    In fact, you could very easily create another query method that returns a single random element - you would create an extension method on IEnumerable<T> that would return an object of type T and could do so with one line of code: return collection.Shuffle().First();. (The demo available for download includes such a query operator named Random.)

    To see the Shuffle query operator in action, check out the Shuffle.aspx demo, which is part of the download at the end of this article. Much like the variance and standard deviation demo, this one prompts the user to enter a sentence. The words in the sentence are then read into an array and the Shuffle query operator is used to randomize their order. Finally, this randomized orderings is displayed.

    The words in the sentence entered by the user are shuffled and displayed.

    Conclusion


    LINQ offers a variety of standard query operators that can be used to compute aggregate values, retrieve a single element, retrieve a sequence of elements, or even generate groups from a sequence. While these standard query operators may seem mystically powerful, there is no magic going on. Query operators are simply extension methods on types that implement IEnumerable<T>. As we saw in this article, you can create your own query operators with just a dash of code and a sprinkle of imagination.

    Happy Programming!

  • By Scott Mitchell


  • Download the code associated with this article series
  • Further Readings:


  • An Introduction to LINQ
  • The Ins and Outs of Query Operators
  • The Standard Query Operators
  • Techniques for Randomly Reordering an Array
  • A Shuffle Query Operator


  • ASP.NET [1.x] [2.0] | ASPMessageboard.com | ASPFAQs.com | Advertise | Feedback | Author an Article