An Extensive Examination of LINQ: Extending LINQ - Adding Query OperatorsBy Scott Mitchell
|A Multipart Series on LINQ|
This article is one in a series of articles on LINQ, which was introduced with .NET version 3.5.
As discussed in earlier installments of this article series - most notably in An Introduction to LINQ and The Standard Query Operators - one of LINQ's primary components is its set of standard query operators. A query operator is a method that operates on a sequence of data and performs some task based on that data, are implemented as extension methods on types that implement the
IEnumerable<T>interface. Some of the standard query operators that we've explored throughout the articles in this series include:
OrderBy, among others.
While these standard query operators provide a great detail of functionality, there may be situations where they fall short. The good news is that it's quite easy to create
your own query operators. Underneath the covers query operators are just methods that extend types that implement
IEnumerable<T> and iterate over the
sequence performing some task, such as computing the total number of items in the sequence, computing the average, filtering the results, or ordering them. This article examines
how to extend LINQ's functionality by creating your own extension methods. Read on to learn more!
A Quick Primer on Query Methods
In a prior article, The Ins and Outs of Query Operators, we looked at the underlying functionality of query methods. (If you have not yet read that article I strongly suggest reading it before continuing on with this one.) Recall that, in a nutshell, query methods iterate over a sequence of data; specifically, a query operator iterates over a sequence that implements
IEnumerable<T>, which includes arrays, lists, stacks, queues, dictionaries, and ADO.NET-related classes like
DataRowCollection, among many other types.
Query operators can be classified as to what type of value they return. For instance, some standard query operators return a scalar value based on some aggregating function (such as determining the maximum value in the sequence), while others return a single element from the sequence or a new sequence altogether. All query operators can be classified into one of the following categories:
- Aggregate operators - aggregate operators return a scalar value based on some aggregating operation. For instance, the standard query operators
Averageare examples of aggregate operators because they return a scalar value (a number) based on an aggregate operation (summing or averaging some value in the underlying sequence)
- Single element operators - query operators that return precisely one element from the underlying sequence.
Singleare examples of standard query operators that return a single element.
- Sequence operators - sequence operators return a sequence. The returned sequence may be a subset of the underlying sequence, some modification of the
underlying sequence, or an entirely new one. The
Wherestandard query operator is an example of a sequence operator as it returns the elements in the underlying sequence that match a particular filter criteria.
- Grouping operators - returns a group of sequences from a single underlying sequence.
GroupByis an example of a grouping standard query operator.
Creating Standard Deviation and Variance Aggregate Query Operators
When writing a SQL query you can utilize a number of aggregate functions, such as
SUM. The standard query operators include similar aggregate operators; however, SQL offers a number of statistical aggregate functions not found in the standard query operators, including:
STDEVP- computes the standard deviation of all values in a specified expression, and
VARP- computes the variance of all values in a specified expression.
- Compute the average value of the sequence.
- For each element in the sequence determine the difference between the number and the average computed in step (1).
- Square each of the differences determined in step (2) and sum these numbers.
- Divide the result in step (3) by the number of elements in the sequence.
The following extension methods compute the variance and standard deviation for a sequence of decimal values. (I've included just the C# version of these query operators here in this article; download the code available at the end of this article to see the Visual Basic version of these query operators.)
As you can see, these two methods are implemented as extension methods on a type that implements the
IEnumerable<decimal> interface. (To compute
the standard deviation and variance on sequences of integers, doubles, and other numeric types you'd need to create additional
methods that applied to types of
IEnumerable<float>, and so on.) The
Variance method starts by
computing the average using the
Average standard query operator. Next, it enumerates the elements in the underlying sequence (
for each element, determines the square of the number less the average value. These numbers are summed into a variable named
runningSum, which is then
divided by the number of elements in the sequence. This is the variance.
StdDeviation method computes the variance of the sequence using the just-defined
Variance method and returns its square root.
With these query operators complete they can now be used on any sequences of decimals, much like how the aggregate standard query operators can be used. The demo includes
a demo that allows the user to type a comma-delimited list of numbers into a textbox. These numbers are split apart and fed into a list of decimal values named
StdDeviation query operators are then used to display these statistical metrics.
The code snippet below shows how the
StdDeviation query operators can be called like any other query operator on an object that
IEnumerable<decimal>. The screen shot below the code snippet shows the output when viewed through a browser.
Returning Elements of a Sequence in Random Order
Given a sequence of elements how would you return, say, three random elements? Or all of the elements but in a random order? There is no standard query operator for randomizing a sequence, so let's create one!
As noted in Techniques for Randomly Reordering an Array, it's very easy to write a shuffle algorithm that does not generate truly random shuffles. A naive implementation can easily overweight certain permutations. One fairly simple algorithm that avoids such pitfalls is the Fisher-Yates shuffle. the Fisher-Yates shuffle randomly reorders the elements in an array in such a way that the various permutations are equally likely.
A thorough description of the Fisher-Yates shuffle is beyond the scope of this article; refer to Techniques for Randomly Reordering an Array for more information. When preparing for this article I came across a code sample on StackOverflow by user LukeH that implements the Fisher-Yates shuffle as a query operator. Here is the code (which you can also find in the downloadable demo):
Another approach for getting the elements of a sequence in random order is to use the
OrderBy standard query operator sorting on a new
GUID. The code for this is much simpler, as it relies on an existing standard query operator:
Either one of these two methods will do the trick. The demo available for download includes both.
To get one or more random elements from a sequence using either one of these shuffle query operators you would write like the following:
In fact, you could very easily create another query method that returns a single random element - you would create an extension method on
would return an object of type
T and could do so with one line of code:
return collection.Shuffle().First();. (The demo available for
download includes such a query operator named
To see the
Shuffle query operator in action, check out the
Shuffle.aspx demo, which is part of the download at the end of this article.
Much like the variance and standard deviation demo, this one prompts the user to enter a sentence. The words in the sentence are then read into an array and the
Shuffle query operator is used to randomize their order. Finally, this randomized orderings is displayed.
LINQ offers a variety of standard query operators that can be used to compute aggregate values, retrieve a single element, retrieve a sequence of elements, or even generate groups from a sequence. While these standard query operators may seem mystically powerful, there is no magic going on. Query operators are simply extension methods on types that implement
IEnumerable<T>. As we saw in this article, you can create your own query operators with just a dash of code and a sprinkle of imagination.