An Extensive Examination of LINQ: An Introduction to LINQBy Scott Mitchell
|A Multipart Series on LINQ|
This article is one in a series of articles on LINQ, which was introduced with .NET version 3.5.
LINQ, or Language INtegrated Query, is set of classes added to the .NET Framework 3.5 along with language enhancements added to C# 3.0 and Visual Basic 9, the versions of the language that ship with Visual Studio 2008. LINQ adds a rich, standardized query syntax as a first-class citizen in .NET programming languages that allows developers to interact with any type of data.
Consider a typical data-driven application. There may be times when you are working with a database, displaying records or editing, inserting, and deleting data. Certain parts of the application may require retrieving certain elements from an XML file, or constructing an XML file based on user input. Or perhaps you have a collection of objects returned from a business object that you now want to work with by sorting them, computing the average value of a particular numeric property value, and displaying only those objects that meet a specified criteria. Prior to LINQ, working with each data source requires writing a different style of code. Moreover, working with external resources like data bases, XML files, and the like typically involves communicating with that external resource in some syntax specific to that resource. To retrieve data from a database you need to send it a string that contains the SQL query to execute; likewise, to work with a subset of XML elements in an XML document involves specifying an XPath expression in the form of a string. The idea is that using LINQ you can work with disparate data sources using a similar style without having to know a separate syntax for communicating with the data source (e.g., SQL or XPath) and without having to resort to passing opaque strings to external resources.
This article is the first in a series of articles that explores the goals of LINQ, its underpinnings, its syntax, and LINQ providers like LINQ to Objects, LINQ to XML, LINQ to SQL, and so forth. This inaugural article offers an overview of LINQ, looks at some simple examples of using the LINQ classes and syntax, and examines the core LINQ classes in the .NET Framework. Read on to learn more!
The Case for LINQ
Many applications use an external resource in some form or another, the most common one being a database. Because of the physical and logical separation between the runtime executing a program and the external resource, there is bound to be a number of extra steps that the developer working with the resource has to perform. What's more, the information passed to the external resource and the information received from the external resource usually must undergo some transformation. The extra work involved in communicating with an external resource is best seen by an example. Imagine that you were working on a data-driven web application and were in the midst of building the Data Access Layer (DAL), working on a routine that sent a query to the database, populated the results in a collection of business objects, and returned this collection. The code for this method might look like the following:
In order to send a query to the database we must first establish a connection to the database. We then must encode the logic - the SQL query, its parameters, and the
parameters' values - into strings that are supplied to the
SqlCommand object. And because these inputs are encoded into opaque data (strings, for instance),
there is no compile-time error checking and very limited debugging support. For example, if there's a typo in the
SELECT query causing the
table name to be misspelled, this typo won't propagate until runtime when this page is visited. (And typos are easy to make seeing as there's no IntelliSense support.)
Ideally, Visual Studio would display an error message alerting us to this incorrect table name when building the application. Another mismatch between the programming language
and the database is that the data returned by the database is transformed for us into objects accessible through the
SqlDataReader, but these objects are
not strongly-typed objects like we'd like. To get this data into strongly-typed objects we must write code ourselves that enumerates the database results and populates each
record into a corresponding object.
LINQ was designed to address the issues illustrated by the example above. LINQ aims to offer a unified syntax for working with data, be it data from a database, an XML file,
or a collection of objects. With LINQ you don't need to know the intricacies of SQL, the ins and outs of XPath, or various ways to work with a collection of objects. All you
need be familiar with is LINQ's classes and the associated language enhancements centered around LINQ. This leads into another design goal of LINQ: to add first-class constructs
to C# and Visual Basic that allows for SQL
SELECT-like syntax for querying any data source. In other words, LINQ aims to move the
out of an opaque string and into keywords in the language, a move that allows for type safety, IntelliSense support, compile-time error checking, and enhanced debugging scenarios.
How Is LINQ Implemented?
LINQ was introduced in the .NET Framework 3.5 (the Visual Studio 2008 cycle) and is composed of three main components:
- Standard Query Operators - a set of extension methods in the .NET Framework that can be
used to work with any collection of objects that implements the
IEnumerable<T>interface. A class that implements the
IEnumerable<T>interface must provide an enumerator for iterating over a collection of a specific type (
T). All arrays inherently implement
IEnumerable<T>, as do most of the built-in collection objects like
Dictionary<K,T>, and so forth. Using these operators you can: filter the results; perform aggregate operations like sum, min, max, and average; join two collections based on matching keys; order the results; group the results; determine the total number of elements in the collection; and so forth.
Specifically, these extension methods are defined in the
System.Core.dllassembly in the
System.Linqnamespace. We'll look at an example of using the standard query operators later on in this article.
- Language Extensions - to make the standard query operators easier to use, and to offer a more SQL-like syntax in C# and Visual Basic, Microsoft added a number of
new extensions to C# 3.0 and Visual Basic 9. These extensions include implicitly typed variables, anonymous types, object initializers, and lambda expressions; each extension
will be explored in detail in future installments. What's important to understand is that these extensions are simply syntactic sugar. Behind the scenes, the
compiler converts the syntax made possible by these enhancements into calls to the standard query operators. We'll look at an example of using the language extensions
later on in this article.
- LINQ Providers - it is possible to create a class known as a LINQ Provider that takes a LINQ query, examines it, and dynamically generates a method that executes
an equivalent query against a specific data source. The .NET Framework ships with four LINQ Providers: LINQ to Objects, which executes a LINQ query against a collection
of objects; LINQ to XML, for querying XML documents; LINQ to SQL, which allows LINQ queries to operate against a Microsoft SQL Server database; and LINQ to DataSets, which
execute LINQ queries against ADO.NET DataSets.
In addition to these three providers there are other LINQ Providers available. Microsoft has created a LINQ Provider that operates against its Entity Framework, for example, as well as one to operate against the ADO.NET Data Services. And many open-source projects and third-party companies that offer some sort of data store or middle-tier library for working with data have a LINQ Provider so that LINQ queries can be executed against their data store or against their middle tier implementation. For example, there's a LINQ Provider for NHibernate, an open-source Object/Relational Mapping (O/RM) tool.
Using the Standard Query Operators Against a Collection of Objects
The standard query operators are implemented as a number of extension methods on the
IEnumerable<T>interface. That means that if we have an object that implements
IEnumerable<T>at our disposal we can use the variety of standard query operators to work with that collection. As noted earlier, all arrays in .NET implement
IEnumerable<T>. Therefore, let's take an array and practice using these standard query operators.
Let's start with a simple example. (All of the LINQ examples examined in this tutorial are provided in both C# and VB code and are available for download at the end of
this article.) The following code creates an array that contains the first nine Fibonacci numbers.
Two of the standard query operators are then used:
Average(), which return the number of elements in the collection and the average value
of the elements in the collection, respectively. These values are then displayed in a Label.
The page, when visited through a browser, displays the output: "The first 9 elements of Fibonacci sequence have an average value of 9.78!"
As evidenced by the example above, the
Average() standard query operators do not require any input parameters. Other operators, such
Where operator, require an input; in the case of the
Where operator you must supply information as to how the data is to be filtered. But
what sort of input parameter would a
Where operator require? Imagine if you were writing a function that was supposed to filter a collection of objects based
on a filtering condition supplied through an input parameter to the function. How would you write that function given that you don't even know what type of collection of
objects you are going to be filtering in the first place!?
In short, the
Where operator must accept a function as input. The
Where operator will then call the passed-in function, passing it each
element in the collection of objects, and asking that function, "Should this item be filtered out of the collection?" Similarly, many other standard query operators require
that a function be passed in as an input parameter. For example, the
OrderBy operator must be passed a function that indicates what field each object in the
collection is to be sorted by. It's a complex concept to wrap your head around at first.
.NET has long allowed developers to reference functions as a variable of sorts and to pass that variable to methods. Doing so involves creating the function as you normally
would and then creating a delegate that references the function. This delegate can then be passed around and used to invoke the function it points to. Some of the
language extensions added in C# 3.0 and Visual Basic 9 were added specifically to make it possible to tersely create a function so that it can be called or passed into another
method in just one line of code. (As noted earlier, we'll explore these language extensions in greater detail in future installments.) You can see this new syntax in action
in the following example, which uses the
Where operator to filter the first nine Fibonacci numbers to compute the average of only the odd numbers.
The output of the code, when viewed through a browser, is: "Of the first 9 elements of Fibonacci sequence the odd numbers have an average value of 7.33!"
Most of the code is the same as the first example, but on the line where the average is computed I've added a call to the
Where operator. The
Where operator expects a function that takes as input a type of the collection being filtered and returns a Boolean, indicating whether the element
belongs in the return set. In short, the function passed into the
Where operator returns true if the number being enumerated MOD 2 equals 1. The operation
X MOD Y returns the remainder of X / Y, so X MOD 2 returns 0 if X is even and 1 if X is odd.
Also note how the standard query operators can be stringed along, one after the other. I have:
fibNum.Where(function).Average(), which first applies the
Where operator to the
fibNum array and then takes the resulting filtered set and applies the
Average operator to that.
Using the Language Extensions to Write More SQL-Like Queries
In addition to the standard query operators, Microsoft added a host of language extensions to C# and Visual Basic to allow for a more SQL-like syntax in working with LINQ. We'll explore this new syntax in a future article. For now I want to just show the syntax so you can see and appreciate how LINQ does truly offer SQL-like syntax in C# and Visual Basic. The following example is the same as the last one - it takes the first nine Fibonacci numbers and computes the average of the odd numbers - but does so using the language extensions rather than the extension methods.
These language extensions offer full IntelliSense support, compile-time type and syntax checking, and debugging capabilities. As you can probably surmise by inspecting the
From statement enumerates a specified collection of objects (
fibNum, in this case) and can have other operators applied to it, such as
This article is the first in a series of articles on LINQ, its syntax, its uses, and LINQ Providers, like LINQ to XML and LINQ to SQL. In this installment we looked at the motivation for LINQ and its key design goals. We then talked about the three main cornerstones of LINQ - the standard query operators, the language extensions, and LINQ providers. This was followed by a look at some simple examples using the standard query operators and language extensions.
We've only begun to scratch at the surface of LINQ, and I'm sure that this article has raised many more questions than it has answered. In the upcoming tutorials we will explore the C# and Visual Basic language enhancements that make LINQ possible: extension methods, anonymous types, lambda expressions, and so forth. Once we have a solid understanding of these concepts we'll be ready to delve into how to use LINQ to query databases, XML files, and other data stores.
|A Multipart Series on LINQ|
This article is one in a series of articles on LINQ, which was introduced with .NET version 3.5.