An Extensive Examination of LINQ: Querying and Searching XML Documents Using LINQ to XMLBy Scott Mitchell
XML is an increasingly popular way to encode documents, data, and electronic messages. Over the years Microsoft has offered a variety of libraries to facilitate creating, modifying, querying, and searching XML documents. LINQ to XML is a relatively new set of XML-related classes in the .NET Framework (found in the
System.Xml.Linqnamespace), which enable developers to work with XML documents using LINQ's features, syntax, and semantics. As discussed in an earlier article, Introducing LINQ to XML, LINQ to XML is a simpler and easier to use API than previous libraries. Because LINQ to XML can utilize LINQ's query syntax and assortment of standard query operators, LINQ to XML code is usually very terse and readable.
This article continues our look at LINQ to XML. Specifically, we explore how to query XML documents using axis methods as well as how to search and filter XML documents
using both LINQ's
Where method and XPath expressions. Read on to learn more!
Note: If you have not yet read Introducing LINQ to XML please do so before reading this article...
Retrieving Child Elements
As discussed in Introducing LINQ to XML, the most frequently used class in the LINQ to XML API is the
XElementclass, which represents an XML element. It's
Loadmethod loads an XML document form disk or over the Internet and returns the root of the just-loaded document. The most frequently used class in the LINQ to XML API is the
XElementclass, which represents an XML element. This class is used when programmatically constructing an XML document, when loading an XML document, and when searching, filtering, or otherwise enumerating the elements within an XML document. The
Valueproperty returns concatenated text contents of the element and the text content of its descendants.
When working with an XML document we are often interested in a particular element or attribute value or a particular subset of elements and attribute values. The
XElement object has a number of helpful methods that we can use to retrieve such data. Let's start by looking at two of the most commonly used methods,
Elements method returns all of the child elements of the current element. You can optionally pass in an element name and then only those children
element with a matching name are returned. The
Element method requires a name as an input parameter and then returns the first child element with that name.
Element methods - along with a number of other methods we'll be examining in this article - are referred to as axis methods
and operate relative to the current node. To hammer home this point, let's look at an example. For this example and others in this article I will be using an XML
NutritionInfo.xml. This XML file can be found in the
in the demo available for download at the end of this article.
NutritionInfo.xml document contains nutritional information about a variety of food items. Here is a snippet of this XML document:
<nutrition> element is the root element and contains a single child element named
<daily-values>, which spells out the
recommended daily allotments for the various nutritional metrics provided by each
<food> item. Note that there is only one
element. Following this sole
<daily-values> element there are a number of
<food> elements that spell out the nutritional information
for a number of food items. The snippet above shows a single
<food> element describing the nutritional information for Avocado Dip.
Now, imagine that we wanted to retrieve the name of the first food item in the XML document. To accomplish this we'd need to start by loading the XML document.
Recall that the
Load method returns the root of the document as an
XElement object (in this example,
Now that we have a reference to the root we can get the first
<food> element using the following syntax:
This syntax says, in English, "Get me the root's first child element named
<food>." (If there are no
<food> child elements then
root.Element("food") will return
null.) Once we have the first
<food> element we can get its
element using the same syntax:
Note that to get the
<name> element we call the
Element method. Had we
root.Element("name") we'd get back a null value because the root element does not have any
<name> children elements (it
<food> child elements).
Now that we have the
<name> element (of the first
<food> element) we can get its text value ("Avocado Dip", in this example) by using
Another important class in the LINQ to XML API is the
XAttributeclass, which represents an XML attribute. The
XElementclass has two methods that return
Attribute(attributeName)- returns an
XAttributeobject for a specific attribute, and
Attributes- two overloads; the first accepts no input parameters and returns all attributes of the
XElement; the second overload accepts an attribute name and returns a collection of attributes of the
XElementwith a matching name.
XAttributeclass has a
Valueproperty, which returns the value of the attribute.
Let's look at using the
Attribute method to retrieve calorie information for the first food item (Avocado Dip). The
calorie information using a
<calories> element with two attributes - total and fat - which return the total calories and the calories from fat, respectively.
To retrieve these values programmatically we could use the following code:
This syntax, I think, it pretty readable. For example, to get the total calories for the first food item we say, "Hey, root, give me your first
and then, from that, give me the first
<calories> element and then from that get the
total attribute and then give me its value. In the
case of Avocado Dip, this returns a value of "110".
While the above syntax is quite terse and readable, it does make a number of presumptions - namely that there will be at least one
<food> child item
from the root and that that
<food> item will have a
<calories> child and that the
<calories> element will have a
total attribute specified. If any of these elements or attributes are missing the above code will throw a
NullReferenceException because if no
match is found the
Attribute methods return
null. To more safely query the XML document you would need to get the pieces
one at a time and ensure that a
null value was not returned; the code in the demo available for download has a sample of this more careful syntax.
Returning Descendant and Ancestor Elements
Elementsmethods only search the set of children elements. For XML document specifying a hierarchical structure, such as the XML format of the
Web.sitemapfile, there may be elements with the same name buried at arbitrary depths. To search across all descendants for the current node (and not just children) use the
Descendantsmethod has two overrides. The first accepts no input parameters and returns all descendant nodes. The second accepts a name and returns only those descendants whose name matches.
The following snippet of code shows how to use the
Descendants method to determine how many
<siteMapNode> elements exist in the
Web.sitemap file. (If you are unfamiliar with the
Web.sitemap file it is an XML-formatted file that developers can create to define a logical
structure to their site. Once defined, navigation web controls like the Menu or TreeView can be used to display this site structure. The
is composed of an arbitrary number of
<siteMapNode> elements, where each
<siteMapNode> element represents a section on the site.
These elements can be (and often are) nested. See Examining ASP.NET's Site Navigation for more
information on this file and ASP.NET's site map functionality.)
The code here is a little bit more involved than previous examples because the
Web.sitemap uses XML namespaces. If you examine the
you'll find that its root element (
<siteMap>) defines a namespace named "http://schemas.microsoft.com/AspNet/SiteMap-File-1.0":
Querying an XML document that uses namespaces requires that the namespaces be included in the querying syntax. This is accomplished by creating an
object that specifies the namespace name and then including it as part of the name in the
XElement's methods. In the above example this is accomplished by
XNamespace object named
siteMapNS and then including it when calling the
root.Descendants(siteMapNS + "siteMapNode").
Along with the
Descendants method, the XElement also offers an
This method is the inverse of
Descendants - rather than returning the nodes (or matching nodes) beneath the element it returns the parent node, the grandparent
node, and so forth, all the way up to the root. See the demo available for download for a demo using the
Searching / Filtering an XML Document
Because the LINQ to XML API gives us full access to LINQ's standard query operators, searching or filtering an XML document is very straightforward. As discussed in previous installments of this article series, the
Whereextension method can operate on an enumeration and filter certain elements out of that enumeration using lambda expressions.
For example, use the following
Where clause to retrieve only those food items with less than 300 total calories:
The code here says, in English, "Give me all
<food> child elements off the root and then only return those whose
total attribute has a value less than 300." Bear in that in the lambda expression in the
Where method we are dealing with
in other words, each
f here is an
XElement that represents a particular
<food> element in the XML document. Consequently, to
retrieve the calorie information for each
<food> element we use
f.Element("calories") to get a reference to the
element and then
Attribute("total").Value to get the value of the
total attribute. The
returns a string, so we need to convert this string into a decimal value in order to compare it to a numeric value, in this case 300.
In addition to searching and filtering XML documents using the LINQ standard query operators you can use XPath expressions.
XPath is a standardized syntax for filtering XML documents. To filter documents using XPath expressions use the
XPathSelectElements method, which is an extension method defined in the
System.Xml.XPath namespace. The following example uses an XPath expression to return only those food items with less than 300 calories:
Personally, I prefer using LINQ's standard query operators. Using the standard query operators and lambda expressions you get IntelliSense and compile-time checking. Moreover, the same standard query operators can be used with LINQ to Objects, LINQ to SQL, LINQ to Entity Framework, or any other LINQ providers. XPath expressions, on the other hand, are an opaque string. There is no compile-time checking - you need to actually execute the code to see if the XPath expression is valid and returns the expected results. And XPath's syntax is specific to XML.
Check out the demo for more searching and filtering code examples. The demo includes a web page that allows the user to search for food items that meet a variety of criteria, including upper bounds for the calories, grams of fat, and milligrams of sodium, as well as the presence of certain vitamins or minerals. The screen shot below shows this page from the demo in action and includes code showing how to filter using the standard query operators and using XPath expressions.
At this point we have examined how to create, query, and filter XML documents using the LINQ to XML API. In a future installment we'll see how to edit existing XML documents by modifying existing values and by adding and removing XML elements.
Until then... Happy Programming!