By Scott Mitchell
Over the past several years the popularity of website syndication through XML syndication standards like RSS and ATOM has flourished. Originally, syndication feeds were the domain of technical sites (like 4Guys, which has an RSS feed of the most recent articles) and blogs. However, more and more mainstream sites are picking up on the trend, helping information consumers stay up to date with a website's content.
Typically sites that offer a syndication feed provide a little orange button named XML or RSS that, when clicked, either takes the user to a complete listing of the site's syndication feeds, or directly to the syndicated content. (A sample of the orange XML button can be seen on the right.) Since this syndicated content is simply XML data, the user's browser will display the XML content. This can be confusing to users - they click on the button and are taken to a page with seemingly cryptic content. This can be particularly confusing to computer novices, and can even throw more savvy computer users for a loop. (There have been multiple occassions where a 4Guys reader emails me thinking he found a site bug - clicking on the orange XML button displayed the site's content in XML instead of HTML.)
Fortunately this confusion can be alleviated through the use of an XSL stylesheet. With an XSL stylesheet you provide a file that translates the syndication XML into browser-friendly HTML markup. This transformation happens on the client-side, in the user's browser. When viewing a site's syndication feed, then, a user will see a visually-pleasing result free from cryptic XML. However, the underlying XML data still remains, meaning that the site feed still works seamlessly with aggregators.
I recently upgraded the 4Guys syndication feed to provide a more aesthetically-pleasing output when viewed through a browser. Read on to learn how I accomplished and how you can turn your site's feeds' markup into a more browser-friendly output.
A Brief Primer on Site Syndication
Many sites today syndicate their content, which can be integrated into other websites, or can be viewed by individuals via an assortment of desktop applications. For example, CNet's News.com site can be syndicated, meaning you can add the News.com's latest headlines to your website or can be notified through an aggregator program when News.com publishes new content. A plethora of other websites, especially blogs, provide this syndication feature.
The syndicated content is provided as XML typically using one of two standard syndication formats:
- RSS - RSS was the first attempt at site syndication and remains the most popular syndication format in use due to its simplicity. For more information on RSS be sure to read: What is RSS?. A description of the RSS spec can be found at http://blogs.law.harvard.edu/tech/rss.
- ATOM - ATOM is the proposed successor to RSS, and offers more capabilties than mere syndication. ATOM, however, is used less frequently than RSS for syndicating content. You can learn more about ATOM at AtomEnabled.org.
A Brief Primer on XSL Stylesheets
The syndication standards (RSS and ATOM) are simply specifications on how to format data into an XML format. That is, both RSS and ATOM syndication feeds use XML to express their data. (If you are unfamiliar with what XML is, be sure to check out the XML Tutorials on XMLFiles.com.) The challenge that faces us is that a web browsers are built to render HTML, not arbitrary XML content. If you visit a syndication feed directly through your browser you will see this XML content, unformatted, as shown in the screenshot below.
Clearly this unformatted XML is an eye-sore and potentially very confusing to a user who is unfamiliar with syndication feeds, XML content, and so on. What we need to be able to do, then, is transform the RSS or ATOM XML content into an HTML form that the browser knows how to display. XSL is a technology that translates XML from an original form into another form. Hence we can use XSL to transform the RSS or ATOM XML into HTML.
XSL transformations can occur in one of two places - on the web server or on the client (thereby having the transformation performed by the visitor's browser). In this case we want the transformation to happen on the client because we want to be able to provide just one version of the site syndication feed. If we insisted on doing the transformation on the server, we'd either have to have two copies of the site's feed - one in the native XML and one in the transformed HTML - or we'd have to be able to detect whether the request for the XML content was coming from a browser or from an aggergator. (Remember, the aggregator software wants the raw XML; furthermore, many browsers serve as aggregators themselves, so this check cannot likely be performed reliably.) Hence, we need to be able to provide just one raw XML feed with instructions on how to translate it to HTML in case it's being visited by a browser.
This invovles two steps:
- Creation of an XSL stylesheet that translates the raw XML site feed into HTML, and
- A means to associate that XSL stylesheet with the raw XML site feed.
This article presents an XSL stylesheet for transforming RSS version 2.0 markup into HTML. It does not however, provide a thorough discussion on XSL's syntax or semantics. For more information on XSL be sure to check out the following resources:
- Using XSL Stylesheets to Translate XML into HTML
- What is XSLT and how does it relate to XML?
- XSL Tutorials on XMLFiles.com
When working with XSL keep in mind that you're simply transforming existing XML data into some other form. When turning RSS XML into HTML we need to know what portions of the XML document we want to work with and where these portions should be squirted into the resulting HTML output. It is imperative, then, that you be familiar with the XML structure you're transforming; if you're new to RSS take a moment to review the RSS 2.0 specification at http://blogs.law.harvard.edu/tech/rss.
The stylesheet I created has two general portions: a head, which lists the title of the RSS feed along with some instructions/information
for the user; and a details section, which lists each of the
<item> element details. Let's examine each
of these portions of the XSL document separately, starting with the header portion:
Note that the XSL stylesheet starts with an
<xsl:stylesheet> element. Inside of that root element
<xsl:template> elements, which attempt to match nodes from the original XML data. For each match,
the content in the
<xsl:template> element is emitted. There's only one
in this XSL stylesheet, and it is set to match against the original XML content's root element (
this match is made (and it will be made exactly once, since all well-formed XML documents have exactly one root element),
the appropriate HTML markup is emitted. Note that we can grab the value of a particular XML element from the original
XML document using the
<xsl:value-of> element. For example, in the HTML
tag the RSS feed's title is displayed using
<xsl:value-of select="/rss/channel/title" />.
The three parts worth noting in this header section are: the
<script> tag in the
id="cometestme" at the start of the
the call to the
onload event handler in the
The client-side script, which you can view at http://aspnet.4guysfromrolla.com/rss/disableOutputEscaping.js,
was written by Sean M. Burke and provides the disable-output-escaping feature for browsers that don't support it. (We'll
talk more about disable-output-escaping in a bit.) The
id="cometestme" is used
by the script to test whether or not the browser supports disable-output-escaping. The call to
on page load kicks off the disable-output-escaping script.
|Remember: XSL Translates XML From One Form to Another|
|When tinkering with the XSL stylesheet don't forget that XSL is designed to translate a valid, well-formed XML document into another valid, well-formed XML document. Therefore, the HTML markup within the XSL stylesheet must be well-formed according to XML's formatting rules. That is, tags must be nested properly, the tag names are case sensitive, attribute values must be quoted, and so on.|
Let's now look at the details section of the XSL stylesheet:
The body section uses a
<xsl:for-each> element to loop through each of the
in the original XML document. For each
<item> element the title is displayed in an
and the description in a
<div> beneath it.
Note that the description element, emitted using
<xsl:value-of select="description" disable-output-escaping="yes" />, has disable-output-escaping enabled.
What disable-output-escaping does is prevent the browser from HTML encoding the XML content it receives. That is, imagine
that our description has the content: "This is a <b>great article</b>." By default, the browser
will display that as so: "This is a <b>great article</b>." However, we want to see: "This is a
great article." By setting
disable-output-escaping="yes", we instruct the browser's XSL parsing engine
to not escape the output, thereby displaying "This is a great article." One issue with this, however, is that
now all browser support disable-output-escaping. That is where the script we discussed earlier comes in. The
client-side script at
http://aspnet.4guysfromrolla.com/rss/disableOutputEscaping.js first grabes the
id="cometestme" to see if the browser supports disable-output-escaping.
If the browser does not support disable-output-escaping, then the script searches for HTML
name="decodeable" and makes the necessary changes. (That's why the
that contains the
<xsl:value-of select="description" disable-output-escaping="yes" /> has
so that browsers that don't support disable-output-escaping will still escape the description.
Wiring Up the XSL Stylesheet to the Site's Feed
The final step is to associate the XSL stylesheet we just created with the website's syndication feed. This requires just adding a single line to the top of the syndication feed's emitted markup:
With that, when visiting the site's feed through a browser the browser will see the stylesheet reference, pull down the XSL stylesheet, and make the transformation, displaying the resulting HTML markup. Aggregators, on the other hand, will ignore the XSL stylesheet and just work with the site's feed's XML data. The screenshot below shows the 4Guys site feed using the the XSL stylesheet we just examined when viewed through a browser.
You can view everything I did in this article directly through your browser. Want to see the full XML for the 4Guys site
feed? Visit http://aspnet.4guysfromrolla.com/rss/rss.aspx and
then do a View/Source. You can find the XSL stylesheet online at http://aspnet.4guysfromrolla.com/rss/rss2html.xsl.
In this article we saw how to display an ugly, cryptic syndication feed in a browser in a more vibrant and attractive light. This was accomplished by using an XSL stylesheet that was written to correctly transform an RSS 2.0 feed into HTML. (You can, of course, tweak the stylesheet to adjust the appearance of your site's HTML output; also, if you use a different syndication format you'll likely need to change the XSL stylesheet, matching on the appropriate nodes.) With just a little bit of work you can make your site's syndication feeds less confusing to users who may accidentally stumble upon them.