Published: Wednesday, October 05, 2005
Displaying Browser-Friendly RSS Feeds
By Scott Mitchell
Introduction
Over the past several years the popularity of website syndication through XML syndication standards like RSS and ATOM
has flourished. Originally, syndication feeds were the domain of technical sites (like 4Guys, which has an
RSS feed of the most recent articles) and blogs. However,
more and more mainstream sites are picking up on the trend, helping information consumers stay up to date with a website's
content.
Typically sites that offer a syndication feed provide a little orange button named XML or RSS that, when clicked, either takes
the user to a complete listing of the site's syndication feeds, or directly to the syndicated content. (A sample of the
orange XML button can be seen on the right.) Since this syndicated
content is simply XML data, the user's browser will display the XML content. This can be confusing to users - they click on
the button and are taken to a page with seemingly cryptic content. This can be particularly confusing to computer novices,
and can even throw more savvy computer users for a loop. (There have been multiple occassions where a 4Guys reader emails me
thinking he found a site bug - clicking on the orange XML button displayed the site's content in XML instead of HTML.)
Fortunately this confusion can be alleviated through the use of an XSL stylesheet. With an XSL stylesheet you provide
a file that translates the syndication XML into browser-friendly HTML markup. This transformation happens on the client-side,
in the user's browser. When viewing a site's syndication feed, then, a user will see a visually-pleasing result free from cryptic
XML. However, the underlying XML data still remains, meaning that the site feed still works seamlessly with aggregators.
I recently upgraded the 4Guys syndication feed to provide a more
aesthetically-pleasing output when viewed through a browser. Read on to learn how I accomplished and how you can turn your
site's feeds' markup into a more browser-friendly output.
A Brief Primer on Site Syndication
Many sites today syndicate their content, which can be integrated into other websites, or can be viewed by
individuals via an assortment of desktop applications. For example, CNet's News.com site
can be syndicated, meaning you can add the News.com's
latest headlines to your website or can be notified through an aggregator program when News.com publishes new content.
A plethora of other websites, especially blogs, provide this syndication feature.
The syndicated content is provided as XML typically using one of two standard syndication formats:
- RSS - RSS was the first attempt at site syndication and remains the most popular syndication format in use due to
its simplicity. For more information on RSS be sure to read: What
is RSS?. A description of the RSS spec can be found at
http://blogs.law.harvard.edu/tech/rss.
- ATOM - ATOM is the proposed successor to RSS, and offers more capabilties than mere syndication. ATOM, however, is
used less frequently than RSS for syndicating content. You can learn more about ATOM at AtomEnabled.org.
This article doesn't intend to serve as an in-depth look at what syndication is or how to syndicate your website's content.
If you need more information on these topics, I'd encourage you to read
Syndicating Your Web Site's Content with RSS
and
Syndicating Your Web Site's Content with RSS and ASP.NET.
There are also articles here on 4Guys on how to programmatically consume an RSS feed and display it in an ASP.NET page - see
A Custom ASP.NET Server Control for Displaying RSS Feeds.
A Brief Primer on XSL Stylesheets
The syndication standards (RSS and ATOM) are simply specifications on how to format data into an XML format. That is,
both RSS and ATOM syndication feeds use XML to express their data. (If you are unfamiliar with what XML is, be sure
to check out the XML Tutorials on XMLFiles.com.) The challenge that faces us
is that a web browsers are built to render HTML, not arbitrary XML content. If you visit a syndication feed directly
through your browser you will see this XML content, unformatted, as shown in the screenshot below.
Clearly this unformatted XML is an eye-sore and potentially very confusing to a user who is unfamiliar with syndication feeds,
XML content, and so on. What we need to be able to do, then, is transform the RSS or ATOM XML content into an HTML form
that the browser knows how to display. XSL is a technology that translates XML from an original form into another form. Hence
we can use XSL to transform the RSS or ATOM XML into HTML.
XSL transformations can occur in one of two places - on the web server or on the client (thereby having the transformation performed
by the visitor's browser). In this case we want the transformation to happen on the client because we want to be able to provide
just one version of the site syndication feed. If we insisted on doing the transformation on the server, we'd either have to
have two copies of the site's feed - one in the native XML and one in the transformed HTML - or we'd have to be able to detect
whether the request for the XML content was coming from a browser or from an aggergator. (Remember, the aggregator software
wants the raw XML; furthermore, many browsers serve as aggregators themselves, so this check cannot likely be performed reliably.)
Hence, we need to be able to provide just one raw XML feed with instructions on how to translate it to HTML in case it's being
visited by a browser.
This invovles two steps:
- Creation of an XSL stylesheet that translates the raw XML site feed into HTML, and
- A means to associate that XSL stylesheet with the raw XML site feed.
The first task will require us to create a new file whose content is the XSL stylesheet that creates the appropriate HTML markup
based on the feed's XML data. This will take a bit of work and need to be customized based on what syndication format you
use (my example is a stylesheet for RSS version 2.0). The second task, though, is simple - all you have to do is add a
single line to the syndication feed that points to the XSL stylesheet. When a web browser visits the syndication feed it
will see this line and apply the XSL stylesheet, displaying the HTML-formatted content. Aggregators, however, will ignore this stylesheet, and simply parse
the raw XML data.
This article presents an XSL stylesheet for transforming RSS version 2.0 markup into HTML. It does not however, provide
a thorough discussion on XSL's syntax or semantics. For more information on XSL be sure to check out the following resources:
Creating an XSL Stylesheet for Translating RSS 2.0 into HTML
When working with XSL keep in mind that you're simply transforming existing XML data into some other form. When turning
RSS XML into HTML we need to know what portions of the XML document we want to work with and where these portions should be
squirted into the resulting HTML output. It is imperative, then, that you be familiar with the XML structure you're transforming;
if you're new to RSS take a moment to review the RSS 2.0 specification at
http://blogs.law.harvard.edu/tech/rss.
The stylesheet I created has two general portions: a head, which lists the title of the RSS feed along with some instructions/information
for the user; and a details section, which lists each of the <item> element details. Let's examine each
of these portions of the XSL document separately, starting with the header portion:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<head>
<title><xsl:value-of select="/rss/channel/title" /></title>
<style type="text/css">
... <i>CSS styles</i> ...
</style>
<script type="text/javascript"
src="http://aspnet.4guysfromrolla.com/rss/disableOutputEscaping.js" />
</head>
<body onload="go_decoding();">
<div id="cometestme" style="display:none;">
<xsl:text disable-output-escaping="yes">&</xsl:text>
</div>
<div align="center">
<div id="headerInfo">
<h1><xsl:value-of select="/rss/channel/title" /></h1>
<p align="center" id="headerText">This page is the syndication feed for
<b><xsl:value-of select="/rss/channel/title" /></b>. You can subscribe
to this feed using an aggregator program and be kept abreast of the
latest 4GuysFromRolla.com articles. ...</p>
</div>
</div>
... Details section removed ... will be examined shortly! ...
</body>
</html>
</xsl:template>
</xsl:stylesheet>
|
Note that the XSL stylesheet starts with an <xsl:stylesheet> element. Inside of that root element
are <xsl:template> elements, which attempt to match nodes from the original XML data. For each match,
the content in the <xsl:template> element is emitted. There's only one <xsl:template> element
in this XSL stylesheet, and it is set to match against the original XML content's root element (/). When
this match is made (and it will be made exactly once, since all well-formed XML documents have exactly one root element),
the appropriate HTML markup is emitted. Note that we can grab the value of a particular XML element from the original
XML document using the <xsl:value-of> element. For example, in the HTML <title>
tag the RSS feed's title is displayed using <xsl:value-of select="/rss/channel/title" />.
The three parts worth noting in this header section are: the <script> tag in the <head>
section; the <div> with id="cometestme" at the start of the <body>; and
the call to the go_decoding() JavaScript function in the onload event handler in the
<body> tag.
The client-side script, which you can view at http://aspnet.4guysfromrolla.com/rss/disableOutputEscaping.js,
was written by Sean M. Burke and provides the disable-output-escaping feature for browsers that don't support it. (We'll
talk more about disable-output-escaping in a bit.) The <div> with id="cometestme" is used
by the script to test whether or not the browser supports disable-output-escaping. The call to go_decoding()
on page load kicks off the disable-output-escaping script.
| Remember: XSL Translates XML From One Form to Another |
|
When tinkering with the XSL stylesheet don't forget that XSL is designed to translate a valid, well-formed XML document
into another valid, well-formed XML document. Therefore, the HTML markup within the XSL stylesheet must be well-formed
according to XML's formatting rules. That is, tags must be nested properly, the tag names are case sensitive, attribute
values must be quoted, and so on.
|
Let's now look at the details section of the XSL stylesheet:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
...
<body onload="go_decoding();">
... Header section removed ...
<div align="center">
<div id="feedItems">
<xsl:for-each select="/rss/channel/item">
<div class="rssItem">
<h2 class="rssTitle">
<a href="{link}"><xsl:value-of select="title" /></a>
</h2>
<div name="decodeable" class="rssDescription">
<xsl:value-of select="description" disable-output-escaping="yes" />
</div>
</div>
</xsl:for-each>
</div>
</div>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
|
The body section uses a <xsl:for-each> element to loop through each of the <item> elements
in the original XML document. For each <item> element the title is displayed in an <h2>
and the description in a <div> beneath it.
Note that the description element, emitted using
<xsl:value-of select="description" disable-output-escaping="yes" />, has disable-output-escaping enabled.
What disable-output-escaping does is prevent the browser from HTML encoding the XML content it receives. That is, imagine
that our description has the content: "This is a <b>great article</b>." By default, the browser
will display that as so: "This is a <b>great article</b>." However, we want to see: "This is a
great article." By setting disable-output-escaping="yes", we instruct the browser's XSL parsing engine
to not escape the output, thereby displaying "This is a great article." One issue with this, however, is that
now all browser support disable-output-escaping. That is where the script we discussed earlier comes in. The
client-side script at http://aspnet.4guysfromrolla.com/rss/disableOutputEscaping.js first grabes the
<div> with id="cometestme" to see if the browser supports disable-output-escaping.
If the browser does not support disable-output-escaping, then the script searches for HTML
elements with name="decodeable" and makes the necessary changes. (That's why the <div>
that contains the <xsl:value-of select="description" disable-output-escaping="yes" /> has name="decodeable" -
so that browsers that don't support disable-output-escaping will still escape the description.
Wiring Up the XSL Stylesheet to the Site's Feed
The final step is to associate the XSL stylesheet we just created with the website's syndication feed. This requires just
adding a single line to the top of the syndication feed's emitted markup:
<?xml-stylesheet type="text/xsl" href="urlToXSLstyleSheet" version="1.0"?>
|
With that, when visiting the site's feed through a browser the browser will see the stylesheet reference, pull down the
XSL stylesheet, and make the transformation, displaying the resulting HTML markup. Aggregators, on the other hand, will ignore
the XSL stylesheet and just work with the site's feed's XML data. The screenshot below shows the 4Guys site feed using the
the XSL stylesheet we just examined when viewed through a browser.
You can view everything I did in this article directly through your browser. Want to see the full XML for the 4Guys site
feed? Visit http://aspnet.4guysfromrolla.com/rss/rss.aspx and
then do a View/Source. You can find the XSL stylesheet online at http://aspnet.4guysfromrolla.com/rss/rss2html.xsl.
Finally, the disableOutputEscaping.js JavaScript file can be grabbed at
http://aspnet.4guysfromrolla.com/rss/disableOutputEscaping.js.
Conclusion
In this article we saw how to display an ugly, cryptic syndication feed in a browser in a more vibrant and attractive light.
This was accomplished by using an XSL stylesheet that was written to correctly transform an RSS 2.0 feed into HTML. (You can,
of course, tweak the stylesheet to adjust the appearance of your site's HTML output; also, if you use a different syndication
format you'll likely need to change the XSL stylesheet, matching on the appropriate nodes.) With just a little bit of work
you can make your site's syndication feeds less confusing to users who may accidentally stumble upon them.
Happy Programming!
By Scott Mitchell