Published: Wednesday, December 03, 2003
The XHTML Way, Part 2
By Vlad Alexander
Read Part 1
In Part 1 we looked at the differences between HTML 4, XHTML 1.0, and XHTML 1.1,
as well as why XHTML provides a superior option when creating content-managed Web sites involved numerous content providers
and Web developers. In this second and final part, we'll look at an XHTML 1.1 WYSIWYG editor and discuss some ideas
to keep in mind when working on HTML 4 sites today in order to make a smooth transition to XHTML in the future.
The New Content Managers
In the last few years, the responsibility for managing Web-based content has rapidly and irrevocably shifted
from programmers to non-technical business users, the vast majority of whom have no understanding of technical
standards or best practices. This is understandable since it is not their role. However, most business authors use
WYSIWYG editors to manage content and those editors present a major headache for developers, since they consistently
generate vast amounts of non-compliant code that severely limit the availability and portability of data.
The contentEditable
interface found in Internet Explorer is by far the most popular WYSIWYG interface and numerous vendors have
written wrappers for it, producing in-browser WYSIWYG editors, some of which are really well done. However, at
their core these are all HTML 4 editors, with the major limitation that they fuse data with formatting. True,
a handful of vendors have attempted to clean up HTML 4 code to the level of XHTML 1.0 Transitional, but all flavors
of XHTML 1.0 still tolerate inline formatting, which we've seen is the core of the problem. Virtually all
HTML 4/XHTML 1.0 WYSIWYG editors have this weakness. So what would an XHTML 1.1 WYSIWYG editor look like?
The first thing that strikes you when you see an XHTML 1.1 WYSIWYG editor, like XStandard
for example, is the absence of formatting tools like color pickers or font selectors.
(Note: The author works for the company that develops XStandard.) While these tools are popular
in word processors, they are dangerous in WYSIWYG editors because they permit users to make arbitrary formatting decisions
that compromise the look and feel of Web sites. They also format content for its appearance not its semantic meaning.
This can be seen in the example below, where it is impossible to attach any additional meaning (semantics) to
"War And Peace". The only information you can attach to "War And Peace" is formatting details that are probably only
useful in this context.
By contrast, XHTML 1.1 WYSIWYG editors permit authors to attach real meaning to data. For example, the simple
drop-down menu seen below allows authors to choose from a list of semantically rich names (called Styles) when
formatting content. This means that Styles like "Quote" or "Character" not only determine how content will be
formatted, they also attach meaning that can be useful for parsing and reusing data.
Behind each Style name are instructions for the type of markup to create. For example, a style called "Underline"
could create a <span class="underline"> markup. The style called "Chapter Title" could create an
<h1 class="title"> markup. Formatting would be applied to the content through external or
embedded CSS.
Plan Ahead – Future-Proof Your Data
Whether you're ready to make your Web site XHTML 1.1 compliant, or if you prefer to stay with an older spec like
HTML 4 for a little longer, you can start protecting your data today by capturing it as XHTML 1.1. By all means, continue
to layout your site in HTML 4 but maintain your content in XHTML 1.1, because content authored today in XHTML 1.1
can be integrated into page layouts that meet earlier specs like HTML 4, but can also be inserted into layouts when
you upgrade to more recent specs. XSLT can also be used to transform XHTML 1.1 into just about any text-based format.
There's simply no reason not to start authoring content today in XHTML 1.1.
Rules Of The Road
The following are some rules for future-proofing the most valuable aspect of your Web site - your data. These rules
are important because they separate data from formatting and turn your data into XHTML.
- Don't use formatting to convey meaning. If you change the foreground color of text to green, there has got to
be a good reason for doing this. At the very least, use a generic tag like
<span> or
<div> with a meaningful class name. For example:
<span class="important">Stock market up 10%</span>.
- Use CSS for formatting. Only external or embedded CSS, never inline CSS.
- Make sure that markup is well formed. This means all tags must be property nested and non-empty tags must have
a closing tag. For example:
<p>Hello World!</p>
- Empty elements like
<br>, <hr>, <img>,
etc. must end with a />. You can insert a space before the slash for backwards compatibility with
older browsers. For example: <br /> and <hr />.
- Get into the habit of writing all element and attribute names in lowercase. XHTML 1.0 and higher requires this.
- Make sure all attribute values have quotes around them. For example:
<table width="100%">.
- Use semantically rich tags like
<acronym>, <abbr>, <dfn>,
<code>, <strong>, <em>. For <acronym>,
<abbr> and <dfn>, use the title attribute to describe the contents.
For example: <dfn title="A program intended to enhance the operation of a parent application.">plug-in</dfn>
(See http://www.w3.org/TR/xhtml1/ for more rules.)
Conclusion
Moving to XHTML 1.1 does not have to be an immediate either/or choice. XHTML can co-exist with HTML. But there
is one thing you should stop doing as soon as you can: stop using inline formatting. Use external or embedded CSS
instead. This one rule will at least extend the life of your data and ensure its future availability.
Happy Programming!
By Vlad Alexander
Vlad Alexander is a software engineer at Belus Technology, makers of
XStandard. He is one of the developers of Apple Computer's original
Apple Store and has worked on projects for the US Department of Defense (Washington, DC), Platinum Technology (Chicago, IL),
Delta Dental (Seattle, WA) and SHS.com (Anacortes, WA).