News & ViewsWhat is XML, Anyway?

News & Views Software Review


By Michael Hendry

Originally published in News & Views May 2000 issue.

Copyright 2000 STC-Philadelphia Metro Chapter. For permission to reprint this article, contact the Managing Editor.


I will address one of my pet peeves first by saying what XML is not: XML is not a database front end. Regardless of what the tool vendors are trying to tell us, XML is for marking up text. It's ours, and don't let the programmers take it away from us. Now the good news: XML is easy!

So what is it?

XML (eXtensible Markup Language) is not a markup language; it is a language to create markup languages (called a "metalanguage"). I could start talking about it being an offshoot of SGML, but I won't. Just use XML to clearly describe the logical structure of your document. Worry about the rest later.

Notice the underlined words. Markup tags can have three meanings: structure, style, and semantics.

  • structure tags convey how the tagged element fits into the heirarchy of the document
  • style tags specify how the element should be displayed
  • semantic tags convey the meaning of the element

HTML is a hybrid of structure and style tags and isn't very good at conveying either. For example, in HTML, this article would use tags like <BODY>,<H1>, <H2>, <I>, <P>, and <UL>. XML tags combine structure and semantic tags; in XML, this article could use tags like <TechnologyReview>, <Title>, <Author>, <BodyText>, and <TagList>. The result: XML files are human-readable! The official XML specification says "Terseness is of minimal importance." Don't be terse! Write your tags so they clearly convey the structure and meaning of your document.

So how do I use it?

First map out the structure of the document you want to code. Then start writing tags. Follow these simple rules and you will have well-formed XML:

  • put in the XML header
  • match all start tags with an end tag (note: tags are case sensitive)
  • nest your tags properly (Microsoft users beware!), don't let elements overlap
  • use "entity references" for special characters: "&" instead of "&"
  • make sure all your attribute values are in quotes

Now I recommend you get a good book on XML to understand the fine points of these rules, but hopefully this will allay any fears you may have.

How will my XML documents look?

Now here's the rub. You have to use a style sheet. Although it seems with HTML that you don't need a style sheet, you do use one-if not your own, then the one built into the browser. But if you are making up tags as you go along, the browser is not going to know what to do with them. XML can make use of Cascading Style Sheets (CSS) or eXtensible Stylesheet Language (XSL).

Parts is parts

The design goal of XML was 1) to be easy, 2) to have better linking than HTML (XLL), and 3) to have better stylesheets than CSS (XSL). Unfortunately the designers gave 2 a priority over 3, and neither got done, which set the usefulness of XML back about two years. Figure 1 shows the parts that make up an XML project. So here's a rundown:

  • the XML document
  • the style sheet
  • the Document Type Definition (DTD), which is optional. Use a DTD to enforce rules and reject documents that do not meet them.
  • the links. Based on the eXtensible Linking Language (XLL), these can be internal or external, and are more powerful than HTML linking. But they're not ready yet, so I won't go any further here.

Figure 1: The Parts of an XML Document

It's worth an aside to note that the "display" can be anything. With MusicML, an XML-based language for music, a browser could display the sheet music of a song, while a synthesizer could play the song from the same XML file. This begins to show why XML is so exciting and promises so much.

Style sheets

Fortunately, XML can use CSS and XSL, because XSL isn't completely ready yet. CSS level 1 supports basic formatting, and is straightforward. CSS level 2 combines formatting with placement, animation, and other advanced features. XSL is broken out into two parts, XSL Transformations (XSLT) and XSL formatting.

XSL formatting is not released yet. XSLT, which is released, provides a processing language to transform the XML document. The most useful function of this currently is to transform XML to HTML for display on current browsers. For example, an XSLT parser can map the <Title> tag to <H1>, <Author> to <H2>, etc., and create an HTML document. In the future, different XSL files can transform an XML document for different output devices.

(Note: for more information about HTML and CSS, I recommend downloading the official specifications from www.w3c.org. They are clear and complete. For XML, don't bother. The XML spec is written in the language of Computer Science. So unless you are familiar with computer language specifications, the spec will not help much).

Tool support

Currently, Microsoft Internet Explorer 5.0 supports XML with CSS level 1. This means you can start experimenting with XML and use it under controlled conditions. Netscape 6.0 (now Mozilla again), due out later this year, will support more CSS level 2. While we still have a couple of years, I think, before we can unleash XML documents on the general public, I think it is exciting enough that it warrants getting involved now.



Return to . . .

[News & Views] [STC-PMC Home] [STC Home Page]
Last updated: June 6, 2000 (mvh)