Whitespace handling in XML
In The Annotated XML Specification, Tim Bray explains a common confusion about whitespace in XML:
XML has an incredibly simple rule about how to handle white space, that is contained in this one sentence: "If it ain't markup, it's data." Under no circumstances will an XML processor discard some white space because, in the processor's opinion, it is not "significant".
Let's look at our white space example again:
<p>Little boys, ingredients for:
<ol>
<li>Snips,</li>
<li>snails,</li>
<li>puppy dogs' tails.</li>
</ol>
</p>An XML processor will pass the application not just the title and the ingredients, but all the white space characters you can see before the <ol> and <li> tags, and also the line-end characters you can't see; in this case, 7 of them. (But note that an XML processor will clean up the line-ends as described in the next section, so while apps are going to have to wrestle with white space, they won't have to deal with CR-NL on windows and CR on Mac and NL on Unix.)
This behavior is going to cause some surprises and problems for XML users and programmers, because we've come to expect (as a result of working with SGML and HTML) "insignificant" white space to auto-magically vanish.
On the other hand, those who've actually worked with real SGML tools will generally approve of XML's behavior, because it has an important virtue, namely that the rule is simple and anyone can understand it: all white space gets passed through, always.
posted Wed 31 Mar 2004 in /software/xml | link
Archives 2008: Apr Feb 2007: Jul May Feb Jan 2006: Dec Nov Oct Sep Aug Jul Jun Jan 2005: Sep Aug Jul Jun May Apr Mar Feb Jan 2004: Dec Nov Oct Sep Aug Jul Jun May Apr Mar Feb Jan 2003: Dec Nov Oct Sep Aug Jul Jun May
Copyright (C) 1999-2007 Martin Pool.