Martin Pool's blog

PyRXP and XML as tuple trees

PyRXP has an interesting way of representing XML documents as Python tuple trees. It makes data encoded as XML pretty easy to deal with. Unfortunately it does not come with Python or Debian and you need to build the rxp library first.

Dave Mertz introduces PyRXP.

Uche Ogbuji has some criticisms of its handling of Unicode, and pointers to related tools.

XmlMarshaller also looks pretty interesting and is pure Python.

Illogical Markup

Dave Thomas says

It seems to me that DocBook is falling in to the same trap as the rest of the XML world, confusing tedious verbosity for semantic information.

XML is like

From slashdot via rusty

XML is like violence: If it doesn't solve your problem, you aren't using enough of it.

Whitespace handling in XML

In The Annotated XML Specification, Tim Bray explains a common confusion about whitespace in XML:

XML has an incredibly simple rule about how to handle white space, that is contained in this one sentence: "If it ain't markup, it's data." Under no circumstances will an XML processor discard some white space because, in the processor's opinion, it is not "significant".

Let's look at our white space example again:

<p>Little boys, ingredients for:
  <ol>
    <li>Snips,</li>
    <li>snails,</li>
    <li>puppy dogs' tails.</li>
  </ol>
</p>

An XML processor will pass the application not just the title and the ingredients, but all the white space characters you can see before the <ol> and <li> tags, and also the line-end characters you can't see; in this case, 7 of them. (But note that an XML processor will clean up the line-ends as described in the next section, so while apps are going to have to wrestle with white space, they won't have to deal with CR-NL on windows and CR on Mac and NL on Unix.)

This behavior is going to cause some surprises and problems for XML users and programmers, because we've come to expect (as a result of working with SGML and HTML) "insignificant" white space to auto-magically vanish.

On the other hand, those who've actually worked with real SGML tools will generally approve of XML's behavior, because it has an important virtue, namely that the rule is simple and anyone can understand it: all white space gets passed through, always.

Sick of XML? Try YAML!

From slashdot:

YAML(tm) (rhymes with "camel") is a straightforward machine parsable data serialization format designed for human readability and interaction with scripting languages such as Perl and Python. YAML is optimized for data serialization, configuration settings, log files, Internet messaging and filtering. YAML(tm) is a balance of the following design goals:

  • YAML documents are very readable by humans.
  • YAML interacts well with scripting languages.
  • YAML uses host languages' native data structures.
  • YAML has a consistent information model.
  • YAML enables stream-based processing.
  • YAML is expressive and extensible.
  • YAML is easy to implement.

Archives 2008: Apr Feb 2007: Jul May Feb Jan 2006: Dec Nov Oct Sep Aug Jul Jun Jan 2005: Sep Aug Jul Jun May Apr Mar Feb Jan 2004: Dec Nov Oct Sep Aug Jul Jun May Apr Mar Feb Jan 2003: Dec Nov Oct Sep Aug Jul Jun May