X

XML too inefficient for the Web

2 min read
In response to the February 10 special report by Evan Hansen and Paul Festa, "":

I have to shake my head yet again at the silliness that is still being promoted by my colleagues. XML is a dismal failure. This kind of silliness, especially XML, is killing our industry. Server side Java and XML have done more to stifle innovation than Microsoft.

In systems programming we care about performance and efficiency. When I first ran into XML as a method for transporting data, I asked two very fundamental questions: How does markup get applied to a database table, and why would you pay the price of 40 bytes of XML markup to deliver four bytes of integer data?

The problem with XML is that it assumes the markup and data are intertwined (thus the word "markup.") Also, it encourages the use of a general XML parser. Why write an optimized parser when you can grab a working general parser? Writing parsers is hard and expensive work.

However, a general XML parser is hugely inefficient beyond the point where anyone caring about performance even begins to think about using a general parser.

Let's assume that TCP/IP is replaced with SOAP (Simple Object Access Protocol), a network transport for businesses. TCP/IP is a hard-coded, very efficient protocol/language and parser. Field widths are fixed for fast parsing. Data values for the network transport software itself are extremely limited, making again for fast parsing.

Imagine now we have a general parser parsing SOAP instead of TCP/IP. If this happened, then the Web would come to a halt in less than one second because general parsers are so slow. XML parsers are even doubly slower.

This same thought-experiment can be applied to databases and their data. XML is extremely inefficient to an unusable degree. XML to deliver small tables of aggregate data on a Web page is fine; HTML is too restricted.

XML to deliver terabytes of data, by either marking up the data in the database itself or dynamically applying the markup on delivery of terabytes of data, is so inefficient it defies discussion. Why would anyone of any sanity even consider it? It blows my mind. Time and time again, when I bring these points up to my colleagues, they have no answer, or they are not systems programmers and say really, really uninformed systems programming things like "CPU cycles are free" or "disk space is free."

Mybrid Spalding
Mountain View, Calif.