XML: Too much of a good thing?

Explosion of special-interest XML dialects could mean the standard is a success or could be the start of a new headache.

David Becker Staff Writer, CNET News.com
David Becker
covers games and gadgets.
David Becker
8 min read
Despite rumors to the contrary, the adult entertainment industry is not developing its own dialect of Extensible Markup Language dubbed XXXML.

Aside from that, it's hard to find an industry or interest that isn't taking advantage of the fast-growing standard for Web services and data exchange. In the six years since the main XML specification was first published, it's spawned hundreds of dialects, or schemas, benefiting everyone from butchers to bulldozer operators wishing to easily exchange information electronically.


What's new:
In the six years since the main XML specification was first published, it's spawned hundreds of dialects, benefiting everyone from butchers to bulldozer operators.

Bottom line:
The proliferation could mean the standard is a success or could be the start of a new headache.

More stories on this topic

While some industry observers worry proliferation has gone too far, potentially creating new instances of the interoperability problems that XML was meant to solve, proponents say the explosion of schemas is a testament to the format's success.

Tim Bray, co-inventor of the main XML specification, said the proliferation of special-interest XML dialects validates what he and his colleagues set out to achieve.

"The idea from the start was to make it as easy as possible for people to come up with their own special languages for their specific problems," Bray said. "In the big picture, I think XML is more successful than any of us who designed ever thought it would be."

XML is most often lauded as a foundation for delivering Web services and is the base for plans from Microsoft and other software makers to ease the development and maintenance of business programs. Web services and XML are also major components of Indigo, a new communications subsystem that's slated to be part of Longhorn, the next major release of Windows. Microsoft recently revised its plans for Longhorn and said it will make Indigo available for Windows XP and other current versions of Windows, meaning that it should soon become even easier to exchange XML data between computers.

Also, XML data exchange is a must for companies wishing to join the growing movement toward building new business software using a more flexible model called a "services-oriented architecture." Proponents say SOAs can make software easier to reconfigure as needs change and that they're cheaper to maintain in the long run.

As a vehicle for describing complex sets of data in a globally comprehensible way that works smoothly across the Internet, however, XML is already there. Just ask your local chicken farmer, who is, or soon will be, benefiting from Meat and Poultry XML (mpXML), an offshoot of the Global Standards Management Process that is designed to meet the special needs of producers, retailers and distributors of flesh food.

Turns out meat is a classic example of an industry with agreed-on data sets (Prime or Choice? Wing or drumstick? Fresh or frozen?) where speedy electronic transmission of data can be a major asset, said Blake Ashby, executive vice president of mpXML.org.

"Anything our people can do to move that product through the supply chain faster pays off for them in less shrinkage" and spoilage, he said. "Without a system, the managers of these (grocery store) meat departments have to spend time walking the aisle and seeing what they have too much of and when it expires."

Relatively speedy industry agreement on XML has helped producers and sellers boost business and prepare for new challenges, Ashby said. "The need for a global standard has really increased, especially now that Congress is pushing for country-of-origin labeling," he said.

The benefits of XML were similarly obvious for the newspapers and other media outlets that need to deal with voluminous and often inconsistently formatted statistics reported on sports pages, said Alan Karben, chairman of the SportsML Working Group, a branch of the International Press Telecommunications Council that oversees Sports Markup Language.

"Because people's appetites for esoteric sports statistics are so insatiable, the data reports that get exchanged and formatted for display are often incredibly intricate," Karben wrote in an e-mail exchange. "For our industry, the benefits of XML are clear: consistent input no matter what the provider, what the sport, what the native language."

XML has succeeded, co-creator Bray said, because it has solved several of the more vexing challenges for electronic data exchange, including growing need to deal with diverse languages and character sets.

"One of the big problems is internationalization," Bray said. "One of the reasons XML took off is because it solved a lot of those issues with Unicode, which was fairly new at that point."

How much is too much?
While XML makes it easy to create special-purpose dialects, the privilege shouldn't be abused, Bray warned. Competing schemas handling similar tasks create the potential for confusion and broken connections. Consider musical notation, where there are at least a half-dozen projects to apply XML to standardizing music scores. Similarly, the seemingly arcane field of cave exploration has inspired at least three attempts at XML data standards.

"There's an incentive to create a language to solve your specific problem," Bray said. "But if there's something out there already that might serve your need, you should consider using it."

Ron Schmelzer, an analyst for research company ZapThink, says industry leaders typically have little trouble agreeing on what data needs to be represented in an XML schema but get hung up on how to do it--sometimes creating conflicting specifications.

"When you have two different organizations trying to push two different vocabularies for solving the same problem, it doesn't help the supply chain," Schmelzer said. "If you're a small guy, supporting a bunch of different schemas gets difficult."

XML everywhere

Extensible Markup Language offshoots such as Web logging foundation Really Simple Syndication and new Microsoft Office formats get all the attention, but XML is transforming the way people exchange information in countless areas.

Some examples you might have missed:

• LandXML is a format for arranging data on terrain. It's most commonly used to feed data from engineering applications that design roads, construction sites and other projects directly into navigation systems. Bulldozers and other construction vehicles use the systems to eliminate most of the need to have surveyors on site during construction.

• Karst Markup Language is one of several efforts to develop an XML schema optimized for sharing data from cave surveys and maps.

• Recipe Markup Language uses XML to create a standardized format for organizing and presenting cooking directions.

• MusicXML is one of several efforts to create an XML format for expressing music and notations. Among the potential benefits, scores could be fed directly into MIDI systems for playback.

• Theological Markup Language is meant to standardize scriptural citations and other references to theological documents.

• Mind Reading Markup Language is an apparently farcical and now abandoned project to mess with your head.

Source: CNET News.com research

But proliferating schemas are more often a reflection of the complexity of the data that needs to be described, said Chuck Allen, director of the HR-XML Consortium, a human resources trade group shepherding more than a dozen XML offshoots to standardize data formats in areas such as payroll and stock-incentive plans.

"There has been some concern about hundreds of standards groups duplicating efforts, and there are cases where some of these groups could look over the other's shoulders more closely," Allen said. "But it gets complicated when you're trying to draft metadata standards to capture all this very complex domain knowledge."

Allen said his group employs sensible standards to ensure new XML projects truly serve a purpose. "We need at least three organizational sponsors and 10 participants," he said. "The main criteria are 'Is it in our domain?' and 'Is anybody else doing something about it?'"

Likewise, it would have been easy for the insurance industry to spawn a wealth of standards specialized for everything from boat coverage to reinsurance. But Lloyd Chumbley, assistant vice president of standards for trade group Acord, said the industry had a head start because it had already centralized on common paper forms, mainly to ensure agents could easily share data with insurers.

"When you're trying to do quotes for a policy, the last thing you need is to have to talk several different languages to communicate with several different insurers," he said. "The insurance industry for the most part has been using standardized forms generated by Acord since the 1960s, and that that helped us maintain a single point of reference as everything got digitized."

Chumbley said the main proliferation challenge in the insurance industry is the localized schemas that have emerged to reflect changes in national laws. "We deal with a lot of different organizations internationally to consolidate XML schemas and definitions," he said. "When you're dealing across different cultures and legal systems, it takes time, but we're making progress."

Allen also expects consolidation of XML dialects. "There's been a lot of speculation as to whether there'll be more convergence, and I think that is going to be the case," he said. "I think it'll be because of IP (intellectual property) issues...which are sometimes more costly than the actual development. It takes a lot of resources to review the patent libraries, police the group's IP policies. If you have fewer organizations, there's fewer IP agreements."

John Simpson, author of several XML-related books, said the proliferation of XML dialects to describe similar data sets isn't the chaos machine one might assume, thanks to the ease of translating from one dialect to another.

"The fact there are different standards is immaterial...it's almost trivial to get it from one dialect into another," Simpson said, crediting the simplicity and integrity of the main XML specification.

"They came up with really simple rules for how the XML spec is going to develop, and those have allowed tremendous flexibility," said Simpson, who created his own schema for classifying "B" movies. "People refer to XML as a language, but it's really a grammar for inventing new languages or describing ones that already exist. The XML spec itself is this kind of wonderful chameleon."

Stephen O'Grady, an analyst at researcher RedMonk, agreed that the simplicity of the base XML standard makes it easy to accommodate multiple dialects, but he envisions a kind of Darwinian selection for competing schemas: Multiple approaches to similar problems blossom, the market states a preference and supporting software is tweaked to push data from one XML dialect into another.

"Because XML is the way it is, it's usually not intrinsically difficult to extract information," O'Grady said. "The situation with (Web log formats) RSS and Atom is a good example. I think it's likely the market will end up deciding one is the way to go over another, and then it's a pretty easy task to consolidate."

High-tech chess players, meanwhile, have a bounty of options. With five projects and counting underway to develop an XML-based system for describing chess moves, about the only apparent agreement is that one side has to be white and the other black.