X

Taking XML's measure

XML co-inventor Tim Bray helped define a new computing standard seven years ago. Now he says it's time for the next step.

David Becker Staff Writer, CNET News.com
David Becker
covers games and gadgets.
David Becker
10 min read
Tim Bray and his colleagues in the World Wide Web Consortium had a very specific mission when they set out to define a new standard seven years ago. They needed a new format for Internet-connected systems to exchange data, a task being handled with increasing awkwardness by HyperText Markup Language.

The solution Bray helped concoct was XML (Extensible Markup Language), which has since become one of the building blocks of information technology and today serves as the basic language for disparate computing systems to exchange data. Microsoft is betting heavily on XML-based technology that will turn the new version of Office into a conduit for viewing and exchanging data from backend systems. The biggest players in technology are betting heavily on Web services based on XML. And corporate giants such as Wal-Mart Stores are relying on XML to streamline their business processes.

Bray has since gone on to address another big challenge--the visual representation of data--with his company, Antartica, which sells tools that display information from Web searches, corporate portals and other sources in an intuitive map-based format.

Bray talked about the spread of XML, challenges in search technology and other concerns with CNET News.com.

Q: What was the intent in creating XML?
A: This was 1996, and the Web was already wildly popular beyond anybody's dreams. And it was pretty clear that, while there were a lot of really good things about the Web architecture, there was a severe barrier to extending it beyond presenting information to people.

They were already talking about doing micro payments in e-commerce and collaborative activities of various kinds, and it was for this we needed to be able to make it machine-to-machine. It was also the case that the authoring for the Web was getting more and more industrial, and it was starting to look more and more like conventional publishing. You needed to do a lot of repurposing and syndication and stuff.

And you thought HTML wasn't going to be sufficient?
It was pretty clear that HTML didn't provide a very good answer for any of these things. HTML was and is outstanding as a means of delivering information to people. But as a means of communicating machine-to-machine, it suffered. It suffered, because there was a tradition of laxity. It suffered, because it came with a set of hardware tags, and you couldn't make your own. And there was this thing called SGML (Standard Generalized Markup Language), which had been around for decades and seemed to have a lot of the missing pieces for what people wanted to do on the Web.

The number of people who'd been involved in the SGML world and had real exposure to the Web was really small, and essentially all of them got together and formed a working group under the leadership of (W3C leader) Jon Bosak, and the idea was simply to provide something you could use for industrial publishing--what we would now call business-to-business on the Web.

The first wave of attention for XML focused on Web services, but it seems that it's really coming into play as a sort of lubricant for allowing data exchange between heterogeneous systems. Is that what you expected?
There is no doubt whatsoever that if you go into an environment where XML is really, truly being used right now, it's that lubricant role you described. It's lightweight, quick and dirty enterprise application integration. The world is a heterogeneous place. Given the pace of mergers and acquisitions and the desire to centralize in the enterprise, there's a lot of big, hard, ugly integration problems everybody faces. And it suddenly became apparent at some point that almost every application, no matter how old, had a Web server on it. And you could achieve remarkably acceptable results in enterprise application integration simply by binding a set of XML messages to ship back and forth. There's just an immense amount of that happening right now.

What about the Web services part, where it seems like it's been more sizzle than steak to date?
It depends what level you talk about. If you want to deploy an application across the Web on a fairly large scale, or if you've got an application you want to deploy across a network, there have been a variety of ways to get the systems to talk to each other--CORBA (Common Object Request Broker Architecture), Java RMI (Remote Method Invocation), things like that.

I think the idea of Web services is real and something that will pay for itself big time.
The notion of integrating these things based on a loosely coupled exchange of messages, with the messages formatted in XML--that's clearly a winner and is already being done a lot. In a lot of cases, it's done simply on an ad hoc basis--a couple of guys got together and decided that this is what they needed. The notion of formalizing that and building tool kits around it is fine. So, the idea of Web services is real; something that will pay for itself big time.

Having said that, there is this huge, sprawling stack of standards built on top of standards for orchestration and choreography and routing. And I don't have the slightest clue what some of these guys are talking about. To a certain extent, yes, there are castles being built in the sky.

But the basic materials, like SOAP (Simple Object Access Protocol), are deployed in most of the server and client infrastructures. SOAP is real. WSDL (Web Services Description Language)--I don't use it myself, but it certainly works. There's outstanding integration for people who are in the Microsoft world and the Visual Studio .Net environment. If you have a Web service and can write a WSDL description of it, you can pull together a nice application with an amazingly small amount of work on the .Net machine. So, I think there's some real steak there with the sizzle.

The XML standard is partly meant to set the grounds for the free interchange of data. Are you concerned about companies piling proprietary stuff on top of the standard?
Anybody who's reasonable has to have concerns about that. It's obviously in the interest of a vendor that has substantial market share to achieve customer lock-in. XML does make it qualitatively harder to achieve customer lock-in, because it comes with a predisposition towards openness. It makes it harder technologically, and it also comes with social expectations. If you publish an XML format, and it's proprietary gibberish, you're going to catch some heat--from the press, from analysts, from customers. So I think it just makes it harder for a company like Microsoft to achieve lock-in.

To the extent that I've looked the (XML) formats for Office 2003--I can deal with them. They're not simple, but then, Word isn't a simple product. But if need be, I could write a script to process a Word XML file and extract the text of all paragraphs with certain references--which would have been a very daunting task with previous editions of Word.

So, yeah, there's room for concern. As an industry, we have to be vigilant to preserve open access to our own data. But we are moving in the right direction.

Are the successful companies going to be the ones that find the magic balance between true XML interoperability and putting in enough of their own secret sauce to give them a business advantage?
Absolutely. If you bring an application to market and wave the XML banner, what that means to me is that you're willing to accept input in XML, and you'll give me back information in XML, without stealing any of it. What you do inside your own application is none of my concern. All I care about is: Does it produce the business value I want?

At my company, we take in XML, and we provide XML output. But inside, there's no XML at all; it's all highly proprietary data structures. That's where the real strength of XML is--at the periphery, at the interchange.

You've made some comments about XML being too hard for developers. Does that still hold true?
I wrote an essay about XML being too hard to program. I put a lot of thought into that. I was careful about what I wrote, and I stand by every word of it. The business value of XML is plenty high--high enough to certainly justify its deployment. The progress in making it easy for programmers to use it, to generate it--that progress hasn't been as good as I would have liked. When I personally write applications that consume or export XML, it seems to me like more work than it really ought to be to do it.

Having said that, in terms of interoperability and openness and attractiveness in the marketplace, it's more than worthwhile. The answer is better software, and we're getting that. In particular, the XML handling class in .Net is a substantial step forward from what's been available before in terms of the amount of work required to get the job done. I think XML itself, in terms of an open, interoperable and internationalized data format, was a pretty substantial lurch forward. So it shouldn't be surprising that the actual tools are going to take a while to catch up. And that's happening. I'm actually quite pleased by the progress.

Can you describe what Antarctica is doing and how XML fits in?
The premise of Antarctica is that enterprises are typically really good at collecting information, but Wall Street doesn't reward you for collecting information. Many people in chief information officer and chief technology officer roles would share the perception that the enterprise world in general is not doing a good enough job in getting value out of the large inventory of information that's built up.

So I became interested in mining--getting more value out of all this data that's built up. I concluded that one of the main pain points that's preventing people from getting adequate return on investment from data inventories is the user interface. There's an analogy with the advent of the graphical user interface for PCs. Before that, computers were something of a very small minority of the population. Once everybody got a GUI (graphical user iterface) with a desktop metaphor, the use of the computer became much more widespread.

It's Antarctica's hypothesis that by putting a graphic interface somewhat in the spirit of the desktop metaphor on complex information spaces, we can open up the value in there. In our case, the metaphor isn't a desktop--it's a map.

On the XML front, we're using it on the periphery. Our back end consists of either an SQL database you can talk to directly, or send us an XML input file and we'll read that. Interestingly, in our actual deployments, there's been only one case in which somebody's elected to talk to the database. Everyone else has been happy with XML.

There's a lot of business interest in search now. Do you think companies would be better off focusing on user interface issues than algorithms?
Absolutely. There's no reason to expect that search is going to get that much better.

I think where standards processes don't do well is in dealing with new technologies.
The basic algorithms by which search is done have not improved much since about 1975. The only way to improve the situation is by enhancing search engines with more deterministic metadata, essentially adding knowledge management techniques that give you more information from which to draw connections. If you look at the victory of Google in the search engine business, it wasn't because they had better search techniques. It's because they deployed one key metadata value--how many pages are linked to this one--to enhance the relevancy of their results. The same concepts need to be applied to the enterprise.

There are really two ways to get information: search and browse. And browse has a lot of potential. But to work, the drill down has to be intuitive. It cannot be stupid. You have to be really aggressive about bringing the relevant stuff to the top. You can't force the person to go through multiple levels to get to what they want.

And visual representation plays a big role in that?
Right. It turns out that the display technique that returns the most amount of data per square inch is cartography. That's why we're using a map metaphor. The whole notion of the search engine results list, for which I'm partly responsible, is terribly information-thin. Google creates the illusion that the results list is somewhat one-dimensional: This is the most interesting; this is less so.

But if I type "bicycle" into Google, it doesn't know what I'm looking for. I may be looking for bicycle racing results, or I may be interested in that song by Queen. With a map, I get started off right away: Here are the matches for bicycle in music; here are the matches for outdoor sports; here are the ones for shopping.

You've been a major contributor to the W3C. What's your view of the standards process? It seems that in some cases, like XML, it works very well, but there are others like SVG (Scalable Vector Graphics) where it's been quite slow.
Standards processes don't do well in dealing with new technologies, so I disagree that being ahead of the market is a good thing. The standards process works best when you've got a problem that's already been solved, and we have a consensus on what the right way to go is, and you just need to write down the rules.

That's totally what XML was. There had been 15 years of SGML, so there was a really good set of knowledge as to how markup and text should work. And the Web had been around for five years, so we knew how URLs (Uniform Resource Locators) worked, and Unicode had been around, so we knew how to do internationalization. XML just took those solved problems, packaged them up neatly and got consensus on it all.

SVG is a different thing. I haven't given up on SVG; I think it has a bright future, because it really is better than the alternatives. And they didn't invent much stuff. People from Adobe and Microsoft know this stuff.

XML probably couldn't be done anymore at W3C. The reason XML was so successful is that nobody noticed--we came in low, fast and under the radar, and it was already finished by the time the big vendors noticed it. Now, any time there's a new initiative around XML, there are instantly 75 vendors who want to go on the working group.