Web speech spec gets tongue-tied

The intellectual property claims of some contributors to the VoiceXML standard could keep the technology in limbo.

Paul Festa Staff Writer, CNET News.com
Paul Festa
covers browser development and Web standards.
Paul Festa
4 min read
The Web's leading standards group called on developers to implement its nearly finished specification for bringing voice interaction to Web sites and applications.

But the intellectual property claims of a handful of contributors, including Philips Electronics and Rutgers University, threaten to keep the specification tied up in negotiations, the standards body warned.

The World Wide Web Consortium (W3C) on Tuesday issued VoiceXML 2.0 as a candidate recommendation, the penultimate stage in the consortium's approval process.

The job of VoiceXML--part of the W3C's Voice Browser Activity--is to let people interact with Web content and applications using natural and synthetic speech, other kinds of prerecorded audio, and touch-tone keypads.

In addition to adding speech as a mode of interaction for everyday Web surfing, the W3C has its eye on other applications. These include the use of speech for the visually impaired and for people accessing the Web while driving.

The group called VoiceXML a central part of its work on voice-computer interaction.

"The VoiceXML language is the cornerstone of what we call the W3C speech interface framework--a collection of interrelated languages that are used to create speech applications," said Jim Larson, co-chair of the W3C's voice browser working group and manager of advanced human I/O (input/output) at Intel. "Using these types of applications, the computer can ask questions and the user can respond using words and phrases or by touching the buttons on their touch-tone phone."

Other W3C specifications control individual pieces of the voice-browsing puzzle. The Speech Synthesis Markup Language (SSML), for example, describes how the computer pronounces words, with attention to voice inflection, volume and speed. The Speech Recognition Grammar Specification (SRGS), establishes what a user must say in response to a computer prompt. And the Semantic Interpretation for Speech Recognition (Semantic Interpretation) strips down text and translates it to a form that the computer can understand.

"All of the other languages deal with one single aspect of voice applications, but what really controls the conversation and makes it possible is VoiceXML," Larson said. "It's the thing that allows you to converse with a computer."

VoiceXML also has some complementary specifications in progress outside the W3C. One such project, Larson said, comes from the Speech Application Language Tags (SALT) Forum, which is cooperating with the W3C and shares members with that group. The forum, founded by Microsoft, Intel, Philips and Cisco Systems, among others, is working on more detailed, lower-level programming technology.

Larson said the SALT Forum had given the W3C its latest work, which he said would be incorporated into the next version of VoiceXML.

Sticking points
With version 2.0 all but finished and industry heavyweights behind it, VoiceXML has considerable momentum. But the technology could wind up stuck in candidate recommendation limbo, unless W3C members can resolve an intellectual property dispute.

The problem is that version 1.0 was developed not at the W3C but under the auspices of the VoiceXML Forum, an organization founded by AT&T, IBM, Lucent Technologies and Motorola. That group contributed its work to the W3C in May 2000.

The VoiceXML Forum was chartered under a so-called RAND policy, which meant that companies could retain intellectual property rights to the technologies they contributed under "reasonable and nondiscriminatory" terms. That policy changed last year to a royalty-free version, similar to the policy the W3C reaffirmed in 2002 after heated public debate.

But the VoiceXML spec was designed under RAND terms, and several organizations that contributed code under that policy are balking at the W3C's request that they relinquish their intellectual property claims.

"We've been working diligently on getting our members to change from RAND to royalty-free and have successfully done this with most of the members," Larson said. "But there are still two or three that still want to have a RAND license. That was the original policy that they signed up for, and they are not willing to change their position."

Those clinging to their RAND status include Philips, Avaya and Rutgers. Microsoft and IBM, which have resisted royalty-free pressures in other contexts, have relinquished their claims on VoiceXML, Larson said.

Philips, Avaya and Rutgers could not be reached for comment.

In response to the intellectual property imbroglio, the W3C will set up a patent advisory group to work out the differences. Until that group is formed and has completed its work, VoiceXML will not proceed to full recommendation status.

"No one quite knows how it's going to work out," Larson said. "But (the group) will try to come up with creative solutions to make as many people happy as possible. We will not issue a recommendation until this is resolved in some fashion."