W3C looks at next-gen voice technologies

The World Wide Web Consortium says next-gen VoiceXML will include specs for speaker verification.

The World Wide Web Consortium on Tuesday said the next generation of VoiceXML will include specifications for speaker verification.

The W3C, a standards-setting body for the Internet, said it will draft specifications for VoiceXML 3.0, a technology enabling voice identification verification for users transacting business by phone or using voice on computers.

VoiceXML technologies are usually used to enable commands to be issued by voice, rather than with keystrokes. A number of businesses rely on the technology to improve profits by automating processes and reducing employees.

But users and businesses are becoming increasingly concerned about the security of those transactions, given the influx of cases where security has been breached.

"Speaker verification and identification is not only the best biometric for securing telephone transactions and communications, it can work seamlessly with speech recognition and speech synthesis in VoiceXML deployments," Ken Rehor, newly elected chairman of the VoiceXML Forum, said in a statement.

The W3C has now completed its desired requirements for VoiceXML 3.0 and expects to have a working draft of the specifications by the end of the first quarter, said James Larson, co-chair of the W3C Voice Browser Working Group.

In addition to the speaker identification requirements for VoiceXML 3.0, the W3C addressed the issue of extending its Speech Synthesis Markup Language (SSML) functionality to certain languages including Mandarin, Japanese and Korean.

SSML is designed to allow developers to control various aspects of speech from pitch to volume to pronunciation.

"The Chinese tags will include the right tone so the right meaning is conveyed, and boundaries so users know where words start and end," Larson said.

In Mandarin, for example, the word "mai" would mean either "cat" or "pretty," depending on the tone the speaker used.

Tags in SSML already help speech synthesizers to illicit correct word pronunciations, Larson said. For example, aluminum is pronounced one way in the United States, but the same word has a different pronunciation in Canada and the U.K.

Although SSML is already a standard, the W3C group working on the language extensions hopes to hold its first formal meeting in March to develop a document on extension requirements, Larson noted.