IBM touts speech recognition for Macs
Jeff Kuznitz, product manager, IBM ViaVoice
A budding standard, the brainchild of tech giants AT&T, IBM, Lucent Technologies and Motorola, is fueling new software that allows people to use voice commands via their phones--either cell or land-based--to browse the Web. Users of the technology can check e-mail, make reservations and perform other tasks simply by speaking commands.
The technology, called VoiceXML, is now winding its way through the World Wide Web Consortium Internet standards body, which is reviewing the specification and could make it a formal standard by year's end.
Proponents of VoiceXML say standardization is crucial for the market for Web voice access software and services to take off. The standard gives software and hardware makers, as well as service providers and other companies using the technology, a common way to build software to offer Web information and services over the phone. But many technology issues remain undecided, such as interface design and implementation of the standard in software. In many ways, the voice Web of today is still relatively primitive, more like the simple HTML Web interface of 1996, say analysts.
"You're not going to be able to run your business from your phone, but you can check the status of things, which is important," said Kevin Dick, a consultant, author of an XML book and analyst at Kevin Dick Associates. "If you think about what a manager or mobile professional does, a lot of it is authorizing things and checking the status of things, or assigning things to other people. All those operations can be performed pretty well with VoiceXML (enabled software)."
The potential audience for the technology is huge, backers claim. "There are ten times as many telephones in the world as there are PCs. If people use telephones to access the Net, it opens up a big new audience," said Jim Larson, a manager at Intel's Architecture Labs and chair of the Web Consortium's effort to make VoiceXML a standard. "With a standard, everyone can learn the same language and write (new phone) applications faster."
Even though the VoiceXML specification hasn't been finalized, tech companies and telecommunications service providers alike have flocked to support the technology and are already offering new software and services that tie the telephone to the Internet. The technology has gained the support of nearly 500 companies, including IBM, networking giant Cisco Systems, database software maker Oracle and stock brokerage firm Charles Schwab.
One notable holdout is Microsoft. The world's largest software maker is following the standard's progress, but has not announced any VoiceXML-enabled products.
With VoiceXML technology already in use, AT&T Wireless, Sprint PCS, Japan Telecom and Qwest Communications International allow cell phone owners to use voice commands to browse the Web, so they can hail a cab, buy movie tickets, or have a computer-generated voice read them news and traffic information.
Another reason for the rush to develop voice-driven Web interfaces, particularly for cell phones is pending legislation in at least 35 states that will make it illegal to drive with a cell phone next to the driver's ear, except for during emergency calls. New York has already passed a law banning the practice. Voice command recognition using VoiceXML could help cell phone carriers, such as Verizon and Alcatel, to provide hands-free Web surfing. Both carriers, along with automaker DaimlerChrysler, are members of the VoiceXML Forum, an industry organization founded by AT&T, IBM, Lucent and Motorola, to promote VoiceXML specification.
Other companies are building new services built on VoiceXML that are expected to debut later this year. Internet telephony company Net2Phone will offer a service that will allow people to make Net-based phone calls just by picking up a telephone and speaking a person's name. The service will also feature a computer-generated voice that will read e-mail, schedule and contact information.
Companies working on VoiceXML services plan to offer the ability to verbally reserve airplane tickets or trade stock without talking to a live customer service representative. On corporate networks, VoiceXML-enabled software will eventually let sales people access their company's corporate network and check on the status of a customer's order, for instance.
Technology companies said their support for a standard that has not been finalized is not a concern. Cisco executives say they recently built in VoiceXML in some new Internet telephony equipment because their telecommunications service provider customers demanded it. Cisco will simply upgrade its software once VoiceXML is finalized as a formal standard.
"We've done this countless times before," said Mathew Lodge, a Cisco product manager. "There (are) lots of Internet drafts that change substantially and we support (them)."
Speak to me
VoiceXML is just one flavor of XML (Extensible Markup Language), a Web standard for information exchange that not only allows companies to easily and cheaply conduct online transactions with their customers and partners, but it also delivers sound, video and other data across the Web.
While HTML, the language for creating Web sites, has a predefined vocabulary, XML allows software developers to define their own vocabulary for building custom systems that can exchange data. VoiceXML is the verbal equivalent to what HTML is for Web sites, Larson said.
Analysts and technology executives say accessing the Web over regular and mobile phones could prove popular, especially for people who need small bits of information right away or need to make a quick transaction, but are away from their PCs. Using voice commands over a cell phone to navigate the Web is an alternative to text browsing on phones' small screens, they say.
VoiceXML--combined with speech recognition technology and software that translates text to speech--is also improving customer service phone calls. For example, if you're calling a bank to get your account balance, you can verbally request what you want without having to go through a series of instructions that ask you to press numbers on your phone's keypad.
"A voice interface is better than a phone pad on a cell phone. The phones are getting smaller, but your thumbs aren't," said Bill Dykas, IBM's strategic alliance manager and chair of the VoiceXML Forum, the technology group that created the initial standard.
But analysts say the technology could flop if people dislike the user interface. IDC analyst Mark Winther said voice user interfaces today are comparable to the simple Web site designs of five or six years ago.
"There's a long way to go. It's very awkward and klugey," Winther said. "There's a danger of doing them badly. If it's not done right, people will hate them. But there is a recognition that it's a challenge and companies are doing a lot to do it right."
For example, people can access Web information over the phone now by saying one or two words into the phone, such as "weather," or "sports." The vision is to eventually allow people to speak naturally and in complete sentences, Winther said.
Using voice commands to surf the Web can be seen as a potential replacement for difficult-to-use Web browsers on cell phones that allow people to grab Web information by clicking on the phone's keypads. Service providers offer mobile Web connections through a protocol called Wireless Access Protocol. Winther, however, says the two technologies can work hand-in-hand.
"If you are driving in your car and call a voice (Web) portal for directions, it's much better to have the directions on your (cell phone) screen than to have to write them down while driving," he said.
Companies are developing ways to allow the two technologies to work together, Winther added. The phones don't have enough bandwidth to integrate the two services, but some companies are "building tricks" to make it happen, he said.
Speech recognition companies Nuance and SpeechWorks have updated their software to support VoiceXML. Their customers, such as American Airlines, United Parcel Service and E*Trade, use existing speech-recognition technology to offer voice-activated services to track flight and delivery times as well as stock prices.
VoiceXML has spawned new companies, such as Tellme Networks, HeyAnita, BeVocal and VoiceGenie, which either offer voice portal services or sell the software that allows service providers and businesses to offer voice portal services. The voice-activated services provide basic information, such as stock quotes, traffic information and news headlines, as well as the ability to buy movie tickets online.
Analysts say voice portals are catching on. Some industry analysts suggest that every major U.S. wireless carrier will offer this service by year's end. While Sprint and Qwest charge a small monthly fee, AT&T offers the service for free, but its customers are forced to listen to advertisements when they first log on to the system.
Tellme executives say other future services will include the ability to tell your voice mail program commands, such as "delete message," or "move to the next message," rather than having to push buttons on the keypad.
If the VoiceXML market takes off, it could be a boon for telecommunications carriers that have built Internet-based voice networks, such as Qwest, Genuity, iBasis and ITXC, which could carry the traffic on their networks.
Analysts say they expect the technology to become popular among consumers as well as in the corporate world, especially those with customer service centers. VoiceXML is making it easier for customers to resolve their issues, rather than having to talk to a customer service agent, they say.
In fact, tech companies such as IBM and Lucent are already selling products based on VoiceXML that will allow businesses to create voice-based Web sites. Microsoft executives say they expect to offer similar software in the future.
Dick said he envisions a future when employees can access their corporate network using VoiceXML. For example, a saleswoman can have customer information read to her, such as the last time her company contacted the customer. He also expects new voice software that will allow executives to dial in and hear that they have expense reports waiting for them. The executives can review them over the phone and verbally authorize or refuse the expense reports.