Big Blue will announce later this month that it will ship by June the first product from Xperanto, an initiative aimed at helping companies fetch information from many data sources all at once, from sales records to documents stored in e-mail servers.
Meanwhile, Microsoft and BEA Systems are approaching the same problem with similar technology, while database market leader Oracle favors a different approach. At stake is the global market for database software and services, which was worth roughly $9 billion in 2001. Each company is hoping to establish its technology as a way to sell new database servers or add-on servers specifically designed for integration, said analysts.
Database companies have been tackling the idea of the "federated," or virtual, database for years, although many attempts have failed because of the poor performance of distributed queries, said Philip Russom, an analyst with Giga Information Group. System complexity and the lack of a universal data language such as Extensible Markup Language (XML) also sidetracked earlier efforts.
But improvements in querying technology over the past two years, coupled with faster hardware and networks, make federated data or enterprise information integration (EII) systems--such as those built using new database technology from IBM and others--more credible, he said. One battleground for database makers supplying this new technology will be in customer-support call centers, which routinely need to access data from multiple sources.
"If EII vendors can address performance issues, then call centers will be the killer application," said Russom. "But at this point the scenarios are theoretical. It's very difficult to find a reference site where a team has implemented an EII solution."
Russom said EII is viable for running reports that analyze company operations that don't require rapid response from databases. Also, the federated data approach has its advantages over data warehousing projects, in which companies ship data to a central store at scheduled intervals.
While data warehouses typically cost $1 million a year to maintain, EII products cost tens of thousands of dollars and also deliver the most updated information, he noted.
IBM's Xperanto, which builds on XML, a standard for data exchange, is based on the concept of federated data management. Instead of creating a single, larger database --a model, in part, espoused by rival Oracle--a federated scheme creates a virtual database linked to all the relevant data. In this model, data sources are queried from their native locations and database management servers consolidate the results and make them available to users.
Proponents claim that the federated approach enables data to remain in the format and location where it best fits, allowing companies to avoid costly and error-prone data translations and new development.
For example, instead of building an entirely new database system for a customer support application, Xperanto-based servers would make it possible for customer support representatives to draw on several different, incompatible systems to answer a customer inquiry. That's possible, but not easy, using conventional methods.
IBM's competitors are also hard at work on technology to query varied data sources all at once. BEA Systems last year released itsfor WebLogic product, which relies on XML-based queries to cull data from multiple sources.
In the first half of this year, Microsoft is expected to release a beta-test version of its SQL Server database code-named, which will make it easier for the database to manipulate XML data in different data sources, according to Microsoft.
Microsoft is alsoa larger effort to integrate Yukon-like data query technology into its Windows operating system, a project in the works for more than a decade.
"We believe this is very significant shift for the data management industry," said Nelson Mattos, director of information integration at IBM. "This is going to cause databases to move from the notion of managing data that is only physically stored within (them) to a federated approach."
A long-standing debate
With Xperanto, IBM stirs up a longtime industry debate over how best to manage enterprise data. On the one side are IBM, BEA and Microsoft, which favor a federated approach. On the other side, the leading advocate of a more centralized approach is Oracle, which argues that fewer large databases are less expensive to maintain than a larger number of smaller databases. But Oracle databases can also query multiple data sources and handle XML as a data format, said Benny Souder, vice president of distributed database technology at Oracle.
"We think that a smaller number of larger nodes (databases) gets you economies of scale," Souder said.
IBM argues that companies need integration at multiple levels--between information sources, applications, and business processes--and it has invested in all three areas. By using IBM's programming tool, WebSphere Studio, a developer can create an application that exploits the capabilities of Xperanto, its WebSphereMQ application integration middleware and WebSphere Business Integrator.
"Customers are finding that Xperanto increases the productivity of application developers. If they are writing a J2EE (Java 2 Enterprise Edition) application and need to bring in data from three databases, they have to connect to each, issue a query, extract the data and join it at the application server level," explained IBM's Mattos. "With Xperanto, they connect and do one query and get the data merged the way they want."
IBM points to a handful of implementations in the life-sciences industry, where its customers used IBM's Data Joiner product, which was designed primarily for querying relational databases and mainframe-based flat-file systems. IBM is also trying to recruit software companies to exploit Xperanto within their own products. Palo Alto, Calif.-based Crystal Decisions, which sells software to create business reports, has signed on as an Xperanto partner.
IBM intends to manage information in relational databases, which are the cornerstone of most business applications, as well as in e-mail and content management systems that store documents. To handle both the structured data in relational databases and "unstructured" documents, IBM will rely on XML technology.
IBM will also include support for SQL (structured query language), a method for querying relational databases used by all database makers.
"We don't believe in a revolutionary approach. Our customers back the idea of leveraging data in existing environments and looking to get quick returns. And there is a huge investment in SQL," Mattos said.
IBM said the first Xperanto-enabled product will be a dedicated information integration server built on IBM's flagship DB2 database. It will include IBM's WebSphere Studio development tool for building applications that rely on distributed data, said Mattos.
In a Xperanto release planned for 2004, IBM will add the ability to write queries using the XML-based XQuery language, which is still under development. Other future releases will improve the ability to search and analyze text documents, Mattos said.