You see, Lynch, the United Kingdom's most successful Internet entrepreneur, owes much to the work of Thomas Bayes, the 18th century English minister. Among other things, the good reverend sought to prove God's existence through mathematics. But it was his writings on probability inference that provided the framework Lynch borrowed to create new ways to search text for concepts instead of key words.
In 1991, Lynch founded a company called Neurodynamics to develop pattern-recognition technology. Five years later, he created a spinout called Autonomy, which quickly outstripped its corporate parent to become one of the fastest-growing start-ups in the history of the United Kingdom. It also made Lynch the United Kingdom's richest Internet entrepreneur and its first software billionaire
Autonomy's stock in trade is a technology based on Bayesian principles to help comb through and understand the meaning of rich content. He says that's a particularly strong selling point in an age of mobile telephony, where corporate consumers regularly use e-mail, videoconferencing and phone calls.
"The issue for technology isn't the interconnection," Lynch says. "It's the switching and the understanding of content...When you think about the modern corporation, the value is in the intellectual assets held between people's ears. That's what matters."
Like most tech companies, Autonomy has had to grapple with the effects of the lingering industry slowdown. In the company's June quarter, sales remained flat with March. But gross profit margins inched up to an enviable 98 percent, even as overall net profits tumbled. CNET News.com recently caught up with Lynch, who had just arrived from the other side of the pond for a regular visit to the company's West Coast offices.
Q: Your second-quarter earnings didn't make for a four-star performance, but sales basically held steady, and gross profit margins actually increased from a year ago--to 98 percent. What is it specifically about the sector you're operating in that allowed for those sorts of margins, and why hasn't demand for your company's technology gotten whacked along with everything else these days?
A: Our business is to sell technology that automates a lot of tasks done by human beings. It's one of those sad ironies that at a time when people are cutting back and trying to cut head count, then automating things actually becomes a good seller. And that side of our business is doing very well. Now that companies have come out of the initial shell shock of the downturn, they're working out their strategies; automation is a strong story.
Meaning you can sell your technology as part of a larger cost-cutting story in troubled times?
If you examine it, 80 percent of the information that business processes is in a form that a human being likes, not the form the computer likes. In the industry, we get obsessed with the database--when in fact most business takes place outside of the database. We just point out to customers how much of their business is getting done that is still manual. So, really, you just give them an idea of those costs, show them what the technology can do, and then you've got an immediate ROI.
Let me ask the question I ask of everybody these days: When do you expect things to turn up?
Because of technical reasons of accounting, we recognize revenue very quickly. And we don't have consulting or anything like that--so when conditions change, we see it pretty much immediately. We're normally about a quarter ahead of everybody else in what we see. And what we're seeing is that the U.S. is depressed but stable. It's not getting any worse. Continental Europe, after an initial panic, is actually getting a little better and firming up, while the U.K. and Scandinavia, whose economies actually are a little bit more tied to the U.S. in general, suddenly got a lot worse. So, it's mixed.
Earlier in the quarter you issued a profit warning because of the downturn in European sales. Do you expect to continue to experience contract deferrals as part of the general slowdown in IT spending?
There's no drop in demand or in our order pipeline. The change that happened is that when the people are doing procurement, it goes off to the CFO, and that's where it will be held--and it will be held because the CFO says he will not spend more than $500,000 until he's reviewed it. And so over the next few weeks, they sit down and work out what it is they really have to do. What we've seen is that orders do get held at that point but typically, over the next three to eight weeks, they come through.
OK, enough financial stuff. You've been quoted as saying that in a few years' time, people are going to look back at the shift from structured information to unstructured information as being a far bigger change than client-server was, or even Web sites. On the hype meter, that's a 10. Why are you confident in making such a bold call?
I'll defend that (laughing). Let's go back to the basics of IT. Ten years ago, if you were in a company, you knew what your company did--let's say selling teddy bears. Every time a teddy bear was sold, somewhere in the database table was the number of teddy bears that you had in stock. You took minus one off of that, and that was your business. Now if you look at the Fortune 500, it's very difficult to say what will be the major revenue generator for a Fortune 500 company in ten years' time. It bears very little relationship to what it is now.
Well, the world has changed, obviously.
The world has knuckled down! If you look at Lucent's fortunes, for example, what that means is your IT systems have to flexible. The same IT system that you set up with a relational database and your teddy bear sales is not relevant when you're doing three different businesses. Now the one beauty of unstructured human friendly information is that it's inherently flexible. If I send you an e-mail, I could be talking about anything and anybody. I'm not just sending the number of teddy bears being sold; I can tell you about how they were sold, what my thoughts are on this, what we should be doing--all this sort of thing.
So basically, we are using structured information because the computers are stupid. It's all that the computer could handle. As a technology, it is very slowly catching up to what human beings could do. You've got the richness of unstructured information, and that's what's driving this incredible rise. Eighty percent of business is being done this way. If you look forward three or four years--another outrageous statement--I don't think you'll see a piece of significant enterprise software sold that can't handle unstructured information as effortlessly as it handles database information now.
Your technology has its roots in the work done by Thomas Bayes. I would hazard to guess that most folks on this side of the pond haven't heard of the good reverend. Can you briefly explain his theorem and how it influenced your work in launching your first company and then Autonomy?
I was actually doing my Ph.D. in pattern recognition and coming toward the end of it, sadly, and suddenly at that point there was an explosive interest at the very theoretical end in Bayesian inference because a couple of particular problems on how to use it had been solved. I was extremely skeptical, but it was one of these strange things at Cambridge, where I was doing research, that people would leave the night before--normal people--and come in the next morning as though religious conversion had happened. And I'm actually an engineer by trade, so I really have no interest in anything theoretical until I started using the method.
How does it fit the real world?
Most science is about some kind of idealized equation. You apply that and get an answer. You may remember at high school--when the answer didn't actually match what you got when you did the experiment--you might fiddle the results in order to get the right marks. Well, the Bayesian approach says, well, we know that there are things that are not quite right and we should be building that in. It's all about probability. So, rather than this obsession of true and false where one describes the world in computing, it's all about shades of gray. And the other big thing about this is it's beta-driven. It learns by example. So if you're trying to use it to understand the documents in a company, it picks up the slang used by workers in writing to each other. That's the sort of thing that if you tried to program in advance, you'd never be able to do.
So to the degree that we become even more inundated by information in this Information Age, that should be good news for Autonomy?
Well, we shouldn't be becoming inundated with information. The issue is to get the computers to act like the telephone exchange and work out what we really need to see. A great paradox in my work is it's all about leveraging the human being; although on one end, it's about removing them from any mundane tasks. It's actually about saying that the skilled employees in an organization are the real value. You're trying to solve the problem in say, London, and the system--because it can read all the content being generated in the company--can let you know you really need to talk to this person in San Francisco because they really know about this, or it will let you know there's a person in Stockholm who has already worked on that. Now, that is leveraging my brain massively in terms of productivity.
In brief, you're attempting to eliminate the drudgery associated with the task?
It's amazing. You go into a large oil company and you find some work being done over and over again. These kind of systems can just stop that happening and say, look, you're writing this--as you literally write this into, say, Microsoft Word, the system reads it, understands it and says, "Hey, this work has already been done over here. And so, look at this." Now that's an example of leveraging the work that's going on so that person now can leverage the work that's already been done and get on with something else.
Did Microsoft use Bayesian inference in Microsoft Office for its pop-up cartoon paperclip?
They started off in very basic implementations. They're trying to do some other things. A lot of the help system apparently in XP--where it actually works out where to take you in the Help and how to interpret your problem--is Bayesian driven. We're much more specialists in using Bayesian methods to understand the meaning of rich content.
You've got a roster of clients--the U.S. Defense Department, General Motors, Procter & Gamble--big organizations that need to sort through unstructured data. To what degree has Bayesian Theory moved beyond academia to the technology industry? Is it still at a relatively early stage?
Well, what's your opinion on jet engines when you fly to New York? Do you worry about the bypass ratios very much?
It's not at the top of my list.
That's the point. Autonomy has nearly all the top players as you go down each industry. They don't care (about Bayesian Theory.) What they care about is that when they try the stuff, it works. And it does things you cannot do using traditional technologies.
What's the pitch?
You say, "Look at what you're using people to do in your business. If you're using them to read e-mails or using them to link articles in an online publishing site, that's a waste. That's something that can be done automatically and as accurately." And they may look at you and may think, "Well, I don't believe you can link online news articles as accurately as the editors." And you say, "OK, we'll come back on Thursday. We'll put an editor in one room and a machine in the other, and we'll get two of your other editors to blind test and score the thing." That's all they're interested in. It works. They actually don't care whether it's little leprechauns or the computer that does it--as long as the results are there when they actually do the test. You can do a test in an afternoon and prove the point. Almost every one of our customers has done those tests before buying.
The bulk of your business is in text search and retrieval. Where are you in terms of moving into voice?
We see voice as key. Again, just look at how much business is done by phone calls. Speech recognition for various technical reasons is on the edge of working by using the approach we use by treating speech as a phonetic string of symbols; you don't actually do a full recognition. It's cheating, really. But what it allows you to do is find phone calls talking about the same thing or link phone calls to e-mail. Say that you want to treat phone calls just like it's an e-mail; I want to be able to search it. I want to be able to link it, and I want to be able to reroute it. It's a big, new area and very, very powerful.
If that scenario works out, then you're suggesting a much more fundamental change in the way work gets done.
Let me give you an analogy. In the early days of computing, if you were writing an accounting package and had to get a file off one of those big disk drives the size of a filing cabinet, you would actually put the code to access the disk drive in your accounting program and have it say, "Move the disk to track 60..." And then some bright person thought, "That's really stupid. Each time I send the disk drive, I've got to rewrite the software. I've got to understand all that rubbish even though I'm an accounting software writer."
Lets's say I'm running a CRM app. That app has probably only spoken to a relational database. But now, it's gotten more complicated. I've got to talk to the e-mail server. I've got to talk to Lotus Notes. I've probably got to talk to the Web--all sorts of other data sources. And so I'm building all the plumbing for my CRM app into those. With the latest CRM systems, I'm having to listen into incoming phone calls and put up stuff for the rep to deal with. And you're going to see very soon things like video and image.
It also means that you've got a rat's nest of plumbing to consider.
There's the other big change. The app used to go with data. Now it's not like that. The e-mail may be coming in from Lotus Notes, but the CRM system needs to see it, and other systems need to see it. So not only do you have this massive great plumbing nest, it's going across the enterprise. What you need is a layer to make everything data agnostic. So if I'm writing a CRM pipe, I want to be able to call something and have one pipe that talks to that thing--whether the stuff that comes up is a phone call or a .PDF file or whatever--it's just that one point.
That also suggests a lot of integration of multiple systems.
The computer industry has rubber-ducked the issue of integrating multiple systems. What you actually want is something I've termed integration for understanding. The idea there is that if I'm working at GM and I've got a problem with the back spring in some car design I got, I just want everything integrated by virtue of just talking about that fact. I don't care whether it was in the Notes database or the phone call or the e-mail. That's the package I want delivered to me to work on the problem. I cannot believe that people are going to write non-data agnostic apps for much longer because it's a nightmare. The other thing is it shouldn't be the apps problem to do the integration for understanding. I want everything--I don't care where it comes from--to help me with this issue.
There's going to be a lot of code writing going on to get all this stuff working the way you envision.
I don't believe you're going to see people continually writing more and more complex apps talking to all these different types of data. It's just not going to happen. The other side of it is it needs to be automatic; you can't rely on people putting meta-data in to make this work. It's got to be generated as a by-product of the business or the company.
Are there any privacy considerations as Bayesian technology becomes more widely accepted by the corporate world?