IBM chief scientist seeks patterns in patterns
Big Blue's Jeff Jonas sits down to discuss how big data can help businesses reduce risk and increase opportunity.
Despite what is often considered to be a conservative approach to business, IBM has no shortage of big thinkers who use their skills both internally and externally to influence the way the company thinks about technology and how it applies to business processes.
This week I met with Jeff Jonas, chief scientist, IBM Entity Analytics, to talk about how predictive analytics is moving into new realms of big data and how companies are using software to deal with the deluge of information.
Jonas joined IBM in 2005 when Big Blue acquired SRD, a company he founded to develop so-called extraordinary systems with specific data analysis tasks, such as facial recognition and analysis systems casinos use to catch cheating gamblers.
The main thrust of Jonas' research right now is trying to figure out ways to better take advantage of as much data as possible as fast as the transaction is happening--with an eye toward real-time predictive analytics. This is basically pattern detection in real-time, based on patterns that may or may not exist already.
Jonas explained that you may not know of a pattern, but you want to find one, and that many might be interesting but they don't always matter. In the casino example, bad guys are looking to perform channel separation by mixing and matching, people, places, and things, but the casino needs to do channel consolidation to aggregate information and determine an immediate course of action.
Another example is the interest in analyzing social media data. Jonas contends that if you can't count, you can't predict--for example, a government watching for a SARS outbreak needs to comingle channels to better ascertain geolocation as well as the gravity of the data. Counting and pattern matching the data leads to better decision making.
From a recent Jonas blog post:
The single most fundamental capability required to make a sensemaking system is the system's ability to recognize when multiple references to the same entity (often from different source systems) are in fact the same entity. For example, it is essential to understand the difference between three transactions carried out by three people versus one person who carried out all three transactions.
Without the ability to determine when entities are the same, it quickly becomes clear that sensemaking is all but impossible.
And, according to Jonas, the more data the better. If you can reduce the number of puzzle pieces with solid blocks you are able to eliminate noise, however, systems need to be smart enough to re-examine themselves and determine if information that was discounted is now valuable.
Data visualization certainly helps with sense-making but it's the ability to consolidate the channels and have non-obvious relationship awareness to determine threats--inside and outside.
At the moment the majority of this type of analytical software is on-premise but will move to the cloud as soon as IT staff and large corporations become totally comfortable with the privacy and data integrity issues. After all, you don't want your entire trail of GPS movements to be exposed to the entire Internet or hackers who might use the data against you.
That said, the amount of data coming from cloud-based and mobile systems is growing exponentially, and the ability to process the data at the edge, is another way to get the best possible real-time analytics to be able to look back on decisions and analysis in the past and ensure accuracy and reliability.