X

'R' language bringing statistical analytics to the masses (Q&A)

You know you've been dying to perform complicated statistical analyses. The R programming language might just be your ticket to success.

Dave Rosenberg Co-founder, MuleSource
Dave Rosenberg has more than 15 years of technology and marketing experience that spans from Bell Labs to startup IPOs to open-source and cloud software companies. He is CEO and founder of Nodeable, co-founder of MuleSoft, and managing director for Hardy Way. He is an adviser to DataStax, IT Database, and Puppet Labs.
Dave Rosenberg
3 min read

I recently had the chance to discuss the open source 'R' programming language with Revolution Analytics CEO and founder Norman Nie.

Revolution is the commercial organization supporting the open-source project and contains a number of technology bigwigs, including Nie himself, who was the co-founder of analytics firm SPSS and led the company as CEO/chairman of the board for more than 40 years before selling it to IBM in 2009 for $1.2 billion. The company has enjoyed some outstanding press mentions, despite the fact that the product appeals to a very specific user base.

R  coplot with interactions
R coplot with interactions R Project
R is similar to other programming languages like Java and C, but holds particular appeal for statisticians because it contains a number of built-in mechanisms for organizing data, running calculations, and creating graphical representations of data sets.

Considering predictive analytics is not on the tip of most people's tongues, I set up a Q&A with Nie to get a basic overview of why R matters and how Revolution plans to commercialize the software. The edited transcript follows:

Q: What exactly is 'R' and why does it matter?
Nie: Simply put, R is the most powerful statistical computing language on the planet; there is no statistical equation that cannot be calculated in R. This gives it unparalleled ability to sort through data sets and do predictive modeling. This is particularly relevant in today's business intelligence environment, given the explosion of big data and the increased emphasis organizations are putting on advanced analytic techniques.

R is also open source, so there is a community that is over 2 million users strong behind it. It is particularly well entrenched in academia, where today's university students (and tomorrow's future statisticians) are being trained almost exclusively on R.

Predictive analytics is a pretty heady topic, who uses this stuff and why?
Nie: Predictive analytics is statistical modeling by any other name. It's used by statisticians and data analysts across all verticals as a means to better understand their organization, both internally and externally. It helps identify trends and patterns, arming CIOs and other enterprise decision makers with the best possible information with which to make their decisions.

Predictive analytics has a proven track record in the business intelligence field. Wal-Mart, for one, has used predictive analytics to better understand the habits and tendencies of their customers. Predictive analytics was a driving force in influencing their competitive pricing model that gave them a competitive advantage over their market. Beyond retail, predictive analytics models are used extensively in financial services, pharmaceuticals, life sciences, and a host of other verticals.

Is this more than a niche?
Nie: As it currently stands, predictive analytics is more than a niche. Organizations recognize the importance of predictive analytics and the need to leverage their data for a competitive advantage. However, predictive analytics capabilities come at a high price tag today. Equally important, the commercial packages today are built on 40-year-old legacy technology and are therefore reaching their limitations.

Predictive analytics is a complex field that is generally limited to PhD-level analysts. With Revolution R, we intend to bring predictive analytics to the masses. We are addressing ease-of-use and scalability issues that will make R more accessible to business users and more suitable for the high data throughput demands of an enterprise.

Many academic projects remain in academia. Why is R different?
Nie: R is still relatively young and is only just now being recognized outside of academic circles. Its practically limitless statistical capabilities make it attractive to anybody doing data analysis, be they a statistics professor or a financial analyst. As more and more students are trained on R as undergraduates, there is an increasing demand in the enterprise. All of the major analytics players have incorporated or are planning to incorporate R into their software packages because of its capability to sort data. At this point, the future of predictive analytics is pointing directly to R, something we are calling the "perfect storm of predictive analytics."

How does the company intend to monetize, what exactly is it offering?
Nie: We are commercializing R through an open-core business model. We have created proprietary IP that we bundle with the open-source R engine to enhance usability, productivity and scalability with R. Revolution R Enterprise is available under a subscription model, and we also provide a free community version, Revolution R Community. Revolution R is intended for enterprises looking to run a cutting-edge predictive analytics engine at a fraction of the price being charged by legacy vendors.