The intersection of open source and cloud computing

There has been much discussion about the relationship between open-source software and cloud computing. How will the two trends affect each other in the years to come?

James Urquhart
James Urquhart is a field technologist with almost 20 years of experience in distributed-systems development and deployment, focusing on service-oriented architectures, cloud computing, and virtualization. James is a market strategist for cloud computing at Cisco Systems and an adviser to EnStratus, though the opinions expressed here are strictly his own. He is a member of the CNET Blog Network and is not an employee of CNET.
James Urquhart
4 min read

Cloud computing and open-source software have been intertwined since the early days of the cloud. Vendors such as Amazon.com, SugarCRM, Rackspace, and many, many others, utilized open-source choices for everything from virtualization to data stores to user interfaces.

Geograph/Ray Jones

Today, it is fair to say that much of the cloud was made possible by both the economics and malleability of open-source software. However, despite the widespread adoption of open-source software by the cloud community, the future of open-source software is greatly affected by the cloud operations model itself.

Take the interesting discussion about the future of the LAMP stack recently. LAMP--a software stack consisting of Linux, Apache Web Server and/or Tomcat, MySQL (or another open-source database engine) and the Perl, Python and/or PHP scripting languages--plays a critical role in the world of Web applications, but as I noted recently, it may not be as critical to the cloud.

The ways in which open source is most affected by cloud computing mostly revolve around the changing roles in the IT operations sphere that I talked about in my series on DevOps. Briefly, I pointed out that cloud drives operations teams from a server focus to an application focus. This means that different teams now operate the infrastructure components that an application depends on versus the application itself.

James Urquhart

Now, the definition of IT infrastructure has shifted significantly in the last 15 years or so, and much of that term now encompasses software as well as hardware. Operating systems have long been considered infrastructure in the world of client-server. Middleware and data bases, such as J2EE application servers and relational database management systems have also been largely described as common infrastructure.

In fact, a very common practice in enterprise IT organizations is to create standard builds of key software infrastructure stacks, to create a common operations framework on which application code is the only variant--at least in theory.

As many of these infrastructure components shifted to open-source options, they received a tremendous amount of attention from application developers. The reason for this was two-fold. The first was the fact that these projects were available for download for free--a characteristic the average developer loves in tools and infrastructure. The second is that developers were free to manipulate the entire software infrastructure stack if they so chose--though most rarely, if ever, actually did so.

Here's the thing. Developers who wanted to play with infrastructure code were able to do so for two reasons:

  1. The source code and instructions for building the software were freely available for manipulation on the developer's own system.

  2. The developer could then build and deploy the software on said system to test and then utilize any changes.

What changes in cloud computing is that deployment of infrastructure software is strictly under the control of the cloud service provider. If I'm a user of Google App Engine, for example, I can't go into the source code for their management systems, change something to suit me, and push it out to the wider Google service environment.

Of course, we want it that way--it would be ridiculous to allow anyone who wants to change the way App Engine works to affect all other users of that environment. The security implications alone make that completely unreasonable, much less the other operational problems it would present.

Which means that the only users of open-source infrastructure projects in the public cloud are the cloud providers. They may see themselves as responsible users of open source and contribute back, or they may not. In any case, the incentive for the average application developer to delve into infrastructure code is weakened, if not removed outright.

The good news is that "infrastructure as a service" companies like Rackspace, Terremark, or Amazon leave so much of the software infrastructure up to their customers (such as operating systems and middleware) that it will be quite some time before most projects see this effect. In fact, it might accelerate interest in the short term.

However, as "platform as a service" offerings proliferate, and enterprise developers increasingly go for the path of least resistance, it may only be a matter of time until most cloud infrastructure is supported only by the professional operations teams behind cloud services.

Ultimately, I think one of two things will happen. Either the cloud community will find ways to ensure that open-source infrastructure projects are highly visible to end users and encourage innovation by the larger community, or most such projects will be supported only by cloud providers who--as competitive businesses--will seek opportunities for differentiation. That, in turn, may actually kill the advantages of open source for those organizations, and cause increased forking or even abandonment of open-source projects.

I'm not sure how all of this will work out, but I am fascinated by the possibility of increased competition between a shrinking pool of active contributors to common open-source infrastructure projects. Will open source suffer, or will the developer community find innovative new ways to keep open-source infrastructure accessible to application developers using cloud services?