Why You Can Trust CNET

Our expert, award-winning staff selects the products we cover and rigorously researches and tests our top picks. If you buy through our links, we may get a commission. Reviews ethics statement

What happens when clouds (inevitably) burst?

While system failure is inevitable, cloud computer providers must keep their users in the information loop.

Dave Rosenberg Co-founder, MuleSource

Dave Rosenberg has more than 15 years of technology and marketing experience that spans from Bell Labs to startup IPOs to open-source and cloud software companies. He is CEO and founder of Nodeable, co-founder of MuleSoft, and managing director for Hardy Way. He is an adviser to DataStax, IT Database, and Puppet Labs.

See full bio

Dave Rosenberg

March 16, 2009 4:06 p.m. PT

3 min read

Microsoft became a true cloud provider this past weekend as it experienced nearly 22 hours of downtime on its fledgling Azure Services Platform. The cause of the outage has not yet been disclosed to the general public or the Azure user community.

In contrast to on-premise systems, in which the user is responsible for dealing with infrastructure problems, a big part of the appeal of the cloud is the fact that you don't have to manage your own systems, or deal with the inevitable failures that occur.

It's easy to go off on a tangent about the necessity of monitoring the cloud, but the real issue is one of communication. If Microsoft wants to be taken seriously as a hosting provider--especially one defining a very nascent wave of technology--there needs to be more information beyond what a single admin updates on an MSDN forum.

Of course, we would also assume the same thing of other cloud providers like Amazon Web Services, Google App Engine, and Salesforce.com, all of whom only provide the most basic uptime details (green=good, red=bad) with little to no explanation as to what exactly is being monitored. The obvious argument is that users don't need to know...until something goes wrong and information is scarce.

Third-party services such as Hyperic's Cloudstatus.com provide additional insight, but cloud vendors themselves have to become much more ardent about system status and the implications. How can vendors help to assuage issues related to outages?

• Visibility: Give customers immediate (real-time) visibility into the availability and performance of the services that you are delivering to them.

• Transparency: The performance and availability data needs to be freely available. Don't hide these metrics behind a login or some complex credentials-only mechanism. Companies who use this rule will succeed, and they will set the standard and force the rest of the industry to follow.

• Trust: Above all else, report accurately. The most important asset a cloud services provider has is its reputation. Customers will forgive a service disruption--we all know computer systems have their periodic hiccups. Customers will not forgive anything that is less than honest and forthcoming.

This leads to one of the larger questions about cloud adoption: what happens when things go wrong? And are you prepared when things go bump in the night?

As a user, what is your backup plan if your cloud provider fails?
As a provider, what are you doing to communicate effectively with your users?
As a provider, do I have the run-book in place for a large-scale outage?

In the case of Azure, there aren't yet many commercial applications currently running. Still, it's Microsoft's responsibility to be on top of the status of their services and be constantly communicative when things go wrong.

Availability is paramount to any other perceived risk of using the cloud. Issues like security and latency have always been concerns, but nothing else matters if the cloud platform or application isn't available.

One interesting technical aside: Azure appears to have a required five-hour, full reboot of the system, which is probably fine now as the user base is fairly small. But just think about how long it would take to reboot all of Amazon Web Services. (An AWS total reboot is unlikely to happen as Amazon's service is built in zones. But hey, you never know.) Or how about the impact of 17 hours of intermittent availability plus 5 hours of reboot time in the context of AWS? Literally hundreds (thousands?) of businesses would wind up offline in some manner.

As Gavin Clarke wrote on The Register, "Microsoft wanted to offer people the full cloud experience. Well, now it has."

Follow me on Twitter @daveofdoom

Services and Software Guides

VPN

Cybersecurity

Streaming Services

Web Hosting & Websites

Other Services & Software

Services and Software Coupons