The future cloud should fend for itself

Automation stands at the heart of the economics and operation of cloud computing services. However, today's automation is primitive and requires much manual engineering and intervention.

It is fascinating the ways in which the world of computing can be made easier, thus creating opportunity for new complexities--usually in the form of new computing technologies. It's happened with programming languages, software architectures, computer networks, data center design, and systems virtualization. However, nothing has raised the bar on that concept like IT automation.

Flickr/HelveticaFanatic

You may have been expecting to hear the term "cloud computing," but cloud is just an outcome of good automation. It's an operations model --a business model to some--that was only made possible by a standardization of the core elements of computing and the automation of their operation. Without automation, the cloud cannot be self-service, and it cannot scale to very large numbers of customers or systems.

The best part is that we are only at an intermediate stage in the evolution of operations automation--the second of several evolutionary stages in the growing capability of systems to fend for themselves in a global computing marketplace.

These are the stages we understand to some extent today:

  1. Server provisioning automation--The first stage of automation that we all know and love is the automation of server provisioning and deployment, typically through scripting, boot-time provisioning (e.g. PXE booting), and the like.

    When the server is the unit of deployment, server automation makes a lot of sense. Each server (bare metal box or virtual machine) can host one operating system, so laying down that OS and picking the applications to include in the image is the way to simplify the operation of a single server.

    The catch is that this method alone is difficult to do well at large scales, as it still requires the system administrator to make decisions on behalf of the application. How many servers should I deploy now? Which types of servers should I add instances to in order to meet new loads, and when should I do that? The result is still a very manual operations environment, and most organizations at this stage attempt capacity planning and build for expected peak. If they are wrong...oh, well.

  2. Application deployment automation--A significant upgrade to single server deployment is the deployment of a "partitioned" distributed application, where the different executables and data sets of the application are "predestined" for a deployment location, and the automation simply makes sure each piece gets where it needs to go, and is configured correctly.

    This is what Forte Software's 4GL tools did when deploying a distributed application, which transferred responsibility for application deployment from systems administrators to developers. However, this method requires manual capacity management, deploying for peak loads, and continued monitoring by human operators.

  3. Programmed application operations automation--Developing operations code adds critical functions to basic distributed deployment automation to automatically adjust capacity consumption based on application needs in real time. This is the magic "elasticity" automation that so many are excited about in the current cloud computing landscape. Basic scaling automation makes sure you pay only for what you use.

    However, today's scaling automation has one severe limitation: the way the "health" of the application is determined has to be engineered into application operations systems ahead of time. What conditions you monitor, what state requires an adjustment to scale, and what components of the application you scale in response has to be determined by the developer well before the application is deployed.

  4. Self-configuring application operations automation--To me, the logical next step is to start leveraging the smarts of behavior learning algorithms to enable cloud systems to receive a wide variety of monitoring data, pick through that data to determine "normal" and "abnormal" behaviors and to determine appropriate ways to react to any abnormalities. These types of learned behavior turn the application system into more of an adaptive system, which gets better and better at making the right choices the longer the application is in production.

    Though behavioral learning systems today, such as Netuitive's performance management products, are focused primarily on monitoring and raising alerts for abnormal behaviors, they can do some amazing things. According to CEO Nicola Sanna, Netuitive has three key calculations it applies to incoming data:

    1. It determines where one should be with respect to operations history.

    2. It performs end-to-end contextual analysis of existing conditions, determining what factors may be contributing to an operational abnormality.

    3. It forecasts likely conditions in the near future based on previous behavior trends, thus potentially averting abnormalities before they happen.

  5. There are other products making their way into this space, such as Integrion's Alive product, and I expect we'll see performance analytics become more intelligent in a variety of other traditional management and monitoring tools as well. The real excitement, however, will come as automation systems learn not only when to raise an alert but also what action to take when an alert is raised.

    This latter problem is a difficult one, make no mistake (a wrong choice might teach the system something, but it might also be detrimental to operations), but successful implementations will be incredibly valuable as they will constantly evolve tactics for dealing with application performance, security (at least some aspects, anyway) and cost management.

Crazy, you say? Why the heck would I want to give up control over the stability and operations of my key applications to a "mindless" automation system? For the same reason that--once you trust them--you will happily turn over your operating systems to virtual machines; your phone systems to managed service providers or your elastic workloads to cloud environments: optimization, agility, and cost.

The companies that adopt one or more cloud models for a large percentage of their workloads will see some key advantages over those that don't. Cloud providers that adopt the best infrastructure and service automation systems will greatly improve their chances in the market place, as well.

In the future, companies and providers that go further and apply learning algorithms to operations automation will increase their advantage even further. We just need a few smart people to solve some hard problems so we can teach our cloud applications to fend for themselves.

About the author

    James Urquhart is a field technologist with almost 20 years of experience in distributed-systems development and deployment, focusing on service-oriented architectures, cloud computing, and virtualization. James is a market strategist for cloud computing at Cisco Systems and an adviser to EnStratus, though the opinions expressed here are strictly his own. He is a member of the CNET Blog Network and is not an employee of CNET.

     

    Join the discussion

    Conversation powered by Livefyre

    Show Comments Hide Comments