You are here

You are here

IT Ops automation: 3 things you can do today—and 2 you can't

public://pictures/davidl.jpg
David Linthicum Chief Cloud Strategy Officer, Deloitte Consulting
 

When it comes to monitoring and operations, how much can you automate? There are lots of new tooling available today for AIOps and related areas. If you listen to the hype, it's easy to believe you can automate most operations and thus avoid the need to deal with complexity, security issues, governance, and so forth.

However, those who deal with IT Ops, and certainly those with multi-cloud operations, know that the reality is quite different from the picture painted in marketing brochures or by some of the tech press. There are limits to how far can you go right now. It's important to become aware of the emerging realities and best practices of hybrid IT management. 

IT Ops automation is a growing need. When the use of cloud computing exploded, compounded by the explosion of multi-cloud complexity, the demand for tools exploded as well. Demand will continue to be high for some time. 

It's a natural iteration that IT Ops vendors will focus on tools that monitor both the on-premises and cloud environments of their customers. The hope is that these tools will work in any cloud or on-premises environment. That will require support for traditional technology stacks (including Microsoft and Oracle) and popular public cloud providers (i.e., Google, Amazon, and Microsoft). 

To understand how to do your own automation evaluations, you need to understand several automation categories. Here's what works, and what does not—and what you can expect in the near future from the technology providers.

Emerging automation categories

Consider IT Ops and how the new tools can automate operations. Keep in mind that the capabilities of these tools vary widely. Therefore, it's considered a best practice to come up with your own definitions for this emerging market before you move forward with tool selections. Both artificial intelligence (AI) and machine learning (ML) will be systemic to most tools you review, regardless of whether they are labeled AIOps or not.

Also understand that you will almost certainly need multiple tools, not a single tool. 

So, let's draw some IT Ops automation lines in the sand, with the caveat that a single tool may belong to more than one or two categories, which seems to be the trend. 

Root-cause analysis automation

These tools look at data transmitted from systems under management, find faults, determine the ultimate cause of those faults, and then kick off self-healing automation. 

The root-cause analytics category of IT Ops accounted for the largest market share in 2019, according to a report from Grand View Research. A new interest in fault diagnosis and Internet of Things (IoT) devices—which all leverage automation to take corrective actions—largely pushed forward this market segment. 

Analytics-oriented automation

IT operations analytics (ITOA) includes tools used to monitor and collect data from systems under management. They also process, analyze, and infer actionable conclusions from that data, which are corrected using automated processes. This means decision making and spotting threats that may interrupt operations. 

Asset-oriented automation

This category includes IT Ops solutions that manage the performance of assets and reduce operating expenses. The tools, and associated automation, help plan strategies for asset maintenance. The goals are to avoid outages, lower maintenance costs, help those charged with IT Ops make informed decisions around the management of those assets, and help the user of the automation carry out direct asset management processes.

Other automation categories include:

  • Application-oriented automation, or the automation of application management processes
  • Security-oriented automation that automates security processing, such as blocking an attacking IP address
  • Network-oriented automation, or automating the management of the networking layers
  • Governance-oriented automation, or the ability to leverage automation to carry out company policies 
  • Special-purpose automation, which is automation custom-built for a specific purpose such as IoT management using proprietary interfaces

Three things you can do with IT Ops automation

First, it’s helpful to understand what automation means from tool provider to tool provider. You'll find a wide range of options here.

Let's start with the positive: What features of IT Ops automation can do the most for the majority of enterprises, and how are they handled? These include providing APIs and prebuilt automation scripts and integrating with other IT Ops tools.

The yin and yang of APIs

APIs allow you to build applications to leverage the tools, but the automation applications are really a custom build each and every time. The advantage is that you can automate almost anything, such as an application that starts a backup system in a public cloud provider when the primary on-premises system goes down, as reported by the IT Ops tool. 

If you're thinking, "Well, that sounds like a lot of work," you're right. Yes, this feature allows you to automate anything that can be triggered by an IT Ops tool. However, it's the IT Ops team that must create the actual automation. That involves teaching the IT Ops tools to be the entity that triggers automation by spotting an issue in real time, or by spotting an issue that's determined through analytics. 

Prebuilding automation scripts

Perhaps a better approach is to provide prebuilt automation scripts, and maybe prebuilt traditional programs (e.g., Python) that can do most of the IT Ops tasks based upon triggers from the IT Ops tools. This means that the API is leveraged for you, and the IT Ops tool provider has done the majority of work for tasks such as:

  • Restarting a server based on errors coming from the database
  • Blocking an IP address that appears to be involved in a DDOS attack
  • Moving processing from a primary server in one public cloud region to another region to work around an outage
  • Based upon a root cause, launching one of 100 corrective actions that are preprogrammed 

This approach is obviously much less work, and you'll need fewer development skills among the members of your IT Ops team. However, most enterprises quickly run out of prebuilt routines, or those routines don't fit their exact needs. That forces them back to the first category, where they define most of their automation using the IT Ops API.

Prebuilt scripts may also come with some prebuilt integrations with other tools, such as security and governance tools. However, these depend upon your IT Ops tool providing the connections and relationships for you. 

You'll find that most prebuilt scripts are either integration-rich or integration-poor. If poor, you can count on your team having to learn the IT Ops APIs as well as having to build integrations as one-offs.

Playing nice with others

The least amount of work comes with those few tools that have most of the integration completed with the other major IT Ops automation tools that you'll need to leverage as your IT Ops automation "tool team." Moreover, these tools will also include most automations required to provide common resolutions to common operational problems. 

This seems to be the trend right now, as large and small tool providers are consistently told by their user bases that this feature is needed. 

While these are emerging, not all vendors have all stepped up to this approach, even as part of their product road maps. This is where it pays to ask questions. Many IT Ops teams discover these limitations after they have already committed to an IT Ops tool. 

What you can't do with IT Ops automation

The can't-do list could go a hundred items long, considering today's most-wanted features for IT Ops automation as contrasted with the current state-of-the-art as covered above. As part of your IT Ops automation selection process, it's helpful to review what's not available now.

The two most important can't-do's are complex automations and automations across tools in unified ways.

Complex automations

These types of automations can persist over days, weeks, and years. They provide the ability to deal with complex issues by using complex, pre-defined automations. If you think that as long as there's an API, then automations of any complexity are possible, you would be half-right. 

Even complex persistent automations require things from the API that are just not currently in place, such as the ability to maintain a persistent state or the ability to deal with one-off analytics that are invoked by the automation itself. For most, support for these types of automations just don't exist. 

Automation across tools in unified ways 

Examples here include the inability to reach out to the network IT Ops tool to switch routers to resolve a network issue that was actually identified from the analytical IT Ops tools, and not having a way to receive a notification if the switch was successful, or failed and needed to be resolved. 

Indeed, some network IT Ops tools provide only rudimentary APIs that don't return a status. That means you're flying blind, in terms of your automation, or you will need to build more complicated and costly solutions. 

It's important to understand that there is not a unified standard that covers how the APIs are developed. The lack of the sort of standardized automation that needs to occur across tools is still a limitation today. 

Hang in there. Help is coming

The good news is that the amount of investment that’s pouring into the IT Ops space right now means the limitations listed above are likely to fall away over the next few years. Also, the capabilities will only get better for the same reasons. 

The purpose of covering what works and what does not is to allow IT Ops teams to select tools with realistic goals based upon the tools' true capabilities and limitations. 

Keep learning