Automation first: How to get your enterprise ChatOps-ready

If you don't have automation for certain things in your IT Ops environment, but you want to control or "interrogate" your various IT systems through ChatOps, then that should be your motivation for bringing in ChatOps bot technologies. But you don't want the automation to reside solely inside the chat environment.

IT teams have been using various chat tools since the 1980s. The first, talk, allowed limited communications among users logged onto a UNIX-based system. Now there are many more, including Slack, HipChat, and Yahoo Messenger. and there are automation bots, such as Hubot, Lita, and Err, which offer a nice bidirectional push/pull for your ChatOps environment. 

But for reasons of scalability, availability, and other factors, it's best to start with automation outside the chat environment, then integrate it into the chat environment. Here's why.

Multicloud Monitoring: How to Ensure Success the First Time

Access to automation within ChatOps 

Technically speaking, you can run automation from the command line or with other tooling. The question is: What's a good, general architecture for automation? There are many automation tools out there. With some, the only interface you have is the web user interface or command-line interface (CLI). And triggering the automation from the command line can be hard. 

If the only way you can click a virtual button is by navigating through the CLI, then the automation isn't truly accessible. Besides, you don’t want your automation to be only in service of a web interface or a ChatOps integration. It should serve as a general-purpose mechanism to accomplish many other things. Once the automation capabilities are established, you'll want to integrate them with the chat infrastructure, the reporting infrastructure, and other processes.

Daniel Perez, services tools engineer at GitHub, described some complexities of ChatOps automation integrations at a recent DevOps Enterprise Summit. To push people into the chat tool, or to allow them to trigger automations, you have to build code inside your bot configuration to trigger the automation at the right time, he said.

For example, if you want to reboot a server from inside the chat tool, you call the command "reboot server," which takes one argument (the name of the server), and all of that is now inside the chat configuration.

You can put in a lot of code that does the work of rebooting the server—whatever it takes. But there's a better way.

Advantages of more general automation 

Imagine you put that same automation inside a more general-purpose script, maybe inside your bin directory that gets loaded on various machines. Then you can call that from anywhere, not just from chat. Which means you're sharing a best practice. As Perez explained it, you don't want to jam a lot of stuff into the chat tool's automation integration. Make sure your tool calls something else that's more general-purpose and shareable.

[ Special Coverage: DevOps Enterprise Summit 2018 London ]

The reason: You’re going to have many mechanisms for alerting. Chat may be one of the ways to do alerts when there's a problem. But if you're not in front of your laptop, or if you're on a device that isn't connected to a particular graph or pane of glass, you might want to get an email. Or get paged, or get an IM. You want to implement the alerting part once and have different outlets for the alerting automation, including, but not limited to, your ChatOps tool.

From within the chat environment itself, you want to be able to say, "Show me the time graph for a particular system," or "Show me the last 10 lines in a log for system x." 

The most useful approaches to alerting start small. It's less important which chat tool infrastructure you choose; it's more important that you have your automation done properly, ideally in advance of your ChatOps implementation. Once you have these universally available automations, you can invoke them from wherever within your network. The more "approachable" those automations are, i.e., the more accessible, the better.

Avoiding the tool-chain wars

You might anticipate that different IT teams—development, core IT, networking, etc.—have grown fond of different ChatOps communications tools over the past several years. By having more centralized automation in place, you don’t have to get into the "tool chain" wars, at least not heavily.

You simply want to make sure your shared automation is standalone and reachable through lots of different means. And if one particular team is motivated to use Slack, it can integrate using that tool. If another team prefers HipChat, no problem. The same automation works for both.

You'll get better results if you pay some attention to the automation itself, and less attention to how you interact with it. You'll be in a more powerful position if the automation is accessible.

Centralized authentication for chat room security

If you don't consider centralized security before you implement ChatOps, it's possible that people who can get into a chat room can issue commands to the infrastructure. You don't want just anyone to be able to log into a chat room and reboot a system or two.

So are you going to build security into the integration on the chat server side? No. Even if you wanted to do so, many chat tools don't have the facilities for that.

Instead, you want to pass the identity of any person typing commands into the chat tool to the back-end automation. That's where proper authentication should happen.

Auditors and regulators will be much happier with this approach. When they ask you: "Who has access to these four critical systems? Who is able to shut down a piece of infrastructure, or deploy to a new server?" you don’t want your answer to be, "Well, anyone in this chat room." That will not satisfy a regulator, especially after an incident.

Identity management is critically important in an enterprise. If you're just five people working in a garage, sure, you can trust each other. But once you're working for a large bank, a trading company, or a healthcare company, ID authentication is critical.

High availability of the ChatOps infrastructure

If you can't count on the ChatOps infrastructure to be there when you need it, people will stop using it.

If your ChatOps tool is the main way that your IT Ops team collaborates and solves problems under pressure, then the chat tool has to be highly available. You want to avoid a single point of failure, which means that the tool itself, as well as the connection to it, has to be highly available.

The automation that the chat tool will reach out to and trigger and the monitoring that will push data into the chat tool have to be highly available also. If part of the infrastructure goes down and that incident takes with it your ChatOps tools, then you're back to tin cans and string.

Still, of utmost concern is high availability of the automation infrastructure itself. If it does happen that the database goes down and the chat tool goes down, too, you're not totally dead in the water if your automation is still there. At least you should be able to run any given machine from the command line. 

Making ChatOps work at enterprise scale

Robust DevOps teams ensure that the entire software path, from development to production, is automated. That means an engineer checking in a simple change to code kicks off the continuous integration pipeline, which does compiling and testing, security testing, verification, and so on.

If you don't have this path from "laptop to live," to where your customer is consuming the change, then the ChatOps parts are not going to help you.

You need to have monitoring and notification established first before you bring in an enterprise-wide ChatOps capability. In setting up your chat environment, you'll need to assess all the pieces of your IT Ops environment that you've automated.

What things will you want working for you when you solve a thorny problem?

Some "automation" could be as simple as a command-line prompt. Some can be as complex as a multi-part sequence of scripts, stepping through a set of conditions. Regardless of the automation's complexity, you need it to know which machines need attention and to ensure secure access. In a large company, you must be able to answer in detail how your chat is set up because someday, you'll be asked about it.

For more on establishing ChatOps in an enterprise-wide setting, see Anders Wallgrens' presentation at DevOps Enterprise Summit London on June 25-26.