Rube Goldberg's DevOps legacy: Make your work visible

At its core, DevOps is all about speeding up delivery of value to your end users, without compromising on high quality. The basic algorithm is pretty simple: Find a bottleneck, remove it, and repeat until you can't make any more noticeable improvements.

But you can identify a bottleneck only if you can see into your delivery pipeline. Without transparency, your DevOps transformation is doomed to fail. You need to make your work visible, across the whole of your delivery pipeline.

Rube Goldberg machines are really good at that. Here's why.

How to Build a DevOps Toolchain That Scales

Rube Goldberg machines are surprisingly transparent

Rube Goldberg was a cartoonist who depicted machines that performed a simple task in a needlessly convoluted manner. This is of course the antithesis of DevOps, which abhors bottlenecks. So when you see Rube Goldberg machines mentioned in a DevOps article, it's not usually complimentary. Everything, from tools to processes, and entire deployment pipelines, suffers from negative comparisons to Rube Goldberg machines.

That's not surprising, because each step in a Rube Goldberg machine is designed to take longer than it should. Many of the steps in the machine can often be replaced with a single step that has the same result—or even be removed altogether.

Software delivery organizations looking to reduce waste also aim to replace unnecessary activities, or do without them completely where possible.

But the real, unsung beauty of the Rube Goldberg machine is that you can see everything that it does, in great detail. Imagine a Rube Goldberg machine with a cover on it. It would just be a puzzling contraption that takes an unfeasibly long time to complete its task and inexplicably consumes valuable resources as it does so.

DevOps craves transparency, too. Without it, you can't see what resources you're using, what processes are being performed, and what data is being moved through the system. And if you can't see these things, you can't fix them or improve them.

I can't tell you what the bottlenecks are in your system. But here are some ideas that might help you identify them.

Make your team's work visible

Individuals and teams have a finite capacity for work, and when they take on more work than they can handle, they end up being overloaded and frustrated, and tasks take longer to complete.

The work that is being done at any moment is usually termed "work in progress," or "WIP." The team's capacity, or the maximum amount of work that it can successfully undertake, is known as the "WIP limit." When the WIP exceeds the WIP limit, the team starts to get further and further behind.

Many teams have no idea of their WIP limit, and they take on too much WIP. The way to find out how much work they are undertaking is to make the work visible. A popular method is to use a Kanban board, which is a simple way of seeing what work is in progress, and at each stage.

Whether you use a physical Kanban board or manage it with software, you'll be able to see exactly what items are waiting in the queue, which ones are currently being processed and what stage they're in, and which ones have been completed. You'll be able to prevent additional items from sneaking into your workload under the radar, and you'll see which items are stuck in one stage, not moving to the next.

Make your tests, test data, and test results visible

You can improve your test strategy significantly if your stakeholders know what you're testing and how the software is holding up against the tests. A visualization of test coverage can help you see what parts of your software you're testing and, just as importantly, which parts you're not testing.

Similarly, by visualizing the results of the tests, a simple glance is all you need to understand how the software is performing. This makes it easier for team members to consider how their changes are affecting the rest of the software. And if you can filter test results by specific functionality, team members can quickly understand how healthy that area of functionality is.

Much data is associated with tests, whether it's metadata about the test, data used to drive the test, or results that are generated during the course of a test's execution. Making that data visible across the development pipeline allows your team to leverage it to improve the quality of the product.

Create a window into operations

The Agile Manifesto stresses the value we place on software that works correctly, so it's imperative that your team understand how the software is performing in the wild, and not just in the lab. The development and testing environments must mirror the production environment as closely as possible, which can only be achieved if the team has visibility into it.

Once the product is deployed in the production environment, the team must capture metrics from it, and act on them. This will give them insights into the actual user experience, and how users are reacting to that experience, often called "user sentiment."

Production metrics to monitor for user experience include the amount of time it takes for your application to start up, the response time from the user interface, application errors, crashes, and the time to restore service.

On mobile devices, resource consumption such as battery and cellular data should be monitored. All of this information can and should be used to optimize the user experience and keep your users happy.

Spread the word with ChatOps

Information might be visible, but that doesn't mean that someone is looking at it. Sometimes we know about an issue only when someone brings it to our attention, whether that's a user or a colleague. In today's IT environment, many organizations are using ChatOps, which connects humans and automated agents together within a chat room.

These automated agents, or chatbots, are services that can interact with humans and other chatbots, using a chat-based interface within a chat room such as Slack. This allows everyone in the conversation to share the same context and information and to communicate with one another using natural language.

Chatbots are often associated with production-monitoring systems. When an incident is detected, the chatbot will automatically create a chat room and invite the relevant participants, whether human or automated, to the room, and initiate the conversation, using the data from the incident. Participants can request additional information from the chatbots and can perform commands through them.

Because the conversation takes place in a chat room, the conversation is open and visible to all participants. This makes everyone aware of the issue at hand and provides tools that can be used to help gather more information about the issue and resolve it.

Design your DevOps pipeline with Rube Goldberg in mind

An effective DevOps transformation requires visibility to succeed. So when you build your pipeline, don't hide the inner workings.

Expose as much as you can, so that you and your team can see what's going on at each stage. You'll see where the bottlenecks are, understand what you need to optimize to make the pipeline more efficient, and deliver higher-quality software to your users faster.

Image: Máquina de Rube Goldberg en la base del Alinghi (by Flickr user freshwater2006)

Topics: DevOps