You are here

How to improve your agile development? Question how it's always been done

public://pictures/nickf.jpeg
Nick Fletcher, VP of Engineering, xMatters

When the xMatters development team transitioned a decade ago from the traditional waterfall software development process to the shiny new agile development methodology known as Scrum, our engineers had to question processes that had proved effective for years.

Major inefficiencies in the agile development process crept in during a shift to a service-oriented architecture and service-ownership model, so we began questioning old techniques and learned that things can (and should) be done differently—and better—than before.

Ten years ago, we were a small engineering team with a monolithic codebase that was compact enough that any engineer could work on any part of it. But as our customer base expanded, so did our development team. We went from four to 15.

Here's how keeping our thinking fresh along the way helped us scale successfully.

[ Get up to speed on quality-driven development with TechBeacon's new guide. Plus: Download the World Quality Report 2019-20 for lessons from leading organizations. ]

Shared ownership is the same as no ownership

When you have a small number of teams, everyone works on everything. As systems get larger and more complicated, that sense of ownership begins to erode. The chance of a developer revisiting a specific part of the codebase becomes less likely.

For this reason and many others, companies have chosen to move toward service-oriented architectures. One large codebase in a monolithic architecture becomes multiple smaller codebases that are easier to comprehend in isolation. Fairly organically, the concept of service ownership begins to take root, and teams claim ownership of one or more services.

Amazon captured this concept in a simple but powerful phrase: You build it, you own it.

At xMatters, we gravitated toward this concept, with teams building, testing, deploying, and monitoring their services from development through production.

Finishing what we started

Somewhere along the way, our team forgot "You build it, you own it," by not finishing what we started. Back in the days of our monolithic architecture, we had a shared codebase, bug backlog, and customer support queue. As a bug entered the backlog, it was triaged with a risk rating and developers would pick up the highest-risk bugs first and work their way down to the lowest-risk bugs.

The way we always managed the bug backlog was to allocate a percentage of our time toward fixing bugs. If the size of the bug backlog grew too large, we'd crank up the time allocation and—sure enough—the size of the backlog would shrink. Once it got small enough, we'd turn the knob the other way and spend more time building new features.

But in the latter half of 2017, the size of our bug backlog had increased by a factor of four and the knob just stopped working. For the first time in many years, we were looking down the road and wondering if we'd be spending all of our time fixing bugs.

[ Get up to speed with TechBeacon's Guide to Software Test Automation. Plus: Get the Buyer’s Guide for Selecting Software Test Automation Tools ]

Diving into the problem

Over a 90-day period, we identified that:

  • Only 68% of bugs were fixed by a person on a team with the best subject-matter expertise. In other words, one-third of all bugs were worked on by people who had little or no idea what they were doing.
  • Just 42% of escalated customer support tickets were being investigated by the best subject-matter expert to solve the problem, meaning more than half had little or no idea what they were investigating.

In a team of around 100 engineers, that added up to a lot of wasted time.

The forced time allocation for fixing bugs and support tickets also incentivized engineers to spend a specified amount of time fixing bugs, rather than achieving the desired outcome of having a higher-quality product. An engineer might spend 50 hours fixing one bug, and that would be considered just as good as spending 50 hours fixing 50 bugs.

Another side effect of focusing on the overall size of the bug backlog meant that unappealing (i.e., gnarly, hard-to-fix) bugs could exist in perpetuity. And they did. The average age of a bug in the backlog was just over 1,000 days!

A new process that auto-scales

We needed a new process that was simple to understand and made it easy for engineers to prioritize what was most important at any given point in time.

We amended our process by taking these steps.

Routing the bugs and support tickets to the best people

During triage, we classify and flag bugs and support tickets with the service or product area where they most likely originated. We have Jira automatically assign the tickets to the team that owns that service or product area.

Spending time needed to fix the bugs in their services

Rather than allocating blocks of time to work on these issues, we assign mean-time-to-resolution (MTTR) thresholds to each risk rating for bugs and support tickets. We now aim to fix bugs in anywhere between 11 and 180 days, depending on their risk rating.

This means we no longer have bugs sitting around in perpetuity, and engineers know when to direct effort to bugs based on their age.

The beauty of using MTTR for bugs is that it automatically scales the effort needed to keep the size of the backlog at bay. We no longer need to adjust the amount of time teams spend on fixing bugs and investigating support tickets. It changes over time depending on the age and number of the bugs in their owned services.

Monitor and reinvent

Most engineers prefer working on new features over fixing bugs or investigating support tickets, so our new process rewards teams for writing higher-quality code, which conveniently results in fewer bugs and more time to work on features.

The results have far exceeded our highest expectations. We halved the backlog size within six months, and after a year our feature velocity had increased by 50% (in terms of time spent as well as number of new features delivered).

We’re continuing to monitor the effects of this process change, but overall, the net effect has been immensely positive.

While the process works well today, that doesn't mean it will always be optimal. My hope is that one day someone will evaluate this process and invent a new approach that results in even greater levels of success.

[ Learn how to apply DevOps principles to succeed with your SAP modernization in TechBeacon's new guide. Plus: Get the SAP HANA migration white paper. ]

Article Tags