This post is adapted from a talk I gave at DevExecWorld in San Francisco, titled: “In a World of Data Science, Why is So Much of Engineering Still Guesswork?”
Finding a Common Language with the Business
You’re the leader of a software organization. Someone — maybe the CFO, maybe the CEO, maybe you yourself — asks what should be a simple question: How are we doing?
What’s your answer?
If you run sales, the answer is simple. You’ll talk deal pipeline and revenue growth. If you run marketing, you’ll talk lead acquisition and funnel conversion. If you run finance, you’ll talk P&L and balance sheet.
But what numbers do you use, what data do you point to, to show engineering performance?
To complicate the question a little, let’s assume that the teams across your organization use a variety of methodologies (Scrum, Kanban, etc.), as well as a variety of technologies. And remember: you’re explaining performance to a business audience. If your answer includes words like “story points” or “burndown,” you’ve already lost them.
Having spent most of my career building software companies, this isn’t an idle question for me. The answer, I think, lies in reframing what software delivery is about. At its most essential, software engineering is about taking ideas, engaging talented developers, and turning those ideas into working product. Ideas come in, we move them through our build steps, and what comes out is (hopefully) high quality, timely product.
Put another way, engineering is a pipeline: ideas enter, move through the pipeline of development, and exit as quality software.
So far, so basic. But this framing starts to unlock a way of both describing and measuring engineering performance, in language both business and technology leaders can understand. This is because we know how the performance of a pipeline should be measured:
- How much moves through the pipeline?
- How fast does it move?
- How good is what comes out?
- What does it cost?
There’s a lot of kinship between the way we measure the performance of a physical pipeline and the way we might measure the performance of a software pipeline. (It’s a big reason why lean manufacturing has had such an influence on the way we think about software delivery.) But the major difference historically has been this: physical pipelines can be instrumented for measurement. A software “pipeline” can’t, at least not without a lot of manual inputs and data gathering. We’re dealing with bits, not atoms.
This is what we set out to change. With Pinpoint we said, What if we could harness all the raw activity data that occurs inside the systems used to build software, and then used data science to turn that raw activity data into performance insights? Among other things, this meant deciding which measures — we call them signals, based on how we derive them — would best reflect pipeline performance.
Here’s what we came up with:
Taken together, these five signals give a fairly comprehensive understanding of how well engineering is performing:
- Backlog Change tells us how well we’re keeping up with demand;
- Cycle Time, which measures the average days from starting a piece of work to completing it, tells us our speed;
- Workload Balance evaluates how evenly work is distributed across people, which signals how efficiently the pipeline is operating;
- Throughput measures how much work we get done per person, per month;
- Defect Ratio tracks closed defects against created ones — i.e., are we squashing more bugs than we’re introducing?
To emphasize: we derive all of these instantly by harnessing the raw activity data in our Jira and GitHub and adding our own machine learning and statistical analysis. There’s no manual computation or people running around trying to collate information from different teams or systems.
How the Pipeline Drives Performance
Here’s a real example of how illuminating the software pipeline has helped us. When I looked at our pipeline performance over the last six months, I saw this:
Backlog-wise, we were closing more issues than we were opening, which is good. Our Workload Balance needed to be better, but it was trending in the right direction, having improved by 30% over the prior six month period. Both our Throughput and our Defect Ratio were strong and getting better.
But our Cycle Time — yikes.
Looking across all work types (enhancements, features, bugs, etc.), it was taking us an average of 102 days from start to completion. Worse, that was more than four times as long as in the prior six months! In a startup, innovation and speed are your competitive strength — at least they should be.
Digging in a little further, here’s what the product showed:
This is Pinpoint’s breakdown of Cycle Time by work type. The paler blue measures number of days in development; the darker blue is days in verification. So the actual bottleneck, across almost all work types, wasn’t in building but in testing.
In speaking with the teams, I learned we were dealing with a high number of handoffs, especially between our engineering and data science teams, who work to train to our models. This led to direct, specific action — for example, we moved from a legacy data architecture to a data lake. This meant any team could access the necessary data when and how it was needed, and freed the data science teams to build their own models without engineering dependencies. This was further complemented by work we did to improve the flow of work through our pipeline.
The pipeline view let me see where we had room for improvement, and more importantly what needed to be done. In the not-to-distant future, our data science will go one better: instead of me clicking on a diagnostic view to understand why a given signal might be underperforming, I’ll receive a specific, prescriptive action to take to improve whatever it is that’s trending in the wrong direction. But that’s a topic for another post...