Initiatives to measure software development are a source of all kinds of engineering misery. Measures, metrics, OKRs, KPIs...the work to define these things is almost as miserable as the work to implement them, which is still less miserable than the work of living under them.
Many (most?) measurement initiatives become vast efforts of over-engineering that collapse under their own weight. Some common pitfalls:
- Looking at the wrong thing: e.g. a metric that's too far removed from the business goal. ("Hours worked" was once a classic of the genre.)
- Looking at everything: just because we can measure something, doesn’t mean we should. Too much data are as bad as too few. It just becomes noise.
- Measures that don’t suggest action: measures should ideally point to a reasonable action or specific concern. If upon reading some metric, the reaction is, Now what? the metric isn't helping.
- Measures that are too difficult to derive: if calculating the measure requires too much in the way of extra calories (think: pivot tables upon pivot tables) or unnatural acts on the part of teams ("remember to record what you did over in this system here..."), it's dead on arrival.
Against these, the temptation is to say: Forget it. It's too hard. Engineering can’t be measured.
That’s an understandable reaction. It’s just not a winning one.
In developing Pinpoint, a first order objective was this: We want an actionable understanding of engineering performance, that doesn’t require any new or unnatural acts on the part of teams, nor a separate army of data crunchers to assemble, manage and interpret results.
In practice, this meant rather than interrupting busy engineers to ask them what they’d done, we’d let them work and instead look to the systems where all that work occurred. We’d use the actual raw activity data as our input, not secondary reports, or recollections, or opinions.
That still left the comparably harder work of figuring out what to measure, and how to derive it. Since we’d all spent time in the KPI salt mines, we had some hard-earned ideas about the criteria of a good metric:
- It must be discoverable by machine intelligence. No manual reporting, no manual digging.
- Its value must be clear and intrinsic. If a metric activates lots of debate over its worth, it flunks.
- It must suggest logical action. On seeing the metric, it should be plain what must be done.
Also, we decided not to call them “metrics” or “measures” or “KPIs.” These were something new.
We call them signals.
How signals become insights
We’ve talked about the value of framing software development as a pipeline, one that receives ideas and turns them into working product. Framed this way, what we want to know about engineering is: how much do we get through the pipeline, how quickly do we do it, how good is what comes out, and at what cost?
We started by deciding which signals would be most reflective of end-to-end (or top to bottom) pipeline performance:
But we didn’t stop there. One of the powerful aspects of a signal is that we use data science to make correlations to other signals, all of which are derived from the raw activity data of source systems. Why does this matter? Because, to take one example, if your cycle time is trending in the wrong direction, you and your teams will want diagnostics on where and why.
In fact, the more signals we surface, the more powerful the insights we can deliver. We can make ever more fine-grained correlations between pipeline performance signals and diagnostic ones. In this way, signals are like colors on a palette: the more colors, the richer the picture you can paint.
And it doesn’t stop with correlation. With correlation comes the ability to make recommendations—and ultimately predictions—all by applying data science and machine intelligence to signals.
If that sounds vague or sci-fi, consider something specific like project forecasting. Today, teams must essentially guess how long it’s going to take to finish their work. But with signals like Cycle Time and Throughput, we can make a determination, based on historical data, of a given team’s likely time to finish. We can even assign a probability to various due dates.
Use cases like these show where the real power lies. Instead of being strictly an accounting of past results, measures like these become a means to suggest—to signal, yes—the actions we should take to shape the future.