Velocity Metrics - What Could Possibly Go Wrong?
Velocity is at best a measure of estimation accuracy
Are you tempted to use sprint velocity as a metric that you can report and set goals on? Don’t. Here’s why.
What is velocity anyway?
Velocity is the rate per unit time at which we check off “story points” as being “done”
Story pointing is an estimation technique. Story points are dimensionless units of work. They are helpful for gauging the relative effort to complete tasks in a backlog. There is a good deal of evidence showing that developers are much better at estimating things relative to each other than at making standalone estimates. If the team looks at stories A, B, and C and assigns the most points to A, next most to B, and the least to C, you can be confident that A is going to take the longest. You might even be able to ask the team which ones will fit into a sprint. But you won’t be able to tell with any precision how long any of them will take unless you can compare them to other completed work whose duration you measured carefully.
So velocity is estimation units marked “done” per time period. Comparing points the team thought it could fit into a sprint with what actually fit reveals something about estimation accuracy and how well the team models their capacity. It cannot and does not tell you how productive the team is. Velocity is at best a measure of estimation accuracy, and mostly not a measure of anything.
Keep these ideas in mind:
Story points do not have units - of time or anything else. They are useful for relative sizing and little else.
Story points are not comparable from one team to another, or even within a team on separate occasions with any accuracy.
Under very particular conditions, velocity may tell you something about how well your team estimates. It does not tell you how productive they are.
What could possibly go wrong with making Velocity a Key Performance Indicator?
I’m glad you asked! If you read the above, you can translate velocity as “how much of what we estimated actually got checked off in this sprint”
In the best case, we may use this to find sustainable ways to be more productive. Some outcomes of careful discussion might be:
An improved Definition of Done (critical for estimation)
Better quality
Fewer distractions
Faster build and smoke test cycles
But more likely outcomes are:
Inflated estimates - creating the appearance of improvement by making a “point” smaller
Eroded “Definition of Done”- defects leak downstream in a misguided attempt to look good at velocity
Less test coverage - defects leak downstream for lack of test automation or manual testing
Death march - unsustainable working practices
You run a real risk of self-delusion and harm if people are being evaluated on the basis of data that affect product quality. To paraphrase Deming and Scholtes: trustworthy data or performance metrics - choose one. Focus on customer outcomes, quality, and cycle times. Productivity will follow.
Further reading:
What should you measure?
Customer satisfaction (See Learning Loops and Customer Sat for details)
Customer churn rate, same customer growth, and retention time stats
Field defect resolution time stats
Internal and external quality measures
Sprint and release timeliness
Build, smoke test, and deployment cycle time
Percentage of broken builds
Automated assessments of code smells, test coverage, and security vulnerabilities (assessed critically)
Automated test effectiveness: percent defects found and fixed at desk vs found in QA vs leaked to customers
Trustworthy data or performance metrics - choose one
Copyright © 2023-2025 John W Sadler Jr - All Rights Reserved