Velocity Metrics - What Could Possibly Go Wrong?

Velocity is at best a measure of estimation accuracy

Are you tempted to use sprint velocity as a metric that you can report and set goals on? Don’t. Here’s why.

What is velocity anyway? 

Velocity is the rate per unit time at which we check off “story points” as being “done”

Story pointing is an estimation technique. Story points are dimensionless units of work. They are helpful for gauging the relative effort to complete tasks in a backlog. There is a good deal of evidence showing that developers are much better at estimating things relative to each other than at making standalone estimates. If the team looks at stories A, B, and C and assigns the most points to A, next most to B, and the least to C, you can be confident that A is going to take the longest. You might even be able to ask the team which ones will fit into a sprint. But you won’t be able to tell with any precision how long any of them will take unless you can compare them to other completed work whose duration you measured carefully.

So velocity is estimation units marked “done” per time period. Comparing points the team thought it could fit into a sprint with what actually fit reveals something about estimation accuracy and how well the team models their capacity. It cannot and does not tell you how productive the team is. Velocity is at best a measure of estimation accuracy, and mostly not a measure of anything.

Keep these ideas in mind:

  • Story points do not have units - of time or anything else. They are useful for relative sizing and little else.

  • Story points are not comparable from one team to another, or even within a team on separate occasions with any accuracy.

  • Under very particular conditions, velocity may tell you something about how well your team estimates. It does not tell you how productive they are.

What could possibly go wrong with making Velocity a Key Performance Indicator?

I’m glad you asked! If you read the above, you can translate velocity as “how much of what we estimated actually got checked off in this sprint”

In the best case, we may use this to find sustainable ways to be more productive. Some outcomes of careful discussion might be:

  • An improved Definition of Done (critical for estimation)

  • Better quality

  • Fewer distractions

  • Faster build and smoke test cycles

But more likely outcomes are:

  • Inflated estimates - creating the appearance of improvement by making a “point” smaller

  • Eroded “Definition of Done”- defects leak downstream in a misguided attempt to look good at velocity

  • Less test coverage - defects leak downstream for lack of test automation or manual testing

  • Death march - unsustainable working practices

You run a real risk of self-delusion and harm if people are being evaluated on the basis of data that affect product quality. To paraphrase Deming and Scholtes: trustworthy data or performance metrics - choose one. Focus on customer outcomes, quality, and cycle times. Productivity will follow.

Further reading:

What should you measure?

  • Customer satisfaction (See Learning Loops and Customer Sat for details)

  • Customer churn rate, same customer growth, and retention time stats

  • Field defect resolution time stats

  • Internal and external quality measures

  • Sprint and release timeliness

  • Build, smoke test, and deployment cycle time

  • Percentage of broken builds

  • Automated assessments of code smells, test coverage, and security vulnerabilities (assessed critically)

  • Automated test effectiveness: percent defects found and fixed at desk vs found in QA vs leaked to customers

Trustworthy data or performance metrics - choose one

Copyright © 2023-2025 John W Sadler Jr - All Rights Reserved