You Should Not Estimate Using Story Points

Do this instead…

7 min readMay 16, 2022

Modern software engineering best practice urges that you should not estimate the size of work based on expected time, but on something arbitrary, like story points, instead. The reasoning is that parties outside of the project team, such as sales, marketing, and upper management might interpret our estimate as to when something will be done. Or heaven forbid, they might even turn into deadlines.

Story points?

For those unfamiliar with story points, they are a set of numbers that can be used to estimate the size or complexity of a particular task. They are often a sequence of numbers like the Fibonacci sequence (1, 2, 3, 5, 8, 13, …) or prime numbers (1, 2, 3, 5, 7, 11, …). And sometimes t-shirt sizes (S, M, L, XL, XXL, …)

So instead of saying I think this small task requires 4 hours and the large task requires 5 days until completion (which was the traditional way tasks were estimated), you say these tasks are 1 and 5 story points, respectively.

By intention, the scale is meant to be relative. An M task is always less work than an L task. But the absolute value of M and L is allowed to change. It’s actually very natural for them to change if the team becomes better at predicting. Or on the other end, the team becomes more productive and is spending less time on tasks.

Originally the idea comes from the Scrum methodology. There these story points are used during the planning ceremony at the beginning of a sprint to help decide how much work to commit to. In a practice called planning poker team members each provide their estimate, then compare and discuss until they agree on the right number.

So why are we here?

Story points may sound like a sensible idea. But problem is that these often end up being codewords for an unwritten amount of time. Scrum recommends they should be based on “task complexity”. But because that’s impossible to measure, you settle with the duration of uninterrupted work instead. For example, a team might say less than half a day is 1 point, about a day is 2 points, two days is 3 points, a week is 8 points, or something like that. So in the end you are estimating with time durations after all!

In my experience, these story points only lead to confusion. They are just different names for the time required. Definitely at the beginning when the team needs to find alignment on the relative value of the story points.

Story points often end up being just a different names for the time required.

Okay, this may sound a bit crazy, but what if we reverted Scrum’s advice and went back to estimating with time again? This certainly would make it easier for teams to provide estimates. Everybody understands how to measure time.

Also, isn’t time the only thing we should care for anyway? Let's look at how estimates might be used in a business context:

“How long will it take to fix that bug?”
“Which release date should we tell our customers?”
“What is a realistic target for our 6-month roadmap?”
“Which tasks can we pick up this sprint? Do we have time for this large ticket, or should we do two smaller tickets instead?”
“How many engineers should we assign to this task for it to be completed before the end of the month?”

All of these questions are centered around time. Not something arbitrary like story points.

If the sales team asks you when the next product update is ready for release, you do not say “in 80 story points”, but you say “in around 5 weeks”. If your customer asks you when the new feature is ready, you don’t tell them “we ship once 2 L and 2 M tasks are done”.

In any case, for story points to be useful, they have to be converted to times instead!

Even sprints are in fact time-bounded. Usually, they are 2 weeks (though shorter or longer is also possible). When you are selecting the next sprint’s tasks during planning, you are really saying that your team of let's say 6 engineers each have 80 hours (minus meetings) to do work. So you need to pick a set of tasks that in total require no more time than that.

Some argue that the benefit of story points or t-shirt sizes is there to guestimate the effort, complexity, and risk of the work. The promise is that those factors can provide an indication of the time it takes to complete.

But first of all, how do you quantify those factors? Second, how do you convert them to real-time? And finally, what is the point?

Time is what we care about most. It’s what limits us to do more work in a week. It’s what constraints us when setting a release day. Customers don’t want to wait too long. They don’t care how much labor it takes us.

If a task is simple but takes a long time to do, then it will force up the release day. Simple. It’s the duration that matters.

Think about it this way, when you estimate a task, you first think how long it took a similar task before right?

So if estimates are fundamentally about time, why not predict the time required directly?

“But estimated time required is not time until completion”

I hear you thinking, “people are not able to work 100% of the time!”

You are correct. That’s why we need to apply a multiplier to the number of hours a task is predicted to take, to have an idea of when it will be done. These two values sound the same but they are not. For a number of reasons:

meetings
breaks
sick days
high priority bug fixes
context switching when multitasking
engineers are humans, not robots
humans get distracted
etc.

A multiplier of 2 may not be too far off. It’s what I use, especially when I’m uncertain of the complexity of a task. If I think that a task will take about 2 weeks of work, I tell my manager it will be done in 4. This accounts for all the other things that may come in the way.

What is the right multiplier?

For some, a number like 2, picked off the cuff, feels too crude. If that’s you, let me tell you about a method for calculating a more accurate multiplier.

Let's say you have a team of 6 engineers (other roles like product designers are usually not taken into account because they do other work). And the team works in sprints of two weeks. Normally that is 10 working days. But this sprint contains 1 bank holiday (UK term for a national holiday) so that’s actually only 9 days. Also, one team member is off for a whole week, thus 5 days, to go on holiday. And another needs to take 2 days off for moving house.

So in total, your team has a capacity of 6 * 9 * 8 - 5 * 8 - 2 * 8 = 376 working hours.

At the end of the sprint, you look at which tasks have been completed. You sum up all their time estimates. Let's say your engineers thought they would take 189 hours in total. Then the multiplier is 376 / 189 = 1.99 .

Knowledge of this calculated multiplier is quite powerful because now you can actually provide estimates externally as to when a task might be done. Ask the engineers to estimate the task and then multiply it by 1.99.

Though in order to do this confidently you first need to have finished a few sprints. The more data you have, the more certain you can be of your multiplier. Even better is if you take predictable time commitments like meetings into account. If you know how much time your engineers are usually sitting in meetings, you can subtract that from the hours they have to do heads-down work.

If you have used Scrum, this calculation might seem very similar to how you calculate the focus factor. That is because the focus factor is simply the inverse of the multiplier — a multiplier of 2, is a focus factor of 50%. (I don’t like the term “focus factor” though, because it sounds like the reason tasks do not get finished on time is that engineers are not focused enough). Another difference is that in Scrum the focus factor is applied to story points, while in this case, we apply it to the time required.

As a side note, when I say that estimates should be given in time, it doesn’t mean that any number (e.g. 0.3 hours, 1.2 days, 2.7 weeks) is allowed. You can still limit the options to like <1/2 day, 1 day, 2 days, 3 days, 5 days, etc.. The point is that the unit of estimation is time duration. This avoids confusion and subjectivity on what story points or t-shirt sizes represent.

Measuring performance

Another benefit of time estimates as they are harder to abuse for measuring performance.

For teams using story points, their velocity, which is the number of completed story points in a sprint, might be used as a metric for how well the team is doing. This may lead to them inflating the story points of each task so that they look better from the outside. This then defeats the purpose of the metric.

The question, though, is whether teams should be evaluated at all on how hard they work. Ultimately what we should care about is the amount of value they bring to the business.

Final words

The often-repeated advice to externally communicate estimates sparingly is still very relevant. Do this only when necessary and emphasize that they are merely estimates.

Also, don’t get too emotionally attached to your estimates. There is a multitude of reasons that can delay the completion of a task: unforeseen issues, scope creep, dependencies on other people, infrastructure instability, and sometimes you simply underestimated. We are human after all. Be honest about the why. Don’t work yourself into the ground. Nobody will die if you don’t (usually).

Take my advice. Start using time estimates again. But with a safe multiplier. Stakeholders will be grateful that you’ll be providing them with useful time estimates.

You Should Not Estimate Using Story Points

Do this instead…

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Martin ter Haak

Responses (4)