Story Point Estimation: Could Your Team Do Better? | Keyhole Software

7 min readAug 30, 2021

It can be rough to ask your development team to estimate work based on abstract story point values, especially when they are new to it or to each other. I know this and have experienced this in full.

So in this blog, I am going to share an exercise with you that will give every member of your team the same frame of reference for estimating the size of their work.

I call this exercise Story Point Benchmarking. (If you’ve never heard of agile story points, check out this primer.)

This post was first published on the Keyhole Software blog by Rusty Divine on August 30, 2021.

What Are Story Point Benchmarks?

Story point benchmarking is a meeting. You and your team will review the work you’ve completed in the recent past and sort it into piles according to the relative effort, then choose a story or two from each pile to represent the benchmark for that size of work.

Those benchmarks can then be used in estimation meetings to compare future work against. Even members new to the team can quickly become effective at estimating work using these benchmarks.

Performing the exercise I describe below should help both teams who are new to story pointing and teams who have gotten into a rut where every story is a 5 or an 8. Your team can repeat the exercise every year to re-baseline or whenever you start a new project.

Story Point Benchmarking Exercise

You should plan for this exercise to take about an hour to complete and an hour to prepare for. I’ve listed the steps below!

1. Using index cards, sticky notes, or an online tool like Mural, create your point cards.

If you are using story points, these would be 0.5, 1, 2, 3, 5, 8, 13, 20.
You could also use days, t-shirt sizes (xs, s, m, l, xl), or any other categorizations.

2. Create work cards for the last few months of work that are short (just a title, or brief description like “Add contact us form”), and eliminate any that seem like they were just a one-off work item.

The exercise will work best with about 30–60 cards for the participants to sort through.
The work cards should not have any former estimates on them. In fact, I wouldn’t put the actual time spent on the work card even if I had it available.

3. During the meeting, everyone on the team should participate in sorting the work cards into the story point piles.

Ensure participation by dealing the work cards out to all the members beforehand.
Encourage the team to discuss moving work cards between piles as they learn more about what kinds of work cards are going into the piles.
Hopefully, this generates some good discussions for the team.
Throw out any work cards that are causing controversy and delay.

4. Now, divvy up the point piles amongst the team, and ask them to pick two work cards that clearly fit into that pile in a way that they think the whole team would be familiar with. These are your potential benchmark cards.

If you started with enough work cards, you should have at least one work card per story point pile.

5. Take turns talking about the two potential benchmark cards, and see if the team can narrow it down to one. If not, keep both.

6. Record the benchmark cards in a document that can be printed off.

Sometimes the work card text can be edited a little during this to make it more broadly applicable (e.g. change “Add Terms Screen” to “Add a new screen with only words”).

7. Print off the benchmark doc for each team member and ask them to bring it to the next estimation meeting.

While estimating upcoming work, the team should refer back to their benchmarks and use them to debate where a work item best fits compared to the benchmarks.

Trust me, this exercise will save your team lots of time and frustration in the future. After completing the exercise, the entire team will share a context for estimating work based on work they did in the past and were already familiar with.

I know from experience that without doing this activity, it can take teams months to gain that common understanding.

FAQs, Tips, & Tricks

Why not just estimate in hours instead?

Estimation in hours tends to be overly optimistic. When someone asks me how many hours it will take me to do something, I reply with the best-case scenario of this work. Story points, on the other hand, are designed to be relative to other work the team has completed. It doesn’t depend on who did it, and it forces us to change our context to think about things in comparison to other similar work we’ve done.

If you find your team saying something like, “It will be 2 points if the expert does it, and 5 points if I do it,” then you need to let them know that points are not based on who does the work, points are based on the comparison to other work done in the past and the rest will average out over time.

Let me put it this way. Imagine you have two teams, one is made up of seasoned developers and the other more junior developers. Imagine that each team has an identical backlog and the points are also identical on all the stories.

If the two teams used hours to estimate instead, they would both perform the same number of hours of work in two weeks. The junior team just gets fewer work items complete. If they used story points, though, we could say the seasoned team might be able to complete 40 points in two weeks, and the junior team might only be able to complete 20 points in two weeks. Now imagine you shuffle the team members up. Both teams still complete the same number of hours, obviously, and they would also both complete 30 points.

Basically, it’s hard to make truly accurate hour-based estimates because hours are based on personal experience; they’re a subjective unit. Points are based on team experience instead, which makes them an objective, more universal unit.

How does complexity affect estimates?

When a team discusses complexity, encourage them to discuss comparative complexity.

For example, if the team is embarking on something they’ve never tried before, there will be a lot of complexity involved, which often drives the story points up artificially high. If you can, find examples of when the team tackled similar obstacles, and use that comparative complexity as an example.

You can tell them, “When we integrated with that third-party service last time, we didn’t know what we were getting into exactly either, but it turned out to be about a 13. Do you think this new work item is really any more complex than that one was?”

What if the team has no past work items?

It’s going to be difficult for a team that hasn’t worked together to get a common context for estimating work. If you have a backlog of upcoming work, you can try doing this exercise with the upcoming work instead of past work. That would still give the team the experience of sorting all the work out into the piles instead of trying to consider each new work item on its own.

You can also introduce the concept of story point benchmarking by making up a bunch of fake work cards. I have done this for user groups by using chores around the house. The participants take some work cards I’ve printed like “Sweep the floor” and “Replace garbage disposal” and sort those out into point piles. It gives them a good feel for what benchmarking entails.

If you’d like to try it, you can download and print some example cards here.

Should my sprint have 20 points or 100 points? How much should each member be assigned?

If your team is struggling to complete stories or you’re just starting out, then a good rule of thumb is that each developer on the team might do around 8 or 13 points per sprint. You can then predict how many points your team might be able to do in a sprint (your velocity) by multiplying the number of developers by that number.

After you have successfully completed 3 sprints, you can start calculating your velocity based on the average of how much work the team completed across the last three sprints.

Normally, I don’t favor considering testing effort in the points unless it is abnormally large. Teams who estimate a work item by asking developers for an estimate then asking testers for an estimate are just asking for trouble.

Some teams assign points to bugs and research items (spikes), and others don’t. I think it makes sense to assign points to anything the developers are working on, but not to things that business analysts or possibly architects are working on.

The team’s goal should be to create a fairly consistent velocity and to complete all the work committed to in a sprint. With any extra time, they can be documenting, researching, fixing bugs, working ahead, or addressing technical debt.

Conclusion

Your team will thank you for making their lives easier with story point benchmarking.

Everyone will quickly get on the same page when it comes to estimating work, there will be fewer arguments, and shorter estimation meetings. It only takes an hour, and the dividends just keep paying off — especially when you roll new members onto the team.

I hope you try it out, and if you do, please leave a comment here to let me know how it goes for you!

Originally published at https://keyholesoftware.com on August 30, 2021.