Velocity Is Not Your Goal. It’s a Symptom.
If you haven’t read Graham McNicoll’s recent post on experimentation velocity yet, go read it first.
It’s good. Graham gets it.
Graham McNicoll and Kelly, in front of “Sheer Cliff, Stay Back” sign. (We’re clearly rule-followers.)
One of the things I appreciated most about Graham’s post is that it tackles something our industry desperately needs to recognize: velocity has become a target on a dashboard. And once that happens, we run straight into Goodhart’s Law (when a measure becomes a target, it stops being a good measure).
That doesn’t mean velocity is bad. Of course, we should measure it. But it was never supposed to be the goal. It’s a signal. A symptom. A health indicator.
A resting heart rate of 45 might signal elite athlete or bradycardia, depending on the context. (I assure you, for me, it’s not the elite athlete signal… I only run if someone is chasing me. Please don’t chase me.) Experimentation velocity works the same way.
The “right velocity” depends on your company, your culture, your technical setup, your governance model, your product maturity, your staffing, leadership expectations (sorry - it’s true), and who knows how many other considerations we have tried to explain to leadership over the past few decades. The point is - there is no “best practice” here. It depends. And the answer isn’t always “more is better”.
You absolutely can increase velocity by lowering quality standards, prioritizing tiny low-risk tests, skipping strategy conversations, or eliminating governance entirely. Congratulations. You’re faster. But are you smarter? Is your product better? Did you make things better? Or only different? (Thank you, Erin. 💜)
Running faster with scissors doesn’t make things better. It just makes things messy - for you, and your customer.
Most Dashboards Suck Monkey Balls (sorry, not sorry)
Most dashboards today are basically operational exhaust. And that includes experimentation program dashboards - sometimes called “scorecards” or “health checks”.
Throughput (number of ideas submitted)
Velocity (number of tests run - once again, see Graham’s article for why this is a bad target)
Win rate (defined in various confusing ways that might make the team look better but get you no where closer to what really matters - learning - plus, it should be a range, not a target)
Financial impact (short-sighted, and getting you further away from the real goals of improving the end-user experience and the relationship between the product & the customer or the brand and the customer - remember, when the measure becomes the target…)
None of those are inherently bad things to measure - it’s just that none should be targets. Because the problem is that many organizations never stop to ask what those metrics are actually supposed to help accomplish. “What decision will you make with this information?”
Part of the problem is that organizations blur the line between outputs and outcomes. And we need both. But it’s important to understand the difference, and outputs should never be tied to targets.
Outputs are the measurable activities of the program: velocity, win rate, ideas submitted, readouts delivered. These things matter because they help us understand how the system is operating.
But outcomes are the thing we’re actually trying to influence: improving customer experiences, improving products, strengthening the relationship between the customer and the brand, increasing revenue, reducing the cost of being wrong.
Outputs measure the operational activities and should not have targets
Outcomes are what we’re trying to achieve and are tied to targets
Ideally, healthy outputs contribute to better outcomes. More quality experiments should create more opportunities for learning and growth. But when organizations stop treating outputs as measures and start treating them as goals, things break down fast. That’s the Goodhart’s Law problem. The measures became the targets.
If velocity increases while the product fails to improve, is that a success? If teams run more tests but none influence roadmap decisions, is that success? If you launch hundreds of experiments nobody learns from, is that success? If you have a high so-called “win rate”, but all you’re doing is running “safe” tests, validation tests, iterations, and you’re not pushing for bigger bets, is that a success? If you hit all your “targets” for these metrics, but you learn nothing and your customers are unhappy, and your product gets worse, is that a success? If short term revenue is going up, but retention is going down, and your highest value customers are leaving, is that a success? If lots of ideas are coming in, but those ideas aren’t tied to solid research or backed by data, or are never actually tested, is that a success?
Probably not. But it happens all the time because organizations optimize for activity instead of outcomes. Outputs help us understand how the system is operating.
Outcomes help us understand whether the system is accomplishing something valuable.
This is where Graham’s post resonated with me the most. He’s not arguing velocity doesn’t matter. He’s arguing that chasing it blindly can create deeply unhealthy behavior.
Two Magic Questions
Some of the smartest thinking I’ve encountered around this comes from two very dear friends and former colleagues: Tim Wilson and Joe Sutherland, authors of Analytics the Right Way.
Tim is lovingly referred to as “The Quintessential Analyst,” which he absolutely hates. We all refer to Joe as “Dr. Joe,” which, I figure, he’s earned. At minimum, he’s never asked anyone to stop calling him that. He earned the PhD. We’ll allow it. Tim, meanwhile, would very much prefer to remain behind the scenes and asks that you please NOT hug him, thank you very much.
Kelly, hugging Tim. (I did get his permission first). He did NOT love it. But he loves me, so he allowed it. Look how happy he is!
But - back to why we’re here. Tim and Dr. Joe simplify this whole process down to two questions they call the “Magic Questions”:
What are we trying to achieve?
How will we know if we’ve done that?
Simple questions. Shockingly difficult for many organizations to answer (unassisted anyway).
Because a lot of teams skip directly to measurement dashboards before they’ve aligned on either one. (Which is why the dashboards have next to little meaning to anyone! Have you ever tried turning off the data pipeline to a dashboard to see who notices that the dashboard isn’t updating? It’s very illuminating.)
Building a Program Health Scorecard
One of my clients recently framed their experimentation scorecard around two categories:
Doing the Right Experiments
Doing Experiments Right
I loved the framing because it separates two things that organizations often blur together.
First: Are we selecting experiments that actually matter to the company? Are the experiments aligned with the strategy? Are they influencing product direction? Are they tied to meaningful business or customer outcomes?
And second: Are we designing and running those experiments well? Are teams using strong hypotheses, decision plans, statistical rigor, and good learning practices? Are they avoiding things like HARKing? Are they sharing learnings and improving the organization’s ability to make decisions?
Those are connected questions, but they are not the same question.
A company can run technically flawless experiments that have almost no strategic impact. And a company can prioritize strategically important ideas while executing them poorly and producing results nobody trusts, or so full of statistical ghosts that everything is inconclusive noise.
Healthy programs need both.
Under each category, you start with the first magic question: What are we trying to achieve?
Maybe under “Doing the Right Experiments,” the goal is supporting product strategy, influencing roadmap decisions, or reducing feature risk earlier in the process.
Maybe under “Doing Experiments Right,” the goal is improving experiment quality, increasing rigor, improving learning dissemination, or increasing velocity (responsibly - again, read Graham’s article!)
Only then do you define metrics for each. Not because “everybody tracks them.” Not because they’re easy to measure. Not because your platform dashboard included them.
But because they help answer the thing you actually care about, the second magic question: How will we know if we’ve done that?
So What Are Those Metrics‽
(Don’t you love a good use of interrobang‽Oh boy! Again!)
If one of your goals is supporting product strategy, maybe your metrics could be:
% of experiments influencing planned features
% of feature rollouts informed by experiments
% of roadmap initiatives connected to validated learning
If one of goals is improving experiment quality, maybe you should look at:
% of experiments with clearly documented hypothesis
% of experiments with predefined success criteria
Ratio of insight-generating tests versus “ship confirmation” tests
If your one of your goals is building culture, maybe you measure:
% company-wide usage of learning library
Executive engagement
Attendance at readouts
Number of teams participating
Metrics Alone Aren’t Enough
Here’s the other place where I think organizations have a really hard time. They might define metrics (even if they are questionable ones), but they truly struggle when it comes to targets.
Which means six months later, everyone is staring at a dashboard asking, “…okay, but is that good?”
Tim wrote a fantastic piece on this called, “Setting KPI Targets? Try a (Mini) Wisdom of Crowds Approach.”
The core idea is simple and incredibly practical: stakeholders independently propose targets before discussing them together.
No anchoring. No executive immediately hijacking the conversation. No HiPPO or loudest voice deciding success metrics for everyone else.
Just independent thinking followed by collaborative alignment.
I’ve worked with Tim and Val Kroll from facts&feelings using this exact approach with one of my clients, and the client was truly blown away by the process and the results. Not because the targets were magically perfect predictors of the future, but because the conversations themselves surfaced assumptions, expectations, tensions, priorities, and tradeoffs the organization had never explicitly discussed before.
That’s where the value is.
Target-setting is less about predicting the future perfectly and more about forcing alignment around what success actually looks like.
And once you do that, your scorecard becomes dramatically more useful.
Plus, working with Val and Tim is just a lot of fun. Ask anyone. 10 out 10. Would recommend.
Healthy Programs Understand Tradeoffs
One of the reasons I worry about velocity becoming the dominant metric in experimentation is because it tends to flatten nuance. It encourages organizations to believe there’s a universally correct number everyone should aspire to.
There isn’t.
Sometimes the healthiest thing a program can do is increase velocity. Sometimes the healthiest thing it can do is slow down and improve quality. Sometimes the right answer is investing in instrumentation, or governance, or executive trust, or better documentation, or stronger partnerships with product teams. Very often, it’s an issue with culture, strategy, knowledge transfer, and dissemination (why don’t you people have learning libraries???)
Healthy experimentation programs are balancing systems. They require tradeoffs, intentionality, and clarity around what the organization is actually trying to achieve.
Which brings us right back to the magic questions.
What are we trying to achieve?
And how will we know if we’ve done that?
Everything else flows from there. What do you think? Do you agree? Join the conversation!
Disclosure: GrowthBook is a sponsor of the Test & Learn Community. This post reflects my own reading of their writing and does not represent a paid or coordinated placement.