A metric is only as useful as your ability to measure it
Metrics need to be not only strategically valuable, but also feasible to use operationally. Consider metrics that:
Measure impact over a holdout group, rather than goaling on a topline number
Measure absolute numbers, rather than proportions
Measure cohort effects or conversion, rather than acquisition
The most important attribute of a metric is that it tracks well with your goal. The second most important attribute is that you can effectively measure it. I want to share lessons from picking a metric that aligned well with our long term strategy, but was challenging to measure, track, and move. To summarize: Metrics need to be not only strategically valuable, but also feasible to use operationally. Consider metrics that:
Measure impact over a holdout group, rather than goaling on a topline number
Measure absolute numbers, rather than proportions
Measure cohort effects or conversion, rather than acquisition
What were the issues with our metric?
Our goal was to drive adoption of a product. Specifically, this meant that we tried to inspire each of our users to adopt a feature once in their lifetime. We derived a lot of value to our platform when this happened, so this adoption represented something very meaningful. And yet:
Even if a metric is strategically valuable, if you can't operationalize it, it's a poor metric.
This was exactly our problem. Our metric was hard to measure, track, and move. Here were the issues:
It was vulnerable to factors outside our control: Our metric measured the number of users who adopted our product divided by the total number of users on our platform. This exposed us to changes in the user mix. New users are least likely to have adopted a specific feature, so as growth accelerated for our platform, our metric sagged. This was good in some ways (it encouraged us to focus on experiences for new users), but the effects were so large that our team's efforts were overshadowed.
Recommendation: Goal on the difference between a test and holdout group. While it's standard practice to measure the impact of tests using a holdout group, teams choose whether their goal will be the topline metric or the delta between test and holdout groups. Topline goals force teams to confront external factors, whereas goals focused on impact over a holdout isolate the team's impact. Had we goaled on the difference between a test and control, the changes in user mix would have been irrelevant given they'd impact both groups equally and we could've understood our team's contribution more clearly.
Recommendation: Measure using absolute numbers rather than proportions. Rather than proposing we increase adoption +10%, we could have proposed increasing adoption +X, where X is equal to +10%. These two numbers represent the same thing, but the latter wouldn't have been muddled by changes in user growth.
It was a lagging indicator of progress: We needed to convince users to do something once. Adoption goals are often lagging indicators; in other words, adoption metrics move after a trend change has occurred. Our changes could bend the adoption curve, but net very small benefit immediately. Consider a change made on the last day of a measurement period (e.g. a quarter or year) that makes new users 2x likelier to add your product on their first day: it would have no immediate impact on adoption even though it's very valuable in the long term. We can try to forecast long term impact, but extrapolating trends is imprecise and time bounding “long term” is arbitrary. There is no perfect replacement for adoption, but there are relevant alternatives depending on the team's strategy.
Recommendation: Measure cohort behavior. When trying to change new user behavior, you can measure cohort differences. If our goal was to ensure this week's new users were adopting our product at greater rates than last week's new users, we could use cohort analysis to track just this.
Recommendation: Measure conversion: If the goal is to fix a leaky funnel, you can measure conversion. For example, we know that some number of people an adoption flow each month where we prompt them to use our product. We could measure what proportion of the people who start the flow take the desired action. Notably, this lacks incentive to drive top-of-funnel traffic (which is sometimes the most effective lever). For that, adoption is the only option.
Development lessons for building a 0-1 product at big companies
Earlier in my Facebook career, I built a standalone app. Where many projects at big companies involve incremental changes, we built something from the ground up. Here were four lessons:
Maximize input from research and other “Understand” functions
Embrace iteration, but avoid last minute thrash
Increase accuracy of the engineering timeline by incorporating variance
Earlier in my Facebook career, I built a standalone app. Where many projects at big companies involve incremental changes or modifications to existing surfaces, we had the opportunity to build something from the ground up. Through our successes and tribulations, we learned four key lessons to share with teams building a 0 to 1 product:
Maximize input from research and other functions
Embrace iteration, but avoid last minute thrash
Increase accuracy of the engineering timeline by incorporating variance
1. Maximize input from research and other “Understand” functions
One of the great assets of a large companies that often differentiate them from scrappier startups is the depth of experts. In shaping our app's strategy, we maximized diverse perspectives. Strategies are best when developed in active consultation with people in “Understand” functions whose job it is to synthesize data.
Research and marketing teams often work day in and day out with our target users, so are in consistent conversation to understand needs. Ops teams are on the receiving end of passionate feedback from users, so can share pain points back to the product teams.
Research defined the app's purpose, core functionality, and helped improve our UX. Nearly all our product decisions had roots in a specific insight.
2. Embrace iteration, but avoid last minute thrash
There is a healthy push-pull relationship between embracing iterative work and minimizing thrash.
Facebook has a saying: “We're 1% finished.” Sometimes that's an overstatement! Since we were building something new, we made opinionated product decisions that were sometimes wrong. The team kept adaptive as our product and designs evolved.
Product feedback came at various stages of the process, and sometimes it took having an actual build we could dogfood to understand a feature's implications. It's always easier to move a button in a design file than it is to rewrite code, but reverting a feature doesn't need to be “churn;” it's the iterative process at work.
On the other hand, eleventh hour changes are painful. With less time to address changes, late-breaking feedback forces compromises to work-life balance, code quality, or even the product itself. To reduce the likelihood of disruptive feedback, consider these lessons:
Set clear expectations with leadership: We did not galvanize the scrutiny we needed with our product leadership early enough. We thoroughly reviewed product goals and strategy, but did not provide sufficient visibility into what we were building (and when we did, it was too much an after thought). To remedy this, we should have established a “one way door”: solicit prescriptive product feedback when there's sufficient time to pivot, rather than late (once code complete or “through the door”), when it'll ignite a firedrill. Another trick is to find a forcing function. Use milestones like public tests and betas or senior exec reviews to spawn the internal attention that a big launch otherwise precipitates.
Budget time for dogfooding and soak: If I could speak with my past self, I'd say, “Finish roadmapping. Identify a launch date. Now push it back by.” Budgeting time for quality polish is critical. All the theorizing in the world can't replace the experience of testing the real thing. The challenge is to use dogfooding and soak time earnestly, rather than as an opportunity to accelerate last minute feature development that could on its own create new regressions and bugs.
3. Increase accuracy of engineering timeline by incorporating variance
We committed externally to shipping the app before a specific date. As we conceptualized the app, we estimated build times so we knew how much we could fit in our MVP. Unfortunately, our timelines kept slipping. We identified several ways to increase accuracy of forecasts:
Be honest, and remove implicit pressures: Internal or external pressures beget unrealistic timelines. There may be a date we want to hit, but setting false expectations is counter-productive. Engineers or designers should be the ones generating estimates, and ought to prioritize honesty. For project managers likes PMs, EMs, and designer managers, remove implicit pressures and frame questions that solicit genuine responses (“What is a realistic range for how long this will take?” vs. “Do you think you can get this done by Friday?”).
Create estimates as a range: High variance in build times is inevitable as unexpected obstacles arise...or don't. Engineers can consider providing an estimate as a range (“My best guess is 1 week. Best case I'll be done in 5 days, worst case it'll take 2 weeks.”) rather than as an absolute (“1 week”). As important, communicate risks early and often so we can narrow and shift timelines accordingly.
Vocalize risks and tradeoffs: As execs are keen to say, everything is a prioritization problem. The whole team has responsibility to identify tradeoffs. Could we get this done faster if we simplified a feature in a certain way? Is a bigger upfront investment worthwhile to reduce tech debt down the road? The more internally communicative we are, the more we can evaluate tradeoffs with intention.