Expect 1-2 pp. changes and you won't be disappointed
'Replacement-level' meat reductions interventions will be about that effective, in my experience.
My colleague Alex Coppock quipped earlier this year that if my lab was to run an op-ed experiment, he’d bet that we’d “move attitudes, oh, by about 0.15 to 0.20 standard deviations in the short term 😃”. The context here is that that’s the effect size you pretty much always find in political science for this kind of study. People update (rationally and in parallel) and then effects dissipate.
Related: statistician Andrew Gelman reads a meta-analysis of nudges which finds an overall “small to medium effect size of Cohen’s d = 0.45,” and comments [emphasis added]:
An effect size of 0.45 is not “small to medium”; it’s huge. Huge as in implausible that these little interventions would shift people, on average, by half a standard deviation…It’s important because it’s related to expectations and, from there, to the design and analysis of experiments. If you think that a half-standard-deviation effect size is “small to medium,” i.e. reasonable, then you might well design studies to detect effects of that size. Such studies will be super-noisy to the extent that they can pretty much only detect effects of that size or larger; then at the analysis stage researchers are expecting to find large effects, so th[r]ough forking paths they find them[.]
I have been thinking about these remarks in light of a forthcoming study I co-authored that induced meat reductions of about 1-2 pp., where the study was designed/powered to detect a 5 pp. change. (I’ll summarize the paper when we post it.) In the language of frequentist statistics, we failed to detect an effect.
Here’s another read. A “replacement-level player,” per Nate Silver, is someone “who’s right on the fringe between being a big leaguer and not, someone you can call up from AAA or pick up on the waiver wire.” Analogously, a replacement-level intervention is widely available, good enough to get deployed, but unlikely to change the game.
My theory: replacement-level behavioral science interventions change outcomes by about 1-2 percentage points1 when you measure them properly.
A few examples:
DellaVigna and Linos (2022) find that academic articles on nudges find an average effect of 8.7 pp., but when Nudge Units measure real-world impacts, they find changes of about 1.4 pp. on average.
Andersson & Nelander (2021) put the plant-based options at eye-level in a university cafeteria and find a reduction in sales of (terrestrial) meat-based options of about 5 pp., but most of that shift is to fish, so the increase in vegetarian meals sold is about 1 pp.
Haile et al. (2021) distribute a pamphlet with an “animal welfare message to persuade individuals to reduce their meat consumption” and report a “2.4 percentage-point reduction in poultry and fish for men and a 1.6 percentage-point reduction in beef for women,” but that all effects disappear within a few months.
A few implications:
At scale, 1-2 pp. can be a big deal. Costco apparently sells 106 million rotisserie chickens per year. Reducing that by a few pp. would avert billions of hours of chicken suffering per year.
For researchers: forget the effect sizes you learned in the 2010s. You were misled by fraudulent research. Power to detect small changes and you just may find them.
If you think you have a superstar intervention, great! Unfortunately, heroes (and heroic interventions) typically don’t replicate. It still might be worth designing the perfect intervention for a particular setting, say, to develop theoretical insights. But when bridging to the real-world, I’d personally expect reductions in effect sizes like those DellaVigna and Linos (2022) find.
A percentage point change is the absolute difference between two percentages, whereas a percentage change is the relative change. A change from 20% to 25% is a 5 percentage point change and a 25% percentage change relative to the baseline.

