*Note: p‑values are criticized here for their implied use for
significance testing (NHST). Remember, however: they're different
things. NHST through confidence intervals is just as bad; and p‑values
have legitimate uses. The issue is NHST.*

*So the “p-bashing” below is mostly NHST-bashing by proxy.*

From Stephen Gorard 2016:

- “Anyone using significance tests, allowing them to pass peer-review for publication in their journals, teaching them to new researchers, or otherwise advocating them in any way, is part of a (hopefully) diminishing group causing untold real-life damage.”
- “It is a kind of magic-thinking — an erroneous superstitious belief.”

William M. Briggs, on his site:

- 2013: “‘Statistically
significant’ does not imply true nor useful nor even interesting.
‘Significance’ is a fog which emanates from a computerized thurible,
thick and pungent. It obscures and conceals. It woos and insinuates. It
distracts. It is a mathematical sleight-of-hand, a trick. It takes the
eye from the direct evidence at hand and refocuses it on the
pyrotechnics of
*p*‑values. So delighted is the audience at seeing wee*p*‑values that all memory of the point of a study vanishes.” - 2017: “The threshold picked is mesmerizing. The number 0.04999 brings joy, 0.05001 tears.
*This happens.*”*[note: evidence of such ignorance/despair]*

Charles Seife, on Edge, 2014:

- “It's a boon for the mediocre and for the credulous, for the dishonest and for the merely incompetent. It turns a meaningless result into something publishable, transforms a waste of time and effort into the raw fuel of scientific careers. It was designed to help researchers distinguish a real effect from a statistical fluke, but it has become a quantitative justification for dressing nonsense up in the mantle of respectability. And it's the single biggest reason that most of the scientific and medical literature isn't worth the paper it's written on.”

From Gigerenzer 2004:

- “If psychologists are so smart, why are they so confused? Why is statistics carried out like compulsive hand washing?”
- “The null ritual has each of these four characteristics: the
same procedure is repeated again and again; the magical 5% number; fear
of sanctions by editors or advisors; and wishful thinking about the
outcome, the
*p*‑value, which blocks researchers’ intelligence.”

From Gigerenzer & Marewski 2015:

- “Because no [universal method for scientific inference] has ever
been found, surrogates have been created, most notably the quest for
significant
*p*values. This form of surrogate science fosters delusions and borderline cheating and has done much harm, creating, for one, a flood of irreproducible results. Proponents of the "Bayesian revolution" should be wary of chasing yet another chimera: an apparently universal inference procedure.”

Andrew Gelman, on his site, 2015:

- “Doing science using published
*p*-values is like trying to paint a picture using salad tongs.”

Nassim Taleb, 2017 (related paper here):

- “If we used this ‘p value’ nonsense (used by academics) for air safety, the U.S. would be depopulated.”
- “The only solid use I found for p values is a test of robustness across jacknife (1‑p)”
- “Nonsignificant is 3x >likely to be "accepted" than rejected. 100 years of p-value bullshit.”

From John Ioannidis 2016:

- “Misleading use of
*p*‑values is so easy and automated that, especially when rewarded with publication and funding, it can become addictive. Investigators generating these torrents of*p*‑values should be seen with sympathy as drug addicts in need of rehabilitation that will help them live a better, more meaningful scientific life in the future.”

David Trafimow, who in 2015 banned significance testing (and, controversially, *also* *p*‑values and CIs) from the journal *Basic and Applied Social Psychology*:

- “If scientists are depending on a process that’s blatantly invalid, we should get rid of it.”
- “I’d rather not have any inferential statistics at all than have some that we know aren’t valid.”

From Trafimow et al 2018 (pdf):

- “None of the statistical tools should replace significance testing as the new magic method giving clear-cut mechanical answers. Inference should not be based on single studies at all, but on cumulative evidence from multiple independent studies. When evaluating the strength of the evidence, we should consider, for example, auxiliary assumptions, the strength of the experimental design, or implications for applications. To boil all this down to a binary decision based on a p-value threshold of .05, .01, .005, or anything else, is not acceptable.”

Instructions for submissions to the journal *Epidemiology*:

- “Significance Testing: For estimates of causal effects, we strongly discourage the use of categorized P‑values and language referring to statistical significance. We prefer instead interval estimation, which conveys the precision of the estimate with respect to sampling variability.”

From Kenneth J. Rothman 2014:

- “The unfortunate consequence of the focus on statistical significance testing has been to foster a dichotomous view of relationships that are better assessed in quantitative terms. This distinction is more than a nicety. Every day there are important, regrettable and avoidable misinterpretations of data that results from the confusing fog of statistical significance testing. Most of these errors could be avoided if the focus were shifted from statistical testing to estimation.”

From Greenland et al. 2016:

- “Statistical significance is neither necessary nor sufficient for determining the scientific or practical significance of a set of observations. This view was affirmed unanimously by the U.S. Supreme Court.”
- “We join others in singling out the degradation of P values into ‘significant’ and ‘nonsignificant’ as an especially pernicious statistical practice.”

From Greenland 2017:

- “This null obsession is the most destructive pseudoscientific gift that conventional statistics (both frequentist and Bayesian) has given the modern world. One of its many damaging manifestations is nullism (also known as pseudo-skepticism): a religious faith that nature graces us with null associations in most settings. This faith should always be challenged within the applied context. Instead, it goes unnoticed in the vast majority of education and practice — often to great harm.”

A.W.F. Edwards (quoted in Greenland 2016):

- “What used to be called judgement is now called prejudice, and what used to be called prejudice is now called a null hypothesis. In the social sciences, particularly, it is dangerous nonsense (dressed up as ‘the scientific method’), and will cause much trouble before it is widely appreciated as such.”

From Thompson 1992:

- “Statistical significance testing can involve a tautological logic in which tired researchers, having collected data on hundreds of subjects, then conduct a statistical test to evaluate whether there were a lot of subjects, which the researchers already know, because they collected the data and know they are tired. This tautology has created considerable damage as regards the cumulation of knowledge.”

Robert Matthews:

- “Statistical significance is a mathematical machine for turning baloney into breakthroughs, and flukes into funding.”

From Beninger et al. 2012:

- “NHST is a very inappropriate tool used in very inappropriate ways, to achieve a misinterpreted result.”

From José Perezgonzalez 2015:

- “NHST is an incompatible amalgamation of the theories of Fisher and of Neyman and Pearson. Curiously, it is an amalgamation that is technically reassuring despite it being, philosophically, pseudoscience.”
- “NHST effectively negates the benefits that could be gained from Fisher's and from Neyman-Pearson's theories; it also slows scientific progress.”

From Jeff Gill 2004:

- “The null hypothesis significance test (NHST) should not even exist, much less thrive as the dominant method for presenting statistical evidence. . . It is intellectually bankrupt and deeply flawed on logical and practical grounds.”

From Donald Berry 2017:

- “We have saddled ourselves with perversions of logic —
*p*‑values — and so we deserve our collective fate.” - “I am witness to the collective ignorance regarding
*p*‑values in medicine. And I also see the herd mentality that*p*< 0.05 means true and*p*> 0.05 means not true. This mentality leads to inappropriate clinical attitudes and guidelines, and consequently to poor treatment of patients.”*p*‑Values are life and death quantities. - “We created a monster. And we keep feeding it, hoping that it will stop doing bad things. It is a forlorn hope.
**No cage can confine this monster. The only reasonable route forward is to kill it.**”