The dead salmon problem: Multiple tests, minimality and data-driven alternatives
Artikel konnten nicht hinzugefügt werden
Der Titel konnte nicht zum Warenkorb hinzugefügt werden.
Der Titel konnte nicht zum Merkzettel hinzugefügt werden.
„Von Wunschzettel entfernen“ fehlgeschlagen.
„Podcast folgen“ fehlgeschlagen
„Podcast nicht mehr folgen“ fehlgeschlagen
-
Gesprochen von:
-
Von:
Über diesen Titel
🪄 Created using NotebookLM, with all the benefits and blind spots of human editing.
In 2009, a deceased Atlantic salmon was placed inside a functional magnetic resonance imaging scanner to test its calibration parameters. Although the subject was undeniably dead, the standard statistical software produced results suggesting the fish was actively contemplating human emotions. This bizarre outcome highlights a systemic fragility in modern science known as the multiple tests trap, where conducting thousands of tests without adjustment guarantees that random noise will eventually look like a discovery. Just as flipping a coin enough times will inevitably produce a streak of ten heads, asking too many questions of a large dataset ensures that a researcher will find significant results purely by luck.
Escaping this trap requires rigorous pre-planning and methodological self-restraint to avoid the statistical cheating known as hypothesising after the results are known. While the classical Bonferroni correction acts as a 'sledgehammer' by dividing the significance threshold by the total number of tests, more sensitive sequential procedures like the Holm-Bonferroni method offer a more refined approach. Modern researchers often prefer sophisticated data-driven strategies such as permutation testing, which shuffles experimental labels thousands of times to build a custom noise map specific to the dataset rather than relying on broad theoretical assumptions.
Choosing between the precise spatial localisation of maximum t-statistic testing and the sensitive yet fuzzy cluster-based methods reveals that statistical truth is often a philosophical judgement call. Ultimately, the decision of how to define a family of tests depends on the logical structure of a scientific claim and the intent of the investigator. By embracing the principle of test minimality, researchers can move beyond mere p-value adjustments and toward a more robust, transparent and honest scientific practice.
References
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological), 57(1), 289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x
Bennett, C. M., Miller, M. B., & Wolford, G. L. (2009). Neural correlates of interspecies perspective taking in the post-mortem Atlantic Salmon: An argument for multiple comparisons correction. Neuroimage, 47(Suppl 1), S125. https://doi.org/10.1016/S1053-8119(09)71202-9
Cumming, G. (2014). The new statistics: Why and how. Psychological Science, 25(1), 7–29. https://doi.org/10.1177/0956797613504966
Frane, A. V. (2021). Experiment-wise type I error control: a focus on 2× 2 designs. Advances in Methods and Practices in Psychological Science, 4(1), 2515245920985137. https://doi.org/10.1177/2515245920985137
García-Pérez, M. A. (2023). Use and misuse of corrections for multiple testing. Methods in Psychology, 8, 100120. https://doi.org/10.1016/j.metip.2023.100120
Groppe, D. M., Urbach, T. P., & Kutas, M. (2011). Mass univariate analysis of event‐related brain potentials/fields I: A critical tutorial review. Psychophysiology, 48(12), 1711-1725. http://doi.org/10.1111/j.1469-8986.2011.01273.x
Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6(2), 65–70. https://www.jstor.org/stable/4615733
Rubin, M. (2021). When to adjust alpha during multiple testing: A consideration of disjunction, conjunction, and individual testing. Synthese, 199(3-4), 10969–11000. https://doi.org/10.1007/s11229-021-03276-4
