Many Labs 2: Investigating Variation in Replicability Across Samples and Settings

Richard Klein, Michelangelo Vianello, Fred Hasselman, Bryon Adams, Reginald Adams, Sinan Alper, Mark Aveyard, Jordan Axt, Mayowa T. Babalola, Stepan Bahnik, Rishtee Batra, Mihaly Berkics, Michael Bernstein, Daniel Berry, Olga Bialobrzeska, Evans Binan, Konrad Bocian, Mark Brandt, Robert Busching, Anna RedeiHuajian Cai, Fanny Cambier, Katarzyna Cantarero, Cheryl Carmichael, Francisco Ceric, Jesse Chandler, Jen-Ho Chang, Armand Chatard, Eva Chen, Winnee Cheong, David C. Cicero, Sharon Coen, Jennifer Coleman, Brian Collisson, Morgan Conway, Katherine Corker, Paul Curran, Fiery Cushman, Zubairu Dagona, Ilker Dalgar, Anna Rosa, William Davis, Maaike de Bruijn, Leander De Schutter, Thierry Devos, Marieke de Vries, Canay Dogulu, Nerisa Dozo, Kristin Dukes, Yarrow Dunham, Kevin Durrheim, Charles Ebersole, John Edlund, Anja Eller, Alexander English, Carolyn Finck, Natalia Frankowska, Miguel-Angel Freyre, Mike Friedman, Elisa Galliani, Joshua Gandi, Tanuka Ghoshal, Steffen Giessner, Tripat Gill, Timo Gnambs, Angel Gomez, Roberto Gonzalez, Jesse Graham, Jon Grahe, Ivan Grahek, Eva Green, Kakul Hai, Matthew Haigh, Elizabeth Haines, Michael Hall, Marie Heffernan, Joshua Hicks, Petr Houdek, Jeffrey Huntsinger, Ho Phi Huyhn, Hans IJzerman, Yoel Inbar, Ase Innes-Ker, William Jimenez-Leal, Melissa-Sue John, Jennifer Joy-Gaba, Roza Kamiloglu, Heather Kappes, Serdar Karabati, Haruna Karick, Victor Keller, Anna Kende, Nicolas Kervyn, Goran Knezevic, Carrie Kovacs, Lacy Krueger, German Kurapov, Jamie Kurtz, Daniel Lakens, Ljiljana Lazarevic, Carmel Levitan, Neil.A. Lewis, Samuel Lins, Nikolette Lipsey, Joy Losee, Esther Maassen, Angela Maitner, Winfrida Malingumu, Robyn Mallett, Satia Marotta, Janko Mededovic, Fernando Pacheco, Taciano Milfont, Wendy Morris, Sean Murphy, Andriy Myachykov, Nick Neave, Koen Neijenhuijs, Anthony Nelson, Felix Neto, Austin Nichols, Aaron Ocampo, Susan O'Donnell, Haruka Oikawa, Masanori Oikawa, Elsie Ong, Gabor Orosz, Malgorzata Osowiecka, Grant Packard, Rolando Perez-Sanchez, Boban Petrovic, Ronaldo Pilati, Brad Pinter, Lysandra Podesta, Gabrielle Pogge, Monique Pollmann, Abraham Rutchick, Patricio Saavedra, Alexander Saeri, Erika Salomon, Kathleen Schmidt, Felix Schonbrodt, Maciej Sekerdej, Davd Sirlopu, Jeanine Skorinko, Michael Smith, Vanessa Smith-Castro, Karin Smolders, Agata Sobkow, Walter Sowden, Philipp Spachtholtz, Manini Srivastava, Troy Steiner, Jeroen Stouten, Chris N. H. Street, Oskar Sundfelt, Stephanie Szeto, Ewa Szumowska, Andrew Tang, Norbert Tanzer, Morgan Tear, Jordan Theriault, Manuela Thomae, David Torres, Jakub Traczyk, Joshua Tybur, Adrienn Ujhelyi, Robbie van Aert, Marcel van Assen, Marije van der Hulst, Paul van Lange, Anna van 't Veer, Alejandro Echeverria, Leigh Vaughn, Alexandra Vazquez, Luis Vega, Catherine Verniers, Mark Verschoor, Ingrid Voermans, Marek Vranka, Cheryl Welch, Aaron Wichman, Lisa Williams, Michael Wood, Julie Woodzicka, Marta Wronska, Liane Young, John Zelenski, Zeng Zhijia, Brian Nosek

Research output: Contribution to journalArticle

Abstract

We conducted preregistered replications of 28 classic and contemporary published findings with protocols that were peer reviewed in advance to examine variation in effect magnitudes across sample and setting. Each protocol was administered to approximately half of 125 samples and 15,305 total participants from 36 countries and territories. Using conventional statistical significance (​p​ < .05), fifteen (54%) of the replications provided evidence in the same direction and statistically significant as the original finding. With a strict significance criterion (​p​ < .0001), fourteen (50%) provide such evidence reflecting the extremely high powered design. Seven (25%) of the replications had effect sizes larger than the original finding and 21 (75%) had effect sizes smaller than the original finding. The median comparable Cohen’s ​d​ effect sizes for original findings was 0.60 and for replications was 0.15. Sixteen replications (57%) had small effect sizes (< .20) and 9 (32%) were in the opposite direction from the original finding. Across settings, 11 (39%) showed significant heterogeneity using the Q statistic and most of those were among the findings eliciting the largest overall effect sizes; only one effect that was near zero in the aggregate showed significant heterogeneity. Only one effect showed a Tau > 0.20 indicating moderate heterogeneity. Nine others had a Tau near or slightly above 0.10 indicating slight heterogeneity. In moderation tests, very little heterogeneity was attributable to task order, administration in lab versus online, and exploratory WEIRD versus less WEIRD culture comparisons. Cumulatively, variability in observed effect sizes was more attributable to the effect being studied than the sample or setting in which it was studied.
Original languageEnglish
Pages (from-to)443-490
Number of pages48
JournalAdvances in Methods and Practices in Psychological Science
Volume1
Issue number4
DOIs
Publication statusPublished - 1 Dec 2018

Cite this

Klein, R., Vianello, M., Hasselman, F., Adams, B., Adams, R., Alper, S., Aveyard, M., Axt, J., Babalola, M. T., Bahnik, S., Batra, R., Berkics, M., Bernstein, M., Berry, D., Bialobrzeska, O., Binan, E., Bocian, K., Brandt, M., Busching, R., ... Nosek, B. (2018). Many Labs 2: Investigating Variation in Replicability Across Samples and Settings. Advances in Methods and Practices in Psychological Science, 1(4), 443-490. https://doi.org/10.1177/2515245918810225