Reproducibility: Why and how?

Lars Vilhuber

2025-02-27

Disclaimer

The opinions expressed in this talk are solely the authors, and do not represent the views of CIQSS, the American Economic Association, or any of the funding agencies.

Who am I?

Lars Vilhuber

Executive Director of the Labor Dynamics Institute and Senior Research Associate in the Economics Department at Cornell University, and the American Economic Association’s Data Editor.

Journals

Data Editor of the AEA

2389 Manuscripts and 4440 Reports, approx. 4400 authors reached.

DCAP

Why reproducibility?

What are the benefits of reproducibility and transparency?

Credibility

Transparency

Efficiency of scholarly discourse?

  • Early publications (20th century) contained tables of data, and the math was simple (maybe)
  • Data became electronic, was no longer included or cited
  • Math was transcribed to code, and was no longer included

AER 1911

Progress

Increasing broad consensus in academia

  • FAIR principles
  • Data Citation Principles
  • Computational Reproducibility

FAIR Principles

FAIR:

  • Findable
  • Accessible
  • Interoperable
  • Reusable

Data Citation Principles

To make it findable,

1

Reaction to “Reproducibility crisis”

Progress in infrastucture

  • Replication archives and Data (Code) Availability policies
  • Shared open source software
  • Better public-use and shared confidential data

Studies in reproducibility

What is…

Reproducibility in Economics

AEA Journals, Canada, and others

What is a replication package?

A Replication Package is

  • Code
  • Data
  • Materials (for surveys, experiments, …)
  • Instructions on how to obtain data not included
  • Instructions on how to combine it all
  • Known issues documented

Complies with…

AEA policy

Is stored in…

Tenets of the Policy

  • Transparency
  • Completeness
  • Preservation

Transparency

  • Provenance of the data
  • Processing of the data, from raw data to results (code)

It is the policy of the American Economic Association to publish papers only if the data used in the analysis are clearly and precisely documented and access to the data and code is clearly and precisely documented and is non-exclusive to the authors.

Completeness

  • All data needs to be identified and and access described
  • All code needs to be described and provided
  • All materials must be provided (survey forms, etc.)

Authors … must provide, prior to acceptance, the data, programs, and other details of the computations sufficient to permit replication

Preservation

  • All data needs to be preserved for future replicators
    • Ideally, within the replication package, subject to ToU, for convenience
    • Otherwise, in a trusted repository

Preservation

  • Code must be in a trusted repository
    • Usually, within the replication package
    • Websites, Github, are not acceptable

Historically

AER 1911 thanks to Stefano Dellavigna

Modern preservation

Exceptions to the Policy

None

… there is a grey zone:

  • When data do not belong to researcher, no control over preservation, access!
  • Sometimes, ToU prevent researcher from revealing metadata (name of company, location)

Transparency again

  • However:
    • No exception for need to describe access (own and other)
    • No exception for need to fully describe processing (possibly with redacted code)

Reproducibility in Economics and beyond

Data Editors

Common policies

https://social-science-data-editors.github.io/

Elsewhere: Political Science

APSR

AJPS

Elsewhere: Sociology

Sociological Science

But!

Elsewhere: Sociology

Sociological Science

Weeden (2023) 2

Benefits of reproducible research

Benefits to you

  • Efficiency
  • Trust that the results hold up to scrutiny
  • Credibility

Benefits to you

  • More citations?

Jury is still out on this one, because too many confounding factors and too short a timeline.

Benefits to you

  • Ability by others to re-use
    • in education (classes)
    • in research (new papers)

Benefits to others

  • Ability to re-use (more efficiently)

Building on the work of others

Roth, Jonathan. 2022. “Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends.” American Economic Review: Insights 4 (3): 305–22. DOI: 10.1257/aeri.20210236

Notes: “I exclude 43 papers for which data to replicate the main event-study plot were unavailable.

Roth 2022

Building on the work of others: dCdH 2020

de Chaisemartin, Clément, and Xavier D’Haultfœuille. 2020. “Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects.” American Economic Review 110 (9): 2964–96. DOI: 10.1257/aer.20181169

The results from various other papers are recomputed to empirically demonstrate the relevance of the proposed methods.

dCdH 2020

There is a cost

Pain of transparency

Remember this guy?

Matt Gaetz

Transparency

Closer to home (Feb 2025)

i4r on Bluesky

i4r on replication packages

How?

How to create reproducible research?

Habits

  • Reproducibility from Day 1
  • Adopt reproducible habits
  • Take notes when you do things, not after
  • Use version control

Tools

  • Use Template README
  • Learn programming tricks that help streamline processing
  • Save files and tables programmatically

When to learn?

  • Undergradudate students
  • Graduate students
  • Junior faculty
  • Senior faculty
  • … anybody

Tips

  • Before finalizing, run everything from scratch
  • Computational empathy: think of the next person to run this
    • It could be you in 5 years!

Advanced tools

  • Use environments
  • Use containers
  • Plan for failure

These are not barriers!

  • Use of Excel for tables
  • Manual processing (in limits)
  • “I am not a programmer”
  • Confidential data

Most of these have simple solutions!

Join us for the summer school/workshop

Thank you

Appendix

Resources

README

Lars Vilhuber, Connolly, M., Koren, M., Llull, J., & Morrow, P. (2022). A template README for social science replication packages (v1.1). Social Science Data Editors. https://doi.org/10.5281/zenodo.7293838

You can download the Word, LaTeX, or Markdown version of the README with lots of examples.

Other guidance

Extra info

Sources

Footnotes

  1. Data Citation Synthesis Group: Joint Declaration of Data Citation Principles. Martone M. (ed.) San Diego CA: FORCE11; 2014 https://www.force11.org/group/joint-declaration-data-citation-principles-final

  2. Weeden, K. A. (2023). Crisis? What Crisis? Sociology’s Slow Progress Toward Scientific Transparency . Harvard Data Science Review, 5(4). https://doi.org/10.1162/99608f92.151c41e3