2025-01-03
Lars Vilhuber, Connolly, M., Koren, M., Llull, J., & Morrow, P. (2022). A template README for social science replication packages (v1.1). Social Science Data Editors. https://doi.org/10.5281/zenodo.7293838
You can download the Word, LaTeX, or Markdown version of the README with lots of examples.
You can also try Miklós Koren’s online generator https://www.templatereadme.org/.
Lavecchia, Adam, 2023, “Replication Data and Code for: Family-Level Responses to the Introduction of Tax-Free Savings Accounts”, https://doi.org/10.5683/SP3/M6HLUF, Borealis, V1.
INSTRUCTIONS: The typical README in social science journals serves the purpose of guiding a reader through the available material and a route to replicating the results in the research paper. Start by providing a brief overview of the available material and a brief guide as to how to proceed from beginning to end
guides a reader through the available material and a route to replicating the results in the research paper, including
It contains information about the sources of data used in the replication package, in addition to or instead of such detailed description in the manuscript.
The information should describe ALL data used, regardless of whether they are provided as part of the replication archive or not, and regardless of size or scope!
These may include
When the data were generated (by the authors) in the course of conducting (lab or field) experiments, or were collected as part of surveys run by the authors, then the description of the provenance should describe the data generating process, i.e., survey or experimental procedures
If providing the data as part of the replication package, authors should be clear about whether they have the rights to distribute the data
Example: if data are on a restricted server, and you have access:
Less obvious: You were able to download the data from a website that did not require a login.1
Data sources translate into datasets. Ideally, the README lists them:
For simple replication packages, may appear to be trivial (a laptop and some common software).2
What if requirement is expensive commercial software and a super computer cluster?
In order to assess the complexity of the task of replicating, authors should specify each of the following elements:
INSTRUCTIONS: List all the software requirements, up to and including any operating system requirements, for the entire set of code
INSTRUCTIONS: Some estimation code uses random numbers, almost always provided by pseudorandom number generators (PRNGs). For reproducibility purposes, these should be provided with a deterministic seed, so that the sequence of numbers provided is the same for the original author and any replicators
INSTRUCTIONS: Give information on the machine you used, especially if specialized hardware is needed (e.g. high-power servers)
This should provide some details, but ideally:
Important: Remove any redundant code.
INSTRUCTIONS: Your programs should clearly identify the tables and figures as they appear in the manuscript, by number
table1.do
) this may not be necessary, but should be stated.table_ols_rds_55.do
), a mapping is necessary.INSTRUCTIONS: As in any scientific manuscript, you should have proper references. Cite your data, the packages that you use, everything you refer to in the ReadMe, and generally everything you rely upon
Good example 1, German data: https://fdz.iab.de/en/our-data-products/integrated-establishment-and-individual-data/liab/
Good example 2, French data: https://www.casd.eu/en/source/all-employees-databases-establishment-data/
Lars Vilhuber, Connolly, M., Koren, M., Llull, J., & Morrow, P. (2022). A template README for social science replication packages (v1.1). Social Science Data Editors. https://doi.org/10.5281/zenodo.7293838
You can download the Word, LaTeX, or Markdown version of the README with lots of examples.
Presentation on “Self Checking Reproducibility” and its associated website
Guidance when (some) data are confidential: https://labordynamicsinstitute.github.io/reproducibility-confidential/
Guidance for citations: https://social-science-data-editors.github.io/guidance/addtl-data-citation-guidance.html
World Values Survey data are available for download without a login, but you have to agree to terms of use that prohibit redistribution.
https://social-science-data-editors.github.io/template_README/#computational-requirements
Image by JoePhin, under Creative Commons Attribution-Share Alike 4.0 International license. https://commons.wikimedia.org/wiki/File:Numbers.gif