> What should a presentation on the reproducibility of AI-based research address?
2024-12-14
A PDF version of this presentation is available here.
AI introduces challenges for reproducibility.
Not unlike difficulties researchers face with
I will discuss
In this talk, we focus on computational reproducibility, though the ultimate goal remains replicability.
I will distinguish LLM (large language models) from AI (artificial intelligence):
> What should a presentation on the reproducibility of AI-based research address?
See full response as well as OpenAI’s response 2: claude.ai, Haiku, Normal, 2024-12-14 20:47
A presentation on the reproducibility of AI-based research should comprehensively address several critical aspects:
1. Methodology Transparency
2. Computational Environment
3. Data Considerations
4. Experimental Reproducibility
5. Code and Implementation
6. Ethical and Contextual Considerations
7. Validation Strategies
8. Reporting Challenges
By comprehensively addressing these areas, the presentation can provide a robust framework for understanding and potentially replicating AI-based research, ultimately contributing to the scientific integrity and advancement of the field.
Claude:
OpenAI:
We want to check that
In Economics,
Are models data or software? - will treat as software here.
“Immigration Restrictions as Active Labor Market Policy: Evidence from the Mexican Bracero Exclusion, Replication files and raw data” (Michael Clemens)
Probably requires
“Immigration Restrictions as Active Labor Market Policy: Evidence from the Mexican Bracero Exclusion, Replication files and raw data” (Michael Clemens)
Generically,
pre-trained LLM
▶️ tuned LLM
▶️ analysis data
tuned LLM
= f(raw data
,pre-trained LLM
)analysis data
= f(tuned LLM
,raw data
)Both should be preserved
tuned LLM
:
analysis data
:
The very first test is that your code must run, beginning to end, top to bottom, without error, and ideally without any user intervention. This should in principle (re)create all figures, tables, and numbers you include in your paper.
This is pretty much the most basic test of reproducibility.
This has nothing to do with LLM/AI!
If you cannot run your code, you cannot reproduce your results, nor can anybody else. So just re-run the code.
What happens when some of these re-runs are very long? See later in this chapter for how to handle this.
While the code, once set to run, can do so on its own, you might need to spend a lot of time getting all the various pieces to run.
This should be a warning sign:
If it takes you a long time to get it to run, or to manually reproduce the results, it might take others even longer.3
Furthermore, it may suggest that you haven’t been able to re-run your own code very often, which can be indicate fragility or even lack of reproducibility.
Automation and robustness checks, as well as efficiency.
Generating a log file means that you can inspect it, and you can share it with others. Also helps in debugging, for you and others.
Running it again does not help:
Your code must run, beginning to end, top to bottom, without error, and without any user intervention.
This should in principle (re)create all figures, tables, and in-text numbers you include in your paper.
Out of 8280 replication packages in ~20 top econ journals, only 2594 (31.33%) had a main/controller script.4
In order to be able to enable “hands-off running”, the main (controller) script is key.
Set the root directory (using here()
or rprojroot()
).
# main.R
## Set the root directory
# If you are using Rproj files or git
rootdir <- here::here()
# or if not
# rootdir <- getwd()
## Run the data preparation file
source(file.path(rootdir, "01_data_prep.R"),
echo = TRUE)
## Run the analysis file
source(file.path(rootdir, "02_analysis.R"),
echo = TRUE)
## Run the table file
source(file.path(rootdir, "03_tables.R"), echo = TRUE)
## Run the figure file
source(file.path(rootdir, "04_figures.R"), echo = TRUE)
## Run the appendix file
source(file.path(rootdir, "05_appendix.R"), echo = TRUE)
Call each of the component programs, using source()
.
# main.R
## Set the root directory
# If you are using Rproj files or git
rootdir <- here::here()
# or if not
# rootdir <- getwd()
## Run the data preparation file
source(file.path(rootdir, "01_data_prep.R"),
echo = TRUE)
## Run the analysis file
source(file.path(rootdir, "02_analysis.R"),
echo = TRUE)
## Run the table file
source(file.path(rootdir, "03_tables.R"), echo = TRUE)
## Run the figure file
source(file.path(rootdir, "04_figures.R"), echo = TRUE)
## Run the appendix file
source(file.path(rootdir, "05_appendix.R"), echo = TRUE)
The use of echo=TRUE
is best, as it will show the code that is being run, and is thus more transparent to you and the future replicator.
If using procedural Python code, might use a bash
script:
What you do should remain transparent to other users!
Writing a scientific paper is different than writing a useful function on the internet.
You are not writing
mynumpy
, you are writing a paper.
… though there are grey areas there.
Ensure you have the necessary Python libraries installed:
pip install openai pandas numpy
To execute the simplest example, run the script:
python simple_example_chat1.py
The results will be displayed on the screen.
requirements.txt
These are now systematically requested for replication packages!
I created a requirements.txt
file
(note: created using pipreqs
Python package, plus hand-edit).
Create environment
(note: running on Linux, openSUSE, Python 3.11.10)
Get the API key
API keys
on the left side+ Create new secret key
.env
Run the script
.env
file (one way of doing it)openai.api_key = os.environ.get('OPENAI_API_KEY')
# If the API key isn't found in the environment variable, prompt the user for it
if not openai.api_key:
openai.api_key = input("Please enter your OpenAI API key: ")
NEVER RECORD YOUR API KEY IN SCRIPTS!
Traceback (most recent call last):
File "/path/korinek-2023/simple_example_chat1.py", line 37, in <module>
completion = openai.ChatCompletion.create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/path/korinek-2023/venv-311/lib64/python3.11/site-packages/openai/lib/_old_api.py", line 39, in __call__
raise APIRemovedInV1(symbol=self._symbol)
openai.lib._old_api.APIRemovedInV1:
You tried to access openai.ChatCompletion, but this is no longer supported in openai>=1.0.0 - see the README at https://github.com/openai/openai-python for the API.
You can run `openai migrate` to automatically upgrade your codebase to use the 1.0.0 interface.
Alternatively, you can pin your installation to the old version, e.g. `pip install openai==0.28`
A detailed migration guide is available here: https://github.com/openai/openai-python/discussions/742
Provide
requirements.txt
and pin versions!
(We will talk later about API issues!)
requirements.txt
, re-install> python simple_example_chat1.py
Traceback (most recent call last):
File "/path/korinek-2023/simple_example_chat1.py", line 37, in <module>
completion = openai.ChatCompletion.create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
<snip>
File "/path/korinek-2023/venv-311/lib64/python3.11/site-packages/openai/api_requestor.py", line 765, in _interpret_response_line
raise self.handle_error_response(
openai.error.InvalidRequestError: The model `gpt-4-0613` does not exist or you do not have access to it.
CoPilot response
(note: github.copilot-chat
0.23.1, updated 2024-12-14, 15:31:36)
you do not have access to it
part that was crucial!Original content
1. Job loss due to automation in lower-skille
2. AI-driven wealth concentration in tech-sav
3. Digital literacy gap leading to economic d
4. Lack of universal access to AI technology.
5. AI-driven bias in hiring and selection pro
6. Imbalance in job market due to AI speciali
7. Data privacy issues affecting vulnerable p
8. AI-driven services predominantly targeting
9. Algorithms exacerbating social inequality
10. Inclusive AI product development lacking.
11. Higher prices due to AI-enhanced products
12. AI-fueled gentrification in tech-centered
13. Anticompetitive practices bolstered by AI
14. Lack of labor rights for jobs displaced b
15. Educational imbalance due to AI-learning
16. AI in healthcare excluding lower socioeco
17. Disproportionate influence of AI in polit
18. Undervaluing of human skills in favor of
19. Biased AI systems perpetuating discrimina
20. AI reinforcing societal hierarchies via d
Content as of 2024-12-14:
1. Job displacement due to automation.
2. Wealth concentration in tech industries.
3. Increased surveillance disproportionately
4. Unequal access to AI technology.
5. AI-driven discrimination in hiring.
6. AI bias in credit scoring.
7. Inequality in AI education and training.
8. AI in healthcare favoring wealthier patien
9. AI-driven gentrification in cities.
10. AI in law enforcement targeting minoritie
11. AI in marketing exploiting vulnerable con
12. AI in politics manipulating voters.
13. AI in insurance favoring privileged group
14. AI in social media amplifying hate speech
15. AI in education favoring affluent student
16. AI in agriculture favoring large-scale fa
17. AI in transportation favoring urban areas
18. AI in retail favoring wealthier consumers
19. AI in entertainment creating cultural div
20. AI in research favoring developed countri
One of the most frequently asked questions…
“but I have confidential data…”
Equivalently
“but I have a big LLM model…”
How can you show that you actually ran the code?
In order to document that you have actually run your code, a log file, a transcript, or some other evidence, may be useful. It may even be required by certain journals.
In almost all cased, the generated log files are simple text files, without any formatting, and can be read by any text editor (e.g., Visual Studio Code, Notepad++, etc.).
If not, ensure that they are (avoid Stata SMCL files, for example, or iPython output).
Generically: see separate tutorial.
Create a wrapper that will capture the calls for any function
from datetime import datetime
def track_calls(func):
def wrapper(*args, **kwargs):
with open('function_log.txt', 'a') as f:
timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
f.write(f"[{timestamp}] Calling {func.__name__} with args: {args}, kwargs: {kwargs}\n")
result = func(*args, **kwargs)
return result
return wrapper
# Usage
@track_calls
def my_function(x, y,default="TRUE"):
return x + y
my_function(1, 2,default="false")
Activate the wrapper
from datetime import datetime
def track_calls(func):
def wrapper(*args, **kwargs):
with open('function_log.txt', 'a') as f:
timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
f.write(f"[{timestamp}] Calling {func.__name__} with args: {args}, kwargs: {kwargs}\n")
result = func(*args, **kwargs)
return result
return wrapper
# Usage
@track_calls
def my_function(x, y,default="TRUE"):
return x + y
my_function(1, 2,default="false")
An alternative (or complement) to creating log files explicitly is to use native functionality of the software to create them. This usually is triggered when using the command line to run the software, and thus may be considered an advanced topic. The examples below are for Linux/macOS, but similar functionality exists for Windows.
In order to capture screen output in Python, on Unix-like system (Linux, macOS), the following can be run:
which will create a log file with everything that would normally appear on the console using the tee
command.
renv
packagevenv
or virtualenv
modulePkg
moduleGenerically, all “environments” simply modify where the specific software searches (the “search path”) for its components, and in particular any supplementary components (packages, libraries, etc.).5
Python allows for pinpointing exact versions of packages in the PyPi repository. This is done by creating a requirements.txt
file that lists all the packages that are needed to run your code. In principle, this file can be used by others to recreate the environment you used. The problem is that it might contain TOO many packages, some of which are not relevant, even if you carefully constructed the environment, because it will contain dependencies that are specific to your platform (OS or version of Python).
will output all the packages installed in your environment. These will include the packages you explicitly installed, but also the packages that were installed as dependencies. Some of those dependencies may be specific to your operating system or environment. In some cases, they contain packages that you needed to develop the code, but that are not needed to run it.
will output all the packages installed in your environment in a file called requirements.txt
. This file can be used to recreate the environment. Obviously, because of the above issue, it will likely contain too many packages.
will install all the packages listed in requirements.txt
. If you run this on your computer, in a different environment, this will duplicate your environment, which is fine. But it probably will not work on somebody else’s Mac, or Linux, system, and may not even work on somebody else’s Windows computer.
The solution is to create a minimal environment, and document it. This is done in two steps:
import
in your code, and nothing else. This is the minimal environment.requirements.txt
file to only include the packages that are needed to run your code. This will be the file you provide to replicators to recreate your necessary environment, and let the package installers solve all the other dependencies.The resulting requirements.txt
file will contain “pinned” versions of the packages you have, so it will be very precise. Possibly overly precise.
… when it comes to reproducibility
… compared to the average difficulty in economics papers
A presentation on the reproducibility of AI-based research should address several key areas to ensure a comprehensive understanding of the topic. Here are some important points to consider:
1. **Introduction to Reproducibility**:
- Define reproducibility in the context of AI research.
- Explain why reproducibility is crucial for scientific integrity, validation, and progress.
2. **Challenges in Reproducibility**:
- Discuss common challenges such as lack of access to data, proprietary algorithms, and insufficient documentation.
- Highlight issues related to computational resources and environment dependencies.
3. **Data Accessibility**:
- Emphasize the importance of open datasets and the challenges of data privacy and security.
- Discuss strategies for sharing data, such as anonymization and data licensing.
4. **Algorithm and Model Transparency**:
- Address the need for sharing model architectures, hyperparameters, and training procedures.
- Discuss the role of open-source software and platforms in promoting transparency.
5. **Documentation and Reporting Standards**:
- Highlight the importance of thorough documentation, including code comments, readme files, and research papers.
- Discuss existing standards and guidelines, such as those from journals or conferences, for reporting AI research.
6. **Tools and Platforms for Reproducibility**:
- Introduce tools and platforms that facilitate reproducibility, such as version control systems (e.g., Git), containerization (e.g., Docker), and workflow management systems.
- Discuss the role of cloud computing in providing scalable and reproducible environments.
7. **Case Studies and Examples**:
- Present examples of successful reproducibility efforts in AI research.
- Discuss lessons learned from reproducibility failures and how they were addressed.
8. **Community and Collaboration**:
- Highlight the role of the research community in fostering a culture of reproducibility.
- Discuss collaborative efforts, such as reproducibility challenges and workshops.
9. **Ethical and Legal Considerations**:
- Address ethical concerns related to data sharing and model transparency.
- Discuss legal implications, such as intellectual property rights and compliance with regulations.
10. **Future Directions and Recommendations**:
- Suggest ways to improve reproducibility in AI research, such as developing better standards, incentives, and educational resources.
- Discuss the potential impact of emerging technologies and methodologies on reproducibility.
11. **Conclusion**:
- Summarize the key points discussed.
- Reinforce the importance of reproducibility for advancing AI research and its applications.
By covering these areas, the presentation can provide a well-rounded view of the challenges and solutions related to the reproducibility of AI-based research.
A presentation on the reproducibility of AI-based research should address several key areas to ensure a comprehensive understanding of the topic. Here are some important points to consider:
1. **Introduction to Reproducibility**:
- Define reproducibility in the context of AI research.
- Explain why reproducibility is crucial for scientific integrity, validation, and progress.
2. **Challenges in Reproducibility**:
- Discuss common challenges such as lack of access to data, proprietary algorithms, and insufficient documentation.
- Highlight issues related to computational resources and environment dependencies.
3. **Data Accessibility**:
- Emphasize the importance of open datasets and the challenges of data privacy and security.
- Discuss strategies for sharing data, such as anonymization and data licensing.
4. **Algorithm and Model Transparency**:
- Address the need for sharing model architectures, hyperparameters, and training procedures.
- Discuss the role of open-source software and platforms in promoting transparency.
5. **Documentation and Reporting Standards**:
- Highlight the importance of thorough documentation, including code comments, readme files, and research papers.
- Discuss existing standards and guidelines, such as those from journals or conferences, for reporting AI research.
6. **Tools and Platforms for Reproducibility**:
- Introduce tools and platforms that facilitate reproducibility, such as version control systems (e.g., Git), containerization (e.g., Docker), and workflow management systems.
- Discuss the role of cloud computing in providing scalable and reproducible environments.
7. **Case Studies and Examples**:
- Present examples of successful reproducibility efforts in AI research.
- Discuss lessons learned from reproducibility failures and how they were addressed.
8. **Community and Collaboration**:
- Highlight the role of the research community in fostering a culture of reproducibility.
- Discuss collaborative efforts, such as reproducibility challenges and workshops.
9. **Ethical and Legal Considerations**:
- Address ethical concerns related to data sharing and model transparency.
- Discuss legal implications, such as intellectual property rights and compliance with regulations.
10. **Future Directions and Recommendations**:
- Suggest ways to improve reproducibility in AI research, such as developing better standards, incentives, and educational resources.
- Discuss the potential impact of emerging technologies and methodologies on reproducibility.
11. **Conclusion**:
- Summarize the key points discussed.
- Reinforce the importance of reproducibility for advancing AI research and its applications.
By covering these areas, the presentation can provide a well-rounded view of the challenges and solutions related to the reproducibility of AI-based research.
A presentation on the reproducibility of AI-based research should address a multifaceted issue, covering both the challenges and potential solutions. Here's a breakdown of key areas to consider:
**1. Defining Reproducibility in the Context of AI:**
* **What does reproducibility mean in AI?** Differentiate between:
* **Replication:** Re-running the same code on the same data and getting the same results.
* **Reproduction:** Re-implementing the method from scratch based on the paper and getting similar results.
* **Robustness:** Evaluating the model's performance on different datasets or under different conditions.
* **Why is reproducibility important?** Highlight the benefits:
* **Scientific validity:** Ensuring the reliability and trustworthiness of research findings.
* **Progress and innovation:** Building upon existing work and accelerating advancements.
* **Transparency and accountability:** Allowing for scrutiny and identification of potential biases or errors.
* **Practical applications:** Facilitating the deployment and adoption of AI models in real-world scenarios.
**2. Challenges to Reproducibility in AI:**
* **Code and Implementation:**
* **Lack of code availability:** Papers often lack publicly available code or provide incomplete/unusable code.
* **Poor code quality:** Unstructured, undocumented, or hard-to-understand code.
* **Dependency issues:** Incompatible library versions, operating systems, or hardware.
* **Hidden implementation details:** Crucial parameters or preprocessing steps not explicitly mentioned in the paper.
* **Data:**
* **Data unavailability:** Datasets are often proprietary, sensitive, or difficult to access.
* **Data preprocessing:** Inconsistent or undocumented data cleaning, transformation, or augmentation techniques.
* **Data versioning:** Lack of clarity on which version of the dataset was used.
* **Data bias:** Datasets may contain biases that affect the model's performance and generalizability.
* **Computational Environment:**
* **Hardware differences:** Variations in CPU, GPU, and memory can impact results.
* **Software environment:** Differences in operating systems, libraries, and drivers.
* **Randomness:** The use of random seeds and initialization can lead to variations in results.
* **Methodological Issues:**
* **Incomplete descriptions:** Lack of detail in the paper about the experimental setup, hyperparameters, or evaluation metrics.
* **Cherry-picking results:** Reporting only the best results and ignoring less favorable outcomes.
* **Lack of statistical rigor:** Insufficient statistical analysis or inappropriate use of statistical tests.
* **Human Factors:**
* **Time constraints:** Researchers may lack the time or resources to properly document their work.
* **Incentives:** The current academic system may not adequately reward reproducible research.
* **Lack of awareness:** Researchers may not be fully aware of the importance of reproducibility or best practices.
**3. Solutions and Best Practices for Improving Reproducibility:**
* **Code Sharing and Version Control:**
* **Public repositories:** Using platforms like GitHub, GitLab, or Bitbucket to share code.
* **Version control:** Employing Git to track changes and manage different versions of the code.
* **Clear documentation:** Providing comprehensive README files and comments within the code.
* **Data Management:**
* **Public datasets:** Utilizing publicly available datasets whenever possible.
* **Data sharing platforms:** Using platforms like Kaggle or Zenodo to share datasets.
* **Data provenance:** Documenting the origin, processing, and versioning of the data.
* **Data anonymization:** Ensuring the privacy and security of sensitive data.
* **Computational Environment Management:**
* **Containerization:** Using Docker or Singularity to create reproducible environments.
* **Virtual environments:** Using tools like conda or venv to manage dependencies.
* **Configuration files:** Storing all necessary parameters and settings in configuration files.
* **Methodological Transparency:**
* **Detailed descriptions:** Providing comprehensive descriptions of the experimental setup, hyperparameters, and evaluation metrics.
* **Open access publications:** Publishing research in open access journals or repositories.
* **Pre-registration:** Registering research plans before conducting experiments.
* **Statistical rigor:** Employing appropriate statistical methods and reporting confidence intervals.
* **Community Efforts and Tools:**
* **Reproducibility checklists:** Using checklists to ensure that all necessary steps are taken.
* **Reproducibility badges:** Awarding badges to papers that meet certain reproducibility criteria.
* **Reproducibility workshops and tutorials:** Providing training and resources to researchers.
* **Open-source tools and libraries:** Developing and using tools that facilitate reproducible research.
**4. Call to Action:**
* **Encourage researchers to adopt best practices.**
* **Promote the use of open-source tools and platforms.**
* **Advocate for changes in the academic system to incentivize reproducible research.**
* **Raise awareness about the importance of reproducibility in AI.**
**Presentation Structure Suggestions:**
* **Start with a compelling example** of a reproducibility failure in AI research.
* **Clearly define the problem** and its implications.
* **Present the challenges** in a structured and understandable way.
* **Offer practical solutions** and best practices.
* **Conclude with a call to action** and a positive outlook for the future of reproducible AI research.
**Visual Aids:**
* **Use clear and concise slides.**
* **Include diagrams and illustrations to explain complex concepts.**
* **Show examples of good and bad code documentation.**
* **Use charts and graphs to visualize data and results.**
By addressing these key areas, your presentation will provide a comprehensive overview of the challenges and opportunities surrounding the reproducibility of AI-based research, ultimately contributing to a more robust and trustworthy field. Remember to tailor your presentation to your specific audience and their level of understanding.
Claude, queried on 2024-12-16, see lars_query_claude.py
and lars-prompt1.txt
Claude, queried on 2024-12-16, see lars_query_claude.py
and lars-prompt1.txt
Source: Red Warning PNG Clipart, CC-BY.
Results computed on Nov 26, 2023 based on a scan of replication packages conducted by Sebastian Kranz. 2023. “Economic Articles with Data”. https://ejd.econ.mathematik.uni-ulm.de/, searching for the words main
, master
, makefile
, dockerfile
, apptainer
, singularity
in any of the program files in those replication packages. Code not yet integrated into this presentation.
Formally, this is true for operating systems as well, and in some cases, the operating system and the programming language interact (for instance, in Python).
Claude, queried on 2024-12-16, see lars_query_claude.py
and lars-prompt1.txt
OpenAI, queried on 2024-12-14, see lars_query.py
and lars-prompt1.txt
Gemini, queried on 2024-12-15, see lars_query_gemini.py
and lars-prompt1.txt