Workshop reproducibility - Berlin

Author

Lars Vilhuber

Published

October 15, 2016

Session 3

Reproducibility when some data are confidential

Session 4

One of the following:

Results

# download data to local location
download.file(gurl,destfile=here::here("data","berlin-google-responses.csv"))
# read file in
berlin_google_responses <- read_csv(here::here("data","berlin-google-responses.csv"))
Rows: 17 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): Timestamp, Topic, What is the name of the instructor?
dbl (1): How many browser tabs do you have open?

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
berlin_google_responses |>
  select(Topic) |>
  group_by(Topic) |>
  summarise(Frequency=n()) |>
  ungroup() |>
  mutate(Percent = round(Frequency/nrow(berlin_google_responses)*100,2)) -> data

data |> kable()
Topic Frequency Percent
Ethics and Privacy in Data Dissemination (discussion) 3 17.65
In-depth Tutorial on Use of Containers 8 47.06
Just-in-time Preservation of Research Data (hands-on) 6 35.29
ggplot(data,  aes(y=Frequency, x=Topic)) +
  geom_bar(position='dodge', stat='identity')

Guidance

Some additional guidance can be found on the website of the Social Science Data Editors (URLs subject to change):

Examples of replication packages

With confidential data

Using containers:

Extra info