The goal of this project is to demonstrate the feasibility of creating replicable blog posts for national statistical agencies. We pick a single blog post from the United States Census Bureau, but the general principle could be applied to many countries’ national statistical agencies.
A blog post by Jim Lawrence, U.S. Census Bureau [2016] (archived version, locally archived version).
Data to produce a graph like this can be found at https://www.census.gov/ces/dataproducts/bds/data_firm.html. Users can look at the economy-wide data by age of the firm, where startups are firms with zero age:
We will illustrate how to generate Figure 1 using R [2019]. Users wishing to use Javascript, SAS, or Excel, or Python, can achieve the same goal using the tool of their choice. Note that we will use the full CSV file at http://www2.census.gov/ces/bds/firm/bds_f_age_release.csv, but users might also want to consult the BDS API.
bdsbase <- "http://www2.census.gov/ces/bds/"
type <- "f_age"
ltype <- "firm"
# for economy-wide data
ewtype <- "f_all"
fafile <- paste("bds_",type,"_release.csv",sep="")
ewfile <- paste("bds_",ewtype,"_release.csv",sep="")
# this changes whether we read live data or Zenodo data
bds.from.source <- TRUE
We are going to read in two files: the economy wide file bds_f_all_release.csv
, and the by-firm-age file bds_f_age_release.csv
:
# we need the particular type
if ( bds.from.source ) {
conr <- gzcon(url(paste(bdsbase,ltype,fafile,sep="/")))
txt <- readLines(conr)
bdstype <- read.csv(textConnection(txt))
# the ew file
ewcon <- gzcon(url(paste(bdsbase,ltype,ewfile,sep="/")))
ewtxt <- readLines(ewcon)
bdsew <- read.csv(textConnection(ewtxt))
}
We’re going to now compute the fraction of total U.S. employment (Emp
) that is accounted for by job creation from startups (Job_Creation if fage4="a) 0"
):
analysis <- bdsew[,c("year2","emp")]
analysis <- merge(x = analysis, y=subset(bdstype,fage4=="a) 0")[,c("year2","Job_Creation")], by="year2")
analysis$JCR_startups <- analysis$Job_Creation * 100 / analysis$emp
# properly name everything
names(analysis) <- c("Year","Employment","Job Creation by Startups", "Job Creation Rate by Startups")
Now we simply plot this for the time period 2004-2014:
Allaire, J., Xie, Y., McPherson, J., et al. 2019. Rmarkdown: Dynamic documents for r..
Arnold, J.B. 2018. Ggthemes: Extra themes, scales and geoms for ’ggplot2’..
Lawrence, J. 2016. How Much Do Startups Impact Employment Growth in the U.S.? http://researchmatters.blogs.census.gov/2016/12/01/how-much-do-startups-impact-employment-growth-in-the-u-s/.
R Core Team. 2019. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
Wickham, H. 2016. Ggplot2: Elegant graphics for data analysis. Springer-Verlag New York.
Xie, Y. 2019. Knitr: A general-purpose package for dynamic report generation in r..