The goal of this project is to demonstrate the feasibility of creating replicable blog posts for national statistical agencies. We pick a single blog post from the United States Census Bureau, but the general principle could be applied to many countries’ national statistical agencies.

Source document

A blog post by Jim Lawrence, U.S. Census Bureau [2016] (archived version, locally archived version).

Source data

Data to produce a graph like this can be found at https://www.census.gov/ces/dataproducts/bds/data_firm.html. Users can look at the economy-wide data by age of the firm, where startups are firms with zero age:

Select Firm Age

Getting and manipulating the data

We will illustrate how to generate Figure 1 using R [2019]. Users wishing to use Javascript, SAS, or Excel, or Python, can achieve the same goal using the tool of their choice. Note that we will use the full CSV file at http://www2.census.gov/ces/bds/firm/bds_f_age_release.csv, but users might also want to consult the BDS API.

bdsbase <- "http://www2.census.gov/ces/bds/"
type <- "f_age"
ltype <- "firm"
# for economy-wide data
ewtype <- "f_all"

fafile <- paste("bds_",type,"_release.csv",sep="")
ewfile <- paste("bds_",ewtype,"_release.csv",sep="")

# this changes whether we read live data or Zenodo data
bds.from.source <- TRUE

We are going to read in two files: the economy wide file bds_f_all_release.csv, and the by-firm-age file bds_f_age_release.csv:

# we need the particular type 
if ( bds.from.source ) {
  conr <- gzcon(url(paste(bdsbase,ltype,fafile,sep="/")))
  txt <- readLines(conr)
  bdstype <- read.csv(textConnection(txt))
  # the ew file
  ewcon <- gzcon(url(paste(bdsbase,ltype,ewfile,sep="/")))
  ewtxt <- readLines(ewcon)
  bdsew <- read.csv(textConnection(ewtxt))
}

We’re going to now compute the fraction of total U.S. employment (Emp) that is accounted for by job creation from startups (Job_Creation if fage4="a) 0"):

analysis <- bdsew[,c("year2","emp")]
analysis <- merge(x = analysis, y=subset(bdstype,fage4=="a) 0")[,c("year2","Job_Creation")], by="year2")
analysis$JCR_startups <- analysis$Job_Creation * 100 / analysis$emp
# properly name everything
names(analysis) <- c("Year","Employment","Job Creation by Startups", "Job Creation Rate by Startups")

Create Figure 1

Now we simply plot this for the time period 2004-2014:

Compare to original image:

original image

References

Allaire, J., Xie, Y., McPherson, J., et al. 2019. Rmarkdown: Dynamic documents for r..

Arnold, J.B. 2018. Ggthemes: Extra themes, scales and geoms for ’ggplot2’..

Lawrence, J. 2016. How Much Do Startups Impact Employment Growth in the U.S.? http://researchmatters.blogs.census.gov/2016/12/01/how-much-do-startups-impact-employment-growth-in-the-u-s/.

R Core Team. 2019. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.

Wickham, H. 2016. Ggplot2: Elegant graphics for data analysis. Springer-Verlag New York.

Xie, Y. 2019. Knitr: A general-purpose package for dynamic report generation in r..

Replication for: How Much Do Startups Impact Employment Growth in the U.S.?

Lars Vilhuber

December 1, 2016