This R Markdown document walks through the steps for imputing a reporting delay distribution from linelist data with missing disease onset data, adapted in STAN from Li and White
There are two starting points for this vignette: either you have caseCount data, which are aggregated case counts by day, or you have Line-list data means you have a single row for each case, that has dates for: infection, symptom onset, positive test, and when this was reported to public health agencies.
Step 1. Load data
Step 2. Creating linelist
object
my_linelist <- create_linelist(report_dates = sample_report_dates,
onset_dates = sample_onset_dates)
#> Warning in create_linelist(report_dates = sample_report_dates, onset_dates =
#> sample_onset_dates): Some onset dates are NA
head(my_linelist)
#> report_date delay_int onset_date is_weekend report_int week_int
#> 1 2020-01-01 4 2019-12-28 0 1 1
#> 2 2020-01-01 4 2019-12-28 0 1 1
#> 3 2020-01-01 NA <NA> 0 1 1
#> 4 2020-01-01 NA <NA> 0 1 1
#> 5 2020-01-01 NA <NA> 0 1 1
#> 6 2020-01-01 8 2019-12-24 0 1 1
Step 3. Define the serial interval. The
si()
function creates a vector of length 14 with shape and
rate for a gamma distribution. Note, this has a leading
0 to indicate no infections on the day of disease onset.
Step 5. Run the back-calculation algorithm. The default is an R(t) sliding window of 7 days. Additional options to STAN can be specified in the last argument (e.g., chains, cores, control).
Plot outputs. The points are aggregated reported cases, and the red line (and shaded confidence interval) represent the back-calculated case onsets that lead to the reported data.
You can also plot the R(t)
curve over time. In this
case, the red line (and shaded confidence interval) represent the
time-varying r(t). See Li and White for description.
You can also do the same from case count data, although at some point you will have to assume a reporting delay distribution, so this would be a little circular.
Step 1. Load data
data("sample_dates")
data("sample_location")
data("sample_cases")
head(sample_dates)
#> [1] "2020-01-01" "2020-01-02" "2020-01-03" "2020-01-04" "2020-01-05"
#> [6] "2020-01-06"
head(sample_cases)
#> [1] 10 1 4 11 8 10
head(sample_location)
#> [1] "Tatooine" "Tatooine" "Tatooine" "Tatooine" "Tatooine" "Tatooine"
Step 2. Creating case-counts
caseCounts <- create_caseCounts(date_vec = sample_dates,
location_vec = sample_location,
cases_vec = sample_cases)
head(caseCounts)
#> date cases location
#> 1 2020-01-01 10 Tatooine
#> 2 2020-01-02 1 Tatooine
#> 3 2020-01-03 4 Tatooine
#> 4 2020-01-04 11 Tatooine
#> 5 2020-01-05 8 Tatooine
#> 6 2020-01-06 10 Tatooine
Step 3. Convert to linelist data. You can specify
the distribution for my_linelist
in
convert_to_linelist
. reportF
is the
distribution function, _args
lists the distribution params
that are not x
, and _missP
is the percent
missing. This must be between 0 < x < 1. Both ‘caseCounts’
and ‘caseCounts_line’ objects can be fed into run_backnow
.
The implied onset distribution is rnbinom()
with
size = 3
and mu = 9
, with
reportF_missP = 0.6
.
my_linelist <- convert_to_linelist(caseCounts,
reportF = rnbinom,
reportF_args = list(size = 3, mu = 9),
reportF_missP = 0.6)
head(my_linelist)
#> report_date delay_int onset_date is_weekend report_int week_int
#> 1 2020-01-01 14 2019-12-18 0 1 1
#> 2 2020-01-01 NA <NA> 0 1 1
#> 3 2020-01-01 NA <NA> 0 1 1
#> 4 2020-01-01 NA <NA> 0 1 1
#> 5 2020-01-01 NA <NA> 0 1 1
#> 6 2020-01-01 NA <NA> 0 1 1
Step 4. Define the serial interval. The
si()
function creates a vector of length 14 with alpha and
beta as defined in Li and White, for COVID-19.
Step 5. Run the back-calculation algorithm. The defaults are 2000 iterations and an R(t) sliding window of 7 days.