Backcalculating Reporting Delays Distributions from Linelist Data

This R Markdown document walks through the steps for imputing a reporting delay distribution from linelist data with missing disease onset data, adapted in STAN from Li and White

There are two starting points for this vignette: either you have caseCount data, which are aggregated case counts by day, or you have Line-list data means you have a single row for each case, that has dates for: infection, symptom onset, positive test, and when this was reported to public health agencies.

Example: lineList data

library(WhiteLabRt)

Step 1. Load data

data("sample_report_dates")
data("sample_onset_dates")

Step 2. Creating linelist object

my_linelist <- create_linelist(report_dates = sample_report_dates,
                                onset_dates = sample_onset_dates)
#> Warning in create_linelist(report_dates = sample_report_dates, onset_dates =
#> sample_onset_dates): Some onset dates are NA
head(my_linelist)
#>   report_date delay_int onset_date is_weekend report_int week_int
#> 1  2020-01-01         4 2019-12-28          0          1        1
#> 2  2020-01-01         4 2019-12-28          0          1        1
#> 3  2020-01-01        NA       <NA>          0          1        1
#> 4  2020-01-01        NA       <NA>          0          1        1
#> 5  2020-01-01        NA       <NA>          0          1        1
#> 6  2020-01-01         8 2019-12-24          0          1        1

Step 3. Define the serial interval. The si() function creates a vector of length 14 with shape and rate for a gamma distribution. Note, this has a leading 0 to indicate no infections on the day of disease onset.

sip <- si(14, 4.29, 1.18)
plot(sip, type = 'l')

Step 5. Run the back-calculation algorithm. The default is an R(t) sliding window of 7 days. Additional options to STAN can be specified in the last argument (e.g., chains, cores, control).

out_list_demo <- run_backnow(my_linelist, sip = sip, chains = 1)

Plot outputs. The points are aggregated reported cases, and the red line (and shaded confidence interval) represent the back-calculated case onsets that lead to the reported data.

data("out_list_demo")
plot(out_list_demo, "est")

You can also plot the R(t) curve over time. In this case, the red line (and shaded confidence interval) represent the time-varying r(t). See Li and White for description.

data("out_list_demo")
plot(out_list_demo, "rt")

Example: Case Count data

You can also do the same from case count data, although at some point you will have to assume a reporting delay distribution, so this would be a little circular.

Step 1. Load data

data("sample_dates")
data("sample_location")
data("sample_cases")

head(sample_dates)
#> [1] "2020-01-01" "2020-01-02" "2020-01-03" "2020-01-04" "2020-01-05"
#> [6] "2020-01-06"
head(sample_cases)
#> [1] 10  1  4 11  8 10
head(sample_location)
#> [1] "Tatooine" "Tatooine" "Tatooine" "Tatooine" "Tatooine" "Tatooine"

Step 2. Creating case-counts

caseCounts <- create_caseCounts(date_vec = sample_dates,
                                location_vec = sample_location,
                                cases_vec = sample_cases)
head(caseCounts)
#>         date cases location
#> 1 2020-01-01    10 Tatooine
#> 2 2020-01-02     1 Tatooine
#> 3 2020-01-03     4 Tatooine
#> 4 2020-01-04    11 Tatooine
#> 5 2020-01-05     8 Tatooine
#> 6 2020-01-06    10 Tatooine

Step 3. Convert to linelist data. You can specify the distribution for my_linelist in convert_to_linelist. reportF is the distribution function, _args lists the distribution params that are not x, and _missP is the percent missing. This must be between 0 < x < 1. Both ‘caseCounts’ and ‘caseCounts_line’ objects can be fed into run_backnow. The implied onset distribution is rnbinom() with size = 3 and mu = 9, with reportF_missP = 0.6 .

my_linelist <- convert_to_linelist(caseCounts, 
                                   reportF = rnbinom, 
                                   reportF_args = list(size = 3, mu = 9),
                                   reportF_missP = 0.6)
head(my_linelist)
#>   report_date delay_int onset_date is_weekend report_int week_int
#> 1  2020-01-01        14 2019-12-18          0          1        1
#> 2  2020-01-01        NA       <NA>          0          1        1
#> 3  2020-01-01        NA       <NA>          0          1        1
#> 4  2020-01-01        NA       <NA>          0          1        1
#> 5  2020-01-01        NA       <NA>          0          1        1
#> 6  2020-01-01        NA       <NA>          0          1        1

Step 4. Define the serial interval. The si() function creates a vector of length 14 with alpha and beta as defined in Li and White, for COVID-19.

sip <- si(14, 4.29, 1.18)

Step 5. Run the back-calculation algorithm. The defaults are 2000 iterations and an R(t) sliding window of 7 days.

options(mc.cores = 4)

out_list_demo <- run_backnow(my_linelist, sip = sip)