Title: | Novel Methods for Reproduction Number Estimation, Back-Calculation, and Forecasting |
---|---|
Description: | A collection of functions related to novel methods for estimating R(t), created by the lab of Professor Laura White. Currently implemented methods include two-step Bayesian back-calculation and now-casting for line-list data with missing reporting delays, adapted in 'STAN' from Li (2021) <doi:10.1371/journal.pcbi.1009210>, and calculation of time-varying reproduction number assuming a flux between various adjacent states, adapted into 'STAN' from Zhou (2021) <doi:10.1371/journal.pcbi.1010434>. |
Authors: | Chad Milando [aut, cre], Tenglong Li [ctb], Zhenwei Zhou [ctb], Laura White [ctb] |
Maintainer: | Chad Milando <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.01 |
Built: | 2025-01-13 04:52:34 UTC |
Source: | https://github.com/cmilando/whitelabrt |
A collection of functions related to novel methods for estimating reproduction number, R(t), created by the lab of Professor Laura White at Boston University School of Public Health.
Currently implemented methods include (1) Temporal R(t) estimation: Two-step Bayesian back and nowcasting for linelist data with missing reporting delays, adapted in 'STAN' from Li et. al. 2021, and (2) Spatial R(t) estimation: Calculating time-varying reproduction number, R(t), assuming a flux of infectors between various adjacent states, in 'STAN' from Zhou et. al. 2021.
Maintainer: Chad Milando [email protected]
Other contributors:
Tenglong Li [contributor]
Zhenwei Zhou [contributor]
Laura White [contributor]
Stan Development Team (NA). RStan: the R interface to Stan. R package version 2.32.6. https://mc-stan.org
This function takes a data frame of case counts and expands it into a line list format, which is often used for epidemiological analysis. The function validates input data, manages missingness, and assumes additional generation times based on the specified reporting function.
convert_to_linelist( caseCounts, reportF_missP, reportF = NULL, reportF_args = NULL )
convert_to_linelist( caseCounts, reportF_missP, reportF = NULL, reportF_args = NULL )
caseCounts |
A data frame with columns |
reportF_missP |
A numeric probability between 0 and 1 (exclusive) indicating the proportion of missing onset dates. It throws an error if it is out of bounds or not numeric. |
reportF |
A function used to simulate the delay from case reporting to case onset.
Defaults to a negative binomial distribution function ( |
reportF_args |
A list of additional arguments to pass to |
The function stops and sends error messages for various data integrity issues,
such as incorrect data types, negative cases, or missing required columns.
It also assumes that the input data is for only one location and handles
NA generation according to reportF_missP
.
A data frame in line list format, where each row corresponds to a case report.
The data frame includes columns for the report date, the delay from report to onset,
the onset date, weekend indicator, report interval in days from the first report,
and week interval.
The returned data frame has additional attributes set, including min_day
and the
class lineList
.
data("sample_dates") data("sample_location") data("sample_cases") case_Counts <- create_caseCounts(sample_dates, sample_location, sample_cases) line_list <- convert_to_linelist(case_Counts, reportF_missP = 0.5)
data("sample_dates") data("sample_location") data("sample_cases") case_Counts <- create_caseCounts(sample_dates, sample_location, sample_cases) line_list <- convert_to_linelist(case_Counts, reportF_missP = 0.5)
This function constructs a data frame from vectors representing dates, locations, and case numbers, ensuring that all input vectors meet specific data integrity requirements. It checks for the correct data types, non-negative case numbers, and uniformity in vector lengths. The function also ensures no missing values are present and that all data pertain to a single location.
create_caseCounts(date_vec, location_vec, cases_vec)
create_caseCounts(date_vec, location_vec, cases_vec)
date_vec |
A vector of dates corresponding to case reports; must be of type Date. |
location_vec |
A character vector representing the location of the case reports; all entries must refer to the same location. |
cases_vec |
A numeric vector representing the number of cases reported on each date; values must be non-negative integers. |
The function performs several checks to ensure the integrity of the input:
- It verifies that all vectors have the same length.
- It confirms that there are no negative numbers in cases_vec
.
- It checks for and disallows any missing values in the data frame.
It throws errors if any of these conditions are not met, indicating that
the input vectors are not appropriately formatted or contain invalid data.
A data frame named caseCounts
with columns date
, cases
, and location
.
Each row corresponds to a unique report of cases on a given date at a specified location.
The data frame is assigned a class attribute of caseCounts
.
data("sample_dates") data("sample_location") data("sample_cases") case_Counts = create_caseCounts(sample_dates, sample_location, sample_cases)
data("sample_dates") data("sample_location") data("sample_cases") case_Counts = create_caseCounts(sample_dates, sample_location, sample_cases)
This function constructs a line list data frame using vectors of report and onset dates.
create_linelist(report_dates, onset_dates)
create_linelist(report_dates, onset_dates)
report_dates |
A vector of dates representing when cases were reported; must be of type Date. |
onset_dates |
A vector of dates representing when symptoms onset occurred; must be of type Date. This vector can contain NA values, but not exclusively or none at all. |
The function ensures the following:
- The length of report_dates
and onset_dates
must be equal.
- There should be no NA values in report_dates
.
- onset_dates
must contain some but not all NA values.
- Each non-NA onset date must be earlier than or equal to its
corresponding report date.
If any of these conditions are violated, the function will
stop with an error message.
Additionally, the function calculates the delay in days between
onset and report dates,
identifies weekends, and calculates reporting and week intervals
based on the earliest date.
A data frame with the following columns:
report_dates, delay_int, onset_dates, is_weekend, report_int, and week_int.
This data frame is ordered by report_dates and assigned a class
attribute of lineList
.
data("sample_onset_dates") data("sample_report_dates") line_list <- create_linelist(sample_report_dates, sample_onset_dates)
data("sample_onset_dates") data("sample_report_dates") line_list <- create_linelist(sample_report_dates, sample_onset_dates)
an example backnow
output object from run_backnow()
out_list_demo
out_list_demo
An backnow object
list
This function plots estimates of case numbers or reproduction numbers (r(t)
) based on the
provided object. It can handle two types of plots: 'est' for estimated case numbers over time,
and 'rt' for estimated reproduction numbers over time.
## S3 method for class 'backnow' plot(x, plottype, ...)
## S3 method for class 'backnow' plot(x, plottype, ...)
x |
An object containing the necessary data for plotting. This object should have
specific structure depending on the |
plottype |
A character string specifying the type of plot to generate. Valid options are 'est' for case estimates and 'rt' for reproduction numbers. |
... |
Additional arguments passed to the plot function. |
Depending on the plottype
:
- 'est': Plots the reported cases over time with a polygon representing the
uncertainty interval and a line showing the central estimate.
- 'rt': Plots the reproduction number over time with a similar style.
a plot object for an object of class backnow
data("sample_onset_dates") data("sample_report_dates") line_list <- create_linelist(sample_report_dates, sample_onset_dates) sip <- si(14, 4.29, 1.18) results <- run_backnow( line_list, sip = sip, chains = 1) plot(results, "est")
data("sample_onset_dates") data("sample_report_dates") line_list <- create_linelist(sample_report_dates, sample_onset_dates) sip <- si(14, 4.29, 1.18) results <- run_backnow( line_list, sip = sip, chains = 1) plot(results, "est")
This function plots the number of cases over time from a data frame object. If the data frame contains multiple locations, a specific location must be specified. The plot displays the total number of cases against dates and annotates one of the earliest points with the location name.
## S3 method for class 'caseCounts' plot(x, loc = NULL, ...)
## S3 method for class 'caseCounts' plot(x, loc = NULL, ...)
x |
A data frame containing the case counts with at least two columns: |
loc |
An optional string specifying the location to filter the case counts by. If |
... |
Additional arguments passed to the |
If the location
column is present in x
and contains multiple unique values,
the loc
parameter must be specified to indicate which location's data to plot.
The function adds a text annotation to the plot, labeling one of the earliest points
with the specified location's name.
a plot object for an object of class caseCounts
data("sample_dates") data("sample_location") data("sample_cases") case_Counts = create_caseCounts(sample_dates, sample_location, sample_cases) plot(case_Counts)
data("sample_dates") data("sample_location") data("sample_cases") case_Counts = create_caseCounts(sample_dates, sample_location, sample_cases) plot(case_Counts)
This function performs a back-calculation based on provided epidemic case count data, estimating the time distribution of infections and reproduction numbers (r(t)). It utilizes extensive input checks and parameter validation to ensure robust model execution.
run_backnow( input, sip, NB_maxdelay = as.integer(20), window_size = as.integer(7), ... )
run_backnow( input, sip, NB_maxdelay = as.integer(20), window_size = as.integer(7), ... )
input |
A 'lineList' data.frame from |
sip |
Vector of numeric values specifying the serial interval probabilities. |
NB_maxdelay |
Integer, the maximum delay for the negative binomial distribution used in modeling. |
window_size |
Integer, the number of days of the R(t) averaging window. |
... |
Additional arguments passed to rstan::sampling() |
The function ensures input data is of the correct class and processes it accordingly.
It handles different input classes by either converting caseCounts
to lineList
or
directly using lineList
. The function stops with an error if the input doesn't meet expected standards.
It performs simulations to estimate both the back-calculation of initial infections and reproduction numbers
over time, while checking and adjusting for potential NA values and ensuring that all conditions for the
model parameters are met. Output includes estimates of initial infections and reproduction numbers along
with diagnostic statistics.
an object of class backnow
data("sample_onset_dates") data("sample_report_dates") line_list <- create_linelist(sample_report_dates, sample_onset_dates) sip <- si(14, 4.29, 1.18) results <- run_backnow( line_list, sip = sip, chains = 1)
data("sample_onset_dates") data("sample_report_dates") line_list <- create_linelist(sample_report_dates, sample_onset_dates) sip <- si(14, 4.29, 1.18) results <- run_backnow( line_list, sip = sip, chains = 1)
Sample of aggregated case counts from a single location.
sample_cases
sample_cases
A vector of length 80.
Numeric
Sample of case report dates from a single location.
sample_dates
sample_dates
A vector of length 80.
Dates of reported aggregated cases
An vector of a single location to accompany sample_dates
and sample_cases
.
This is to emphasize that linelist functions are for a single location
sample_location
sample_location
A vector of length 80.
Character
an example rstan
output object from spatialrt()
sample_m_hier
sample_m_hier
An rstan object
list
A matrix of daily aggregated cases from two sites (Hoth and Tatooine).
sample_multi_site
sample_multi_site
A data.table data frame with 80 rows and 2 variables:
Number of aggregated cases for Site 1
Number of aggregated cases for Site 2
Line-list data, onset dates, with some missing.
sample_onset_dates
sample_onset_dates
A vector of length 6380, one value per case.
Date of onset
Line-list data, case report dates.
sample_report_dates
sample_report_dates
A vector of length 6380, one value per case.
Date of report
This function computes the probability distribution function (PDF) of the serial interval using a gamma distribution with specified shape and rate parameters. The serial interval is defined as the time between successive cases in a chain of transmission. This implementation generates a discrete PDF at the daily level.
si(ndays, shape, rate, leading0 = TRUE)
si(ndays, shape, rate, leading0 = TRUE)
ndays |
Integer, the number of days over which to calculate the serial interval distribution. |
shape |
Numeric, the shape parameter of the gamma distribution. |
rate |
Numeric, the rate parameter of the gamma distribution. |
leading0 |
Logical, should a leading 0 be added to indicate t0? |
The function uses the pgamma
function to calculate cumulative
probabilities for each day up to ndays
and then differences these
to get daily probabilities. The resulting probabilities are normalized to
sum to 1, ensuring that they represent a valid probability distribution.
Numeric vector representing the serial interval probabilities
for each of the first ndays
days. The probabilities are normalized
so that their sum is 1.
si(ndays = 14, shape = 4.29, rate = 1.18)
si(ndays = 14, shape = 4.29, rate = 1.18)
This function calculates R(t) that arises from transfer of infectors between different states. There are different flavors of the model, but the base version calculates a weekly R(t) within each state.
spatialRt(report_dates, case_matrix, transfer_matrix, sip, v2 = FALSE, ...)
spatialRt(report_dates, case_matrix, transfer_matrix, sip, v2 = FALSE, ...)
report_dates |
A vector of reporting dates |
case_matrix |
A matrix of cases, defined by integers |
transfer_matrix |
A matrix that defines how infectors flow between states. Each row of the transfer matrix must sum to 1. |
sip |
Vector of numeric values specifying the serial interval probabilities. |
v2 |
a flag indicating FALSE if the base algorithm is to be used, and TRUE if the experimental algorithm is desired. The experimental version contains a non-centered parameterization, an AR1 process, and partial pooling across states. |
... |
Additional arguments passed to rstan::sampling() |
An rstan object.
data("sample_multi_site") data("transfer_matrix") Y <- matrix(integer(1), nrow = nrow(sample_multi_site), ncol = 2) for(i in 1:nrow(Y)) { for(j in c(2, 3)) { Y[i,j-1] <- as.integer(sample_multi_site[i,j]) } } all(is.integer(Y)) sip <- si(14, 4.29, 1.18, leading0 = FALSE) sample_m_hier <- spatialRt(report_dates = sample_multi_site$date, case_matrix = Y, transfer_matrix = transfer_matrix, v2 = FALSE, sip = sip, chains = 1)
data("sample_multi_site") data("transfer_matrix") Y <- matrix(integer(1), nrow = nrow(sample_multi_site), ncol = 2) for(i in 1:nrow(Y)) { for(j in c(2, 3)) { Y[i,j-1] <- as.integer(sample_multi_site[i,j]) } } all(is.integer(Y)) sip <- si(14, 4.29, 1.18, leading0 = FALSE) sample_m_hier <- spatialRt(report_dates = sample_multi_site$date, case_matrix = Y, transfer_matrix = transfer_matrix, v2 = FALSE, sip = sip, chains = 1)
A matrix of daily transfers between two sites (Hoth and Tatooine). Each row sums to 1.
transfer_matrix
transfer_matrix
A data.table data frame with 160 rows and 2 variables:
Fraction of transfer to Site 1
Fraction of transfer to Site 2