library(dplyr)
library(gt) # For tables
library(ggplot2)
library(readsdr)
library(tidyr)
This tutorial shows a workflow that integrates System Dynamics (SD) software and R using the readsdr package. This workflow combines the strengths of both tools: SD software for model building and R for data science.
In particular, I employ the SIR model with demography (see page 26 in Keeling and Rohani (2011)) to demonstrate the functionalities of the package. This model is useful for understanding the longer-term persistence and endemic dynamics of an infectious disease.
The model is built in Stella and can be downloaded here.
The dynamics of this system (Eq (2.1)) are governed by three compartments:
\(S_t\): The fraction of susceptible individuals at time t. These are individuals who can contract the disease.
\(I_t\): The fraction of infectious individuals at time t. These individuals can transmit the disease to others.
\(R_t\) : The fraction of recovered or immune individuals at time t. These individuals have recovered from the disease and are permanently immune.
\[\begin{equation} \begin{aligned} \dot{S}_t &= \nu -\beta S_t I_t - \mu S_t \\ \dot{I}_t &= \beta S_t I_t - \gamma I_t - \mu I_t \\ \dot{R}_t &= \gamma I_t - \mu R_t \\ \end{aligned} \tag{2.1} \end{equation}\]
In this model, \(\nu\) and \(\mu\) denote birth and death rates, respectively. These rates are configured identically (\(\nu = \mu\)) to ensure the population size remains constant over time. Furthermore, \(\beta\) corresponds to the effective contact rate and \(\gamma\) to the recovery rate. These parameters also determine the basic reproduction number (\(\Re_0 = \frac{\beta}{\mu +\gamma}\)).
Finally, compartment \(C_t\) (Eq (2.2)) is an accounting stock that keeps track of the number of cases over the simulation period, where \(N\) denotes the population size.
\[\begin{equation} \dot{C} = N \beta S_t I_t \tag{2.2} \end{equation}\]
After building the model, we need to convert it to a format compatible with R.
The function read_xmile
from the readsdr package accomplishes this task by
translating the XMILE code in the .stmx file into a list (mdl
) of various
objects.
model_path <- "https://jandraor.github.io/models/SIR_demography.stmx"
mdl <- read_xmile(model_path)
The sd_stocks
function helps us verify a model’s stocks and their initial
values. It outputs a data frame that can be used for further analysis. For
instance, we can use this data frame to create an HTML table showing the initial
state of the system. In this example, the table reveals that at the start of the
simulation, 10% of the population is susceptible to the disease and almost
90% are immune.
sd_stocks(mdl) |> gt()
name | init_value |
---|---|
S | 0.1000 |
I | 0.0001 |
R | 0.8999 |
C | 0.0000 |
Similarly, we can use the sd_constants
function to verify parameter values.
sd_constants(mdl) |> gt()
name | value |
---|---|
par_gamma | 1.000000e+00 |
par_beta | 1.000000e+01 |
par_nu | 2.747253e-04 |
par_mu | 2.747253e-04 |
N | 1.000000e+06 |
Of course, the purpose of building a model is to simulate it and understand its
dynamic behaviour. In R, the deSolve package is commonly employed to simulate
Ordinary Differential Equation (ODE) models. See Duggan (2016) for further
details. The sd_simulate
function conveniently simplifies this process by
acting as a wrapper. To use it, we need the deSolve_components
object in mdl
list.
sim_result <- sd_simulate(mdl$deSolve_components, start_time = 0,
stop_time = 1000, timestep = 1/64,
integ_method = "euler")
We can use the tidyverse to tidy and plot the results. See Duggan (2018) for further details.
tidy_df <- sim_result |>
select(time, S, I, R, C) |>
pivot_longer(-time) |>
mutate(name = factor(name, levels = c("S", "I", "R", "C")))
ggplot(tidy_df, aes(time, value)) +
geom_line(colour = "steelblue") +
facet_wrap(~name, scales = "free") +
labs(x = "Week", y = "Value") +
theme_classic() +
theme(axis.line = element_line(colour = "grey65"),
axis.text = element_text(colour = "grey65"))
This plot shows that the cumulative number of cases (\(C\)) grows over time. This
reflects the ongoing occurrence of new cases (incidence). While this model
captures the instantaneous incidence rate (denoted by \(\dot{C}\)), it is often
the case that we more interested in the periodic incidence rate, especially when
fitting a model to
incidence data. This periodic rate corresponds to the number of cases that
occur over a specific time span (e.g., 1 day, 1 week). In our model, the weekly
incidence rate at time \(t\) can be defined as \(C_t - C_{t-1}\). The function
sd_net_change
facilitates this calculation.
weekly_incidence <- sd_net_change(sim_result, "C")
ggplot(weekly_incidence, aes(time, value)) +
geom_line(colour = "steelblue") +
labs(y = "Cases per week", x = "Week") +
theme_classic() +
theme(axis.line = element_line(colour = "grey65"),
axis.text = element_text(colour = "grey65"))
Certainly, the main advantage of simulation models is their ability to test
different assumptions. For instance, we can evaluate the impact of a lower
contact rate on the incidence dynamics . To do so, we read the model again and
override the default value using the argument const_list
in the read_xmile
function. We can also override initial values with stock_list
. The blue line
in the graph indicates the simulation with the new, lower contact rate, whereas
the grey line represents the simulation with the original contact rate.
mdl2 <- read_xmile(model_path, const_list = list(par_beta = 8))
sim_alt1 <- sd_simulate(mdl2$deSolve_components, start_time = 0,
stop_time = 1000, timestep = 1/64,
integ_method = "euler")
wkl_inc2 <- sd_net_change(sim_alt1, "C")
ggplot(wkl_inc2, aes(time, value)) +
geom_line(colour = "steelblue") +
geom_line(data = weekly_incidence, colour = "grey50") +
labs(y = "Cases per week", x = "Week") +
theme_classic() +
theme(axis.line = element_line(colour = "grey65"),
axis.text = element_text(colour = "grey65"))
In practice, we often need to test more than one value and assess the impact on
multiple variables. The sd_sensitivity_run
function streamlines this process
by allowing us to perform sensitivity analysis efficiently. In this case, we
explore two levels of initial immunity and two contact rate values. This
combination leads to four different scenarios for analysis. The function can
handle either a data frame containing modified constants (using const_list
argument) or a data frame with initial value changes (using stock_list
argument). If you provide both, they must have the same number of rows. For
simplicity, we illustrate the dynamics using the instantaneous incidence
rate.
c_df <- data.frame(par_beta = c(8, 8, 10, 10))
s_df <- data.frame(S = c(0.1, 0.2, 0.1, 0.2),
I = 1e-4) |>
mutate(R = S - I)
bind_cols(c_df, s_df) |>
mutate(Scenario = 1:4, .before = everything()) |> gt()
Scenario | par_beta | S | I | R |
---|---|---|---|---|
1 | 8 | 0.1 | 1e-04 | 0.0999 |
2 | 8 | 0.2 | 1e-04 | 0.1999 |
3 | 10 | 0.1 | 1e-04 | 0.0999 |
4 | 10 | 0.2 | 1e-04 | 0.1999 |
sens_run <- sd_sensitivity_run(mdl2$deSolve_components,
consts_df = c_df,
stocks_df = s_df,
start_time = 0,
stop_time = 1000,
integ_method = "euler",
timestep = 1/128)
ggplot(sens_run, aes(time, C_in, group = as.factor(iter))) +
geom_line(colour = "steelblue") +
facet_wrap(~iter, ncol = 1, scales = "free_y") +
labs(y = "Instantaneous incidence rate", x = "Week") +
theme_classic() +
theme(axis.line = element_line(colour = "grey65"),
axis.text = element_text(colour = "grey65"))
Finally, we can finally test the effect of a simple intervention: reducing the
effective contact rate to zero for 12 weeks, starting at week 65. The function
sd_what_if_from_time
facilitates this exploration.
# Intervention
itv_df <- sd_what_if_from_time(65,
up_to_time = 77,
par_list = list(par_beta = 0),
ds_inputs = mdl$deSolve_components,
start_time = 0,
stop_time = 1000,
timestep = 1/128,
integ_method = "euler")
In this plot, the blue lines indicate the intervention scenario, whereas the grey lines denote the scenario without the intervention.
tidy_itv_df <- itv_df |> select(time, par_beta, C_in) |>
rename('Effective contact rate' = par_beta,
'Instantaneous incidence rate' = C_in) |>
pivot_longer(-time)
ref_sim <- sim_result |> select(time, par_beta, C_in) |>
rename('Effective contact rate' = par_beta,
'Instantaneous incidence rate' = C_in) |>
pivot_longer(-time)
ggplot(tidy_itv_df, aes(time, value)) +
geom_line(colour = "steelblue", alpha = 0.5) +
geom_line(data = ref_sim, colour = "grey60",
alpha = 0.5) +
facet_wrap(~name, scales = "free_y", ncol = 1) +
labs(y = "Value", x = "Week") +
theme_classic()
sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Sonoma 14.4.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: Europe/Dublin
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] tidyr_1.3.0 readsdr_0.3.0.9000 ggplot2_3.4.4 gt_0.9.0
## [5] dplyr_1.1.2
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.4 jsonlite_1.8.7 highr_0.10 compiler_4.3.1
## [5] tidyselect_1.2.0 stringr_1.5.0 xml2_1.3.5 jquerylib_0.1.4
## [9] png_0.1-8 scales_1.3.0 yaml_2.3.7 fastmap_1.1.1
## [13] R6_2.5.1 labeling_0.4.3 generics_0.1.3 curl_5.0.2
## [17] knitr_1.43 tibble_3.2.1 bookdown_0.35 munsell_0.5.0
## [21] bslib_0.5.1 pillar_1.9.0 rlang_1.1.2 utf8_1.2.4
## [25] stringi_1.7.12 cachem_1.0.8 deSolve_1.36 xfun_0.40
## [29] sass_0.4.7 cli_3.6.1 withr_2.5.2 magrittr_2.0.3
## [33] digest_0.6.33 grid_4.3.1 rstudioapi_0.15.0 lifecycle_1.0.4
## [37] vctrs_0.6.5 evaluate_0.21 glue_1.6.2 farver_2.1.1
## [41] fansi_1.0.6 colorspace_2.1-0 purrr_1.0.2 rmarkdown_2.24
## [45] tools_4.3.1 pkgconfig_2.0.3 htmltools_0.5.6