Wesley S. Burr$^\star$ and Glen Takahara$^\dagger$
$\star$ Trent University, Peterborough, ON, Canada; $\dagger$ Queen's University, Kingston, ON, Canada
The traditional approach in the field uses a log-linear Generalized Additive Model to obtain the estimated pollution log-relative-rate \(\beta\) for pollutant \(\mathbf{x}\),
\[\begin{split} \log (\mu) &= \beta \mathbf{x} + \gamma_1 \text{DOW} + S_1 (\text{time},7/\text{year}) \\ &+ S_2(\text{temp},3) + \cdots \\ \end{split} \] with various confounding terms such as temperature (mean daily), day-of-week (DOW, as factor), and time. In R using the gam package, we might estimate such a model via
mod <- gam(y ~ x + dow + ns(time, df = 7) + ns(temp, df = 3), family = poisson,
data = my_data)
A second common approach to modeling the population health impacts of air pollution uses distributed-lag or distributed-lag non-linear models to estimate association over multiple lags.
Simple distributed lag model: \[ y_t = a + w_o x_t + w_1 x_{t-1} + \cdots + w_n x_{t-n} + \epsilon_t. \]
Given a time series \(x_t\), indexed by time:
Given two series \(x_t, y_t\), we can also estimate the cross-spectral density, which again has two components: the amplitude (spectrum) and phase.
A large portion of the association obtained by the fitting of the residual effective predictor and response, after accounting for other confounders and the smooth function of time, is actually driven by coherence between short timescale line components present in both the predictor and response.
Instead of considering the association as point regression on discrete, day-sized chunks of data, consider a continuous relationship between the predictor (health effect) and response (air pollutant). The data consists of:
for access to wonderful data.