library(readxl)temp <- tempfile()temp2 <- tempfile()download.file("https://cdn1.sph.harvard.edu/wp-content/uploads/sites/1268/2017/01/nhefs_excel.zip", temp)unzip(zipfile = temp, exdir = temp2)data <- read_xls(file.path(temp2, "NHEFS.xls"))unlink(c(temp, temp2))
library(tidyverse)library(magrittr)# define/recode variables data %<>% mutate( # censoring cens = if_else(is.na(wt82_71), 1, 0), # categorize the school variable (as in R code by Joy Shi and Sean McGrath) education = cut(school, breaks = c(0, 8, 11, 12, 15, 20), include.lowest = TRUE, labels = c('1. 8th Grage or Less', '2. HS Dropout', '3. HS', '4. College Dropout', '5. College or More')), active = factor(active), exercise = factor(exercise), survtime = if_else(death == 0, 120, (yrdth-83)*12 + modth) # yrdth ranges from 83 to 92 ) %>% filter(!is.na(education))
Agnostic about the shape of the hazard when all covariates in the model have value of zero–often referred to as the baseline hazard
Assume restrictions on the relation between the baseline hazard and the hazard under other combinations of covariate values
Discrete time HR
\(\frac{Pr[D_{k+1}=1 |D_k=0, A=1]}{Pr[D_{k+1}=1|D_k=0, A=0]}\) is \(exp(\alpha_1\)) at all times \(k+1\) in the hazards model \(Pr[D_{k+1}=1|D_k=0, A] = Pr[D_{k+1}|D_k=0, A=0] exp(\alpha_1A)\)
\(log(Pr[D_{k+1}=1|D_k=0, A])=\alpha_{0,k} + \alpha_1k\), where \(Pr[D_{k+1}=1|D_k=0, A=0]=\alpha_{0,k}\)
When \(k + 1\) is small ➡️ \(Pr[D_{k+1=1}|D_k=0, A=0]≈0\), the he hazard is ≈ to the odds
Odds: \(\frac{Pr[D_{k+1}=1|D_k=0, A ]}{Pr[D_{k+1}=0|D_k=0, A]}\)
\(log(\frac{Pr[D_{k+1}=1|D_k=0, A ]}{Pr[D_{k+1}=0|D_k=0, A]})\) = \(logit(Pr[D_{k+1}=1|D_k=0, A])\) = \(\alpha_{0,k} + \alpha_1k\)
If the hazard is close to zero at \(k + 1\), one can approximate the log hazard ratio by a coefficient from a logistic model
ipw_num <- glm(qsmk ~ 1, data = data, family = binomial(link = "logit"))# denominator (propensity score model for the treatment)ipw_denom <- glm(qsmk ~ sex + race + age + I(age*age) + as.factor(education) + smokeintensity + I(smokeintensity*smokeintensity) + smokeyrs + I(smokeyrs*smokeyrs) + as.factor(exercise) + as.factor(active) + wt71 + I(wt71*wt71), data = data, family = binomial(link = "logit"))data %<>% mutate( # predicted probability of the treatment p_x = predict(ipw_num, type = "response"), p_x = case_when( qsmk == 1 ~ p_x, is.na(qsmk) ~ NA_real_, TRUE ~ 1 - p_x ), # predicted probability of the treatment given confounders p_x_l = predict(ipw_denom, type = "response"), p_x_l = case_when( qsmk == 1 ~ p_x_l, is.na(qsmk) ~ NA_real_, TRUE ~ 1-p_x_l ), # stabilized IPTW iptw_stab = p_x / p_x_l)
The IP weighted model estimates the time-varying hazards that would have been observed if all individuals in the study population had been treated (a = 1) and the time-varying hazards if they had been untreated (a = 0)
Though the survival curve under treatment was lower than the curve under no treatment for most of the follow-up, the maximum difference never exceeded −1.4% with a 95% confidence interval from −3.4% to 0.7%.
Little evidence of an effect of smoking cessation on mortality at any time during the follow-up.
# expand original data set until the last observed month for everyonemonths_gform <- data %>% # using observed data and observed censoring times uncount(weights = survtime, .remove = F) %>% group_by(seqn) %>% mutate ( time = row_number() - 1, event = case_when( time == survtime -1 & death == 1 ~ 1, TRUE ~ 0 ), timesq = time*time ) %>% ungroup() %>% select(seqn, qsmk, time, timesq, age, sex, race, education, smokeintensity, smkintensity82_71, smokeyrs, exercise, active, wt71, event)# fitting Q-model (model for the outcome) with confounders and outcome predictors for better precision in observation-month dataq_model <- glm(event == 0 ~ qsmk + I(qsmk*time) + I(qsmk*timesq) + time + timesq + sex + race + age + I(age*age) + as.factor(education) + smokeintensity + I(smokeintensity*smokeintensity) + smkintensity82_71 + smokeyrs + I(smokeyrs*smokeyrs) + as.factor(exercise) + as.factor(active) + wt71 + I(wt71*wt71), data = months_gform, family = binomial(link = "logit"))
After adjustment for the covariates L via standardization, we found little evidence of an effect of smoking cessation on mortality at any time during the follow-up. Note that the survival curves estimated via IP weighting and the parametric g-formula are similar but not identical because they rely on different parametric assumptions.
The IP weighted estimates require no misspecification of a model for treatment and a model for the unconditional hazards.
The parametric g-formula estimates require no misspecification of a model for the conditional hazards.
Structural nested log-linear model to model the ratio of cumulative incidences (i.e., risks) under different treatment levels
Accelerated failure time (AFT) model
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |