Mark van der Laan (Division of Biostatistics, UC Berkeley)

Targeted Maximum Likelihood Estimation of Causal Effects

vendredi 26 mars 2010, 9h00 - 9h55

Salle de réunion, espace Turing


Current statistical practice to assess an effect of an intervention or exposure on an outcome of interest often involves either maximum likelihood estimation for a priori specified parametric regression models, or, manual and/or data adaptive interventions to fine tune a choice of parametric model. In both cases, bias in the point estimates and the estimate of the signal to noise ratio are rampant, causing an epidemic of false claims based on data analyses.

In this talk we present targeted maximum likelihood estimation of target parameters of the data generating distribution in realistic semiparametric models, thus only relying on realistic assumptions. The targeted maximum likelihood estimator is semiparametric efficient, and is shown to also be collaborative double robust w.r.t. misspecification of nuisance parameters of the data generating distribution, thereby less biased than other proposed semiparametric estimators. Two fundamental concepts underlying this methodology are super learning, i.e., the very aggressive use of cross-validation to select optimal combinations of many model fits, and subsequent targeted maximum likelihood estimation along a hardest parametric submodel through the super learning fit to target the fit towards the causal effect/target parameter of interest.

We illustrate this method in observational studies and randomized controlled trials for assessing the causal effect of an intervention on time till event outcome dealing with confounding of treatment and (possibly informative) right-censoring, for discovery of mutations in the HIV virus that cause resistance to a particular drug regimen, and
for assessing the effects of single nucleotide polymorphisms on an outcome of interest.