Access

You are not currently logged in.

Access your personal account or get JSTOR access through your library or other institution:

login

Log in to your personal account or through your institution.

Specifying and Implementing Nonparametric and Semiparametric Survival Estimators in Two-Stage (Nested) Cohort Studies with Missing Case Data

Steven D. Mark and Hormuzd A. Katki
Journal of the American Statistical Association
Vol. 101, No. 474 (Jun., 2006), pp. 460-471
Stable URL: http://www.jstor.org/stable/27590709
Page Count: 12
  • Download ($14.00)
  • Cite this Item
Specifying and Implementing Nonparametric and Semiparametric Survival Estimators in Two-Stage (Nested) Cohort Studies with Missing Case Data
Preview not available

Abstract

Since 1986, we have been studying a cohort of individuals from a region in China with epidemic rates of gastric cardia cancer and have conducted numerous two-stage studies to assess the association of various exposures with this cancer. Two-stage studies are a commonly used statistical design. Stage one involves observing the outcomes and accessible baseline covariate information on all cohort members, and stage two involves using the stage one observations to select a subset of the cohort for measurements of exposures that are difficult to obtain. When the outcomes are censored failure times, such as in our studies, the most common designs used are the case-cohort and nested case-control designs. One limitation of both these designs is that the estimators of the cumulative hazards, and hence survivals and absolute risks, are biased when some cases are missing the stage two measurements. In our experience, such missingness is present in virtually all two-stage studies that (like ours) use biological specimens to obtain exposure measurements. In earlier work we derived and characterized the efficiency of a class of nonparametric and a class of semiparametric cumulative hazard estimators that are unbiased regardless of whether or not all cases are measured. In this article we limit the presentation of the mathematical derivation of these two classes to aspects important to study design and analysis. We analyze data from a two-stage study that we conducted on the association of "Helicobacter pylori" infection with incident gastric cardia cancers. We discuss the substantive reasons why we deliberately sampled only 25% of the available cancer cases. Through simulations, we demonstrate that substantial variation in precision exists between unbiased estimators within each class, and express the origin of these differences in terms of parameters familiar to investigators. We describe how preexistent knowledge about these parameters can be used to increase estimator precision, and detail specific strategies for constructing such estimators. Computer code in R that implements these estimators is available from the authors on request.

Page Thumbnails

  • Thumbnail: Page 
460
    460
  • Thumbnail: Page 
461
    461
  • Thumbnail: Page 
462
    462
  • Thumbnail: Page 
463
    463
  • Thumbnail: Page 
464
    464
  • Thumbnail: Page 
465
    465
  • Thumbnail: Page 
466
    466
  • Thumbnail: Page 
467
    467
  • Thumbnail: Page 
468
    468
  • Thumbnail: Page 
469
    469
  • Thumbnail: Page 
470
    470
  • Thumbnail: Page 
471
    471