Analysis of Multivariate Disease Classification Data in the Presence of Partially Missing Disease Traits. Academic Article uri icon


  • In modern cancer epidemiology, diseases are classified based on pathologic and molecular traits, and different combinations of these traits give rise to many disease subtypes. The effect of predictor variables can be measured by fitting a polytomous logistic model to such data. The differences (heterogeneity) among the relative risk parameters associated with subtypes are of great interest to better understand disease etiology. Due to the heterogeneity of the relative risk parameters, when a risk factor is changed, the prevalence of one subtype may change more than that of another subtype does. Estimation of the heterogeneity parameters is difficult when disease trait information is only partially observed and the number of disease subtypes is large. We consider a robust semiparametric approach based on the pseudo-conditional likelihood for estimating these heterogeneity parameters. Through simulation studies, we compare the robustness and efficiency of our approach with that of the maximum likelihood approach. The method is then applied to analyze the associations of weight gain with risk of breast cancer subtypes using data from the American Cancer Society Cancer Prevention Study II Nutrition Cohort.

published proceedings

  • J Biom Biostat

author list (cited authors)

  • Miao, J., Sinha, S., Wang, S., Diver, W. R., & Gapstur, S. M.

citation count

  • 0

complete list of authors

  • Miao, Jingang||Sinha, Samiran||Wang, Suojin||Diver, W Ryan||Gapstur, Susan M

publication date

  • January 2014