Kravitz, Eli Samuel (2019-05). Scoring and Relative Risk Analysis in Nutrition and Physical Activity. Doctoral Dissertation. Thesis uri icon

abstract

  • This work presents three analyses of the NIH-AARP Study of Diet and Health. Each analysis develops recommendations for nutritional intake or physical behaviors, or alters existing recommendations. New statistical methodology for nonlinear and nonparametric regression is introduced. Each methodology results in consistent estimation of relative risk of disease (cancer, mortality, etc.). Technical details and proofs are collected in separate appendices for each chapter. First, a major collaborative project to create a composite scoring system for physical activity is presented. A scoring system allows quick assessment of physical activity levels which can then be used to estimate disease risk. This score, denoted Physical Behavior Score (PBS), is verified to predict mortality using a subset of the NIH-AARP Study of Diet and Health withheld for validation. The Physical Behavior Score is highly predictive of mortality. Women in the highest quintile of scores had a 54% reduction in all-cause mortality risk, and men in the highest quintile had a 45% reduction in all-cause mortality risk. Next, the Healthy Eating Index is used as a case study to provide a general method for reevaluating composite scores. The Healthy Eating Index breaks nutritional intake into 12 components. A method is presented that can be used to reassess the relative importance of these components using a weighted logistic regression model applied across many populations and diseases. Variable selection is performed by taking an asymptotic approximation and adding an adaptive Lasso penalty. This approximation simplifies variable selection into a simple least squares minimization. Oracle properties of this variable selection technique are established, which is different from the usual one population and one disease context. Finally, the problem of the first chapter in which a physical activity score is created then applied to analysis of disease or mortality is revisited. Sample splitting is used to partition the sample into two disjoint subsets, using the first subset to build the score and using the remaining data to estimate relative risk of this score. For parametric models, the limiting distribution of risk estimates is derived. An obvious question is what happens if multiple sample splits are performed. It is shown that as the number of sample splits increases, the combination of multiple sample splits is effectively equivalent to performing no sample splits. This suggests there is no clear benefit to performing multiple splits.

publication date

  • August 2019
  • May 2019