Improving data capture of race and ethnicity for the Food and Drug Administration Sentinel database: a narrative review. Academic Article uri icon


  • PURPOSE: The U.S. Food and Drug Administration's Sentinel System is a national medical product safety surveillance system consisting of a large multisite distributed database of administrative claims supplemented by electronic health-care record data. The program seeks to improve data capture of race and ethnicity for pharmacoepidemiology studies. METHODS: We conducted a narrative literature review of published research on data augmentation and imputation methods to improve race and ethnicity capture in U.S. health-care systems databases. We focused on methods with limited (five-digit ZIP codes only) or full patient identifiers available to link to external sources of self-reported data. We organized the literature by themes: (1) variation in data capture of self-reported data, (2) data augmentation from external sources of self-reported data, and (3) imputation methods, including Bayesian analysis and multiple regression. RESULTS: Researchers reduced data missingness with high validity for Asian, Black, White, and Pacific Islander racial groups and Hispanic ethnicity. Native American and multiracial groups were difficult to validate due to relatively small sample sizes. CONCLUSIONS: Limitations on accessible self-reported data for validation will dictate methods to improve race and ethnicity data capture. We recommend methods leveraging multiple sources that account for variations in geography, age, and sex.

published proceedings

  • Ann Epidemiol

author list (cited authors)

  • Ter-Minassian, M., DiNucci, A. J., Barrie, I. S., Schoeplein, R., Chakravarty, A., & Hernndez-Muoz, J. J.

complete list of authors

  • Ter-Minassian, Monica||DiNucci, Anna J||Barrie, Issmatu S||Schoeplein, Ryan||Chakravarty, Aloka||Hernández-Muñoz, José J

publication date

  • October 2023