Potential selection bias associated with using geocoded birth records for epidemiologic research.
Additional Document Info
PURPOSE: There is an increasing use of geocoded birth registry data in environmental epidemiology research. Ungeocoded records are routinely excluded. METHODS: We used classification and regression tree analysis and logistic regression to investigate potential selection bias associated with this exclusion among all singleton Florida births in 2009 (n = 210,285). RESULTS: The rate of unsuccessful geocoding was 11.5% (n = 24,171). This ranged between 0% and 100% across zip codes. Living in a rural zip code was the strongest predictor of being ungeocoded. Other predictors for geocoding status varied with urbanity status. In urban areas, maternal race (adjusted odds ratio [aOR] ranging between 1.08 for Hispanic and 1.18 for black compared to white), maternal age [aOR: 1.16 (1.10-1.23) for ages 20-34 compared to <20], maternal nativity [aOR: 1.20 (1.15-1.25) for non-US versus US born], delivery at a birth center [aOR: 1.72 (1.49-2.00) compared to hospital delivery], multiparity [aOR: 0.91 (0.88-0.94)], maternal smoking [aOR: 0.82 (0.76-0.88)], and having nonprivate insurance [aOR: 1.25 (1.20-1.30) for Medicaid versus private insurance] were significantly associated with being ungeocoded. In rural areas, births delivered at birth center [aOR: 2.91 (1.80-4.73)] or home [aOR: 1.94 (1.28-2.95)] had increased odds compared to hospital births. The characteristics predictive of being ungeocoded were also significantly associated with adverse birth outcomes such as low birth weight and preterm delivery, and the association for maternal age was different when ungeocoded births were included and excluded. CONCLUSIONS: Geocoding status is not random. Women with certain exposure-outcome characteristics may be more likely to be ungeocoded and excluded, indicating potential selection bias.