Predicting the Estrogen Receptor Activity of Environmental Chemicals by Single-Cell Image Analysis and Data-driven Modeling.
Additional Document Info
A comprehensive evaluation of toxic chemicals and understanding their potential harm to human physiology is vital in mitigating their adverse effects following exposure from environmental emergencies. In this work, we develop data-driven classification models to facilitate rapid decision making in such catastrophic events and predict the estrogenic activity of environmental toxicants as estrogen receptor- (ER) agonists or antagonists. By combining high-content analysis, big-data analytics, and machine learning algorithms, we demonstrate that highly accurate classifiers can be constructed for evaluating the estrogenic potential of many chemicals. We follow a rigorous, high throughput microscopy-based high-content analysis pipeline to measure the single cell-level response of benchmark compounds with known in vivo effects on the ER pathway. The resulting high-dimensional dataset is then pre-processed by fitting a non-central gamma probability distribution function to each feature, compound, and concentration. The characteristic parameters of the distribution, which represent the mean and the shape of the distribution, are used as features for the classification analysis via Random Forest (RF) and Support Vector Machine (SVM) algorithms. The results show that the SVM classifier can predict the estrogenic potential of benchmark chemicals with higher accuracy than the RF algorithm, which misclassifies two antagonist compounds.