Valdez, Jr., Daniel (2018-08). Bias in Public Health Research: Ethical Implications and Objective Assessment Tools. Doctoral Dissertation. Thesis uri icon

abstract

  • In an environment where one article is published every 20 seconds, we cannot be certain all studies are upheld to the same high quality standard. Thus, there is growing speculation that much of what is published today may contain embedded biases that detract from the quality of science. Though aware of bias in research, we are ill-equipped to address, identify and mitigate bias from published literature. Therefore, the purpose of this dissertation is to (1) explore the complexity and saliency of bias in published work via two domains: bias in numeric data (numeric bias), and bias embedded in language patterns (language bias) and (2) test technological tools intended to detect bias more objectively-- namely the Cochrane Institute's GRADEPro, and topic modeling. Numeric bias was defined as bias within number data and detected via the Cochrane Institute's GRADEPro software. To tout the effectiveness of using GRADEPro as a valid tool with which to detect number bias, this study used a heuristic example with currently published manuscripts on Pre-Exposure Prophylaxis (PrEP). Findings indicated, primarily, there were varying levels of evidence quality, ranging from Very High quality of evidence, to Very Low quality of evidence. Further, the efficacy of the medication in each study also varied by different extents. Language bias was defined as bias within written language and identified more objectively via topic modeling. To demonstrate the effectiveness of topic modeling, I compared corpora of text data among three bias-inducing variables--time, funding source and nation of origin. For each corpus, language patterns varied among the bias inducing variables, suggesting, among other considerations, bias inducing variables influence the direction of language despite testing the same hypothesis. Overall, this dissertation sought to present tools outside of Public Health that could more objectively identify problematic issues within numeric and language data. For both types of bias, language and numeric, bias was identified and distilled in a more efficient and effective manner. Therefore, issues such as recurrent bias in Public Health should be addressed via these presented tools, as well as potential others, in the continued effort to uphold the integrity of science.
  • In an environment where one article is published every 20 seconds, we cannot be certain all studies are upheld to the same high quality standard. Thus, there is growing speculation that much of what is published today may contain embedded biases that detract from the quality of science. Though aware of bias in research, we are ill-equipped to address, identify and mitigate bias from published literature. Therefore, the purpose of this dissertation is to (1) explore the complexity and saliency of bias in published work via two domains: bias in numeric data (numeric bias), and bias embedded in language patterns (language bias) and (2) test technological tools intended to detect bias more objectively-- namely the Cochrane Institute's GRADEPro, and topic modeling.
    Numeric bias was defined as bias within number data and detected via the Cochrane Institute's GRADEPro software. To tout the effectiveness of using GRADEPro as a valid tool with which to detect number bias, this study used a heuristic example with currently published manuscripts on Pre-Exposure Prophylaxis (PrEP). Findings indicated, primarily, there were varying levels of evidence quality, ranging from Very High quality of evidence, to Very Low quality of evidence. Further, the efficacy of the medication in each study also varied by different extents.
    Language bias was defined as bias within written language and identified more objectively via topic modeling. To demonstrate the effectiveness of topic modeling, I compared corpora of text data among three bias-inducing variables--time, funding
    source and nation of origin. For each corpus, language patterns varied among the bias inducing variables, suggesting, among other considerations, bias inducing variables influence the direction of language despite testing the same hypothesis.
    Overall, this dissertation sought to present tools outside of Public Health that could more objectively identify problematic issues within numeric and language data. For both types of bias, language and numeric, bias was identified and distilled in a more efficient and effective manner. Therefore, issues such as recurrent bias in Public Health should be addressed via these presented tools, as well as potential others, in the continued effort to uphold the integrity of science.

publication date

  • August 2018