EAGER: Collaborative Research: A Benchmark Data Linkage Repository (DLRep)
Grant
Overview
Affiliation
Other
View All
Overview
abstract
This EAGER award provides pilot funding for a new kind of software repository that could help data scientists create better ways to combine existing data sets. The goal is to encourage collaboration between scientists who consider data methodology, users and builders of integrated data sets, and data custodians. The result may be new high-quality methods for record linkages and better use of existing data. Common, transparent, and reproducible approaches to linkage methodology and privacy preservation will help data analysts meet the highest scientific standards. The project may decrease the gap between research and practice in data science, resulting in better evidence for U.S. policymakers, businesses, and citizens. The PIs will establish a common benchmarking repository of linkage methodologies, Data Link Repository (DLRep) at the Inter-University Consortium for Political and Social Research (ICPSR) at the University of Michigan. Participants in the repository will be able to add comments and ask questions, allowing for engagement between data custodians, providers, producers, holders and data users, as well as between various groups of data users. In addition data users will be able to upload and share code snippets related to the data, allowing for knowledge sharing and better reproducibility. Data users will be able to link related publications and citations via DOI import or by manually entering citation information. This will provide feedback to data providers and other data users about how the data have been used. The repository will accelerate the development of new record linkage algorithms and evaluation methods by providing access to both methods and relevant data. This community effort will improved the reproducibility of analysis conducted on integrated data. The repository will facilitate comparisons of alternative linkage methodologies when using the same and different data, and move forward the provision of privacy-aware integrated data.