Research Data Curation Practices in Institutional Repositories and Data Identifiers Thesis uri icon


  • The access and sharing of research data have been emphasized by the government, funding agencies, and scholarly communities. The increased access to research data increases the impact, as well as the efficiency and effectiveness, of scientific activities and funding. The access, however, is facilitated not just by appropriate policies but also by the employment of effective infrastructure mechanisms, including enhancing data with effective metadata (Simmhan, Plale, & Gannon, 2005). Identifiers are important metadata that traditionally have been used for entity identification, linking, and referencing in various domains (Altman & King, 2007). To enable effective metadata creation support for research data, it is essential to gain a better understanding of the current uses of identifier systems with research data. As many research institutions plan to provide some types of research data services (Tenopir, Birch, & Allard, 2012), it is important to study the current practices of data curation in IRs. In particular to develop effective data management infrastructure configuration templates, it is essential to understand user needs and related activities for data curation in IRs, including different roles played by IR staff and role-specific differences in needs for skills and infrastructure support (Foster, Jennings, & Kesselman, 2004). Furthermore, it is important to investigate both the current practices of identifier use and the requirements for quality and functionalities for identifier schemas in order to design effective metadata support for research data curation in IRs. Studying the practices of research data curation requires multifaceted contextual analysis (Borgman, Wallis, & Enyedy, 2007). Hence this study, too, required a research design that could help examine and capture various sociotechnical and cultural factors that may affect data curation, including the selection and uses of identifier schemas for data. The study used Activity Theory (Engestrm, 1987; Leontiev, 1978) and Information Quality Assessment Framework (Stvilia, Gasser, Twidale, & Smith, 2007) to guide the design of a protocol for semi-structured interviews. This study reports on data collected from fifteen participants from thirteen different universities in the US. The selection of participants was guided by two criteria. To be eligible for participation in the study, participants had to work for an IR that stored and curated research data objects and housed by one of the 108 institutions classified as RU/VH (very high research activity) in the Carnegie Classification of Institutions of Higher Education. The study identified data curation activities and contexts (i.e., tools, norms, rules, and division of labor), perceived roles played by IR staff (e.g., data curator, IR manager, and metadata specialist), role-specific sets of activities and skills, and perception of quality identifiers in IRs. The findings of this study can inform the development of best practices and effective infrastructure support for data curation in the context of IRs, as well as teaching data curation in LIS schools.

author list (cited authors)

  • Lee, D. J.

complete list of authors

  • Lee, DJ

editor list (cited editors)

  • Stvilia, B.