Mapping genomic markers to closest feature using the R package Map2NCBI Academic Article uri icon

abstract

  • The purpose of this technical note is to illustrate the use of functions available in the Map2NCBI R package using applied examples. The purpose of Map2NCBI is to allow users to find genomic feature information for a particular species and identify the feature that is in closest proximity to markers of interest by the user. Although the package allows flexibility in how the functions are utilized, the package was developed as a two part process, where (1) "GetGeneList" function is utilized, which downloads the species genome build and filter the data as specified by the user, and (2) "MapMarkers" is utilized, which maps markers based on the information provided to the closest feature using the list produced in Step 1. Two examples were utilized to (1) demonstrate the system time use of both functions in Map2NCBI and (2) to illustrate how the functions can be applied to a previous study to identify genes of interest using markers (e.g., associated markers). The system.time function available in R was utilized to time both functions available. Markers were from the BovineSNP50 version 1 assay (n=34,566) and were randomly assigned to different quantities of markers per file in order to identify the linear relationship between the number of markers and time elapsed when running the "MapMarkers" function. In addition, genes and markers previously identified in a separate study were utilized in this note to identify markers and genes of interest using the Map2NCBI package. The "GetGeneList" function was relatively efficient and was primarily dependent on the internet connection for speed. The "MapMarkers" function had a quadratic relationship with the number of markers in the file compared to the time elapsed, where including all markers available took over an hour. This function does have the advantage that it only needs to be conducted once for a given set of markers and the output can be saved for future use. In the second example, the functions provided 32 markers located in close proximity to 36 genes of interest and identified 60 unique genes based on 67 markers shown to be associated in a previous study of cannon bone length. These functions and the process involved are relatively quick and provide the user flexibility in using the information for association results or systematic approaches in understanding complex traits. 2014 Elsevier B.V.

published proceedings

  • LIVESTOCK SCIENCE

author list (cited authors)

  • Hanna, L., & Riley, D. G.

citation count

  • 17

complete list of authors

  • Hanna, Lauren L Hulsman||Riley, David G

publication date

  • January 2014