Inferring missing genotypes in large SNP panels using fast nearest-neighbor searches over sliding windows.

abstract

MOTIVATION: Typical high-throughput genotyping techniques produce numerous missing calls that confound subsequent analyses, such as disease association studies. Common remedies for this problem include removing affected markers and/or samples or, otherwise, imputing the missing data. On small marker sets imputation is frequently based on a vote of the K-nearest-neighbor (KNN) haplotypes, but this technique is neither practical nor justifiable for large datasets. RESULTS: We describe a data structure that supports efficient KNN queries over arbitrarily sized, sliding haplotype windows, and evaluate its use for genotype imputation. The performance of our method enables exhaustive exploration over all window sizes and known sites in large (150K, 8.3M) SNP panels. We also compare the accuracy and performance of our methods with competing imputation approaches. AVAILABILITY: A free open source software package, NPUTE, is available at http://compgen.unc.edu/software, for non-commercial uses.

authors

published proceedings

Bioinformatics

altmetric score

3.5

author list (cited authors)

Roberts, A., McMillan, L., Wang, W., Parker, J., Rusyn, I., & Threadgill, D.

citation count

71

complete list of authors

Roberts, Adam||McMillan, Leonard||Wang, Wei||Parker, Joel||Rusyn, Ivan||Threadgill, David

publication date

July 2007

publisher

Oxford University Press (OUP) Publisher

published in

Bioinformatics Journal

keywords

Algorithms
Artifacts
Chromosome Mapping
DNA Mutational Analysis
Genetic Variation
Pattern Recognition, Automated
Polymorphism, Single Nucleotide
Sensitivity And Specificity
Sequence Alignment
Sequence Analysis, DNA

PubMed Central ID

17646323

Digital Object Identifier (DOI)

10.1093/bioinformatics/btm220

start page

i401

end page

i407

volume

23

issue

13

URL

http://dx.doi.org/10.1093/bioinformatics/btm220

Inferring missing genotypes in large SNP panels using fast nearest-neighbor searches over sliding windows. Academic Article

Overview

abstract

authors

published proceedings

altmetric score

author list (cited authors)

citation count

complete list of authors

publication date

publisher

published in

Research

keywords

Identity

PubMed Central ID

Digital Object Identifier (DOI)

Additional Document Info

start page

end page

volume

issue

Other

URL