A hidden Markov model-based algorithm for identifying tumour subtype using array CGH data. Academic Article uri icon


  • BACKGROUND: The recent advancement in array CGH (aCGH) research has significantly improved tumor identification using DNA copy number data. A number of unsupervised learning methods have been proposed for clustering aCGH samples. Two of the major challenges for developing aCGH sample clustering are the high spatial correlation between aCGH markers and the low computing efficiency. A mixture hidden Markov model based algorithm was developed to address these two challenges. RESULTS: The hidden Markov model (HMM) was used to model the spatial correlation between aCGH markers. A fast clustering algorithm was implemented and real data analysis on glioma aCGH data has shown that it converges to the optimal cluster rapidly and the computation time is proportional to the sample size. Simulation results showed that this HMM based clustering (HMMC) method has a substantially lower error rate than NMF clustering. The HMMC results for glioma data were significantly associated with clinical outcomes. CONCLUSIONS: We have developed a fast clustering algorithm to identify tumor subtypes based on DNA copy number aberrations. The performance of the proposed HMMC method has been evaluated using both simulated and real aCGH data. The software for HMMC in both R and C++ is available in ND INBRE website http://ndinbre.org/programs/bioinformatics.php.

published proceedings

  • BMC Genomics

altmetric score

  • 3

author list (cited authors)

  • Zhang, K. e., Yang, Y. i., Devanarayan, V., Xie, L., Deng, Y., & Donald, S.

citation count

  • 4

complete list of authors

  • Zhang, Ke||Yang, Yi||Devanarayan, Viswanath||Xie, Linglin||Deng, Youping||Donald, Sens

publication date

  • December 2011