Collaborative Research: Scalable Bayesian Methods for Complex Data with Optimality Guarantees
- View All
Spectacular advances in data acquisition, processing, and storage present the opportunity to analyze datasets of ever-increasing size and complexity in various applications, such as social and biological networks, epidemiology, genomics, and Internet recommender systems. Underlying the massive size and dimension of these data, there is often a parsimonious structure. The Bayesian approach to statistical inference is attractive in this context in terms of incorporating structural assumptions through prior distributions, enabling probabilistic modeling of complex phenomenon, and providing an automatic characterization of uncertainty. This research project aims to advance eliciting and translating prior knowledge regarding the low-dimensional skeleton of big data to provide realistic uncertainty characterizations while maintaining computational efficiency. Bayesian computation poses substantial challenge in high-dimensional and big data problems. The research aims to develop cutting-edge computational strategies and software packages for implementation to be made available publicly. The project involves graduate students in the research.The research project focuses on theoretical foundations and computational strategies for Bayesian methods in high-dimensional and big data problems motivated by applications in social networks and epidemiology. Techniques for systematically developing and evaluating prior distributions in high-dimensional problems will be investigated with a special emphasis on the trade-off between statistical efficiency and computational scalability. Specific directions include efficient algorithms for posterior sampling with shrinkage priors, a theoretical framework for divide and conquer strategies in big data problems, fast algorithms for clustering nodes in large networks with unknown number of communities, and methods for discovering structure in sparse contingency tables. The algorithms will be motivated by rigorous theoretical understanding of the behavior of the posterior distribution with a particular emphasis on proper quantification of uncertainty in a distributed computing framework. Software will be developed for each application.