Collaborative Research: Statistical Analysis of Massive Spatio-Temporal Datasets Using Distributed Computing-
Grant
Overview
Affiliation
Other
View All
Overview
abstract
Automated sensing instruments on satellites and aircraft have enabled the collection of massive amounts of data indexed in space and time. If these kinds of datasets can be efficiently exploited, they can provide new insights on a wide variety of issues, such as greenhouse gas concentrations for climate, soil properties for precision agriculture, and atmospheric states for weather forecasting. However, traditional spatial-statistical techniques are not computationally feasible for big datasets. This project will develop fast and user-friendly software that can fill gaps, capture inhomogeneous spatial structure from very fine to very large scales, and properly quantify uncertainty. As an illustration, the software will be applied to millions of satellite measurements of hourly Total Precipitable Water fields, which are critical in severe weather forecasting. The goal of this project is to develop methodology for the statistical analysis of massive, high-resolution spatial datasets. The employed model is specified in terms of spatial basis functions at multiple resolutions. The basis functions are chosen to optimally approximate a given covariance function. No restrictions on the covariance function are necessary, and observations can be irregularly spaced. It is crucial that the structure of the basis-function representation results in scalable, parallel inference algorithms that can take full advantage of the many nodes available in modern computing environments. The methodology will be extended to the analysis of spatio-temporal data, allowing real-time analysis of massive, streaming spatio-temporal datasets. All methods will be implemented in software suitable for both multi-core desktop computers and supercomputing environments.