Statistical Modeling and Computation of Extreme Values in Large Datasets
Numerous problems in environmental, earth, and biological sciences nowadays involve large amounts of spatial data, obtained from remote ground sensors, satellite images, geographic information systems, and public health sources, etc. Analysis of extreme values is of particular interests in many such applications. For instance, natural hazardous events such as severe tides, heat waves, heavy rainfalls, and extreme air pollution events can cause substantial damages in our society. The goal of this project is to better understand spatially dependent extreme events for efficient quantitative risk management. The project has a broad impact on multiple interdisciplinary fields including statistics, geoscience, environmental science, operations research, machine learning, and risk management. The modeling and computational approaches for extreme value analysis and prediction in big data can be applied to a wide range of practical and important problems including extreme climate change studies, environmental hazardous event analysis, insurance risk assessments, and agriculture planning.Extreme events are rare events by definition. Until recently, analysis of spatial extreme values starts to become feasible, thanks to the availability of big spatial data, which provides great opportunities to accurately quantify the risk of extreme events, better understand the links among extreme events, promptly monitor changes in the frequency and intensity of extreme events, and reliably predict extreme values at unobserved locations. However, such big data sizes also impose challenges for statistical modeling and computation. The objective of this project is to combine theoretical methods and computational approaches to develop novel models, along with inference and prediction algorithms, to meet the increasing demand of efficient analytical tools for extreme values in big data. In particular, the project will focus on the following research thrusts. First, a new class of nonstationary max-stable process models will be developed with flexible and desirable dependence structures for high-dimensional spatial extreme values. Then new scalable and parallelizable inference tools will be proposed for the estimation of the proposed nonstationary max-stable process models. Afterwards, divide-and-conquer conditional sampling algorithms will be studied for the prediction of extremes over large spatial data, which provides both point estimations and uncertainty measures for the predicted values at unobserved locations. Finally, the developed method will be applied to solve real problems.