Big Data refers to the complexity, high-dimensionality, and high volume of information which are common features in many contemporary engineering applications. In the context of Big Data, however, specific treatments are required to successfully apply and implement Gaussian processes. This dissertation discusses new methodologies for solving three critical problems: analysis of spatial-temporal systems for wind energy applications; multi-fidelity analysis for nano-manufacturing systems; and predictive modeling for large datasets. First, we develop a spatial-temporal model for local wind fields in a wind farm with more than 200 wind turbines. Our framework utilizes the correlation among the derivatives of wind speeds to find a neighborhood of predictors. We extend the model to incorporate the wind direction as a variable to define regimes and fit a separate model for each regime. We consider other meteorological measurements, such as air pressure and temperature, by calculating a theoretical wind called the geostrophic wind to enhance the model's predictive power. We present the model in an optimization framework and solve it through numerical techniques. We compare the model's performance with some alternatives in order to demonstrate its prediction accuracy. Second, we consider a multi-fidelity analysis for predicting the Young's modules of buckypaper, a nano-manufactured product. The data for this problem derive from expensive, but accurate, physical experiments and an inexpensive, but less accurate, simulation model. The practice of integrating such data with different levels of accuracy is called multi-fidelity analysis. The challenge is that some of the input variables in the physical experiments are difficult to measure. We formulate the problem by introducing latent variables and then imputing unobserved latent variables in a two-step process: defining the functional relationship between observed and latent variables, and finding the optimal relationship by minimizing the distance between them. We demonstrate that this problem can be understood as a case of non-isometric curve to surface matching. Third, we apply Gaussian process regression to large datasets. We propose a Bayesian Site Selection (BSS) approach which approximates the likelihood of the Gaussian process by using unobserved variables called pseudo-inputs. The BSS framework enables us to learn both the number and the location of the pseudo-inputs simultaneously through reversible jump Markov chain Monte Carlo methods. Testing the proposed method on both real and artificial datasets shows that the BSS approach provides a sensible trade-off between the prediction accuracy and computation time.
Big Data refers to the complexity, high-dimensionality, and high volume of information which are common features in many contemporary engineering applications. In the context of Big Data, however, specific treatments are required to successfully apply and implement Gaussian processes. This dissertation discusses new methodologies for solving three critical problems: analysis of spatial-temporal systems for wind energy applications; multi-fidelity analysis for nano-manufacturing systems; and predictive modeling for large datasets.
First, we develop a spatial-temporal model for local wind fields in a wind farm with more than 200 wind turbines. Our framework utilizes the correlation among the derivatives of wind speeds to find a neighborhood of predictors. We extend the model to incorporate the wind direction as a variable to define regimes and fit a separate model for each regime. We consider other meteorological measurements, such as air pressure and temperature, by calculating a theoretical wind called the geostrophic wind to enhance the model's predictive power. We present the model in an optimization framework and solve it through numerical techniques. We compare the model's performance with some alternatives in order to demonstrate its prediction accuracy.
Second, we consider a multi-fidelity analysis for predicting the Young's modules of buckypaper, a nano-manufactured product. The data for this problem derive from expensive, but accurate, physical experiments and an inexpensive, but less accurate, simulation model. The practice of integrating such data with different levels of accuracy is called multi-fidelity analysis. The challenge is that some of the input variables in the physical experiments are difficult to measure. We formulate the problem by introducing latent variables and then imputing unobserved latent variables in a two-step process: defining the functional relationship between observed and latent variables, and finding the optimal relationship by minimizing the distance between them. We demonstrate that this problem can be understood as a case of non-isometric curve to surface matching.
Third, we apply Gaussian process regression to large datasets. We propose a Bayesian Site Selection (BSS) approach which approximates the likelihood of the Gaussian process by using unobserved variables called pseudo-inputs. The BSS framework enables us to learn both the number and the location of the pseudo-inputs simultaneously through reversible jump Markov chain Monte Carlo methods. Testing the proposed method on both real and artificial datasets shows that the BSS approach provides a sensible trade-off between the prediction accuracy and computation time.