Robust Data-Driven Machine-Learning Models for Subsurface Applications: Are We There Yet? Academic Article uri icon


  • Algorithms are taking over the world, or so we are led to believe, given their growing pervasiveness in multiple fields of human endeavor such as consumer marketing, finance, design and manufacturing, health care, politics, sports, etc. The focus of this article is to examine where things stand in regard to the application of these techniques for managing subsurface energy resources in domains such as conventional and unconventional oil and gas, geologic carbon sequestration, and geothermal energy. It is useful to start with some definitions to establish a common vocabulary. Data analytics (DA)Sophisticated data collection and analysis to understand and model hidden patterns and relationships in complex, multivariate data sets Machine learning (ML)Building a model between predictors and response, where an algorithm (often a black box) is used to infer the underlying input/output relationship from the data Artificial intelligence (AI)Applying a predictive model with new data to make decisions without human intervention (and with the possibility of feedback for model updating) Thus, DA can be thought of as a broad framework that helps determine what happened (descriptive analytics), why it happened (diagnostic analytics), what will happen (predictive analytics), or how can we make something happen (prescriptive analytics) (Sankaran et al. 2019). Although DA is built upon a foundation of classical statistics and optimization, it has increasingly come to rely upon ML, especially for predictive and prescriptive analytics (Donoho 2017). While the terms DA, ML, and AI are often used interchangeably, it is important to recognize that ML is basically a subset of DA and a core enabling element of the broader application for the decision-making construct that is AI. In recent years, there has been a proliferation in studies using ML for predictive analytics in the context of subsurface energy resources. Consider how the number of papers on ML in the OnePetro database has been increasing exponentially since 1990 (Fig. 1). These trends are also reflected in the number of technical sessions devoted to ML/AI topics in conferences organized by SPE, AAPG, and SEG among others; as wells as books targeted to practitioners in these professions (Holdaway 2014; Mishra and Datta-Gupta 2017; Mohaghegh 2017; Misra et al. 2019). Given these high levels of activity, our goal is to provide some observations and recommendations on the practice of data-driven model building using ML techniques. The observations are motivated by our belief that some geoscientists and petroleum engineers may be jumping the gun by applying these techniques in an ad hoc manner without any foundational understanding, whereas others may be holding off on using these methods because they do not have any formal ML training and could benefit from some concrete advice on the subject. The recommendations are conditioned by our experience in applying both conventional statistical modeling and data analytics approaches to practical problems.

published proceedings

  • Journal of Petroleum Technology

author list (cited authors)

  • Mishra, S., Schuetter, J., Datta-Gupta, A., & Bromhal, G.

citation count

  • 4

complete list of authors

  • Mishra, Srikanta||Schuetter, Jared||Datta-Gupta, Akhil||Bromhal, Grant

publication date

  • March 2021