Accurate prediction of higher-level electronic structure energies for large databases using neural networks, Hartree–Fock energies, and small subsets of the database Academic Article uri icon


  • A novel method is presented that significantly reduces the computational bottleneck of executing high-level, electronic structure calculations of the energies and their gradients for a large database that adequately samples the configuration space of importance for systems containing more than four atoms that are undergoing multiple, simultaneous reactions in several energetically open channels. The basis of the method is the high-degree of correlation that generally exists between the Hartree-Fock (HF) and higher-level electronic structure energies. It is shown that if the input vector to a neural network (NN) includes both the configuration coordinates and the HF energies of a small subset of the database, MP4(SDQ) energies with the same basis set can be predicted for the entire database using only the HF and MP4(SDQ) energies for the small subset and the HF energies for the remainder of the database. The predictive error is shown to be less than or equal to the NN fitting error if a NN is fitted to the entire database of higher-level electronic structure energies. The general method is applied to the computation of MP4(SDQ) energies of 68,308 configurations that comprise the database for the simultaneous, unimolecular decomposition of vinyl bromide into six different reaction channels. The predictive accuracy of the method is investigated by employing successively smaller subsets of the database to train the NN to predict the MP4(SDQ) energies of the remaining configurations of the database. The results indicate that for this system, the subset can be as small as 8% of the total number of configurations in the database without loss of accuracy beyond that expected if a NN is employed to fit the higher-level energies for the entire database. The utilization of this procedure is shown to save about 78% of the total computational time required for the execution of the MP4(SDQ) calculations. The sampling error involved with selection of the subset is shown to be about 10% of the predictive error for the higher-level energies. A practical procedure for utilization of the method is outlined. It is suggested that the method will be equally applicable to the prediction of electronic structure energies computed using even higher-level methods than MP4(SDQ).

author list (cited authors)

  • Malshe, M., Pukrittayakamee, A., Raff, L. M., Hagan, M., Bukkapatnam, S., & Komanduri, R.

publication date

  • January 1, 2009 11:11 AM