Intrinsic Dimensionality of Molecular Properties
Abstract
Chemical space which encompasses all stable compounds is unfathomably large and its dimension scales linearly with the number of atoms considered. The success of machine learning methods suggests that many physical quantities exhibit substantial redundancy in that space, lowering their effective dimensionality. A low dimensionality is favorable for machine learning applications, as it reduces the required number of data points. It is unknown however, how far the dimensionality of physical properties can be reduced, how this depends on the exact physical property considered, and how accepting a model error can help further reducing the dimensionality. We show that accepting a modest, nearly negligible error leads to a drastic reduction in independent degrees of freedom. This applies to several properties such as the total energy and frontier orbital energies for a wide range of neutral molecules with up to 20 atoms. We provide a method to quantify an upper bound for the intrinsic dimensionality given a desired accuracy threshold by inclusion of all continuous variables in the molecular Hamiltonian including the nuclear charges. We find the intrinsic dimensionality to be remarkably stable across molecules, i.e. it is a property of the underlying physical quantity and the number of atoms rather than a property of an individual molecular configuration and therefore highly transferable between molecules. The results suggest that the feature space of state-of-the-art molecular representations can be compressed further, leaving room for more data efficient and transferable models.