Scientists at IBM, Cambridge, Mass., the University of Liverpool, U.K., and the University of Southampton, U.K., have developed an algorithm that could vastly reduce the time and cost of running engineering, materials chemistry and new drug discovery simulations.
The algorithm works by intelligently selecting which simulations are worth running, and then focuses resources on them.
A report in Science Advances explains how the researchers used their method to quickly identify novel materials for gas storage. It saved 500,000 central processing unit hours (CPUh) over traditional simulation methods.
On the face of it, combining machine learning and virtual computational screening should produce a powerful means for discovering new functional organic materials.
However, while this is true for calculating thermodynamic stability and the associated functional properties of candidate materials, tackling a broader range of problems remains difficult, caution the authors. A big hurdle, they say, is the prohibitive computational expense of accurately calculating energies and properties for every candidate material to be screened.
Especially difficult is the a priori design of functional molecular organic crystals with desirable materials properties. Unlike framework-based crystals such as zeolites and metal organic frameworks, molecular crystals rarely obey the kind of simple geometric principles that can be exploited for rational design.
“Even very small changes to molecular structure can have marked effects on crystal packing and, hence, the resultant solid-state properties. Molecular crystal packing is often dictated by weak, competing intermolecular interactions. Hence, the a priori design of materials with predetermined, desirable properties requires a more subtle approach than for materials where structure (and hence function) can be ‘built-in’ through the use of intuitive bonding rules, such as adherence to known framework topologies or other geometric bonding principles,” they write.
One tool for virtual screening candidate organic molecules for desirable properties such as natural gas storage capacity is an energy-structure-function (ESF) map. Such maps pair lattice energy and function to show if a particular molecule has the desired properties. This information can help guide an experimental campaign.
However, the authors note that while this is an effective strategy, ESF map generation can be computationally intensive. The methane storage predictions carried out in this project, for example, took around 800,000 CPUh to compute an ESF map for just one molecule under investigation.
Of course, the cost of creating ESF maps grows as the molecules of interest become more computationally expensive and the number of candidate structures increases. Porous materials pose even more challenges because the energy range that includes all observable crystal structures is extended by solvent templating and sometimes by multiple components such as co-crystals.
“To overcome these, we have packed all of our algorithmic advances into a simple service called IBM Bayesian Optimization (IBO), which means that users can easily get the value from using these algorithms without themselves having to become Bayesian optimization experts,” the authors explain. In general, IBO answers the question: “With what I know now, what should I do next so that I get the best overall result in the future?”
Two main challenges remain, however.
“Being able to work in complex situations without passing that complexity to the user is a key facet of IBO’s application processing interface design. One such example is explainable AI [artificial intelligence] techniques to help the user understand why the algorithm is asking them to do certain actions or experiments,” says Kirk E. Jordan, IBM engineer and executive in the company’s research division’s data centric solutions center.
Another is handling data from multiple sources.
“Being able to take in many different pieces of information is still a challenge. … the authors address one part of this by allowing each property to have its own model. They are now working on a method which allows them to choose between different ways of testing the same hypothesis (which might have different costs or accuracies) in real time and [fusing it] into a single model, which means maximal usage of the available information. Initial results … are promising, so keep your eyes peeled for a publication soon,” adds Edward O. Pyzer-Knapp, research lead, AI enriched modelling and simulation and visiting professor of industrially applied AI at the University of Liverpool.