Published 15 June 2017 by Melissae Fellet

Big Data Analytics Deliver Materials Science Insights

Finding patterns and structure in big data of materials science remains challenging, so researchers are working on new ways to mine the data to uncover hidden relationships. Credit: Hamster3d/

Finding patterns and structure in big data of materials science remains challenging, so researchers are working on new ways to mine the data to uncover hidden relationships. Credit: Hamster3d/


Developing new materials can be a lengthy, difficult process and innovations in the field come through a combination of serendipity and methodical hard work. Researchers perform many rounds of synthesising new materials and testing their properties, using their chemical knowledge and intuition to relate a material’s structure to its function. The result is materials for tough body armour, thin, powerful batteries or lightweight aircraft components, among many other applications.

To speed materials discovery, researchers are now asking computers to help. Algorithms similar to those that organise our email, photos and online banking can also be used to find patterns in chemical data that relate to a material’s structure and composition.

Photo: R. Schultes/Lindau Nobel Laureate Meetings
Walter Kohn, Nobel Laureate in Chemistry 1998. Photo: R. Schultes/Lindau Nobel Laureate Meetings

Traditional computer modelling of materials uses methods recognised with the Nobel Prize in Chemistry 1998. Walter Kohn and John Pople shared the prize that year for developing algorithms that modelled molecules using quantum mechanics, improving the accuracy of molecular structure and chemical reactivity calculations. The techniques that Kohn and Pople each developed revolutionised computational chemistry and have continued to be improved to give highly accurate results.

These methods typically work well to predict structural and electronic properties of crystalline metals and metal oxides. But these predictions do not always match measured properties of complex bulk materials and their surfaces under experimental conditions. Predicting properties of bulk materials and their surfaces using current quantum mechanical methods requires lengthy calculations using supercomputers.

To speed up these calculations, chemists are analysing public databases of atomic, chemical and physical properties to find combinations that predict materials properties. They use big-data analytics tools to search for meaningful patterns in the large amounts of data. Algorithms like this already influence our daily lives by filtering spam email, suggesting other items for online shoppers, detecting faces in digital photos, and identifying fraudulent credit card transactions. Although materials scientists have much less data than email providers or online stores, there is still enough publicly available data about atomic properties such as electronegativity, atomic radius and bonding geometry as well as the geometric and electronic structures of various materials that the same analysis tools are still useful. Materials databases include Materials Project in the United States and the Novel Materials Discovery Laboratory in Europe, among others.

Computational materials discovery often involves making predictions for an entire class of materials, such as metals, metal oxides or semiconductors. However, a global prediction may not apply to certain subgroups of materials within that class.

Bryan Goldsmith, a Humboldt postdoctoral fellow at the Fritz Haber Institute of the Max Plank Society in Berlin and a young scientist attending the 67th Lindau Nobel Laureate Meeting and his colleagues recently applied a data analytics tool called subgroup discovery to see how physical and chemical properties relate to the structure of gold nanoclusters containing varying numbers of atoms. Gold clusters are a model example of how materials properties change from the bulk to nanoscale. Bulk gold is shiny, inert and yellow in color. Gold nanoparticles, however, are red, catalytic and have dynamic structures.


The Novel Materials Discovery Laboratory, a European Center of Excellence established in the fall of 2015, has the world’s largest collection of computational materials science data.


Using molecular dynamics simulations, the researchers calculated 24,400 independent configurations of neutral, gas-phase gold clusters containing 5 to 14 atoms at temperatures from -173 to 541 °C (100 to 814K). Next, they predicted the ionisation potential, electron affinity and van der Waals forces between atoms in a cluster, among other properties.

Then the researchers generated various mathematical combinations of the predicted chemical data to produce a large number of possible relationships between different subgroups of gold clusters. Finally, they used subgroup discovery to find the relationships that best predicted cluster structure and their electronic properties.

The algorithm rediscovered the known property that gold nanoclusters with even number of atoms are semiconducting, whereas those with an odd number of atoms are metallic. It also revealed something new about forces that stabilise nonplanar gold clusters: van der Waals forces typically thought to stabilise interactions between molecules contributed more to the stability of nonplanar clusters than planar clusters.


A computational prediction for a group of gold nanoclusters (global model) could miss patterns unique to nonplaner clusters (subgroup 1) or planar clusters (subgroup 2). Credit: New J. Phys.
A computational prediction for a group of gold nanoclusters (global model) could miss patterns unique to nonplaner clusters (subgroup 1) or planar clusters (subgroup 2). Credit: Goldsmith et al. Uncovering structure-property relationships of materials by subgroup discovery. New J. Phys. 19 (2017) 013031 (CC BY 3.0)

By starting their data analytics with known properties, the researchers hope to develop predictive models that retain physical and chemical information that is easy for other scientists to interpret, Goldsmith says. “We believe that if you can find these simple equations, they can help guide you to deeper understanding, and hopefully lead to new chemistry and materials insights.”

With more powerful computers, larger databases and novel ways to use the data being developed, data analytics could become increasingly important to researchers synthesising new materials. A database of failed reactions could guide the direction of future experiments, and data analytics tools could speed the interpretation of spectra used to characterise molecules and materials. And in time, researchers hope to predict the outcome of a catalytic reaction or materials synthesis. “Data analytics should be an indispensable part of every chemist and material’s scientist toolkit,” Goldsmith says.

Melissae Fellet

Melissae Fellet, PhD and Lindau Alumna 2009, is a freelance science writer based in Missoula, MT. She completed her doctoral work at Washington University in St. Louis, and writes regularly about chemistry, materials science, and engineering. Her work has been published in New Scientist, Chemical & Engineering News, and Chemistry World.