New opportunities for materials informatics: Resources and data mining techniques for uncovering hidden relationshipsAbstract
Data mining has revolutionized sectors as diverse as pharmaceutical drug discovery, finance, medicine, and marketing, and has the potential to similarly advance materials science. In this paper, we describe advances in simulation-based materials databases, open-source software tools, and machine learning algorithms that are converging to create new opportunities for materials informatics. We discuss the data mining techniques of exploratory data analysis, clustering, linear models, kernel ridge regression, tree-based regression, and recommendation engines. We present these techniques in the context of several materials application areas, including compound prediction, Li-ion battery design, piezoelectric materials, photocatalysts, and thermoelectric materials. Finally, we demonstrate how new data and tools are making it easier and more accessible than ever to perform data mining through a new analysis that learns trends in the valence and conduction band character of compounds in the Materials Project database using data on over 2500 compounds.