1932

Abstract

The rapidly growing interest in machine learning (ML) for materials discovery has resulted in a large body of published work. However, only a small fraction of these publications includes confirmation of ML predictions, either via experiment or via physics-based simulations. In this review, we first identify the core components common to materials informatics discovery pipelines, such as training data, choice of ML algorithm, and measurement of model performance. Then we discuss some prominent examples of validated ML-driven materials discovery across a wide variety of materials classes, with special attention to methodological considerations and advances. Across these case studies, we identify several common themes, such as the use of domain knowledge to inform ML models.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-matsci-090319-010954
2020-07-01
2024-09-16
Loading full text...

Full text loading...

/deliver/fulltext/matsci/50/1/annurev-matsci-090319-010954.html?itemId=/content/journals/10.1146/annurev-matsci-090319-010954&mimeType=html&fmt=ahah

Literature Cited

  1. 1. 
    Hill J, Mulholland G, Persson K, Seshadri R, Wolverton C, Meredig B 2016. Materials science with large-scale data and informatics: unlocking new opportunities. MRS Bull 41:399–409
    [Google Scholar]
  2. 2. 
    Jain A, Hautier G, Ong SP, Persson K 2016. New opportunities for materials informatics: resources and data mining techniques for uncovering hidden relationships. J. Mater. Res. 31:977–94
    [Google Scholar]
  3. 3. 
    Mueller T, Kusne AG, Ramprasad R 2016. Machine learning in materials science: recent progress and emerging applications. Rev. Comput. Chem. 29:186–273
    [Google Scholar]
  4. 4. 
    Ramprasad R, Batra R, Pilania G, Mannodi-Kanakkithodi A, Kim C 2017. Machine learning in materials informatics: recent applications and prospects. npj Comput. Mater. 3:54
    [Google Scholar]
  5. 5. 
    Butler KT, Davies DW, Cartwright H, Isayev O, Walsh A 2018. Machine learning for molecular and materials science. Nature 559:547–55
    [Google Scholar]
  6. 6. 
    Larsen P, Von Ins M 2010. The rate of growth in scientific publication and the decline in coverage provided by science citation index. Scientometrics 84:575–603
    [Google Scholar]
  7. 7. 
    Ren F, Ward L, Williams T, Laws KJ, Wolverton C et al. 2018. Accelerated discovery of metallic glasses through iteration of machine learning and high-throughput experiments. Sci. Adv. 4:eaaq1566
    [Google Scholar]
  8. 8. 
    Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z et al. 2016. TensorFlow: large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467 [cs]
  9. 9. 
    Allen FH. 2002. The Cambridge Structural Database: a quarter of a million crystal structures and rising. Acta Crystallogr. B 58:380–88
    [Google Scholar]
  10. 10. 
    Seiler KP, George GA, Happ MP, Bodycombe NE, Carrinski HA et al. 2007. ChemBank: a small-molecule screening and cheminformatics resource database. Nucleic Acids Res 36:D351–59
    [Google Scholar]
  11. 11. 
    Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW 2008. GenBank. Nucleic Acids Res 37:D26–31
    [Google Scholar]
  12. 12. 
    Kim S, Thiessen PA, Bolton EE, Chen J, Fu G et al. 2015. PubChem substance and compound databases. Nucleic Acids Res 44:D1202–13
    [Google Scholar]
  13. 13. 
    Blaiszik B, Chard K, Pruyne J, Ananthakrishnan R, Tuecke S, Foster I 2016. The materials data facility: data services to advance materials science research. JOM 68:2045–52
    [Google Scholar]
  14. 14. 
    Draxl C, Scheffler M. 2018. NOMAD: the FAIR concept for big data–driven materials science. MRS Bull 43:676–82
    [Google Scholar]
  15. 15. 
    O'Mara J, Meredig B, Michel K 2016. Materials data infrastructure: a case study of the Citrination platform to examine data import, storage, and access. JOM 68:2031–34
    [Google Scholar]
  16. 16. 
    Seshadri R, Sparks TD. 2016. Perspective: interactive material property databases through aggregation of literature data. APL Mater 4:053206
    [Google Scholar]
  17. 17. 
    Rickman J, Chan H, Harmer M, Smeltzer J, Marvel C et al. 2019. Materials informatics for the screening of multi-principal elements and high-entropy alloys. Nat. Commun. 10:2618
    [Google Scholar]
  18. 18. 
    Iwasaki Y, Takeuchi I, Stanev V, Kusne AG, Ishida M et al. 2019. Machine-learning guided discovery of a new thermoelectric material. Sci. Rep. 9:2751
    [Google Scholar]
  19. 19. 
    Balachandran PV, Kowalski B, Sehirlioglu A, Lookman T 2018. Experimental search for high-temperature ferroelectric perovskites guided by two-step machine learning. Nat. Commun. 9:1668
    [Google Scholar]
  20. 20. 
    Min K, Choi B, Park K, Cho E 2018. Machine learning assisted optimization of electrochemical properties for Ni-rich cathode materials. Sci. Rep. 8:15778
    [Google Scholar]
  21. 21. 
    Hatakeyama-Sato K, Tezuka T, Nishikitani Y, Nishide H, Oyaizu K 2018. Synthesis of lithium-ion conducting polymers designed by machine learning–based prediction and screening. Chem. Lett. 48:130–32
    [Google Scholar]
  22. 22. 
    Wen C, Zhang Y, Wang C, Xue D, Bai Y et al. 2019. Machine learning assisted design of high entropy alloys with desired property. Acta Mater 170:109–17
    [Google Scholar]
  23. 23. 
    Hohenberg P, Kohn W. 1964. Inhomogeneous electron gas. Phys. Rev. B 136:864–71
    [Google Scholar]
  24. 24. 
    Kohn W, Sham LJ. 1965. Self-consistent equations including exchange and correlation effects. Phys. Rev. A 140:1133–38
    [Google Scholar]
  25. 25. 
    Kresse G, Hafner J. 1993. Ab initio molecular dynamics for liquid metals. Phys. Rev. B 47:558–61
    [Google Scholar]
  26. 26. 
    Hafner J. 2008. Ab-initio simulations of materials using VASP: density-functional theory and beyond. J. Comput. Chem. 29:2044–78
    [Google Scholar]
  27. 27. 
    Curtarolo S, Setyawan W, Wang S, Xue J, Yang K et al. 2012. Aflowlib.org: a distributed materials properties repository from high-throughput ab initio calculations. Comput. Mater. Sci. 58:227–35
    [Google Scholar]
  28. 28. 
    Jain A, Ong SP, Hautier G, Chen W, Richards WD et al. 2013. Commentary: The materials project: a materials genome approach to accelerating materials innovation. APL Mater 1:011002
    [Google Scholar]
  29. 29. 
    Saal JE, Kirklin S, Aykol M, Meredig B, Wolverton C 2013. Materials design and discovery with high-throughput density functional theory: the Open Quantum Materials Database (OQMD). JOM 65:1501–9
    [Google Scholar]
  30. 30. 
    Ward L, Aykol M, Blaiszik B, Foster I, Meredig B et al. 2018. Strategies for accelerating the adoption of materials informatics. MRS Bull 43:683–89
    [Google Scholar]
  31. 31. 
    Mansouri Tehrani A, Oliynyk AO, Parry M, Rizvi Z, Couper S et al. 2018. Machine learning directed search for ultraincompressible, superhard materials. J. Am. Chem. Soc. 140:9844–53
    [Google Scholar]
  32. 32. 
    Zhuo Y, Mansouri Tehrani A, Oliynyk AO, Duke AC, Brgoch J 2018. Identifying an efficient, thermally robust inorganic phosphor host via machine learning. Nat. Commun. 9:4377
    [Google Scholar]
  33. 33. 
    Bassman L, Rajak P, Kalia RK, Nakano A, Sha F et al. 2018. Active learning for accelerated design of layered materials. npj Comput. Mater. 4:74
    [Google Scholar]
  34. 34. 
    Huo H, Rupp M. 2017. Unified representation for machine learning of molecules and crystals. arXiv:1704.06439 [physics.chem-ph]
  35. 35. 
    Askerka M, Li Z, Lempen M, Liu Y, Johnston A et al. 2019. Learning-in-templates enables accelerated discovery and synthesis of new stable double perovskites. J. Am. Chem. Soc. 141:3682–90
    [Google Scholar]
  36. 36. 
    Schütt K, Glawe H, Brockherde F, Sanna A, Müller K, Gross E 2014. How to represent crystal structures for machine learning: towards fast prediction of electronic properties. Phys. Rev. B 89:205118
    [Google Scholar]
  37. 37. 
    Ward L, Dunn A, Faghaninia A, Zimmermann NE, Bajaj S et al. 2018. Matminer: an open source toolkit for materials data mining. Comput. Mater. Sci. 152:60–69
    [Google Scholar]
  38. 38. 
    Faber FA, Christensen AS, Huang B, von Lilienfeld OA 2018. Alchemical and structural distribution based representation for universal quantum machine learning. J. Chem. Phys. 148:241717
    [Google Scholar]
  39. 39. 
    Ward L, Liu R, Krishna A, Hegde VI, Agrawal A et al. 2017. Including crystal structure attributes in machine learning models of formation energies via Voronoi tessellations. Phys. Rev. B 96:024104
    [Google Scholar]
  40. 40. 
    Ward L, Agrawal A, Choudhary A, Wolverton C 2016. A general-purpose machine learning framework for predicting properties of inorganic materials. npj Comput. Mater. 2:16028
    [Google Scholar]
  41. 41. 
    Rosenbrock CW, Homer ER, Csányi G, Hart GL 2017. Discovering the building blocks of atomic systems using machine learning: application to grain boundaries. npj Comput. Mater. 3:29
    [Google Scholar]
  42. 42. 
    Gomberg JA, Medford AJ, Kalidindi SR 2017. Extracting knowledge from molecular mechanics simulations of grain boundaries using machine learning. Acta Mater 133:100–8
    [Google Scholar]
  43. 43. 
    Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE 2017. Neural message passing for quantum chemistry. Proceedings of the 34th International Conference on Machine Learning D Precup, YW Teh 1263–72 New York: ACM https://dl.acm.org/doi/10.5555/3305381.3305512
    [Google Scholar]
  44. 44. 
    Schütt KT, Sauceda HE, Kindermans PJ, Tkatchenko A, Müller KR 2018. SchNet—a deep learning architecture for molecules and materials. J. Chem. Phys. 148:241722
    [Google Scholar]
  45. 45. 
    Xie T, Grossman JC. 2018. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys. Rev. Lett. 120:145301
    [Google Scholar]
  46. 46. 
    Bartók AP, De S, Poelking C, Bernstein N, Kermode JR et al. 2017. Machine learning unifies the modeling of materials and molecules. Sci. Adv. 3:e1701816
    [Google Scholar]
  47. 47. 
    Hall MA. 1999. Correlation-based feature selection for machine learning PhD Thesis, Univ. Waikato Hamilton, N. Z:.
    [Google Scholar]
  48. 48. 
    Jolliffe IT, Cadima J. 2016. Principal component analysis: a review and recent developments. Philos. Trans. R. Soc. A 374:20150202
    [Google Scholar]
  49. 49. 
    Sinkov NA, Harynuk JJ. 2011. Cluster resolution: a metric for automated, objective and optimized feature selection in chemometric modeling. Talanta 83:1079–87
    [Google Scholar]
  50. 50. 
    Meredig B, Antono E, Church C, Hutchinson M, Ling J et al. 2018. Can machine learning identify the next high-temperature superconductor? Examining extrapolation performance for materials discovery. Mol. Syst. Des. Eng. 3:819–25
    [Google Scholar]
  51. 51. 
    Lu HJ, Zou N, Jacobs R, Afflerbach B, Lu XG, Morgan D 2019. Error assessment and optimal cross-validation approaches in machine learning applied to impurity diffusion. Comput. Mater. Sci. 169:109075
    [Google Scholar]
  52. 52. 
    Sheridan RP. 2013. Time-split cross-validation as a method for estimating the goodness of prospective prediction. J. Chem. Inf. Model. 53:783–90
    [Google Scholar]
  53. 53. 
    Wallach I, Heifets A. 2018. Most ligand-based classification benchmarks reward memorization rather than generalization. J. Chem. Inf. Model. 58:916–32
    [Google Scholar]
  54. 54. 
    Riley P. 2019. Three pitfalls to avoid in machine learning. Nature 572:27–29
    [Google Scholar]
  55. 55. 
    Vapnik VN. 1999. An overview of statistical learning theory. IEEE Trans. Neural Netw. 10:988–99
    [Google Scholar]
  56. 56. 
    Stanev V, Oses C, Kusne AG, Rodriguez E, Paglione J et al. 2018. Machine learning modeling of superconducting critical temperature. npj Comput. Mater. 4:29
    [Google Scholar]
  57. 57. 
    Ling J, Hutchinson M, Antono E, Paradiso S, Meredig B 2017. High-dimensional materials and process optimization using data-driven experimental design with well-calibrated uncertainty estimates. Integr. Mater. Manuf. Innov. 6:207–17
    [Google Scholar]
  58. 58. 
    De Jong M, Chen W, Angsten T, Jain A, Notestine R et al. 2015. Charting the complete elastic properties of inorganic crystalline compounds. Sci. Data 2:150009
    [Google Scholar]
  59. 59. 
    Breiman L. 2001. Random forests. Mach. Learn. 45:5–32
    [Google Scholar]
  60. 60. 
    Wager S, Hastie T, Efron B 2014. Confidence intervals for random forests: the jackknife and the infinitesimal jackknife. J. Mach. Learn. Res. 15:1625–51
    [Google Scholar]
  61. 61. 
    Krogh A, Vedelsby J. 1996. Neural network ensembles, cross validation, and active learning. Advances in Neural Information Processing Systems 8 (NIPS 1995) DS Touretzky, MC Mozer, ME Hasselmo 231–38 Cambridge, MA: MIT Press
    [Google Scholar]
  62. 62. 
    LeCun Y, Bengio Y, Hinton G 2015. Deep learning. Nature 521:436–44
    [Google Scholar]
  63. 63. 
    Kim K, Kang S, Yoo J, Kwon Y, Nam Y et al. 2018. Deep-learning-based inverse design model for intelligent discovery of organic molecules. npj Comput. Mater. 4:67
    [Google Scholar]
  64. 64. 
    Gómez-Bombarelli R, Aguilera-Iparraguirre J, Hirzel TD, Duvenaud D, Maclaurin D et al. 2016. Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach. Nat. Mater. 15:1120–27
    [Google Scholar]
  65. 65. 
    Oliynyk AO, Antono E, Sparks TD, Ghadbeigi L, Gaultois MW et al. 2016. High-throughput machine-learning-driven synthesis of full-Heusler compounds. Chem. Mater. 28:7324–31
    [Google Scholar]
  66. 66. 
    Agrawal A, Deshpande PD, Cecen A, Basavarsu GP, Choudhary AN, Kalidindi SR 2014. Exploration of data science techniques to predict fatigue strength of steel from composition and processing parameters. Integr. Mater. Manuf. Innov. 3:90–108
    [Google Scholar]
  67. 67. 
    Iwasaki Y, Sawada R, Stanev V, Ishida M, Kirihara A et al. 2019. Materials development by interpretable machine learning. arXiv:1903.02175 [cond-mat.mtrl-sci]
  68. 68. 
    Xue D, Xue D, Yuan R, Zhou Y, Balachandran PV et al. 2017. An informatics approach to transformation temperatures of NiTi-based shape memory alloys. Acta Mater 125:532–41
    [Google Scholar]
  69. 69. 
    Hassabis D. 2017. Artificial intelligence: chess match of the century. Nature 544:413–14
    [Google Scholar]
  70. 70. 
    Meredig B, Agrawal A, Kirklin S, Saal JE, Doak J et al. 2014. Combinatorial screening for new materials in unconstrained composition space with machine learning. Phys. Rev. B 89:094104
    [Google Scholar]
  71. 71. 
    Zeier WG, Anand S, Huang L, He R, Zhang H et al. 2017. Using the 18-electron rule to understand the nominal 19-electron half-Heusler NbCoSb with Nb vacancies. Chem. Mater. 29:1210–17
    [Google Scholar]
  72. 72. 
    Hutchinson ML, Antono E, Gibbons BM, Paradiso S, Ling J, Meredig B 2017. Overcoming data scarcity with transfer learning. arXiv:1711.05099 [cs.LG]
  73. 73. 
    De S, Bartók AP, Csányi G, Ceriotti M 2016. Comparing molecules and solids across structural and alchemical space. Phys. Chem. Chem. Phys. 18:13754–69
    [Google Scholar]
  74. 74. 
    Liu AY, Cohen ML. 1989. Prediction of new low compressibility solids. Science 245:841–42
    [Google Scholar]
  75. 75. 
    Menon A, Childs CM, Poczós B, Washburn NR, Kurtis KE 2019. Molecular engineering of superplasticizers for metakaolin-portland cement blends with hierarchical machine learning. Adv. Theory Simul. 2:1800164
    [Google Scholar]
  76. 76. 
    Santosa F, Symes WW. 1986. Linear inversion of band-limited reflection seismograms. SIAM J. Sci. Stat. Comput. 7:1307–30
    [Google Scholar]
  77. 77. 
    Bucior BJ, Bobbitt NS, Islamoglu T, Goswami S, Gopalan A et al. 2019. Energy-based descriptors to rapidly predict hydrogen storage in metal–organic frameworks. Mol. Syst. Des. Eng. 4:162–74
    [Google Scholar]
  78. 78. 
    Raccuglia P, Elbert KC, Adler PD, Falk C, Wenny MB et al. 2016. Machine-learning-assisted materials discovery using failed experiments. Nature 533:73–76
    [Google Scholar]
  79. 79. 
    Kawazoe Y, Yu J-Z, Tsai A-P, Masumoto T 1997. Phase Diagrams and Physical Properties of Nonequilibrium Alloys, Subvol. A: Nonequilibrium Phase Diagrams of Ternary Amorphous Alloys Landolt–Börnstein Numer. Data Funct. Relatsh. Sci. Technol. 37 Berlin/Heidelberg/New York: Springer
    [Google Scholar]
  80. 80. 
    Sakurai A, Yada K, Simomura T, Ju S, Kashiwagi M et al. 2019. Ultranarrow-band wavelength-selective thermal emission with aperiodic multilayered metamaterials designed by Bayesian optimization. ACS Cent. Sci. 5:319–26
    [Google Scholar]
  81. 81. 
    Sendek AD, Yang Q, Cubuk ED, Duerloo KAN, Cui Y, Reed EJ 2017. Holistic computational structure screening of more than 12,000 candidates for solid lithium-ion conductor materials. Energy Environ. Sci. 10:306–20
    [Google Scholar]
  82. 82. 
    Sendek AD, Antoniuk ER, Cubuk ED, Francisco BE, Buettner-Garrett J et al. 2019. A new solid Li-ion electrolyte from the crystalline lithium-boron-sulfur system. SSRN Electron. J. https://dx.doi.org/10.2139/ssrn.3404263
    [Crossref] [Google Scholar]
  83. 83. 
    Häse F, Roch LM, Aspuru-Guzik A 2018. Chimera: enabling hierarchy based multi-objective optimization for self-driving laboratories. Chem. Sci. 9:7642–55
    [Google Scholar]
  84. 84. 
    Solomou A, Zhao G, Boluki S, Joy JK, Qian X et al. 2018. Multi-objective Bayesian materials discovery: application on the discovery of precipitation strengthened NiTi shape memory alloys through micromechanical modeling. Mater. Des. 160:810–27
    [Google Scholar]
  85. 85. 
    Kingma DP, Welling M. 2013. Auto-encoding variational Bayes. arXiv:1312.6114 [stat.ML]
  86. 86. 
    Gómez-Bombarelli R, Wei JN, Duvenaud D, Hernández-Lobato JM, Sánchez-Lengeling B et al. 2018. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4:268–76
    [Google Scholar]
  87. 87. 
    Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D et al. 2014. Generative adversarial nets. Advances in Neural Information Processing Systems 27 (NIPS 2014) Z Ghahramani, M Welling, C Cortes, ND Lawrence, KQ Weinberger San Diego: Neural Inf. Process. Syst.
    [Google Scholar]
  88. 88. 
    Nikolaev P, Hooper D, Webber F, Rao R, Decker K et al. 2016. Autonomy in materials research: a case study in carbon nanotube growth. npj Comput. Mater. 2:16031
    [Google Scholar]
  89. 89. 
    Holm EA. 2019. In defense of the black box. Science 364:26–27
    [Google Scholar]
  90. 90. 
    Mannodi-Kanakkithodi A, Pilania G, Huan TD, Lookman T, Ramprasad R 2016. Machine learning strategy for accelerated design of polymer dielectrics. Sci. Rep. 6:20952
    [Google Scholar]
  91. 91. 
    Oliynyk AO, Adutwum LA, Harynuk JJ, Mar A 2016. Classifying crystal structures of binary compounds AB through cluster resolution feature selection and support vector machine analysis. Chem. Mater. 28:6672–81
    [Google Scholar]
  92. 92. 
    Villars P 2007. Pearson's Crystal Data®: crystal structure database for inorganic compounds. Database, ASM Int Materials Park, OH:
    [Google Scholar]
  93. 93. 
    Oliynyk AO, Adutwum LA, Rudyk BW, Pisavadia H, Lotfi S et al. 2017. Disentangling structural confusion through machine learning: structure prediction and polymorphism of equiatomic ternary phases ABC. J. Am. Chem. Soc. 139:17870–81
    [Google Scholar]
  94. 94. 
    Seko A, Hayashi H, Kashima H, Tanaka I 2018. Matrix-and tensor-based recommender systems for the discovery of currently unknown inorganic compounds. Phys. Rev. Mater. 2:013805
    [Google Scholar]
  95. 95. 
    Belsky A, Hellenbrandt M, Karen VL, Luksch P 2002. New developments in the Inorganic Crystal Structure Database (ICSD): accessibility in support of materials research and design. Acta Crystallogr. B 58:364–69
    [Google Scholar]
  96. 96. 
    Lu S, Zhou Q, Ouyang Y, Guo Y, Li Q, Wang J 2018. Accelerated discovery of stable lead-free hybrid organic–inorganic perovskites via machine learning. Nat. Commun. 9:3405
    [Google Scholar]
  97. 97. 
    Wu S, Kondo Y, Kakimoto M, Yang B, Yamada H et al. 2019. Machine-learning-assisted discovery of polymers with high thermal conductivity using a molecular design algorithm. npj Comput. Mater. 5:66
    [Google Scholar]
  98. 98. 
    Otsuka S, Kuwajima I, Hosoya J, Xu Y, Yamazaki M 2011. PoLyInfo: polymer database for polymeric materials design. 2011 International Conference on Emerging Intelligent Data and Web Technologies22–29 Piscataway, NJ: IEEE
    [Google Scholar]
  99. 99. 
    Ramakrishnan R, Dral PO, Rupp M, Von Lilienfeld OA 2014. Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 1:140022
    [Google Scholar]
/content/journals/10.1146/annurev-matsci-090319-010954
Loading
/content/journals/10.1146/annurev-matsci-090319-010954
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error