The field of materials science and engineering is on the cusp of a digital data revolution. After reviewing the nature of data science and Big Data, we discuss the features of materials data that distinguish them from data in other fields. We introduce the concept of process-structure-property (PSP) linkages and illustrate how the determination of PSPs is one of the main objectives of materials data science. Then we review a selection of materials databases, as well as important aspects of materials data management, such as storage hardware, archiving strategies, and data access strategies. We introduce the emerging field of materials data analytics, which focuses on data-driven approaches to extract and curate materials knowledge from available data sets. The critical need for materials e-collaboration platforms is highlighted, and we conclude the article with a number of suggestions regarding the near-term future of the materials data science field.


Article metrics loading...

Loading full text...

Full text loading...


Literature Cited

  1. Cleveland WS. 1.  2001. Data science: an action plan for expanding the technical areas of the field of statistics. ISI Rev. 69:21–26 [Google Scholar]
  2. Dhar V. 2.  2013. Data science and prediction. Commun. ACM 56:64–73 [Google Scholar]
  3. Hey T, Tansley S, Tolle K. 3.  2009. The Fourth Paradigm: Data-Intensive Scientific Discovery. Redmond, WA: Microsoft Res.
  4. Anderson C. 4.  2008. The end of theory: The data deluge makes the scientific method obsolete. Wired Mag. 16:16.07 [Google Scholar]
  5. Linden G, Smith G, York J. 5.  2003. Amazon.com recommendations: item-to-item collaborative filtering. Internet Comput. IEEE 7:76–80 [Google Scholar]
  6. Li I, Dey A, Forlizzi J. 6.  2010. A stage-based model of personal informatics systems. Proc. SIGCHI Conference on Human Factors in Computing Systems557–66 New York: ACM
  7. Hohman M, Gregory K, Chibale K, Smith P, Ekins S, Bunin B. 7.  2009. Novel web-based tools combining chemistry informatics, biology and social networks for drug discovery. Drug Discov. Today 14:261–70 [Google Scholar]
  8. Tien JM. 8.  2003. Toward a decision informatics paradigm: a real-time, information-based approach to decision making. IEEE Trans. Syst. Man Cybern. C 33:102–13 [Google Scholar]
  9. Wan TT. 9.  2006. Healthcare informatics research: from data to evidence-based management. J. Med. Syst. 30:3–7 [Google Scholar]
  10. 10. Dropbox, Inc 2014. http://www.dropbox.com
  11. 11. GitHub 2014. https://github.com
  12. 12. HUBzero 2014. https://hubzero.org
  13. 13. nanoHUB 2014. https://nanohub.org
  14. 14. National Science Board 2005. Long-lived digital data collections: enabling research and education in the 21st century Rep. NSB-05-40, National Science Board. http://www.nsf.gov/pubs/2005/nsb0540
  15. Yu H, Kanov K, Perlman E, Graham J, Frederix E. 15.  et al. 2012. Studying Lagrangian dynamics of turbulence using in-demand fluid particle tracking in a public turbulence database. J. Turbul. 13:1–29 [Google Scholar]
  16. 16. CERN Open Data Portal 2014. http://opendata.cern.ch
  17. 17. Hubble Legacy Archive 2014. http://hla.stsci.edu
  18. 18. CERN 2014. http://home.web.cern.ch/about
  19. 19. Fermi National Accelerator Laboratory 2014. http://www.fnal.gov
  20. 20. Relativistic Heavy Ion Collider 2014. http://www.bnl.gov/rhic
  21. Fuhrmann P. 21.  2014. dCache, the overview White Pap., dCache. http://www.dcache.org/manuals/dcache-whitepaper-light.pdf
  22. 22. dCache 2014. http://www.dcache.org
  23. 23. DESY (Deutsches Elektronen-Synchrotron) 2014. http://www.desy.de
  24. 24. CERN: The Worldwide LHC Computing Grid 2014. http://home.web.cern.ch/about/computing/worldwide-lhc-computing-grid
  25. 25. National Center for Biotechnology Information 2014. http://www.ncbi.nlm.nih.gov
  26. McDonald E, Brown C. 26.  2014. Working with Big Data in bioinformatics. http://www.aosabook.org/en/posa/working-with-big-data-in-bioinformatics.html
  27. 27. NOAA (National Oceanic and Atmospheric Administration) 2014. http://www.nesdis.noaa.gov
  28. 28. NOAA View Data Exploration Tool 2014. http://www.nnvl.noaa.gov/view
  29. 29. NOAA: National Operational Model Archive and Distribution System 2014. http://nomads.ncdc.noaa.gov/data.php
  30. 30. GrADS Data Server 2014. http://grads.iges.org/grads/gds/index.html
  31. 31. OPeNDAP 2014. http://opendap.org
  32. 32. Earth Observing System Data and Information System 2014. https://earthdata.nasa.gov/about-eosdis
  33. 33. National Snow and Ice Data Center 2014. http://nsidc.org/daac/data-sets.html
  34. White AA. 34.  2013. Big data are shaping the future of materials science. MRS Bull. 38:594–95 [Google Scholar]
  35. 35. Committee on Integrated Computational Materials Engineering, National Research Council 2008. Integrated Computational Materials Engineering: A Transformational Discipline for Improved Competitiveness and National Security Washington, DC: The National Academies Press http://www.nap.edu/catalog/12199/integrated-computational-materials-engineering-a-transformational-discipline-for-improved-competitiveness
  36. 36. National Science and Technology Council, Executive Office of the President 2011. Materials genome initiative for global competitiveness http://www.whitehouse.gov/sites/default/files/microsites/ostp/materials_genome_initiative-final.pdf
  37. Van Tendeloo G, Van Dyck D, Pennycook SE. 37.  2012. Handbook of Nanoscopy Weinheim, Ger: Wiley-VCH
  38. Kalidindi SR. 38.  2015. Data science and cyberinfrastructure: critical enablers for accelerated development of hierarchical materials. Int. Mater. Rev. 60:150–68 [Google Scholar]
  39. McNulty E. 39.  2014. Understanding Big Data: the seven V's http://dataconomy.com/seven-vs-big-data/
  40. Lienert U, Li SF, Hefferan CM, Lind J, Suter RM. 40.  et al. 2011. High-energy diffraction microscopy at the Advanced Photon Source. JOM 63:70–77 [Google Scholar]
  41. 41. Advanced Photon Source, Argonne National Laboratory 2014. https://www1.aps.anl.gov
  42. Auciello O. 42.  2013. The materials research community studies magnitude of Big Data. MRS Bull. 38:766–67 [Google Scholar]
  43. Mies D. 43.  2002. Managing materials data. Handbook of Materials Selection M Kutz, Chapter 17 New York: John Wiley & Sons [Google Scholar]
  44. 44. Citrine Informatics 2014. http://www.citrination.com
  45. 45. Clean Energy Project 2014. http://cleanenergy.molecularspace.org
  46. 46. The Materials Project 2014. http://www.materialsproject.org
  47. 47. Automatic-FLOW for Materials Discovery 2014. http://www.aflowlib.org
  48. 48. CALPHAD (Computer Coupling of Phase Diagrams and Thermochemistry) 2014. http://www.calphad.org
  49. 49. Open Quantum Materials Database 2014. http://oqmd.org
  50. 50. NIST (National Institute of Standards and Technology) Data Gateway 2014. http://srdata.nist.gov/gateway/gateway?dblist=1
  51. 51. NIST Material Measurement Laboratory 2014. http://www.ctcms.nist.gov/potentials/
  52. 52. MatWeb 2014. http://www.matweb.com/
  53. 53. Granta 2014. http://www.grantadesign.com/products/ces/
  54. 54. MatNavi (NIMS Materials Database) 2014. http://mits.nims.go.jp/index_en.html
  55. Freiman S, Madsen L, Rumble J. 55.  2011. A perspective on materials databases. Am. Ceram. Soc. Bull. 90:28–32 [Google Scholar]
  56. Kaufman J. 56.  1986. The National Materials Property Data Network, Inc.—the technical challenges and the plan. Mater. Prop. Data 1:159–63 [Google Scholar]
  57. Adams BL, Kalidindi SR, Fullwood DT. 57.  2012. Microstructure Sensitive Design for Performance Optimization Oxford, UK: Butterworth-Heinemann
  58. Milton GW. 58.  2001. The Theory of Composites Cambridge, UK: Cambridge Univ. Press
  59. Torquato S. 59.  2002. Random Hetereogeneous Materials New York: Springer-Verlag
  60. Panchal JH, Kalidindi SR, McDowell DL. 60.  2013. Key computational modeling issues in integrated computational materials engineering. J. Comput. Aided Des. 45:4–25 [Google Scholar]
  61. 61. DREAM.3D 2014. http://dream3d.bluequartz.net
  62. 62. Materials Atlas 2014. https://cosmicweb.mse.iastate.edu/wiki/display/home/materials+atlas+home
  63. 63. Computational Materials Data Network 2014. http://www.asminternational.org/web/cmdnetwork/about
  64. Groeber MA, Jackson MA. 64.  2014. DREAM.3D: a digital representation environment for the analysis of microstructure in 3D. Integr. Mater. Manuf. Innov. 3:5 [Google Scholar]
  65. 65. Material Data Management Consortium 2014. http://www.mdmc.net
  66. 66. TMS (The Minerals, Metals and Materials Society) 2014. http://www.tms.org/
  67. 67. The Materials Cyberinfrastructure Portal 2014. http://www.tms.org/cyberportal/
  68. Patterson DA, Gibson G, Katz RH. 68.  1988. A case for redundant arrays of inexpensive disks (RAID). Proc. 1988 ACM SIGMOD International Conference on Management of Data109–16 Chicago: ACM
  69. Schroeder B, Gibson G. 69.  2007. Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you?. In Proc. 5th USENIX Conference on File and Storage Technologies (FAST'07). San Jose, CA: USENIX [Google Scholar]
  70. Ghemawat S, Gobioff H, Leung ST. 70.  2003. The Google File System. Proc. 19th ACM Symposium on Operating System Principles. Bolton Landing, NY: ACM
  71. Healey CG. 71.  2014. CSC541: advanced data structures Course Notes, Dep. Comput. Sci., NC State Univ. http://www.csc.ncsu.edu/faculty/healey/csc541/notes/file_sys.pdf
  72. Rodeh O, Teperman A. 72.  2003. zFS—a scalable distributed file system using object disks. Proc. 20th IEEE Conference on Mass Storage Systems and Technology207–18 San Diego, CA: IEEE
  73. Shvachko K, Hairong K, Radia S, Chansler R. 73.  2010. The Hadoop distributed file system. Proc. 26th IEEE Symposium on Mass Storage Systems and Technologies1–10 Incline Village, NY: IEEE
  74. Jackson M, Groeber M, Uchic M, Rowenhorst D, De Graef M. 74.  2014. h5ebsd: an archival data format for electron back-scatter diffraction data sets. Integr. Mater. Manuf. Innov. 3:4 [Google Scholar]
  75. Jackson M, Simmons J, De Graef M. 75.  2010. MXA: a customizable HDF5-based data format for multi-dimensional data sets. Model. Simul. Mater. Sci. Eng. 18:065008 [Google Scholar]
  76. 76. The HDF Group 2014. http://www.hdfgroup.org/
  77. 77. NetCDF 2014. http://www.unidata.ucar.edu/software/netcdf/index.html
  78. 78. PDB (Protein Data Bank) 2014. http://www.pdb.org/pdb/home/home.do
  79. 79. FITS Support Office (NASA/Goddard Space Flight Center) 2014. http://fits.gsfc.nasa.gov/
  80. 80. DICOM 2014. http://medical.nema.org/
  81. 81. National Digital Information Infrastructure and Preservation Program 2014. http://www.digitalpreservation.gov
  82. 82. Community Owned Digital Preservation Tool Registry 2014. http://coptr.digipres.org/main_page
  83. 83. ISO (International Organization for Standardization) 26234:2012 2014. http://www.iso.org/iso/catalogue_detail.htm?csnumber=43506
  84. 84. DOI 2014. http://www.doi.org/
  85. 85. Handle.Net 2014. http://www.handle.net/index.html
  86. 86. Corporation for National Research Initiatives 2014. http://www.cnri.reston.va.us
  87. 87. Globus Online GridFTP 2014. http://toolkit.globus.org/toolkit/docs/latest-stable/gridftp/
  88. 88. Internet Engineering Task Force 2014. https://www.ietf.org
  89. 89. Globus Online 2014. https://www.globus.org
  90. Kumar S, Edwards J, Bremer PT, Knoll A, Christensen C. 90.  et al. 2014. Efficient I/O and storage of adaptive-resolution data. Proc. International Conference for High Performance Computing, Networking, Storage and Analysis413–23 Piscataway, NJ: IEEE
  91. Li Y, Perlman E, Wan M, Yang Y, Meneveau C. 91.  et al. 2008. A public turbulence database cluster and applications to study Lagrangian evolution of velocity increments in turbulence. J. Turbul. 9:1–29 [Google Scholar]
  92. 92. ASTM Int 2013. Standard test methods for determining average grain size ASTM E112, ASTM Int.
  93. 93. ASTM Int 2008. Standard test methods for characterizing duplex grain sizes ASTM E1181, ASTM Int.
  94. Russ J. 94.  1992. The Image Processing Handbook Boca Raton, FL: CRC
  95. Hall E. 95.  1951. The deformation and ageing of mild steel. III. Discussion of result. Proc. Phys. Soc. B 64:747–53 [Google Scholar]
  96. Petch N. 96.  1953. Cleavage strength of polycrystals. Iron Steel Inst. J. 174:25–28 [Google Scholar]
  97. Argon A. 97.  2008. Strengthening Mechanisms in Crystal Plasticity Oxford, UK: Oxford Univ. Press
  98. Reed-Hill R, Abbaschian R. 98.  1994. Physical Metallurgy Principles Boston: PWS, 3rd ed..
  99. Rajan K. 99.  2005. Materials informatics. Mater. Today 8:38–45 [Google Scholar]
  100. Gorse D, Lahana R. 100.  2000. Functional diversity of compound libraries. Curr. Opin. Chem. Biol. 4:287–94 [Google Scholar]
  101. Curtarolo S, Morgan D, Persson K, Rodgers J, Ceder G. 101.  2003. Predicting crystal structures with data mining of quantum calculations. Phys. Rev. Lett. 91:135503 [Google Scholar]
  102. Ceder G. 102.  1998. Predicting properties from scratch. Science 280:1099–100 [Google Scholar]
  103. Breneman C, Brinson L, Schadler L, Natarajan B, Krein M. 103.  et al. 2013. Stalking the materials genome: a data-driven approach to the virtual design of nanostructured polymers. Adv. Funct. Mater. 23:5746–52 [Google Scholar]
  104. Cebon D, Ashby M. 104.  2006. Engineering materials informatics. MRS Bull. 31:1004–12 [Google Scholar]
  105. Serra J. 105.  2006. A lattice approach to image segmentation. J. Math. Imag. Vis. 24:83–130 [Google Scholar]
  106. Waggoner J, Simmons J, De Graef M, Wang S. 106.  2013. Multi-structure propagation incorporating homeo-morphism for materials image segmentation. IEEE Trans. Image Process. 22:5282–93 [Google Scholar]
  107. Comer M, Bouman CA, De Graef M, Simmons JP. 107.  2011. Bayesian methods for image segmentation. JOM 63:55–57 [Google Scholar]
  108. 108. EM/MPM Workbench 2014. http://www.bluequartz.net/?page_id=97
  109. Simmons J, Chuang P, Comer M, De Graef M, Uchic M, Spowart J. 109.  2009. Application and further development of advanced image processing algorithms for automated analysis of serial section image data. Model. Simul. Mater. Sci. Eng. 17:025002 [Google Scholar]
  110. MacSleyne J, Simmons J, De Graef M. 110.  2008. On the use of 2-D moment invariants for the automated classification of particle shapes. Acta Mater. 56:427–37 [Google Scholar]
  111. MacSleyne J, Simmons J, De Graef M. 111.  2008. On the use of moment invariants for the automated analysis of 3-D particle shapes. Model. Simul. Mater. Sci. Eng. 16:045008 [Google Scholar]
  112. Hütler M, Rutledge G, Armstrong R. 112.  2005. Crystal shapes and crystallization in continuum modeling. Phys. Fluids 17:014107 [Google Scholar]
  113. MacPherson R, Srolovitz D. 113.  2007. The von Neumann relation generalized to coarsening of three-dimensional microstructures. Nature 446:1053–55 [Google Scholar]
  114. Ohser J, Mücklich F. 114.  2000. Statistical Analysis of Microstructures in Materials Science West Sussex, UK: John Wiley & Sons
  115. Adams BL, Gao X, Kalidindi SR. 115.  2005. Finite approximations to the second-order properties closure in single phase polycrystals. Acta Mater. 53:3563–77 [Google Scholar]
  116. Niezgoda SR, Fullwood DT, Kalidindi SR. 116.  2008. Delineation of the space of 2-point correlations in a composite material system. Acta Mater. 56:5285–92 [Google Scholar]
  117. Fullwood DT, Niezgoda SR, Kalidindi SR. 117.  2008. Microstructure reconstructions from 2-point statistics using phase-recovery algorithms. Acta Mater. 56:942–48 [Google Scholar]
  118. Fullwood DT, Niezgoda SR, Adams BL, Kalidindi SR. 118.  2010. Microstructure sensitive design for performance optimization. Prog. Mater. Sci. 55:477–562 [Google Scholar]
  119. Niezgoda SR, Kalidindi SR. 119.  2009. Applications of the phase-coded generalized Hough transform to feature detection, analysis, and segmentation of digital microstructures. Comput. Mater. Contin. 14:79–97 [Google Scholar]
  120. Niezgoda SR, Turner DM, Fullwood DT, Kalidindi SR. 120.  2010. Optimized structure based representative volume element sets reflecting the ensemble-averaged 2-point statistics. Acta Mater. 58:4432–45 [Google Scholar]
  121. Wargo EA, Hanna AC, Çeçen A, Kalidindi SR, Kumbur EC. 121.  2012. Selection of representative volume elements for pore-scale analysis of transport in fuel cell materials. J. Power Sources 197:168–79 [Google Scholar]
  122. Qidwai SM, Turner DM, Niezgoda SR, Lewis AC, Geltmacher AB. 122.  et al. 2012. Estimating response of polycrystalline materials using sets of weighted statistical volume elements (WSVEs). Acta Mater. 60:5284–99 [Google Scholar]
  123. Fullwood DM, Kalidindi SR, Niezgoda SR, Fast A, Hampson N. 123.  2008. Gradient-based microstructure reconstructions from distributions using fast Fourier transforms. Mater. Sci. Eng. A 494:68–72 [Google Scholar]
  124. Bochenek B, Pyrz R. 124.  2004. Reconstruction of random microstructures: a stochastic optimization problem. Comput. Mater. Sci. 31:93–112 [Google Scholar]
  125. Roberts AP. 125.  1997. Statistical reconstruction of three-dimensional porous media from two-dimensional images. Phys. Rev. E 56:3203–12 [Google Scholar]
  126. Niezgoda SR, Kanjarla AK, Kalidindi SR. 126.  2013. Novel microstructure quantification framework for databasing, visualization, and analysis of microstructure data. Integr. Mater. Manuf. Innov. 2:3 [Google Scholar]
  127. Kalidindi SR, Niezgoda SR, Salem AA. 127.  2011. Microstructure informatics using higher-order statistics and efficient data-mining protocols. JOM 63:34–41 [Google Scholar]
  128. Kalidindi SR. 128.  2012. Computationally-efficient fully-coupled multi-scale modeling of materials phenomena using calibrated localization linkages. ISRN Mater. Sci. 2012:305692 [Google Scholar]
  129. Fast T, Niezgoda SR, Kalidindi SR. 129.  2011. A new framework for computationally efficient structure-structure evolution linkages to facilitate high-fidelity scale bridging in multi-scale materials models. Acta Mater. 59:699–707 [Google Scholar]
  130. Fast T, Kalidindi SR. 130.  2011. Formulation and calibration of higher-order elastic localization relationships using the MKS approach. Acta Mater. 59:4595–605 [Google Scholar]
  131. Landi G, Niezgoda SR, Kalidindi SR. 131.  2010. Multi-scale modeling of elastic response of three-dimensional voxel-based microstructure datasets using novel DFT-based knowledge systems. Acta Mater. 58:2716–25 [Google Scholar]
  132. Yabansu YC, Patel DK, Kalidindi SR. 132.  2014. Calibrated localization relationships for elastic response of polycrystalline aggregates. Acta Mater. 81:151–60 [Google Scholar]
  133. Salem AA, Shaffer JB, Satko DP, Semiatin SL, Kalidindi SR. 133.  2014. Workflows for integrating mesoscale heterogeneities in materials structure with process simulation of titanium alloys. Integrating Mater. Manuf. Innov. 3:24 [Google Scholar]
  134. 134. Google Docs 2014. https://docs.google.com/
  135. 135. Authorea 2014. https://www.authorea.com/
  136. 136. ShareLaTeX 2014. https://www.sharelatex.com/
  137. 137. Mendeley 2014. http://www.mendeley.com/
  138. 138. ResearchGate 2014. http://www.researchgate.net/
  139. 139. Sourceforge 2014. http://sourceforge.net/
  140. 140. Plotly 2014. https://plot.ly/
  141. 141. Google+ 2014. https://plus.google.com/
  142. 142. LinkedIn 2014. https://www.linkedin.com/
  143. 143. Materials Microcharacterization Collaboratory 2014. http://web.ornl.gov/sci/doe2k/MICSReview/99/
  144. 144. TelePresence Microscopy Collaboratory 2014. http://tpm.amc.anl.gov
  145. 145. MGI (Materials Genome Initiative) Digital Data Community 2014. https://www.linkedin.com/groups/mgi-digital-data-community-7459917
  146. 146. The PRISMS Center: Materials Commons 2014. http://prisms.engin.umich.edu/#/prisms
  147. Dabbish L, Stuart C, Tsay J, Herbsleb J. 147.  2012. Social coding in GitHub: transparency and collaboration in an open software repository. Proc. ACM 2012 Conference on Computer Supported Cooperative Work1277–86 New York: ACM
  148. 148. maTIN 2015. http://materials.gatech.edu/matin

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error