Full text loading...
Abstract
The term data deluge is used widely to describe the rapidly accelerating growth of information in the technical literature, in scientific databases, and in informal sources such as the Internet and social media. The massive volume and increased complexity of information challenge traditional methods of data analysis but at the same time provide unprecedented opportunities to test hypotheses or uncover new relationships via mining of existing databases and literature. In this review, we discuss analytical approaches that are beginning to be applied to help synthesize the vast amount of information generated by the data deluge and thus accelerate the pace of discovery in plant pathology. We begin with a review of meta-analysis as an established approach for summarizing standardized (structured) data across the literature. We then turn to examples of synthesizing more complex, unstructured data sets through a range of data-mining approaches, including the incorporation of 'omics data in epidemiological analyses. We conclude with a discussion of methodologies for leveraging information contained in novel, open-source data sets through web crawling, text mining, and social media analytics, primarily in the context of digital disease surveillance. Rapidly evolving computational resources provide platforms for integrating large and complex data sets, motivating research that will draw on new types and scales of information to address big questions.