Annual Reviews home
0
Skip to content
  • For Librarians & Agents
  • For Authors
  • Knowable Magazine
  • Institutional Login
  • Login
  • Register
  • Activate
  • 0 Cart
  • Help
Annual Reviews home
  • JOURNALS A-Z
    • Analytical Chemistry
    • Animal Biosciences
    • Anthropology
    • Astronomy and Astrophysics
    • Biochemistry
    • Biomedical Data Science
    • Biomedical Engineering
    • Biophysics
    • Cancer Biology
    • Cell and Developmental Biology
    • Chemical and Biomolecular Engineering
    • Clinical Psychology
    • Computer Science
    • Condensed Matter Physics
    • Control, Robotics, and Autonomous Systems
    • Criminology
    • Developmental Psychology
    • Earth and Planetary Sciences
    • Ecology, Evolution, and Systematics
    • Economics
    • Entomology
    • Environment and Resources
    • Financial Economics
    • Fluid Mechanics
    • Food Science and Technology
    • Genetics
    • Genomics and Human Genetics
    • Immunology
    • Law and Social Science
    • Linguistics
    • Marine Science
    • Materials Research
    • Medicine
    • Microbiology
    • Neuroscience
    • Nuclear and Particle Science
    • Nutrition
    • Organizational Psychology and Organizational Behavior
    • Pathology: Mechanisms of Disease
    • Pharmacology and Toxicology
    • Physical Chemistry
    • Physiology
    • Phytopathology
    • Plant Biology
    • Political Science
    • Psychology
    • Public Health
    • Resource Economics
    • Sociology
    • Statistics and Its Application
    • Virology
    • Vision Science
    • Article Collections
    • Events
    • Shot of Science
  • JOURNAL INFO
    • Copyright & Permissions
    • Add To Your Course Reader
    • Expected Publication Dates
    • Impact Factor Rankings
    • Access Metadata
    • RSS Feeds
  • PRICING & SUBSCRIPTIONS
    • General Ordering Info
    • Online Activation Instructions
    • Personal Pricing
    • Institutional Pricing
    • Society Partnerships
  •     S2O    
  •     GIVE    
  • ABOUT
    • What We Do
    • Founder & History
    • Our Team
    • Careers
    • Press Center
    • Events
    • News
    • Global Access
    • DEI
    • Directory
    • Help/FAQs
    • Contact Us
  • Home >
  • Annual Review of Political Science >
  • Volume 22, 2019 >
  • Brady, pp 297-323
  • Save
  • Email
  • Share

The Challenge of Big Data and Data Science

  • Home
  • Annual Review of Political Science
  • Volume 22, 2019
  • Brady, pp 297-323
  • Facebook
  • Twitter
  • LinkedIn
Download PDF

The Challenge of Big Data and Data Science

Annual Review of Political Science

Vol. 22:297-323 (Volume publication date May 2019)
First published as a Review in Advance on January 21, 2019
https://doi.org/10.1146/annurev-polisci-090216-023229

Henry E. Brady

Department of Political Science and Goldman School of Public Policy, University of California, Berkeley, California 94720, USA; email: [email protected]

Download PDF Article Metrics
  • Permissions
  • Reprints

  • Download Citation
  • Citation Alerts
Sections
  • Abstract
  • Keywords
  • BIG DATA AND DATA SCIENCE
  • INCREASING VOLUME, VELOCITY, AND VARIETY OF BIG DATA
  • DEFINITIONS OF BIG DATA AND DATA SCIENCE
  • SOCIETAL AND POLITICAL CHANGE FROM BIG DATA AND DATA SCIENCE
  • INCREASING AMOUNTS OF DATA AVAILABLE TO ALL SCIENTISTS, INCLUDING POLITICAL SCIENTISTS
  • NEW WAYS POLITICAL SCIENTISTS ORGANIZE THEIR WORK
  • NEW KINDS OF QUESTIONS ASKED BY POLITICAL SCIENTISTS
  • DEALING WITH ETHICAL ISSUES REGARDING POLITICAL SCIENCE RESEARCH
  • CONCLUSIONS
  • disclosure statement
  • acknowledgments
  • literature cited

Abstract

Big data and data science are transforming the world in ways that spawn new concerns for social scientists, such as the impacts of the internet on citizens and the media, the repercussions of smart cities, the possibilities of cyber-warfare and cyber-terrorism, the implications of precision medicine, and the consequences of artificial intelligence and automation. Along with these changes in society, powerful new data science methods support research using administrative, internet, textual, and sensor-audio-video data. Burgeoning data and innovative methods facilitate answering previously hard-to-tackle questions about society by offering new ways to form concepts from data, to do descriptive inference, to make causal inferences, and to generate predictions. They also pose challenges as social scientists must grasp the meaning of concepts and predictions generated by convoluted algorithms, weigh the relative value of prediction versus causal inference, and cope with ethical challenges as their methods, such as algorithms for mobilizing voters or determining bail, are adopted by policy makers.

Keywords

big data, data science, artificial intelligence, cyberinfrastructure, causality, prediction, text analysis, internet, smart cities, cyber-warfare, automation

BIG DATA AND DATA SCIENCE

“Big data and data science are being used as buzzwords and are composites of many concepts,” says the US National Institute of Standards and Technology (NIST) in a 2015 “framework” report on “big data” (NIST 2015, p. 2). The phrase “big data” appears frequently in the press and in academic journals, and “data science” programs have sprouted in academia over the last five years. On March 29, 2012, the White House Office of Science and Technology Policy announced the “Big Data Research and Development Initiative” (Kalil 2012) that builds upon federal initiatives “ranging from computer architecture and networking technologies to algorithms, data management, artificial intelligence, machine learning, and development and deployment of advanced cyberinfrastructure” (NITRD 2016, p. 6). “Big data” appeared about 560 times per year in JSTOR from 2014 through 2017 even though it was mentioned less than once a year in the century before 2000 and only an average of about eight times a year between 2001 and 2010. In the last five years, at least 17 Data Science programs have started at major American research universities (http://msdse.org/environments/), and the internet is replete with advertisements for data science books and courses, often with the come-on of “Become a Data Scientist.” The phrases have certainly caught on, but they mean different things to different people, and some even doubt that they identify something very new or useful (e.g., boyd & Crawford 2012, Donoho 2017, Smith 2018).

Despite the imperfection of these terms and the hyperbole that often surrounds them, they point to real changes that are important for political science. Big data, data science, and the related ideas of artificial intelligence, cyberinfrastructure, and machine learning contribute to the following developments and trends discussed in this article:

▪

Societal and political change from big data and data science. The volume, velocity, variety, and veracity of data being generated by and available to governments, armies, businesses, nonprofits, and people have combined with the enormous increases in computing power and improvements in data science methods to change society in fundamental ways. Big data and data science are creating new phenomena and raising basic questions about the control and manipulation of people and populations, the future of privacy, the veracity of information, the future of work, and many other topics that matter for political scientists.

▪

Increasing amounts of data available to all scientists, including political scientists. All the sciences are being affected by these changes. The Thirty Meter Telescope coming online in 2022 will generate 90 terabytes every night; genomic data are doubling in quantity every nine months and are currently being produced at approximately 10 terabytes per day; the Large Hadron Collider at CERN generates 140 terabytes per day. The World Wide Web produces about 1,500,000 terabytes every day, and this flow of data offers social scientists a chance to study the “sinews of society” (Weil 2012) and the “nerves of government” (Deutsch 1963) in a way that could not be done in the past. Now political scientists can observe and analyze (sometimes in real time) the information that people choose to consume, the information produced by political actors, the environment in which they live, and many other aspects of people's lives.

▪

New ways political scientists organize their work. With this onslaught of data, political scientists can rethink how they do political science by becoming conversant with new technologies that facilitate accessing, managing, cleaning, analyzing, and archiving data.

▪

New kinds of questions asked by political scientists. Political scientists must ask what they are trying to accomplish with concept formation, description, causal inference, prediction, and projection into the future. In the process, new methods and insights will be developed about political behavior, and new designs will be put forth for political institutions.

▪

Dealing with ethical issues regarding political science research. Finally, political scientists must think about complicated ethical issues regarding access, use, and broadcasting of information, and the possible misuse of their models and results.

Before considering these five changes and their implications for political science, I describe the exponential growth in data and computing power that has led to the prominence of so-called big data and data science, followed by definitions of these untidy phrases.

INCREASING VOLUME, VELOCITY, AND VARIETY OF BIG DATA

Social scientists must come to grips with the current dramatic transformations in the communication of information, which parallel the striking changes in transportation in the nineteenth century. In 1816, using horse-driven stagecoaches, mule-driven canal boats, or sailing packets, a trip between Philadelphia and Quebec took more than four days. By 1860, with the advent of steam-driven trains and steamboats, the time and cost for travel dropped by over two-thirds, and the same trip took just over one day (estimated from Taylor 1951, p. 141). These changes created new trading networks, new opportunities for migration, new kinds of cities with commuter suburbs, and new understandings of the world, with enormous implications for politics, economics, and society.

Changes every 20 years in information technologies punctuated the history of the late nineteenth, twentieth, and early twenty-first centuries: telephones (1870–1890s), phonographs (1870–1890s), cinema (1890–1920s), radio (1900–1920s), television (1940–1950s), mainframe computers (1940–1950s), personal computers (1970–1980s), the internet and World Wide Web (1980–2000s), cell phones (1980–2000s), and smart phones (2000s–present). The most fundamental innovation came with the move from analog devices to digital ones, starting in the 1950s and proceeding dramatically in the 1990s and thereafter. These changes brought (a) extensive digital datafication, in which myriad events are now digitally recorded; (b) widespread connectedness, in which events and people are identified so that they can be linked up with one another; (c) pervasive networking, such that people are embedded in a community of interacting users who become nodes in larger networks; and (d) ubiquitous computer authoring, where computers create new information that becomes part of the social system and its culture.

Political scientists led the way in studying these changes. Harold Lasswell and Karl Deutsch were early students of communications and their impacts on societies. In 1983, MIT political scientist Ithiel de sola Pool looked at the production of words in the American mass media (e.g., radio, television, records, movies, newspapers, books) and point-to-point media (telephone, first-class mail, telegrams, facsimile, and data communication) from 1960 to 1977. Pool found that words in these media doubled every eight years, growing at about 9% per year. He also found that “print media are becoming increasingly expensive per word delivered while electronic media are becoming cheaper,” so that “growth in both mass and point-to-point media has been greatest in the electronic ones.” Furthermore, “although the largest flow of words in modern society is through the mass media, the rate of growth is now fastest in media that provide information to individuals, that is, point-to-point media.” Finally, “the words actually attended to from those media grew at just 2.9% per year” so that “each item of information produced faces a more competitive market and a smaller audience on average” (Pool 1983, p. 609). Pool predicted much of what we know about modern communications: They are growing fast, they are increasingly electronic and point-to-point, and people experience information overload and fragmented information flows. Perhaps most presciently, Pool (1983, p. 611) also said, “Computer networking is for the first time bringing the costs of a point-to-point medium, data communication, down to the range of costs characteristic of mass media.”

Subsequent studies by political scientists and others (Lyman & Varian 2003, Bohn & Short 2012) focused on the volume or stocks of information (e.g., the number of books in a bookstore) as well as on the flows or velocity (the daily sales of books) and the variety of information (subjects of books). They also measured information in digital bytes instead of words so that the measures reflect the proliferation of images, which communicate many more bytes per second than do words through text or speech (Bohn & Short 2012, p. 986). Hilbert & López (2011, p. 63, table 1) found that the world's storage capacity in bytes per capita doubled every 40 months between 1986 and 2007. The bulk of the world's flow of communications was still in broadcast communications, which grew at the rate of 6% per year per capita, but (point-to-point) telecommunications grew at the rate of 28% and could conceivably exceed broadcast communications within 10–15 years. Finally, they computed a new quantity—the growth in the world's computational power measured in millions of instructions per second (MIPS)—and they found that human-guided general-purpose computation grew at an impressive compound annual rate of 58% per capita between 1986 and 2007. Embedded applications-specific computation grew even faster, at 83%.

This research identifies four notable trends, briefly mentioned above, that have produced the big-data revolution: extensive digital datafication, widespread connectedness, networking, and computer authoring. First, there is a tsunami of data about societal events, and digital communications are overtaking analog. This extensive digital datafication (Cukier & Mayer-Schoenberger 2013, p. 29) creates data in a format that can be readily stored and processed by computers. “Recording” or “digitalization” might be used instead of the ugly neologism “datafication,” but it seems too passive for processes that are transmogrifying human interactions into data. Even though some of these data are relatively unstructured (text, audio, networks, or images), data scientists are figuring out ways to analyze them. Second, there is widespread connectedness because point-to-point telecommunications can be, in principle, more easily tracked than broadcasting. For example, whereas broadcasters traditionally required elaborate survey operations (such as Nielson's media-use diaries) to track their audience, Netflix has immediate data on the download of its movies. More generally, we can now record and connect data on individual postings, purchases, police encounters, and even perambulations. Datafication and connectedness mean that once-ephemeral events can now be identified and studied.

A third feature of the changing information environment, networking, is especially important for social scientists. Whereas once communications were classified as either person-to-person (e.g., conversation, letters, or telephone) or mass communications from one source to many people (e.g., books, newspapers, cinema, radio, or television), modern communications involve mediated social networks that combine features of both modes (Neumann 2016, Schroeder 2018). Twitter involves individual communications sent to many followers using hashtags that define self-mediated areas of concern. Facebook involves individuals with customized profiles who have networks of “friends” and who have affiliations with common-interest user groups that share information. Google involves a query by an individual who is provided with a list of relevant websites. Amazon involves a search for a particular product that results in suggestions about other relevant products that can be bought online. In all these media, knowledge about people's characteristics and their search behaviors is used to suggest and sometimes impose particular actions or relationships. The implications of these new modes of communication are not clear, but they probably operate differently in the three important spheres of politics, markets, and culture (Schroeder 2018). They may also have important impacts such as increasing the chance for political polarization through the creation of networks that are closed to dissenting opinions (Neumann 2016).

Finally, whereas the communication of information traditionally involved sending messages in the most verisimilar fashion possible even when the message was transformed along the way (e.g., from voice into electrical signals in a telephone), an increasing fraction of information is partly, if not entirely, computer authored. Computers use programs to produce new outputs that combine inputs in novel ways: A Google search takes a request and delivers plausible “answers” to that search; a computer game produces a fantasy virtual environment for entertainment; a Computer Automated Design program produces a design that meets certain specifications; and so forth. Nature and humans no longer have a monopoly on authoring. We now live in an era when computers can author, publish, and supply new forms of information. Another job of social science is to improve and understand these processes.

DEFINITIONS OF BIG DATA AND DATA SCIENCE

The growth of data and the creation of large databases in business, government, daily life, and scientific research launched many efforts to understand and utilize data. Data mining, knowledge discovery (Maimon & Roach 2005), and business intelligence and analytics (Chen et al. 2012) became popular terms in business describing statistical and logical rule-based efforts to extract knowledge from large databases. Within engineering, a 70-year tradition continues of building computers and robots with artificial intelligence (Russell & Norvig 2009) that can perform human-like tasks such as playing chess or driving cars. Some of the methods developed by artificial intelligence researchers have been combined with traditional methods of statistics to produce methods for pattern recognition (Ripley 1995), machine learning (Bishop 2011), and statistical learning (Hastie et al. 2016). During the first decade of the twenty-first century, the need for better ways to process and use data, especially in the sciences, was discussed under the rubric of cyberinfrastructure (Atkins et al. 2003, Berman & Brady 2005), but more recently big data and data science have become popular phrases.

Big Data

For those of us who remember when computer memories were measured in kilobytes instead of terabytes (a factor of a billion more), “big data” seems like a moving target, but the term has arisen despite the advances in computer power because data seem to be growing faster than our ability to process them. The total volume in bytes, the variety (text, images, audio, video, sensor, social media, and other forms), and the daily velocity (Laney 2001) of data are growing even faster than computing power. The large volume leads to problems of storing and managing data. The growth in variety adds the difficulties of translating data from one form to another, and the growth in velocity leads to the need to edit data on the run and to choose what is important. More recently a fourth concern, the veracity of data, adds another layer of complexity on top of volume, variety, and velocity.

Size, complexity, and technological challenges provide one definition of big data (National Research Council 2013, Ward & Barker 2013), but they do not seem a sufficient basis for heralding a sea-change in our data environment, since the race between data set size and computer capabilities goes back to the advent of computing. The National Institute of Standards and Technology has more usefully proposed that “fundamentally, the Big Data paradigm is a shift in data system architectures from monolithic systems with vertical scaling (i.e., adding more power, such as faster processors or disks, to existing machines) into a parallelized, ‘horizontally scaled’, system (i.e., adding more machines to the available collection in order to deal with volume, variety, and velocity) that uses a loosely coupled set of resources in parallel” (NIST 2015, p. 5). But the statistician David Donoho (2017, p. 747) objects that “the new skills attracting so much media attention are not skills for better solving the real problems of inference from data; they are coping skills for dealing with organizational artifacts of large-scale cluster computing.” We also do not know whether this new architecture is permanent or transient.

Beyond the sheer amount of data, the truly distinguishing features of the big-data revolution are the new technologies for recording, connecting, networking, and creating information. Human interactions through phone calls, email, texts, tweets, social media posts, and other technological methods are now digitally recorded, time- and location-stamped, and attributable to nodes in networks in ways that go far beyond the much more ephemeral media of the past. Many business, governmental, social, and scientific tasks now have digital trails, such as Fed-Ex tracking services, Web searches and purchases, parking meter payments, automobile trips, tax payments, photographs of social gatherings, weather and environmental measurements, digital images from microscopes and telescopes, and much more. When these are combined with the facts that the World Wide Web is an excellent site for social networks and accessing information and that computers can now author information and interact with us—perhaps even producing artificial intelligence and autonomous robot-like entities and virtual realities—the impression is not merely of big data but of immersive data that surround us in our daily lives. The “decentralization of data” identified by NIST may also be more than just a set of techniques for dealing with large computing problems, but the future shape of computing and the internet is still not clear. Consequently, the real impact of the big-data revolution is not so much the amount of data as a change in our cognitive environment (Lugmayr et al. 2016, Neumann 2016, Schroeder 2018) that requires new perspectives to deal with datafication, connectedness, networking, and computer authoring. These phenomena stem from the invention of new technologies including innovative methods in data science.

Data Science

Big data's companion idea, data science, relies less on the scale of the data than on a definition of a way to discover new knowledge in an age when data have proliferated and cry out for analysis. In 2001, the statistician William S. Cleveland put forth a plan to “enlarge the major areas of technical work in the field of statistics” by providing more resources for “computing with data” (Cleveland 2001, pp. 21, 22) and to call the new field “data science.” In an address to the Computer Science and Telecommunications Board of the National Research Council in 2007, computer scientist Jim Gray advocated for “data-driven science” as a new scientific paradigm that uses large collections of data to make scientific discoveries. Gray (2009, p. xxv) proposed that there was a “need for tools to help scientists capture their data, curate it, and then visualize it,” and that the goal was to “unify all the scientific data with all the literature to create a world in which the data and the literature interoperate with each other.”

Starting from these ideas, NIST (2015, p. 7) describes data science as “the extraction of actionable knowledge directly from data through a process of discovery, or hypothesis formulation and hypothesis testing.” One well-known Venn diagram (Conway 2013) places data science at the intersection of three areas: computer programming skills, mathematics and statistics knowledge, and substantive expertise in a field of research. The diagram includes machine learning as an important aspect of data science because machine learning deals directly with data and discovers patterns within it. No doubt the surprising success of machine learning (especially deep learning) in making predictions is one reason for the popularity of data science, but we do not know why deep learning works so well (Knight 2017). This raises a question confronted later in this article: How much do we have to understand about the model's underlying predictions to feel comfortable with a method? The question reflects long-standing concerns with causality versus correlation, experimental versus observational data, structural equation models versus reduced forms, and explanation versus prediction.

But these characterizations of data science are not entirely new either. In a famous article in 1962, the statistician John Tukey averred that perhaps he was not a statistician because “I have come to feel that my central interest is in data analysis, which I take to include, among other things: procedures for analyzing data, techniques for interpreting the results of such procedures, ways of planning the gathering of data to make its analysis easier, more precise or more accurate, and all the machinery and results of (mathematical) statistics which apply to analyzing data” (Tukey 1962, p. 2). Tukey's impact on statistics has been immense (Statistical Science 2003), and his concept of data analysis covers much of the same ground as data science.

Statistician David Donoho (2017, p. 748) argues that “today's popular media tropes about data science do not withstand even basic scrutiny,” but building upon Tukey's work “there is a solid case for some entity called ‘data science’ to be created….” Donoho (2017, p. 755) proposes that data science should encompass six activities to which I've added one more in Table 1, where I have also added examples.

image
CLICK TO VIEW
Table 1

The seven activities of data sciencea

Judging from Table 1, data science borrows methods and techniques that go beyond the traditional core of statistics, which is largely encompassed in item 4, “data modeling.” Techniques of data gathering and preparation are typically taught in subject matter disciplines even though statistics started as an endeavor to collect data on the state and its people through censuses and surveys. Computer science and other academic departments deal with data representation and transformation and with computation. Data visualization and presentation often involve media laboratories and psychology departments. Data archiving, indexing, and availability form the core of work in schools of library science and their modern incarnations as schools of information. In one subject matter area, bioinformatics, more than 100 colleges and universities now offer programs that focus on these tasks, and there are a few digital humanities, social sciences, and environmental science programs. But at the moment it seems that the most popular way to move forward in this area is to create “data science” programs, including computer science, information, and statistics, which allow for relationships with subject matter disciplines. The unsolved problem is how the applied data science being done in these disciplines can be incorporated into data science programs. For example, in addition to benefiting from using data science and big data, the social sciences can provide fundamental help in understanding the social construction and meaning of data, the causal impact of new information technologies, the ethical issues of privacy and data ownership, and the best ways for social institutions to use cyberinfrastructure (Berman & Brady 2005). Data science must encompass these issues.

However universities organize themselves to deal with these seven tasks, the following seems clear to me. The explosion in the number of methods and techniques for undertaking the tasks means that universities need to bring together the people working on them to learn from one another and to teach the next generation of students and scholars how to use them. There must also be some way to help scholars, either through collaboration with other scholars or by having specialists akin to collections specialists in libraries or museums, to use the many kinds of data, software, and techniques that are now available. Gone are the days when someone could learn, as I did, about a few kinds of data collection (e.g., surveys, content analysis, and administrative data), some FORTRAN and subroutine libraries such as NAG and IMSL, a bit about dBase and SQL, some statistics through econometrics and psychometrics, some statistics packages such as BMDP-SPSS-SAS-STATA-GAUSS, a computational program such as MATLAB, and a few other things and be at the forefront of data science in their discipline. There is just too much to be learned.

Real Phenomena, Inadequate Language

Many of the developments related to big data and data science are not new, but they have achieved a scale and level of impact that require new ways of describing them. The right language is hard to find.

The nineteenth century's transportation revolution was not just about the steam engine—it also involved the discovery of new forms of energy (oil and electricity); the invention of new kinds of motors (internal combustion and electrical); the creation of networks composed of rails, roads, and rivers; and even the development of new social norms such as standard time zones. Similarly, the information revolution is more than just computers or any other single thing. It also involves sensors, databases, programming languages, artificial intelligence, telecommunications, machine learning, social media, the internet, and many other inventions. Neither “big data” nor “data science” nor any other labels encompass all these innovations. The term cyberinfrastructure might have been useful, but it has not caught on. One leading data science scholar (Jordan 2018) argues for the use of the term “intelligent infrastructure,” which is broader than “artificial intelligence,” but it also has its limitations. We are left with real phenomena but inadequate language.

SOCIETAL AND POLITICAL CHANGE FROM BIG DATA AND DATA SCIENCE

Many authors have provided overviews of areas affected by big data (Chen et al. 2012, Cukier & Mayer-Schoenberger 2013, Mayer-Schönberger & Cukier 2014, Mosco 2014, Evans 2018). This article cannot provide an exhaustive review of the possible societal impacts of big data and data science, but I list a few prominent examples to show that they deserve more scrutiny by political scientists. I have chosen cyberwarfare and homeland security, smart cities, medicine, the media, and robotics.

Several recent books propose that cyberwarfare exists and that it threatens international security (e.g., Clarke & Knake 2011, Kaplan 2017), but skeptics (Rid 2012, Libicki 2014) argue that while cyber disruptions may be a problem, they do not constitute classical warfare like the Japanese attack on Pearl Harbor, which involved a purposeful and publicly claimed act of violence for political advantage. Some leading examples of cyberwarfare—such as the Stuxnet virus's introduction into Iranian centrifuges, which destroyed an essential part of Iran's nuclear fuels enrichment program, or the massive denial-of-service attack (presumably by Russian hackers) on Estonia in April 2007—were almost surely purposeful, but at most they caused lost productivity and perhaps property damage. Most importantly, no state claimed responsibility in order to achieve direct political advantage. Although the case for cyberwarfare may be weak, the Web has certainly been used for “sabotage, espionage, and subversion” (Rid 2012, p. 5), as recent events involving Russia and the 2016 American election make clear (Sanger 2018, Jamieson 2018). Moreover, the American military is collecting and processing a flood of sensor and digital information (Porche et al. 2014), which could change the face of conflict (Dunlap 2014). Obviously, these developments get at the heart of political science studies of international relations and security.

“Smart Cities” is a popular book title with subtitles such as “Big Data, Civic Hackers, and the Quest for a New Utopia,” “A Spatialised Intelligence” and “The Internet of Things, People, and Systems” (Townsend 2013, Picon 2015, Dustdar et al. 2017). Three streams of big-data work come together in this area. First, there are large, digitized administrative data sets on people and their relationship to schools, social welfare agencies (Brady et al. 2001), medical care, and police, and there are similar data sets on physical structures and their relationship to streets, services, land use, and zoning. Second, the reduced costs of sensors, wireless networks, and video cameras, combined with the ability to connect them with an “internet of things,” make it possible to monitor and sometimes remotely control air pollution, traffic, parking, usage of electricity and water, utilities, safety, police and firefighter deployments, and many other aspects of a modern city. Third, internet data such as Google Street View, Zillow, Airbnb, or Yelp can provide information about businesses, real estate, and the physical condition of the city (Glaeser et al. 2018). These data can be linked by geo-coding the location of each person's house (or place of work), each structure or business, and each sensor. Increasingly, we can go farther and link data through recognition of vehicles, faces, or radio frequency identification tags, which makes it possible to track movements throughout the city (Hashem et al. 2016).

Using these data, the city and its operations can be described, managed, and evaluated. Maps of traffic, air pollution, or poverty can provide useful descriptions for those trying to understand where to live, where to travel, or what to do. Conditions can be managed and improved in real time by involving citizens in constant feedback on services, changing the timing of traffic lights, deploying police to areas with disturbances, asking industries to “spare the air” by reducing some activities, and so forth. Finally, evaluation results can indicate what is working and what is not so that processes can be improved.

Because the decisions about what data are collected, how they are processed, and how they are used all involve choices, often influenced by who has power and who does not, these systems are inherently political. They can easily become technocratic, overly influenced by corporate interests, and perhaps most alarmingly, the basis for the “panoptic” city—the urban counterpart of Jeremy Bentham's circular Panopticon, a prison in which all inmates were constantly visible to a centrally located guard station (Kitchin 2014).

Precision medicine, according to a 2011 report by the National Research Council of the National Academy of Sciences, is “the tailoring of medical treatment to the individual characteristics of each patient” (National Research Council 2011, p. 125). To practice precision medicine, a physician would combine information about the individual with medical knowledge about how people vary in their response to illnesses and treatments (Dzau & Ginsburg 2016). Individual information would come from electronic medical records and genomic data. The 2011 report suggested creating a new taxonomy of human disease based on molecular biology that would serve as the basis for classifying diseases and people's reactions to them. To do this, an “information commons” would be created that linked molecular data, medical histories, and health outcomes (Beachy et al. 2015), and these data would be used to explore clinical associations (Hanauer et al. 2009, Miller 2012). These data could be a great boon to medical researchers, but they raise significant questions about privacy, ownership of data, and their relationship to issues such as race in America (Hochschild & Sen 2015) that could become high-profile political issues.

Changes in the media from the rise of the internet are now manifestly important for politics, but political scientists have lagged in their awareness of them. In 2002, in the first examination of the mass media in the Annual Review of Political Science, Schudson (2002, p. 249) quite properly takes political science to task because it “has never extended to the news media the lovingly detailed attention it has lavished on legislatures, parties, presidents, and prime ministers.” Yet he does not even mention the internet or World Wide Web. He focuses on the relative merits of state- versus commercial-controlled media, journalism as “the story of the interaction of reporters and government officials” (Schudson 2002, p. 255), and the cultural norms that shape coverage of topics such as homosexuality and crime. He concludes, “The news media have always been a more important forum for communication among elites (and some elites more than others) than with the general population” (Schudson 2002, p. 263), with never a hint of the anarchy of uncontrolled news sources and direct leader–follower communications now bedeviling a world with Facebook, Google, and Twitter.

Ten years later, Farrell's (2012) Annual Review of Political Science article recognizes the potential importance of the internet for exacerbating political polarization or facilitating the Arab Spring, and he argues that the internet could sort citizens into homogeneous groups seeking information to confirm their ideological biases, discourage preference falsification in authoritarian regimes by making available a broader array of opinions, and overcome the costs of collective action by allowing like-minded and politically intense people to find one another. Although Prior still concludes, in his 2013 Annual Review of Political Science article titled “Media and Political Polarization,” that “[i]nternet use shows few signs of ideological segregation” (Prior 2013, p. 122), he takes the internet seriously. And communications theorists such as Bennett & Segerberg (2012), Neumann (2016), and Schroeder (2018) argue for developing new models to understand the new media on the internet. Among other things, these theories must explain how people seek out and obtain information, since this is such a big part of what people have been enabled to do on the internet.

These four examples illustrate the kinds of questions that political scientists might ask about the impacts of big data and data science. In Seeing Like a State, Scott (1999) chronicled how states have misused census and other information. What will it mean when societies, businesses, and governments have access to large data sets about their populations that go far beyond a census? Who will own these data? Who will define what data get collected and used? What happens when news and information (e.g., blogs, cellphone videos) can be authored and disseminated without the editing power of peer reviews, journalistic norms, and a concern for their context and veracity? What new problems are created when information can be hacked and digital systems are vulnerable to viruses? When medical diagnoses or city operations depend on algorithms that sometimes fail? What biases will be baked into the algorithms? How can people be brought into the systems at the right places to ensure their participation, their rights, and their welfare?

One final example is worth exploring, although it seems the work of science fiction. As robots get better at sensing the world, as they learn the rudiments of pattern recognition if not full cognition, as they become adept at speech recognition and talking, as they can communicate with each other and with us through wireless networks and the cloud, and as they become embodied in autonomous machines with their own lightweight power sources, to what degree do they acquire rights and responsibilities (Pratt 2015)? If robots replace people at their jobs, what is left for people to do? And if a great deal of wealth is embodied in robots, who owns the robots and who gets the return to their effort (Albus 1984)? Already some authors are proposing universal basic incomes (Manjoo 2016) and guaranteed jobs (Tankersley 2018) to deal with the possibility of job loss due to robots. What kinds of political problems does this raise, or was a 1962 article right to conclude, “Artificial intelligence is neither a myth nor a threat to man” (Samuel 1962)?

INCREASING AMOUNTS OF DATA AVAILABLE TO ALL SCIENTISTS, INCLUDING POLITICAL SCIENTISTS

In a 2015 report, NIST surveyed 51 cases of uses of big data involving government and commercial operations, defense, health care and life sciences, social media, astronomy and physics, earth and environmental science, and energy. Every area involved producing or analyzing many terabytes of data and about one-third of them involved petabytes of data (NIST 2015, pp. 6–45, Appendix B)—sometimes petabytes per year. Scientists are now generating data at a prodigious rate in research involving every physical scale from the subatomic to the cosmic: analyzing the subatomic structure of matter in CERN's Large Hadron Collider, investigating the atomic and chemical structure of materials through intense X-ray and other light sources and through mathematical simulations that start from basic physical principles, sequencing DNA and mapping proteins rapidly and completely, using real-time three-dimensional microscopy of cells at many different wavelengths to understand their operations, scanning animal and human brains and bodies using functional magnetic resonance imaging (fMRI), monitoring the environmental conditions of cities and regions using multiple methods (fixed sensors, radar, and satellite imaging), and undertaking telescopic surveys of the solar system and the universe at multiple wavelengths and in real time. Some of these data sets could be useful to political scientists, such as fMRI data for those studying political psychology (Theodoridis & Nelson 2012) or satellite sensor data for those studying the impacts of climate change on politics (Hsiang et al. 2013).

Social scientists have benefited from many new data sources as well. As of roughly 1980, political scientists had available a limited number of data sets, mostly about the United States but also about other countries: Historical election statistics, usually by county but in a few cases by precinct; surveys from the 1930s onwards; census data; Federal Election Commission (FEC) data on political contributions; roll-call data from legislatures and the United Nations; data from the Correlates of War Project, the World Handbook of Political and Social Indicators, and a few other sources. In the past 30 years, the volume and variety of data have increased enormously beyond these areas, especially thanks to administrative data, internet data, textual data, and sensor-audio-video data.

Administrative Data

Before surveys, political scientists interested in voting used turnout and voting data aggregated by precincts, counties, and states. Recently there has been a return to this kind of data, but often disaggregated in the form of voter registration lists from administrative data. These lists do not report election choices, but they are the official record of turnout and in some states they include political party registration. Brady & McNulty (2011) geo-code the addresses and precinct locations of millions of registered voters in Los Angeles to take advantage of a natural experiment in 2003 where the number of precincts was reduced by two-thirds for the state-wide recall election. They show that changes in polling place location alone had a significant impact on turnout (a few percentage points) and that increased distance to polling place further decreased voting. Using voting records over time (from 1998 to 2012) and data on the residential addresses of 9/11 victims, Hersh (2013) shows that the families and neighbors of these victims voted at significantly higher rates (a few percentage points) after the event than carefully constructed control groups, and they changed their party identification toward the Republican party. Using voter registration files for the city of Chicago, Enos (2016) examines the impact of perceived racial threat on voter turnout by using a natural experiment in which public housing buildings with over 25,000 African American residents were demolished. He categorizes each voter's race using a Bayesian classifier based on the voter's name, location, and related census data. He finds that white voters' turnout decreased by 10 percentage points after the exit of their African American neighbors presumably reduced their perceived sense of threat. Ansolabehere & Hersh (2012) use 50-state voter registration records from a commercial firm, Catalist, LLC, to match individuals interviewed in the 2008 Cooperative Congressional Election Survey to their voting records to determine the correlates of vote misreporting. They describe methods for ensuring the quality of matches and the quality of registration lists, and they find that the correlation between basic socioeconomic characteristics and voting is lower for validated voters than for self-reported voters.

The role of ideology and money in politics has been a long-standing concern of political scientists. Bonica (2013) starts with the classic FEC political-contributions data for the 1980–2010 congressional election cycles and develops a generalized item-response theory count model to estimate an ideal point model of the ideology of candidates and Political Action Committees that contribute money. In order to obtain usable results, he restricts the sample “to candidates who received money from 30 or more unique contributors and contributors that give to 30 or more unique candidates” (Bonica 2013, p. 298). The technique provides estimates for first-time candidates who have no roll-call records from which to estimate their political positions, and the author shows that using his ideological estimates for candidates provides only “a negligible reduction in predictive power of legislative voting behavior” (Bonica 2013, p. 308) compared to roll-call votes. In other papers he connects these data with contributions by doctors (Bonica et al. 2014) and lawyers (Bonica et al. 2016) by linking the contributions data set to listings of these professionals. He describes a massive database that uses candidate names as a key to combine campaign contribution data, legislative voting and bill sponsorship data, election data, and text “from bills and amendments, floor debates, candidate websites, and social media” (Bonica 2016, p. 14). This information is combined to get candidate ideology scores, and it can be used to study the impact of money in politics. In addition, Bonica (2016, p. 18) develops a three-stage process “for measuring preferences and expressed priorities across issue dimensions that combines topic modeling, ideal point estimation, and machine learning methods.” The topic model organizes the text into issue categories by using automated statistical methods described in more detail below.

Using lobbying reports available under the Lobbying Disclosure Act of 1995, Kim (2017) identifies firms that lobby on trade policy, and he links this information, using the names of firms, with databases such as Compustat and Orbis on the characteristics of firms. He adds to this all bills in Congress that had been lobbied, and information about tariffs and trade (Kim 2017, p. 10). By focusing on firms instead of industries, Kim shows that lobbying is firm-specific. In a related paper, lobbying data are combined with sponsorship data on congressional bills to show that, unlike electoral politics networks structured according to ideology, there are distinct “political communities in the lobbying network, which is organized according to industry interests and jurisdictional committee memberships” (Kim & Kunisky 2018).

Recent controversies over police behavior have led to major efforts to collect data on police stops (Pierson et al. 2017) and police use of force (Goff et al. 2016). Each study involves substantial linking across jurisdictions with idiosyncratic formats and definitions of variables. Both conclude that there are substantial racial disparities even after controlling for many relevant features of police encounters.

These examples illustrate several important features of studies using administrative data. Large-scale administrative data sets on voting, lobbying, campaign contributions, trade, tax, welfare, police reports, 311 calls, and many other areas often provide the (legally) definitive data on these activities, but the data sets can contain errors (Luks & Brady 2003). Moreover, in order to get a data set that represents different areas and that has enough cases for analysis, studies often require extensive linking of more people, organizations, or events across jurisdictions. Extensive linking often requires dealing with the problems of combining data with different formats and variables.

These administrative data studies also benefit from intensive linking, in which more data about individual people, organizations, or events are added, as in the work by Bonica and Kim. Brady et al. (2001, p. 226) show how state governments have greatly increased the value of their social program databases by linking across eight programmatic areas including Medicaid, foster care, food stamps, welfare, and other areas. Even with this linking, however, these data often lack useful ancillary information—unlike surveys, they do not automatically collect lists of socioeconomic characteristics such as education, income, age, and so forth on people or financial and historical information on firms or organizations. Moreover, even when this information is collected, it may be of low quality unless it is an essential part of the business purpose of the program (e.g., for welfare programs, income data are reliable because they are part of the application process, but education data are not). Intensive linking to other data sets can often expand their utility tremendously, but these matches are often precarious given the complexity of names, places, and other identifying information. Linkages using probabilistic matching techniques or geo-coding can help facilitate this process, but they still involve elements of uncertainty and incompleteness.

Administrative databases are also often better at providing samples of people who do or encounter things than at portraying the complete universe of those who might have done things. For example, data on police traffic stops tell us who was stopped but not who should have been stopped. Campaign contribution data tell us who gave money, but we know only the value of the numerator in the ratio of those who gave to those who could have given. One approach is to link these data to population data, such as census data or motor vehicle license data, but these linkages can present legal and practical problems (Brady et al. 2001), and they also may not give the best denominator data; for example, in the police-stops example, we want the number of people in each group who should have been stopped given their behavior, not the number of people in each group who drive.

Internet Data

Using proprietary data on over six million Facebook users who had two or more “likes” for 1,223 official political pages representing political candidates, Bond & Messing (2015) estimate candidate and individual ideologies. Because the average number of likes is slightly over three, the matrix of candidates by people is very sparse except for some rows (e.g., Barack Obama's and Mitt Romney's), necessitating steps to adjust for different base frequencies for liking candidates. For those candidates for whom there is an independent measure of ideology from Congress's roll-call data, the correlation between the two measures of ideology was 0.47 for Democrats and 0.42 for Republicans (Bond & Messing 2015, p. 68). Similarly, with a data set of Twitter users from six countries, Barberá (2015) identifies Twitter followers of three or more political actors and uses ideal-point estimation methods to recover the ideologies of the politicians and the Twitter users. Employing various sources of baseline data for each group, he finds evidence that validates these measures. He also finds evidence for political polarization among these Twitter users.

The Web makes it possible to follow events through time. Tinati et al. (2014) develop a tool for following Twitter information flows and network formation over time, and they apply it to a protest of university tuition fees in England in November 2011. They show how networks grow through retweets and that a small number of people are key players. Gomez-Rodriguez et al. (2012) show how information diffuses in 170 million blogs and news articles over a one-year period by developing an algorithm to infer networks of influence and diffusion. They show that the algorithm recovers the structure of simulated data, and it appears to work well with real data. News topics and memes can also be tracked on the Web to characterize a news cycle. By tracking 1.6 million media sites with 90 million articles over three months in 2008 (August–October), Leskovec et al. (2009) find that phrases come and go over 24 hours and that blogs pick up phrases with an average lag of 2.5 hours. Two mechanisms explain much of the up-and-down dynamics: imitation, in which memes persist because sources imitate other sources, and recency, in which older memes are extinguished because new phrases are preferred.

Using Facebook data, Bond et al. (2012) study whether social networks can affect behavior. They randomly assigned encouragements to vote and information about the person's polling place to millions of people on the day of the 2010 midterm election. The “social message group” of 60 million people were also shown up to six faces of their friends who had reported on Facebook that they had voted that day. The “informational message group” of over 600,000 people received only the encouragement to vote and information about their polling place. The “control group” did not receive any message. Those in the social message group were two percentage points more likely to say that they had voted than those in the informational message group, and other significant effects were found.

King et al. (2013) study the motivation of Chinese internet censorship by following the fate of blog posts over time. By comparing the content of those that were censored versus those that were not, they conclude that “the censorship program is aimed at curtailing collective action by silencing comments that represent, reinforce, or spur social mobilization, regardless of content” and that “posts with negative, even vitriolic, criticism of the state, its leaders, and its policies are not more likely to be censored” (King et al. 2013, p. 326). The study is notable for its real-time effort to locate blogs before they were censored (which typically occurred within one day) and its use of automated content analysis methods to analyze the blogs.

To estimate how racial animus affected the vote for Barack Obama in 2008, Stephens-Davidowitz (2014) calculates for media markets the fraction of Google searches that use a well-known derogatory term for African Americans. He finds that racial animus cost Obama roughly 4% of the national popular vote. His paper provides numerous checks on the validity and reliability of his measures.

In addition to sharing many of the same problems as administrative data, internet data are typically highly selective in terms of socioeconomic characteristics (especially by having more young people, although older people are catching up), and they often depend on people's involvement with platforms such as Facebook, Twitter, or Google. Moreover, this involvement is enmeshed with constant efforts by the companies running these platforms to encourage participation, which can lead to subtle selection effects that may mislead the researcher (Lazer et al. 2014). Absence of data is also a problem, as in the studies estimating ideology using Facebook and Twitter data. The compensating merits are that internet data often provide fascinating network data that would otherwise be unavailable; events can be studied as they unfold in real time; and hidden information on behaviors (such as searches about culturally disapproved themes) can be revealed. Nagler & Tucker (2015) discuss what can be learned from Twitter.

Textual Data

Those of us who have put together teams of students to do content analysis of texts know how time consuming and error prone the process can be. Automated methods promise greater efficiency, increased replicability, and perhaps less error-prone coding. Textual data provides an element often missing in our analysis of politics: the words of citizens and politicians. For example, political scientists study the personal vote, in which citizens support politicians in exchange for government money spent in their districts. But how do citizens know about these expenditures? Grimmer et al. (2012) identify the missing ingredient, which is legislators' statements to their constituents. By analyzing all 170,000 US House of Representatives press releases issued between 2005 and 2010 and coding them into five categories that measure two kinds of credit-claiming and three kinds of non-credit-claiming behavior, they find that constituents are more responsive to the total number of messages they receive than the amount of in-district expenditure claimed. To analyze this large corpus of material, they used a supervised learning algorithm (Hopkins & King 2010) that requires a set of hand-coded documents that can be used to “train” the method.

Wilkerson & Casas (2017) and Grimmer & Stewart (2013) provide excellent overviews of the profusion of content analysis methods developed in the last 15 years. Two other articles explore how these methods can be used to study culture (Bail 2014) and to improve the practice of qualitative research (Wiedemann 2013). The methods include the search for particular words or phrases (e.g., Stephens-Davidowitz 2014, Leskovec et al. 2009); the determination of what fractions of text fit into predetermined categories (e.g., King et al. 2013, Grimmer et al. 2012); the classification of each text into predetermined categories using supervised learning; the classification of text into unknown categories using unsupervised clustering methods; and the ideological scaling of political texts such as party platforms (Laver et al. 2003).

These methods require careful use. Grimmer & Stewart (2013) advise, “all quantitative models of language are wrong—but some are useful” (p. 269), and “quantitative methods augment humans, not replace them” (p. 270), so “validate, validate, validate” (p. 271). In addition, the more the methods are automated or unsupervised, the more they typically use complex statistical methods: mixture models with many local minima, in which one cannot guarantee a globally correct solution; lasso or ridge regression, which strive for simplicity that might underfit the data; and models with many parameters that often try to estimate values for each document with small amounts of data. To perform these tasks, they often use estimation methods such as the expectation maximization (EM) algorithm or Bayesian Markov chain Monte Carlo (MCMC) that take a long time to converge and can be tricky to use (see Roberts et al. 2014). Despite all these complexities, the methods can accomplish tasks that could not be done with typical budgets and research teams. Text reduction and analysis have progressed to a point where quantifying large bodies of text is possible. Arguably, these methods improve on human coding if suitable precautions are taken to check the results with human coders and to recognize the limitations of the analysis.

Sensor, Audio, Video, and Other Data

Hsiang et al. (2011) connect sensor data (from gauges and satellite observations) on temperature and rainfall with information on conflict from the “Onset and Duration of Intrastate Conflict” data set to study the impact of weather on civil conflicts. They use the El Niño/Southern Oscillation (ENSO) in weather to identify their model, and they find that the probability of new civil conflicts doubles during El Niño years. The supplementary materials describe the complexities of linking geo-coded sensor data to the boundaries of individual countries over time.

Jennifer Eberhardt and her colleagues use body camera data from stops by police officers in Oakland, California, to uncover racial disparities in officer respect. Starting from human transcriptions and coding of the audio portion of these data, they develop machine learning methods for studying the degree of respect exhibited in the text of police utterances toward people they have stopped. They note: “Future research could expand body camera analysis beyond text to include information from the audio such as speech intonation and emotional prosody, and video, such as the citizen's facial expressions and body movement, offering even more insight into how interactions progress and can sometimes go awry” (Voigt et al. 2017, p. 6525).

These examples demonstrate the power of linking sensor, audio, video, and other kinds of data to events, but they also reveal the substantial processing that must be done to use them correctly. Moreover, they suggest that we still need to improve our ability to transform these data into usable forms for our research given, for example, the complexities of facial expressions or body language in a video and the modifiable areal unit problem in geography, which stems from the difficulty of matching geo-coded point-based measures from sensors to different geographic entities such as cities, counties, states, or nations.

NEW WAYS POLITICAL SCIENTISTS ORGANIZE THEIR WORK

New Courses

Political science professors must develop new courses and become conversant with the new technologies developed by data scientists. New courses should go in two directions. One course should deal with the societal challenges of big data and what they mean for politics. Mergel (2016) has developed a curriculum for schools of public affairs which contains some pertinent elements, including sections on big data in politics, government, public health, and smart cities, but it does not have a section on the media, and it does not directly focus on the political issues such as data ownership and use, privacy, and loss of jobs that stem from big data.

A second course must teach students data science methods. A check of methods courses taught in political science departments at major universities suggests that this is well under way. These courses include programming in R or Python, an emphasis on resampling approaches to understanding statistics, an overview of the data sources described above, and careful discussions of methods for making predictions and those for inferring causality. Moreover, at least one edited book (Alvarez 2016) summarizes a good selection of relevant topics.

Neither of these courses deals with deeper theoretical issues such as how our epistemological and ontological presuppositions might be affected by new methods, the new forms of connectedness in society, and the rise of artificial intelligence. One should be properly skeptical of such grand possibilities, but Rogers (2013), Mayer-Schönberger & Cukier (2014), Mosco (2014), Boullier (2015), and Salganik (2017) provide some food for thought about what will happen when we make “the world self-aware and self-describing” (Evans 2018, p. 141).

New Research Management Methods

A few political scientists working with Google, Facebook, or very large data sets might have to learn about big-data architecture and the new decentralized methods of processing large sets of data such as Hadoop, Hive, NoSQL, and Spark (Varian 2014, Oussous et al. 2018), but for most it would be a waste of time. Instead, political scientists might better focus on new software for data cleaning, data management, reproducible science, life-cycle management of data, and data visualization. Here I briefly discuss data cleaning and reproducible science.

A tweet (@BigDataBorat) parodies the common belief that data cleaning takes up most of the time in research by saying “In Data Science, 80% of time spent prepare data, 20% of time spent complain about need for prepare data.” Certainly data preparation is tedious and time-consuming (Kandel et al. 2012). DataWrangler (Kandel et al. 2011) displays data in an interactive interface like a spreadsheet and allows the researcher to make changes to one line of the data that are reproduced in all other lines of data based on the program's inferences about the general transformations that are desired. As the user interacts with the system, it improves its inferences and even makes suggestions so that it helps the researcher make improvements. The system keeps track of what has been done to the data so that the researcher can make sure it has been successful. A free version of it is available as Trifacta Wrangler. Another approach to cleaning data is the Tidyverse, which is a free collection of R programs that can be used to create a tidy dataset (Wickham 2014).

Reproducible science aims to make it possible for a second investigator to “recreate the final reported results of the project, including key quantitative findings, tables, and figures, given only a set of files and written instructions” (Kitzes et al. 2017, p. 13). Kitzes et al. (2017) exemplify reproducibility through 31 case studies in different scientific areas, including social science, with a focus on data acquisition, data processing, and data analysis. Most of the studies use tools from either Python (17 studies) or R (13) to create a reproducible workflow. Because these tools make it easier to obtain and to recreate research results, because journals are increasingly requiring reproducibility, and because the federal government has been moving toward requiring it for grantees, learning these methods is very worthwhile.

NEW KINDS OF QUESTIONS ASKED BY POLITICAL SCIENTISTS

Where Does Data Science Come From?

Data science methods primarily come from computer science, statistics, and library or information sciences with some roots in the efforts of biologists to model the connections among neurons in the human brain and the work of cognitive scientists (such as polymathic political scientist Herbert Simon) to develop artificial intelligence. The blending of these streams produces confusion because similar methods (e.g., neural nets and logistic regression) have been called by different names in these disciplinary areas, and the use of names such as artificial intelligence or neural nets can lead to the mistaken belief that these methods actually mimic the way the human brain works. In fact, most of the methods can be straightforwardly translated into the language of statistics (Sarle 1994, Warner & Misra 1996), and the connection with human intelligence is more metaphorical than exact. Some of this confusion also comes from the fact that until recently computer scientists were trying to solve pattern recognition problems and to advance predictive machine learning with the fewest errors without much knowledge of or concern with statistical models, while statisticians (especially econometricians and political methodologists) focused on unbiased or consistent estimators of models and hypothesis testing for causal impacts with little concern for prediction or learning. Information scientists were also trying to produce quick and efficient ways to index and access documents and knowledge with an emphasis on prediction and little concern for statistical methods or models.

Because of their emphasis on pattern recognition, computer scientists typically speak of assigning cases to classes based on their features (e.g., predicting whether someone could be classed as a diabetic based on body mass, age, serum insulin), whereas statisticians talk about predicting the value of a dependent variable based on independent variables or predictors, even though they are often dealing with the same problems. Computer scientists talk about activation functions, training sets, and learning, whereas statisticians talk about functional forms, samples, and estimation. In addition, computer scientists talk about supervised and unsupervised learning problems; the former refers to problems where there is information on the relevant classes (e.g., specimens already classified into separate species) and the latter refers to problems without this information. Supervised learning uses methods with a dependent variable such as discriminant analysis or logistic regression, whereas unsupervised learning uses clustering, factor analysis, or multidimensional scaling. Once the newcomer to the field of data science recognizes these differences in nomenclature, books on pattern recognition (Ripley 1995), artificial intelligence (Russell & Norvig 2009), machine learning (Bishop 2011), and statistical learning (Hastie et al. 2016) seem less arcane and more approachable. Newcomers can also benefit from articles that bridge the gaps (Nickerson & Rogers 2014, Varian 2014, Mullainathan & Spiess 2017, Yarkoni & Westfall 2017, Athey 2018).

Increased computing power has also accelerated the development of five innovations. First, the Bayesian paradigm is no longer an outcast in American statistics since the realization that many intractable classical models can be considered Bayesian models with vague priors and that these models can be estimated effectively and efficiently using MCMC and other methods. Second, smoothing or regularizing approaches that require the estimation of nonlinear ridge or lasso regressions or the repeated application of complicated kernel estimation methods have become feasible, providing greater flexibility in model specification. Third, resampling and averaging methods that improve predictions, such as the bootstrap, bagging, boosting, Bayesian model averaging, and random forests, have become commonplace because of computing power that allows repeated estimation using slightly different models or samples. Fourth, the Akaike, Bayesian, and Schwartz information criteria (AIC, BIC, SIC) and methods such as cross-validation are now commonly used to select a parsimonious model. Fifth, computational methods have been developed (e.g., EM and genetic algorithms, MCMC methods, back-propagation) to estimate models with complicated density mixtures, large numbers of parameters, multiple local maxima, and knotty nonlinearities and constraints. These innovations have greatly increased the flexibility and predictive power of statistical models.

One reason data science has become so popular is that one variant of machine learning, called deep learning, has succeeded at difficult pattern recognition tasks such as speech and image recognition, natural language processing, and bioinformatics (LeCun et al. 2015). Deep learning is a variant of the canonical feed-forward neural network, which involves multilayer classifiers that use stacks of logistic or similar regressions (Sarle 1994, Schmidhuber 2015) where the inputs are features of the items that are to be classified. For example, for animals being classified as either dogs or cats, the features might be large or not-large, bark or no-bark, meow or no-meow, docile or not docile, white or not-white, and tail or no-tail. These features are coded with a one if present and a minus one if not present. Some of these features are more useful for distinguishing between dogs and cats than others. For each animal for which we have data, M weighted linear combinations of these L features are calculated where the weights reflect the diagnostic value of the features. After each of these combinations is transformed by a sigmoid activation function such as a logistic, it constitutes a hidden-layer variable, also called a neuron. The first hidden layer contains M of these hidden-layer variables employing different weighted linear combinations of the input variables. The results of these hidden-layer variables in this first hidden level are then either combined into another weighted linear combination and transformed according to the sigmoid function to decide whether the animal is a dog or a cat (with, for example, values near one indicating a dog and values near zero indicating a cat), or a second hidden level of N variables is created that takes weighted linear combinations of the M hidden-layer variables in the first hidden layer. This process can continue with more and more hidden layers until the final sigmoid function is reached that predicts whether the animal is a dog or cat. The model is evaluated on whether it gets the right answer most of the time.

The model is successful when it has the right weights so that it correctly separates the dogs from the cats. For example, a large, docile creature that barks is almost certainly not a cat, so the weights on those characteristics should be large and positive to produce a value near one (indicating a dog) in the sigmoid function, but the weights on having a tail or being white should be near zero since they are not very diagnostic features. The weight on having a meow should be negative. To make the models work, there must be enough hidden layers and hidden variables to provide the flexibility needed to fit all possible permutations of dog and cat features, and there must be efficient learning algorithms to identify the right weights so that the difficult cases are correctly classified. Shallow machine learning models have just a few hidden layers, and those with no hidden layers are called perceptrons. Deep machine learning models have many hidden layers. The overall complexity of the model depends on the number of hidden layers and the number of hidden variables or neurons.

We have known for over 25 years that systems with at least one hidden layer are universal approximators (White 1992) that can, with relatively arbitrary activation functions, approximate to any degree nonlinear continuous functions as long as there are enough neurons (hidden independent variables) in the model. Once it is clear that machine learning is simply a novel method for fitting (complicated) curves, it becomes less magical, but some mysteries remain. Why does deep learning work with a total number of weights and variables that seems far short of what would be necessary to approximate all of the possible curves? Why do models with many hidden layers sometimes do so much better than those with just one, especially since only one layer is needed for a universal approximator? How can we interpret the complex pattern of weights yielded by deep learning models? These questions have led to speculations that deep learning works because its layers can match the kinds of physical constraints that exist in the real world (Lin et al. 2017), and this speculation evokes a famous paper by the physicist Eugene Wigner (1960) titled “The Unreasonable Effectiveness of Mathematics in the Natural Sciences.” Whatever the reason, deep learning methods seem to work remarkably well for pattern recognition problems, but their interpretation is often difficult given their arcane complexity. They are better at yielding predictions than explanatory insights.

What Kinds of Problems Can Data Science Solve?

There is so much hyperbole about big data and data science that one might think that we have either solved or obviated four of the most basic problems of empirical research: (a) forming concepts and providing measures of them; (b) providing reliable descriptive inferences; (c) making causal inferences from past experience; and (d) making predictions about the future. Data science has, in fact, made some contributions to solving each of them, especially forming concepts and making predictions about the future, but they continue to be fundamental and difficult problems (Smith 2018). Let us consider each in turn.

Artificial intelligence researchers have used unsupervised machine learning methods so that computers learn concepts in much the same way as political scientists have historically used factor or cluster analysis to identify concepts, as in the study of texts described above. One of the most informative studies of concept formation (Thagard 1992) used artificial intelligence models to understand “conceptual revolutions” in science. Machine learning excels at finding patterns, so it can be helpful in concept formation, but the basic problems of the interplay between defining concepts inductively or deductively, phenomenologically or ontologically, and pragmatically or theoretically remain. We do have some better tools to deal with them, such as model-based clustering techniques (e.g., Ahlquist & Breunig 2012) that allow for the evaluation of uncertainty in typologies, but concepts such as an atom, species, democracy, or topic are still very deep ideas based on a complicated interplay between theory and data that goes beyond mere pattern detection—and that is why conceptual revolutions in science (e.g., quantum theory, plate tectonics, evolution, relativity theory, or topic analysis) are such a big deal. They reflect a gestalt change in the way we see the world. It is also why users of these methods must proceed carefully, as pointed out in the discussion about analyzing texts and topics.

Data science methods can help us to explore and describe data, to find interesting patterns in them, and to display them effectively. The use of big data helps us with descriptive inferences because it often provides a complete list of arrests, registered voters, food stamps recipients, etc., but the problem of defining the proper universe remains, since we may care about crimes, potential voters, or those eligible for food stamps, respectively. Moreover, internet samples are especially problematic because it is hard to define what universe they represent and how they were sampled from that universe. Having a lot of data does not ensure that they represent in a statistically reliable way (e.g., a random sample) an interesting and definable universe.

Perhaps most interesting, and perhaps worrisome, is the degree to which some advocates of data science have ignored or even rejected the need for causal inferences and fastened upon a narrow notion of statistical prediction. There are three sources of this inclination. The first is the idea that the availability of lots of data (either many cases or many variables) automatically solves the inference problem, which is, of course, false. Inference requires that we choose cases in the right way (e.g., a random sample) and that available variables include the actual cause and allow us to control for the right things to avoid spurious correlations (see Lazer et al. 2014, Titiunik 2015). The second source is the idea that machine learning, perhaps especially deep learning, yields insights that would otherwise be buried. That idea founders on questions about whether deep learning is actually providing insights or just fitting curves. Cukier & Mayer-Schoenberger (2013, pp. 32, 39) seem to capture both of these naïve ideas when they say that “[a] worldview built on the importance of causation is being challenged by a preponderance of correlations” and “[w]e can learn from a large body of information things we could not comprehend when we used only smaller amounts.” The third and more defensible notion is that making reliable causal inferences is so hard that we should focus on prediction. This idea led to vector autoregression methods in macroeconomics (Sims 1980, Christiano 2012) 40 years ago, and it is at the core of many textbooks on machine learning. Breiman (2001) presents an elegant, early argument for this approach; Berk (2008) provides a thoughtful book-length treatment; and Shmueli (2010) discusses the trade-offs.

There are certainly practical and technical problems for which achieving a good prediction using machine or statistical learning is a satisfactory, and perhaps optimal, solution. Kleinberg et al. (2015) give an example involving decisions about hip or knee surgery where the surgeries only make sense if the patients live long enough to get through their typically lengthy rehabilitation periods. Yarkoni & Westfall (2017) provide examples from psychology, such as inferring the “big five” personality traits from the “likes” on Facebook pages and inferring the accuracy of people's memories about faces from fMRI data. Nickerson & Rogers (2014) show how predictive scores regarding campaign contributions or voting turnout can be used to increase the efficiency of campaigns. In research problems, good predictive methods can assure acceptable covariate balance in matching methods, high-quality classification of documents according to some characteristic, accurate imputation for missing values, good fits for curves in regression discontinuity designs, powerful instruments for instrumental variables estimation, and so forth.

These methods rely on situations where, in the language of econometrics, reduced form equations solve a problem either because there are no (or only small) structural changes in the mechanism producing outcomes or because the best fit is really the ultimate goal. But social scientists have known at least since the classic work on supply and demand that getting at causal mechanisms requires that statistical methods take into account the identification of structural or behavioral models. The positive correlations between police presence and crime, between higher quantities of a good and higher prices, and between greater education and higher income do not necessarily mean that more police cause more crime, greater quantities of a good create higher prices, or even that more education produces more income. The current emphasis on experiments and quasi-experiments attempts to ensure better identification of these causal effects, and Athey (2018, pp. 21, 22), in a paper that predicts many ways in which machine learning can help improve causal estimation in economics, unequivocally predicts “no fundamental changes to theory of identification of causal effects” and “no obvious benefit from ML in terms [of] thinking about identification issues.” That is the conclusion of a political science symposium on big data (Clark & Golder 2015), and I concur based on my understanding of causality (Brady 2009).

At the same time, political scientists need to think harder about how to combine information about causal mechanisms from strongly identified research designs (such as experiments or quasi-experiments) with sophisticated prediction methods and formal modeling to improve our ability to make projections about the future. These projections should take into account behavioral responses, heterogeneity in causal impacts, and general equilibrium effects that occur when policies are scaled up from a small experiment. This requires combining models, causal estimates, and predictions in ways envisioned by the Empirical Implications of Theoretical Models movement (Granato & Scioli 2004) and in ways undertaken by economists who joined vector autoregressions with concerns about causal mechanisms and macroeconomic models (Christiano 2012). Athey (2018) discusses some ways to do this, and perhaps her most important claim is that data science methods make it possible to develop better systematic model selection methods based on the data instead of specification searches that often involve multiple estimations and repetitive parsing of models until one model is presented, somewhat disingenuously, as “the model.” Data scientists and statisticians are also considering trading off model complexity versus parsimony as both the sample size and the number of available variables increase (Powell 2017). Data science methods now make possible data-driven model selection using cross-validation and other approaches, estimation and averaging over many models, and accounting for model uncertainty as well as data uncertainty.

Data science currently provides many useful tools for political scientists, but their primary contribution is to provide for automated pattern recognition and better methods for prediction. Much more work has to be done before we can confidently use models to project into the future.

DEALING WITH ETHICAL ISSUES REGARDING POLITICAL SCIENCE RESEARCH

A separate article could be written about the ethical issues related to big data and data science. One contentious issue is the possibility of algorithmic injustice (Noble 2018), especially in the field of criminal justice. A number of writers (Harcourt 2007, Mbadiwe 2018, Williams et al. 2018) have worried that algorithms used to assign bail, decide on sentences, or place prisoners in various levels of detention rely on predictions that are not causal, that reproduce stereotypes, and that exacerbate racial biases. The result will be the reinforcement of existing forms of discrimination. But the problem is not easy, and “there is tension between improving public safety and satisfying the prevailing notions of algorithmic fairness” (Corbett-Davies et al. 2017, p. 797). To take another area, political campaign algorithms try to mobilize those voters who can be brought to the polls at least cost per vote, but this typically means that underrepresented voters become even more underrepresented because it costs more to mobilize them (Brady et al. 1999).

Athey (2018) notes that predictive algorithms can not only be unfair but may also be manipulable. For example, if someone knows that credit scores are improved when people shop at certain stores, they may shop at those stores to increase their scores. The political and normative implications of these ethical issues must be studied by political scientists and taken into account when designing algorithms.

CONCLUSIONS

Big data and data science provide extraordinary new sources of data and methods for doing research. They are also changing the world in ways that spawn new kinds of political issues. They broaden the kind of quantitative work that can be done, and they bring political scientists into the middle of societal events in new ways through work on political campaigns, on the impacts of the media, on the operation of cities, on terrorism and cyberwarfare, on the design of voting and political systems, and many other areas. As this happens, political scientists will certainly do more and better research, but they will also have to think about the intellectual and practical value of their role as system designers when they find themselves or their work used to create new policies or social mechanisms. Just as engineers, lawyers, and increasingly economists use their knowledge about society to design social institutions, political scientists are now developing the tools to redesign political systems. How will this role be valued in the academy? What ethical and intellectual issues does it raise? From my perspective, becoming involved in developing new policies and social mechanisms would be a useful turn back toward the “policy sciences” advocated by Harold Lasswell (1951; see also Turnbull 2008), but political scientists will undoubtedly find themselves taking on new roles that will require debate and discussion within the profession.

disclosure statement

The author is not aware of any affiliations, memberships, funding, or financial holdings that might be perceived as affecting the objectivity of this review.

acknowledgments

My thanks to Karen Chapple, Avi Feller, and Anno Saxenian for very helpful comments.

literature cited

  • 1.
    Ahlquist JA, Breunig C. 2012. Model-based clustering and typologies in the social sciences. Political Anal. 20(1): 92–112
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Machine Learning for Social Science: An Agnostic Approach

      Justin Grimmer,1 Margaret E. Roberts,2 and Brandon M. Stewart31Department of Political Science and Hoover Institution, Stanford University, Stanford, California 94305, USA; email: [email protected]2Department of Political Science and Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, California 92093, USA; email: [email protected]3Department of Sociology and Office of Population Research, Princeton University, Princeton, New Jersey 08540, USA; email: [email protected]
      Annual Review of Political Science Vol. 24: 395 - 419
      • ...Clustering methods have been applied in political science to study the types of democracies (Ahlquist & Breunig 2012)...
      • ...For example, Ahlquist & Breunig (2012) apply model-based clustering methods to the varieties of capitalism data and find a different organization than the original authors uncovered. Ahlquist & Breunig (2012)...
      • ... apply model-based clustering methods to the varieties of capitalism data and find a different organization than the original authors uncovered. Ahlquist & Breunig (2012) interpret this to imply that the empirical foundation for varieties of capitalism may require empirical revision....

  • 2.
    Albus JS. 1984. Robots and the economy. Futurist 18(6): 38–44
    • Google Scholar
    Article Location
  • 3.
    Alvarez RM, ed. 2016. Computational Social Science: Discovery and Prediction (Analytical Methods for Social Research). Cambridge, UK: Cambridge Univ. Press
    • Crossref
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Large-Scale Computerized Text Analysis in Political Science: Opportunities and Challenges

      John Wilkerson and Andreu CasasDepartment of Political Science, University of Washington, Seattle, Washington 98195; email: [email protected]
      Annual Review of Political Science Vol. 20: 529 - 544
      • ...but a revolution has occurred that is creating unprecedented research opportunities (Cardie & Wilkerson 2008, Monroe & Schrodt 2008, Alvarez 2016)....

  • 4.
    Ansolabehere S, Hersh E. 2012. Validation: what big data reveal about survey misreporting and the real electorate. Political Anal. 20(4): 437–59
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Web-Based Enrollment and Other Types of Self-Selection in Surveys and Studies: Consequences for Generalizability

      Niels Keiding1 and Thomas A. Louis21Department of Biostatistics, University of Copenhagen, Copenhagen DK-1014, Denmark; email: [email protected]2Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland 21205, USA; email: [email protected]
      Annual Review of Statistics and Its Application Vol. 5: 25 - 47
      • ...Ansolabehere & Hersh (2012) reported on their sophisticated and careful analyses of the discrepancies between actual and survey-reported voting behavior in the United States, ...
    • Labor Unions, Political Representation, and Economic Inequality

      John S. AhlquistSchool of Global Policy and Strategy, University of California San Diego, La Jolla, California 92093; email: [email protected]
      Annual Review of Political Science Vol. 20: 409 - 432
      • ...which can lead to erroneous descriptions of the electorate (Karp & Brockington 2005, Ansolabehere & Hersh 2012)....
    • Cooperative Survey Research

      Stephen Ansolabehere1 and Douglas Rivers21Department of Government, Harvard University, Cambridge, Massachusetts 02138; email: [email protected]2Department of Political Science, Stanford University, Stanford, California 94305; email: [email protected]
      Annual Review of Political Science Vol. 16: 307 - 329
      • ...the CCES replicates the regression models from the CPS voter supplement and for validated registration and vote (Ansolabehere & Hersh 2012)....
      • ...and regression coefficients in modeling the vote choice of the American electorate. Ansolabehere & Hersh (2012)...
      • ...but the differences are much smaller than studies based on self-reports lead us to believe (Ansolabehere & Hersh 2012)....

  • 5.
    Athey S. 2018. The impact of machine learning on economics. Draft chapter, Natl. Bur. Econ. Res., Cambridge, MA. http://www.nber.org/chapters/c14009.pdf
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
    • Article Location
    • Article Location
  • 6.
    Atkins DE, Droegemeier KK, Feldman SI, Garcia-Molina H, Klein M, et al. 2003. Revolutionizing science and engineering through cyberinfrastructure: report of the National Science Foundation blue-ribbon advisory panel on cyberinfrastructure. Rep. Natl. Sci. Found., Washington, DC. https://stewardshipgap.net/node/17
    • Google Scholar
    Article Location
  • 7.
    Bail CA. 2014. The cultural environment: measuring culture with big data. Theory Soc. 43(3/4): 465–82
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Rethinking Culture and Cognition

      Karen A. Cerulo,1, Vanina Leschziner,2, and Hana Shepherd1,1Department of Sociology, Rutgers University, New Brunswick, New Jersey 08901, USA; email: [email protected], [email protected]2Department of Sociology, University of Toronto, Toronto, ON M5S 2J4, Canada; email: [email protected]
      Annual Review of Sociology Vol. 47: 63 - 85
      • ...Another contribution involves aggregating attitude data to detect shared ways of organizing beliefs at the supra-individual level (Boutyline 2017, Goldberg 2011). Bail (2014)...
    • Computational Social Science and Sociology

      Achim Edelmann,1,2 Tom Wolff,3 Danielle Montagne,3 and Christopher A. Bail31Institute of Sociology, University of Bern, 3012 Bern, Switzerland; email: [email protected]2Department of Sociology, London School of Economics and Political Science, London WC2A 2AE, United Kingdom3Department of Sociology, Duke University, Durham, North Carolina 27708, USA; email: [email protected]
      Annual Review of Sociology Vol. 46: 61 - 81
      • ...these new digital sources often provide rich detail about the evolution of social relationships across large populations as they unfold (Bail 2014, Golder & Macy 2011, Lazer et al. 2009, Salganik 2018)....
      • ...Although earlier reviews have examined the growth of new data sources or methods within the field of computational social science (Bail 2014, Evans & Aceves 2016, Golder & Macy 2014, Molina & Garip 2019), ...
    • Machine Learning for Sociology

      Mario Molina and Filiz GaripDepartment of Sociology, Cornell University, Ithaca, New York 14853, USA; email: [email protected], [email protected]
      Annual Review of Sociology Vol. 45: 27 - 45
      • ...readers are directed to articles by Bail (2014), Blei (2012), Evans & Aceves (2016), Grimmer & Stewart (2013), ...
    • Machine Translation: Mining Text for Social Theory

      James A. Evans and Pedro AcevesDepartment of Sociology, University of Chicago, Chicago, Illinois 60637; email: [email protected]
      Annual Review of Sociology Vol. 42: 21 - 50
      • ...Bail (2014) investigated how advocacy organizations for organ donation produced different discourses in their appeal to multiple audiences....
      • ...and big data companies use to deliver actionable insight to all sectors of the knowledge economy (Bail 2014, Golder & Macy 2014)....
    • Nationalism in Settled Times

      Bart BonikowskiDepartment of Sociology, Harvard University, Cambridge, Massachusetts 02138; email: [email protected]
      Annual Review of Sociology Vol. 42: 427 - 449
      • ...A potentially promising alternative is to take advantage of the unprecedented volumes of digitized text produced through online interaction and routine institutional practices (Bail 2014)....

  • 8.
    Barberá P. 2015. Birds of the same feather tweet together: Bayesian ideal point estimation using Twitter data. Political Anal. 23: 76–91
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Data ex Machina: Introduction to Big Data

      David Lazer1,2 and Jason Radford1,31Department of Political Science and College of Computer and Information Science, Northeastern University, Boston, Massachusetts 02115; email: [email protected], [email protected]2Institute for Quantitative Social Science, Harvard University, Cambridge, Massachusetts 021383Department of Sociology, University of Chicago, Chicago, Illinois 60637
      Annual Review of Sociology Vol. 43: 19 - 39
      • ...and floor speeches (Barberá 2015, Bonica 2014, Gentzkow et al. 2016, Tsur et al. 2015)....
    • Large-Scale Computerized Text Analysis in Political Science: Opportunities and Challenges

      John Wilkerson and Andreu CasasDepartment of Political Science, University of Washington, Seattle, Washington 98195; email: [email protected]
      Annual Review of Political Science Vol. 20: 529 - 544
      • .... Barbera (2015) uses Twitter data and information about posters’ followers to estimate the ideological positions of politicians, ...

  • 9.
    Beachy SH, Olson S, Berger AC. 2015. Genomics-Enabled Learning Health Care Systems: Gathering and Using Genomic Information to Improve Patient Care and Research: Workshop Summary. Washington, DC: Natl. Acad. Press
    • Google Scholar
    Article Location
  • 10.
    Bennett WL, Segerberg A. 2012. The logic of connective action. Inf. Commun. Soc. 15(5): 739–68
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Resilience to Online Censorship

      Margaret E. RobertsDepartment of Political Science, University of California, San Diego, La Jolla, California 92093-0521, USA; email: [email protected]
      Annual Review of Political Science Vol. 23: 401 - 419
      • ...allowing for quicker incorporation of a broader population (Bennett & Segerberg 2012), ...

  • 11.
    Berk RA. 2008. Statistical Learning from a Regression Perspective. New York: Springer
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Forecasting Methods in Crime and Justice

      Richard BerkDepartment of Statistics, Department of Criminology, University of Pennsylvania, Philadelphia, Pennsylvania 19104; email: [email protected]
      Annual Review of Law and Social Science Vol. 4: 219 - 238
      • ...An overview of such methods can be found in Berk (2008, ...
      • ...support vector machines, and random forests (Berk 2008, Hastie et al. 2001)....

  • 12.
    Berman F, Brady H. 2005. Workshop on cyberinfrastructure for the social and behavioral sciences: final report. Rep., Natl. Sci. Found., Alexandria, VA. https://www.sdsc.edu/assets/docs/SBE-CISE-FINAL.pdf. Accessed Dec. 2, 2018
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
  • 13.
    Bishop CM. 2011. Pattern Recognition and Machine Learning. New York: Springer
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
  • 14.
    Bohn R, Short J. 2012. Measuring consumer information. Int. J. Commun. 6: 980–1000
    • Web of Science ®
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
  • 15.
    Bond RM, Fariss CJ, Jones JJ, Kramer AD, Marlow C, et al. 2012. A 61-milllion-person experiment in social influence and political mobilization. Nature 489(7415): 295–98
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Political Effects of the Internet and Social Media

      Ekaterina Zhuravskaya,1 Maria Petrova,2,3,4,5,6 and Ruben Enikolopov3,2,4,5,61Paris School of Economics, École des Hautes Études en Sciences Sociales, 75014 Paris, France; email: [email protected]2Department of Economics and Business, Universitat Pompeu Fabra, 08002 Barcelona, Spain3New Economic School, Moscow 121353, Russia4Institute of Political Economy and Governance, 08005 Barcelona, Spain5Graduate School of Economics, 08005 Barcelona, Spain6Catalan Institute for Research and Advanced Studies (ICREA), 08010 Barcelona, Spain
      Annual Review of Economics Vol. 12: 415 - 438
      • ...A few papers examine how social media affect turnout and voting outcomes. Bond et al. (2012) conducted a field experiment on Facebook, ...
      • ...He hypothesizes that the positive effect of Twitter on turnout could be a consequence of peer pressure at the time of the elections (consistent with the findings of Bond et al. 2012, Jones et al. 2017), ...
      • ...Coupled with evidence on the ability of peer pressure on social media to change people's participation in voting (Bond et al. 2012), ...
    • Resilience to Online Censorship

      Margaret E. RobertsDepartment of Political Science, University of California, San Diego, La Jolla, California 92093-0521, USA; email: [email protected]
      Annual Review of Political Science Vol. 23: 401 - 419
      • ...such as Facebook experiments (Bond et al. 2012, Jones et al. 2017), ...
      • ...Evidence exists that in uncensored social media contexts, social pressure can impact information seeking. Bond et al. (2012)...
    • Social Media Elements, Ecologies, and Effects

      Joseph B. Bayer,1 Penny Triệu,2 and Nicole B. Ellison21School of Communication, The Ohio State University, Columbus, Ohio 43210, USA; email: [email protected]2School of Information, University of Michigan, Ann Arbor, Michigan 48109, USA; email: [email protected], [email protected]
      Annual Review of Psychology Vol. 71: 471 - 497
      • ...many psychologists embedded in tech companies now have the ability to manipulate the daily experiences of users and access highly sensitive user data (Bond et al. 2012)....
    • Social Mobilization

      Todd Rogers,1 Noah J. Goldstein,2 and Craig R. Fox21John F. Kennedy School of Government, Harvard University, Cambridge, Massachusetts 02138; email: [email protected]2Anderson School of Management, University of California, Los Angeles, California 90095; email: [email protected], [email protected]
      Annual Review of Psychology Vol. 69: 357 - 381
      • ...Facebook users were randomly assigned to one of two treatment arms or an untreated control group (Bond et al. 2012)....
    • Data ex Machina: Introduction to Big Data

      David Lazer1,2 and Jason Radford1,31Department of Political Science and College of Computer and Information Science, Northeastern University, Boston, Massachusetts 02115; email: [email protected], [email protected]2Institute for Quantitative Social Science, Harvard University, Cambridge, Massachusetts 021383Department of Sociology, University of Chicago, Chicago, Illinois 60637
      Annual Review of Sociology Vol. 43: 19 - 39
      • ...Facebook has been the setting for several large-scale field experiments using randomized manipulations of what people could see about their peers to study social influence (Bond et al. 2012, Kramer et al. 2014)....
    • Studying the Digital: Directions and Challenges for Digital Methods

      Keith N. HamptonDepartment of Media and Information, Michigan State University, East Lansing, Michigan 48824; email: [email protected]
      Annual Review of Sociology Vol. 43: 167 - 188
      • ...p = 0.02) more likely to vote (Bond et al. 2012)....
    • Field Experiments Across the Social Sciences

      Delia Baldassarri1 and Maria Abascal21Department of Sociology, New York University, New York, New York 10012; email: [email protected]2Department of Sociology, Columbia University, New York, New York 10027; email: [email protected]
      Annual Review of Sociology Vol. 43: 41 - 73
      • ...scholars have established that social influence operates across a variety of domains. Bond et al. (2012) carried out a GOTV experiment on 61 million Facebook users and found that seeing the faces of friends who claimed to have voted in a congressional election increased the likelihood of voting, ...
    • An Appraisal of Social Network Theory and Analysis as Applied to Public Health: Challenges and Opportunities

      Thomas W. Valente and Stephanie R. PittsInstitute for Prevention Research, Department of Preventive Medicine, School of Medicine, University of Southern California, Los Angeles, California 90034; email: [email protected]
      Annual Review of Public Health Vol. 38: 103 - 118
      • ... or by manipulating network conditions in online environments as others have done (5, 11)....
      • ...and some experiments have been conducted to show that information and behaviors can be spread online (5, 13, 14, 34)....
    • Electoral Rules, Mobilization, and Turnout

      Gary W. CoxDepartment of Political Science, Stanford University, Stanford, California 94305-6044; email: [email protected]
      Annual Review of Political Science Vol. 18: 49 - 68
      • ... and randomized field experiments (Bond et al. 2012) suggest that secondary mobilization effects are three to five times larger than primary effects....
      • ...the “social message increased turnout directly by about 60,000 voters and indirectly through social contagion by another 280,000 voters” (Bond et al. 2012, ...
    • How Do Campaigns Matter?

      Gary C. JacobsonDepartment of Political Science, University of California San Diego, La Jolla, California 92093; email: [email protected]
      Annual Review of Political Science Vol. 18: 31 - 47
      • ...A rather different experiment on spillover effects has been reported by Bond et al. (2012)....
      • ...and the authors estimate that it produced an additional 282,000 validated votes nationwide (Bond et al. 2012)....
    • Internet Research in Psychology

      Samuel D. Gosling1 and Winter Mason21Department of Psychology, University of Texas, Austin, Texas 78712; email: [email protected]2Stevens Institute of Technology, Hoboken, New Jersey 07030; email: [email protected]
      Annual Review of Psychology Vol. 66: 877 - 902
      • ...researchers ran an experiment using a message that appeared in the newsfeed of 61 million Facebook users encouraging them to vote (Bond et al. 2012)....
      • ...so Bond et al. (2012) also looked at the effect of the message on the Facebook users' friends....
      • ...in which social information was used to encourage people to vote (Bond et al. 2012)....
    • Digital Footprints: Opportunities and Challenges for Online Social Research

      Scott A. Golder and Michael W. MacyDepartment of Sociology, Cornell University, Ithaca, New York 14853; email: [email protected], [email protected]
      Annual Review of Sociology Vol. 40: 129 - 152
      • ...Other online experiments have used existing websites rather than creating their own. Bond et al. (2012) tested the effects of social influence on voter turnout by manipulating whether Facebook users were exposed to information about the number of their friends who had voted....
      • ...but this was only possible because of the internal logging that takes place on messages that users write but ultimately choose not to post. Bond et al. (2012) isolated the effects of social influence from mass-media influence in increasing likelihood to vote by conducting a massive experiment on 61 million Facebook users....
      • ...as in the Facebook experiment by Bond et al. (2012) noted above....

  • 16.
    Bond R, Messing S. 2015. Quantifying social media's political space: estimating ideology from publicly revealed preferences on Facebook. Am. Political Sci. Rev. 109(1): 62–78
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
  • 17.
    Bonica A. 2013. Ideology and interests in the political marketplace. Am. J. Political Sci. 57(2): 294–311
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
    • Article Location
    More AR articles citing this reference

    • Machine Learning for Social Science: An Agnostic Approach

      Justin Grimmer,1 Margaret E. Roberts,2 and Brandon M. Stewart31Department of Political Science and Hoover Institution, Stanford University, Stanford, California 94305, USA; email: [email protected]2Department of Political Science and Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, California 92093, USA; email: [email protected]3Department of Sociology and Office of Population Research, Princeton University, Princeton, New Jersey 08540, USA; email: [email protected]
      Annual Review of Political Science Vol. 24: 395 - 419
      • ...Similar methods have been applied to study preferences using donation data (Bonica 2013), ...
    • From Mass Preferences to Policy

      Brandice Canes-WroneDepartment of Politics, Princeton University, Princeton, New Jersey 08544; email: [email protected]
      Annual Review of Political Science Vol. 18: 147 - 165
      • ...but individual contributors do (e.g., McCarty et al. 2006, Barber 2013, Bonica 2013)....
      • ...This finding on PACs is consistent with prior work that suggests PACs are less partisan and more ideologically moderate than the major parties (e.g., McCarty et al. 2006, Bonica 2013)....
    • Measuring Policy Positions in Political Space

      Michael LaverDepartment of Political Science, New York University, New York, New York 10003; email: [email protected]
      Annual Review of Political Science Vol. 17: 207 - 223
      • ...A promising new avenue of research has been opened up by Bonica (2013), ...

  • 18.
    Bonica A. 2016. A data-driven voter guide for U.S. elections: adapting quantitative measures of the preferences and priorities of political elites to help votes learn about candidates. RSF Russell Sage Found. J. Soc. Sci. 2(7): 11–32
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
  • 19.
    Bonica A, Chilton A, Sen M. 2016. The political ideologies of American lawyers. J. Legal Analysis 8(2): 277–335
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
  • 20.
    Bonica A, Rosenthal H, Rothman DJ. 2014. The political polarization of physicians in the United States: an analysis of campaign contributions to federal elections, 1991 through 2012. JAMA Intern. Med. 174(8): 1308–17
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
  • 21.
    Boullier D. 2015. The social sciences and traces of big data: society, opinion, or vibrations? Rev. Française Sci. Politique 65(5–6): 71–93
    • Google Scholar
    Article Location
  • 22.
    boyd D, Crawford K. 2012. Critical questions for big data: provocations for a cultural, technological, and scholarly phenomenon. Inf. Commun. Soc. 15(5): 662–79
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Perspective on Data Science

      Roger D. Peng1 and Hilary S. Parker21Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland 21205, USA; email: [email protected]2Independent Consultant, San Francisco, California 94102, USA
      Annual Review of Statistics and Its Application Vol. 9: 1 - 20
      • ...similar basic principles should be considered (Goodyear et al. 2007, boyd & Crawford 2012)....
    • Ethical Machine Learning in Healthcare

      Irene Y. Chen,1 Emma Pierson,2 Sherri Rose,3 Shalmali Joshi,4 Kadija Ferryman,5 and Marzyeh Ghassemi1,61Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA; email: [email protected]2Microsoft Research, Cambridge, Massachusetts 02143, USA3Center for Health Policy and Center for Primary Care and Outcomes Research, Stanford University, Stanford, California 94305, USA4Vector Institute, Toronto, Ontario M5G 1M1, Canada5Department of Technology, Culture, and Society, Tandon School of Engineering, New York University, Brooklyn, New York 11201, USA6Institute for Medical and Evaluative Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
      Annual Review of Biomedical Data Science Vol. 4: 123 - 144
      • ...termed critical data studies, is from a social science perspective (9, 10), ...
    • Policing in the Era of Big Data

      Greg RidgewayDepartment of Criminology and Department of Statistics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA; email: [email protected]
      Annual Review of Criminology Vol. 1: 401 - 419
      • ...believing that we see a pattern in a data set when no pattern truly exists (boyd & Crawford 2012)....
    • The Datafication of Health

      Minna Ruckenstein1 and Natasha Dow Schüll21Consumer Society Research Centre, University of Helsinki, Helsinki 00014, Finland; email: [email protected]2Department of Media, Culture, and Communication, New York University, New York, NY 10003; email: [email protected]
      Annual Review of Anthropology Vol. 46: 261 - 278
      • ...p. 1673; see also Beer 2009, boyd & Crawford 2012, Lash 2007, Van Dijck 2014)....
    • From the National Surveillance State to the Cybersurveillance State

      Margaret HuSchool of Law, Washington and Lee University, Lexington, Virginia 24450; email: [email protected]
      Annual Review of Law and Social Science Vol. 13: 161 - 180
      • ...the combination of big data surveillance methods and biometric surveillance systems allows for the analysis of nearly all computer-generated human information (Ingram 2013, boyd & Crawford 2012)....
      • ...Big data has transformed both the public and private market into a political and information economy in which an individual can be reduced to a digital profile made up of data points that can be subjected to surveillance, analysis, and exploitation (boyd & Crawford 2012)...
    • Studying the Digital: Directions and Challenges for Digital Methods

      Keith N. HamptonDepartment of Media and Information, Michigan State University, East Lansing, Michigan 48824; email: [email protected]
      Annual Review of Sociology Vol. 43: 167 - 188
      • ...Big data researchers face similar negotiations to gain access to trace data from media companies (boyd & Crawford 2012)....
    • Data ex Machina: Introduction to Big Data

      David Lazer1,2 and Jason Radford1,31Department of Political Science and College of Computer and Information Science, Northeastern University, Boston, Massachusetts 02115; email: [email protected], [email protected]2Institute for Quantitative Social Science, Harvard University, Cambridge, Massachusetts 021383Department of Sociology, University of Chicago, Chicago, Illinois 60637
      Annual Review of Sociology Vol. 43: 19 - 39
      • ...The coming generation will witness a transformation of sociological theory through these improvements in our ability to observe dynamic social systems (boyd & Crawford 2012, Golder & Macy 2014)....
    • Toward a Sociology of Privacy

      Denise Anthony,1 Celeste Campos-Castillo,2 and Christine Horne31Department of Sociology, Dartmouth College, Hanover, New Hampshire 03755; email: [email protected]2Department of Sociology, University of Wisconsin, Milwaukee, Wisconsin 53211; email: [email protected]3Department of Sociology, Washington State University, Pullman, Washington 99163; email: [email protected]
      Annual Review of Sociology Vol. 43: 249 - 269
      • ...via so-called Big Data techniques (boyd & Crawford 2012) that sort them into categories....
    • Ecoinformatics (Big Data) for Agricultural Entomology: Pitfalls, Progress, and Promise

      Jay A. Rosenheim1,2, and Claudio Gratton3,1Department of Entomology and Nematology, University of California, Davis, California 95616; email: [email protected]2Center for Population Biology, University of California, Davis, California 956163Department of Entomology, University of Wisconsin, Madison, Wisconsin 53706
      Annual Review of Entomology Vol. 62: 399 - 417
      • ...errors in measurement, selection bias, and unexplained confounding factors can undermine interpretations (19, 125)....
    • Virtuality

      Bonnie NardiDepartment of Informatics, University of California, Irvine, California 92697-3440; email: [email protected]
      Annual Review of Anthropology Vol. 44: 15 - 31
      • ...These alterations threaten anthropology (see Boellstorff et al. 2012, boyd & Crawford 2012, Ang et al. 2013, Ekbia et al. 2014, Tufekci 2014)....
    • Citizen Science: A Tool for Integrating Studies of Human and Natural Systems

      Rhiannon Crain,1 Caren Cooper,1 and Janis L. Dickinson1,21Cornell Lab of Ornithology,2Department of Natural Resources, Cornell University, Ithaca, New York 14850; email: [email protected], [email protected], [email protected]
      Annual Review of Environment and Resources Vol. 39: 641 - 665
      • ...use of the electrical grid in the home, and increasingly by cars with computers (82)....
    • Digital Footprints: Opportunities and Challenges for Online Social Research

      Scott A. Golder and Michael W. MacyDepartment of Sociology, Cornell University, Ithaca, New York 14853; email: [email protected], [email protected]
      Annual Review of Sociology Vol. 40: 129 - 152
      • ...and stratification in the research community between a small elite that is well connected to social media companies and everyone else (boyd & Crawford 2012, Huberman 2012)....

  • 23.
    Brady HE. 2009. Causation and explanation in political science. In The Oxford Handbook of Political Science, ed. R Goodin, pp. 217–70. Oxford, UK: Oxford Univ. Press
    • Google Scholar
    Article Location
  • 24.
    Brady HE, Grand SA, Powell MA, Schink W. 2001. Access and confidentiality issues with administrative data. In Studies of Welfare Populations: Data Collection and Research Issues, ed. Natl. Res. Counc., pp. 220–74. Washington, DC: Natl. Acad. Press
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
    • Article Location
  • 25.
    Brady HE, McNulty JE. 2011. Turning out to vote: the costs of finding and getting to the polling place. Am. Political Sci. Rev. 105(1): 115–34
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
  • 26.
    Brady HE, Schlozman KL, Verba S. 1999. Prospecting for participants: rational expectations and the recruitment of political activists. Am. Political Sci. Rev. 93(1): 153–68
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Social Networks and Political Participation

      David E. CampbellDepartment of Political Science, University of Notre Dame, Notre Dame, Indiana 46556; email: [email protected]
      Annual Review of Political Science Vol. 16: 33 - 48
      • ...Brady et al. (1999) have carefully examined who is mobilized—essentially, who within a given social network is asked to engage in political activity....
    • A Life in Political Science

      Sidney VerbaDepartment of Government, Harvard University, Cambridge, Massachusetts 02138; email: [email protected]

      Annual Review of Political Science Vol. 14: i - xv
      • ...and another applying rational choice to political recruiting (Brady et al. 1999)....
    • Advocacy Organizations in the U.S. Political Process

      Kenneth T. Andrews1 andBob Edwards21Department of Sociology, University of North Carolina, Chapel Hill, North Carolina 27599-3210; email: [email protected] 2Department of Sociology, East Carolina University, Greenville, North Carolina 27858; email: [email protected]
      Annual Review of Sociology Vol. 30: 479 - 506
      • ...Arguments about the role of organizations in political socialization along with survey research showing the impact of organizations on individual political participation and political identities have provided a useful corrective to demand-side explanations that focus narrowly on social-psychological characteristics and motivations of individuals (Brady et al. 1999, Edwards & McCarthy 2004, Klandermans 1997, Rosenstone & Hansen 1993, Wilson 1973)....
      • ...thereby exacerbating the impact of privilege on patterns of organizational participation (Brady et al. 1999, Lofland 1996, ...
    • Volunteering

      John WilsonDepartment of Sociology, Duke University, Durham, North Carolina 27708; email: [email protected]
      Annual Review of Sociology Vol. 26: 215 - 240
      • ...Educated people are also more likely to be asked to volunteer (Brady et al 1999), ...
      • ...and trust makes it easier for us to step forward and donate our time (Brady et al 1999:162, ...
      • ...social ties increase the chances of being asked to volunteer (Brady et al 1999:158), ...

  • 27.
    Breiman L. 2001. Statistical modeling: the two cultures. Stat. Sci. 16(3): 199–231
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Recent Challenges in Actuarial Science

      Paul Embrechts and Mario V. WüthrichRiskLab, Department of Mathematics, ETH Zurich, Zurich, Switzerland, CH-8092; email: [email protected], [email protected]
      Annual Review of Statistics and Its Application Vol. 9: 119 - 140
      • ...Here we are reminded of the discussion of Breiman (2001), which lies at the heart of the conflict of choosing the best predictive algorithm versus the requirement of explainability....
    • Machine Learning for Sustainable Energy Systems

      Priya L. Donti1,2 and J. Zico Kolter1,31Computer Science Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA; email: [email protected]2Department of Engineering and Public Policy, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA3Bosch Center for Artificial Intelligence, Pittsburgh, Pennsylvania 15222, USA
      Annual Review of Environment and Resources Vol. 46:
      • ...The difference between these two fields is largely one of perspective (16), ...
    • Governance by Data

      Fleur JohnsFaculty of Law & Justice, University of New South Wales (UNSW) Sydney, New South Wales 2052, Australia; email: [email protected]
      Annual Review of Law and Social Science Vol. 17: 53 - 71
      • ...Statistics typically assumes that data are generated by a given stochastic data model and seeks to fill out the parameters of that model (Breiman 2001)....
    • Algorithms and Decision-Making in the Public Sector

      Karen Levy, Kyla E. Chasalow, and Sarah RileyDepartment of Information Science, Cornell University, Ithaca, New York 14853, USA; email: [email protected]
      Annual Review of Law and Social Science Vol. 17: 309 - 334
      • ...Those data are also processed in new ways: Machine learning techniques are characterized by a focus on developing models for prediction rather than explanation (Breiman 2001, Hofman et al. 2017, Kleinberg et al. 2015) and, ...
    • Machine Learning for Social Science: An Agnostic Approach

      Justin Grimmer,1 Margaret E. Roberts,2 and Brandon M. Stewart31Department of Political Science and Hoover Institution, Stanford University, Stanford, California 94305, USA; email: [email protected]2Department of Political Science and Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, California 92093, USA; email: [email protected]3Department of Sociology and Office of Population Research, Princeton University, Princeton, New Jersey 08540, USA; email: [email protected]
      Annual Review of Political Science Vol. 24: 395 - 419
      • ...We argue that machine learning is as much a culture defined by a distinct set of values and tools as it is a set of algorithms. Breiman (2001) made a similar point 20 years ago in his seminal piece “Statistical Modeling: The Two Cultures,” which drew a contrast between stochastic data-generating process modeling and algorithmic modeling cultures....
    • Flexible Models for Complex Data with Applications

      Christophe Ley,1 Slađana Babić,1,2 and Domien Craens11Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, B-9000 Belgium; email: [email protected]2Vlerick Business School, B-1210 Brussels, Belgium
      Annual Review of Statistics and Its Application Vol. 8: 369 - 391
      • ...An influential paper reflecting generally on the culture of data modeling via a stochastic model versus algorithmic modeling is that of Breiman (2001)....
    • Artificial Intelligence, Predictive Policing, and Risk Assessment for Law Enforcement

      Richard A. BerkDepartments of Statistics and Criminology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA; email: [email protected]
      Annual Review of Criminology Vol. 4: 209 - 237
      • ...Algorithms should not be confused with models because a range of errors can easily follow (Breiman 2001b)....
    • Machine Learning in Epidemiology and Health Outcomes Research

      Timothy L. Wiemken1 and Robert R. Kelley21Center for Health Outcomes Research, Saint Louis University, Saint Louis, Missouri 63104, USA; email: [email protected]2Department of Computer Science, Bellarmine University, Louisville, Kentucky 40205, USA; email: [email protected]
      Annual Review of Public Health Vol. 41: 21 - 36
      • ...These traditional statistical approaches are used in what is coined as the “data culture” (12)....
      • ...Most traditional modeling approaches are data focused and make various assumptions about the data used within the model (12)....
      • ...genetic epidemiology may require smaller data sets (e.g., <100 rows) (12)...
    • Big Data in Industrial-Organizational Psychology and Human Resource Management: Forward Progress for Organizational Research and Practice

      Frederick L. Oswald,1 Tara S. Behrend,2 Dan J. Putka,3 and Evan Sinar41Department of Psychological Sciences, Rice University, Houston, Texas 77005, USA; email: [email protected]2Department of Organizational Sciences and Communication, George Washington University, Washington, DC 20052, USA3Human Resources Research Organization, Alexandria, Virginia 22314, USA4BetterUp, Pittsburgh, Pennsylvania 15243, USA
      Annual Review of Organizational Psychology and Organizational Behavior Vol. 7: 505 - 533
      • ...This is exactly what distinguishes big data's algorithm-driven approach from the traditional model-driven approach (Breiman 2001b)....
    • Machine Learning Methods That Economists Should Know About

      Susan Athey1,2,3 and Guido W. Imbens1,2,3,41Graduate School of Business, Stanford University, Stanford, California 94305, USA; email: [email protected], [email protected]2Stanford Institute for Economic Policy Research, Stanford University, Stanford, California 94305, USA3National Bureau of Economic Research, Cambridge, Massachusetts 02138, USA4Department of Economics, Stanford University, Stanford, California 94305, USA
      Annual Review of Economics Vol. 11: 685 - 725
      • ...In the abstract of his provocative 2001 paper in Statistical Science, the Berkeley statistician Leo Breiman (2001b, ...
      • ...Breiman (2001b, p. 199) goes on to claim that,...
      • ...Breiman's (2001b) characterization no longer applies to the field of statistics....
    • Machine Learning for Sociology

      Mario Molina and Filiz GaripDepartment of Sociology, Cornell University, Ithaca, New York 14853, USA; email: [email protected], [email protected]
      Annual Review of Sociology Vol. 45: 27 - 45
      • ...Breiman (2001b) describes two cultures of statistical analysis: data modeling and algorithmic modeling....
    • Statistical Models of Key Components of Wildfire Risk

      Dexen D.Z. Xi,1 Stephen W. Taylor,2 Douglas G. Woolford,1 and C.B. Dean31Department of Statistical and Actuarial Sciences, University of Western Ontario, London, Ontario N6A 5B7, Canada; email: [email protected], [email protected]2Pacific Forestry Centre, Natural Resources Canada, Victoria, British Columbia V8Z 1M5, Canada; email: [email protected]3Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada; email: [email protected]
      Annual Review of Statistics and Its Application Vol. 6: 197 - 222
      • ...Breiman (2001) noted that the objective of a statistical analysis is to use data to make inferences, ...
    • Approximate Bayesian Computation

      Mark A. BeaumontSchool of Biological Sciences, University of Bristol, Bristol BS8 1TQ, United Kingdom; email: [email protected]
      Annual Review of Statistics and Its Application Vol. 6: 379 - 403
      • ...Simulations from implementations of generative models have increasingly been used to give training data sets for supervised machine learning purposes, potentially bridging the two cultures of Breiman (2001)....
    • Curriculum Guidelines for Undergraduate Programs in Data Science

      Richard D. De Veaux,1 Mahesh Agarwal,2 Maia Averett,3 Benjamin S. Baumer,4 Andrew Bray,5 Thomas C. Bressoud,6 Lance Bryant,7 Lei Z. Cheng,8 Amanda Francis,9 Robert Gould,10 Albert Y. Kim,11 Matt Kretchmar,12 Qin Lu,13 Ann Moskol,14 Deborah Nolan,15 Roberto Pelayo,16 Sean Raleigh,17 Ricky J. Sethi,18 Mutiara Sondjaja,19 Neelesh Tiruviluamala,20 Paul X. Uhlig,21 Talitha M. Washington,22 Curtis L. Wesley,23 David White,24 and Ping Ye251Department of Mathematics and Statistics, Williams College, Williamstown, Massachusetts 012672Department of Mathematics and Statistics, University of Michigan, Dearborn, Michigan 48128-24063Department of Mathematics and Computer Science, Mills College, Oakland, California 946134Department of Statistical & Data Sciences, Smith College, Northampton, Massachusetts 010635Department of Mathematics, Reed College, Portland, Oregon 972026Department of Mathematics and Computer Science, Denison University, Granville, Ohio 430237Department of Mathematics, Shippensburg University, Shippensburg, Pennsylvania 172578Department of Mathematics, Olivet Nazarene University, Bourbonnais, Illinois 609149Department of Mathematics, Brigham Young University, Provo, Utah 8460110Department of Statistics, University of California, Los Angeles, Los Angeles, California 90095-155411Department of Mathematics, Middlebury College, Middlebury, Vermont 0575312Department of Mathematics and Computer Science, Denison University, Granville, Ohio 4302313Department of Mathematics, Lafayette College, Easton, Pennsylvania 18042-178014Department of Mathematics and Computer Science, Rhode Island College, Providence, Rhode Island 0290815Department of Statistics, University of California, Berkeley, California 9472016Department of Mathematics, University of Hawaii, Hilo, Hawaii 96720-409117Department of Mathematics, Westminster College, Salt Lake City, Utah 8410518Department of Computer Science, Fitchburg State University, Fitchburg, Massachusetts 0142019Department of Mathematics, New York University, New York, New York 1001220Department of Mathematics, University of Southern California, Los Angeles, California 9008921Department of Mathematics, St. Mary's University, San Antonio, Texas 7822822Department of Mathematics, Howard University, Washington, DC 2005923Department of Mathematics, LeTourneau University, Longview, Texas 7560224Department of Mathematics and Computer Science, Denison University, Granville, Ohio 4302325Department of Mathematics, University of North Georgia, Oakwood, Georgia 30566
      Annual Review of Statistics and Its Application Vol. 4: 15 - 30
      • ...Breiman (2001) spoke of the two cultures of algorithmic (computational) and data (statistical) models (renamed “predictive” and “inferential,” respectively, ...
    • There Is Individualized Treatment. Why Not Individualized Inference?

      Keli Liu1 and Xiao-Li Meng21Department of Statistics, Stanford University, Stanford, California 94305; email: [email protected]2Department of Statistics, Harvard University, Cambridge, Massachusetts 02138; email: [email protected]
      Annual Review of Statistics and Its Application Vol. 3: 79 - 111
      • ...Readers are referred to Lehmann (1990), Breiman (2001), and Hansen & Yu (2001)...
    • Methods and Global Environmental Governance

      Kate O'Neill,1 Erika Weinthal,2 Kimberly R. Marion Suiseeya,2 Steven Bernstein,3 Avery Cohn,4 Michael W. Stone,5 and Benjamin Cashore51Department of Environmental Science, Policy, and Management, University of California at Berkeley, California 94720; email: [email protected]2Nicholas School of the Environment, Duke University, Durham, North Carolina 27708; email: [email protected], [email protected]3Department of Political Science, University of Toronto, Toronto, Canada M5S 3G3; email: [email protected]4National Center for Atmospheric Research, Boulder, Colorado 80385; email: [email protected]5School of Forestry and Environmental Studies, Yale University, New Haven, Connecticut, 06511; email: [email protected], [email protected]
      Annual Review of Environment and Resources Vol. 38: 441 - 471
      • ...Modeling activities may be fully inductive, fully deductive, and anywhere in between (152)....

  • 28.
    Chen H, Chiang RHL, Storey VC. 2012. Business intelligence and analytics: from big data to big impact. MIS Q. 36(4): 1165–88
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
    More AR articles citing this reference

    • Cyber-Dependent Crimes: An Interdisciplinary Review

      David Maimon1 and Eric R. Louderback21Department of Criminal Justice and Criminology, Georgia State University, Atlanta, Georgia 30303, USA; email: [email protected]2Department of Sociology, University of Miami, Coral Gables, Florida 33146, USA
      Annual Review of Criminology Vol. 2: 191 - 216
      • ...in some instances, eventually shut them down (Chen et al. 2012, Glenny 2011)....
    • ESM 2.0: State of the Art and Future Potential of Experience Sampling Methods in Organizational Research

      Daniel J. BealDepartment of Management, Pamplin College of Business, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061; email: [email protected]
      Annual Review of Organizational Psychology and Organizational Behavior Vol. 2: 383 - 407
      • ...; and a wealth of detailed financial information (e.g., Chen et al. 2012)....

  • 29.
    Christiano LJ. 2012. Christopher A. Sims and vector autoregressions. Scand. J. Econ. 114(4): 1082–104
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
  • 30.
    Clark WR, Golder M. 2015. Big data, causal inference, and formal theory: contradictory trends in political science. PS Political Sci. Politics 48(1): 65–70
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
  • 31.
    Clarke RA, Knake R. 2011. Cyber War: The Next Threat to National Security and What to Do About It. New York: HarperCollins
    • Google Scholar
    Article Location
  • 32.
    Cleveland WS. 2001. Data science: an action plan for expanding the technical areas of the field of statistics. Int. Stat. Rev. 69(1): 21–26
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
  • 33.
    Conway D. 2013. The data science Venn diagram. http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
    • Google Scholar
    Article Location
  • 34.
    Corbett-Davies S, Pierson E, Feller A, Goel S, Huq A. 2017. Algorithmic decision making and the cost of fairness. In Proceedings of 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Canada. New York: ACM. https://arxiv.org/abs/1701.08230
    • Crossref
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Algorithmic Fairness: Choices, Assumptions, and Definitions

      Shira Mitchell,1 Eric Potash,2 Solon Barocas,3,4 Alexander D'Amour,5 and Kristian Lum61Port Jefferson, New York 11777, USA2Harris School of Public Policy, University of Chicago, Chicago, Illinois 60637, USA; email: [email protected]3Microsoft Research, New York, NY 10012, USA4Department of Information Science, Cornell University, Ithaca, New York 14853, USA; email: [email protected]5Google Research, Cambridge, Massachusetts 02124, USA; email: [email protected]6Department of Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA; email: [email protected]
      Annual Review of Statistics and Its Application Vol. 8: 141 - 163
      • ...the utility-maximizing decision rule δ is necessarily a single-threshold rule (Karlin & Rubin 1956, Berger 1985, Corbett-Davies et al. 2017, Lipton et al. 2018)....
      • ...Finally, Corbett-Davies et al. (2017) and Lipton et al. (2018) both note that a decision rule δ that maximizes utility under a demographic parity constraint (in general) uses the sensitive variables a both in estimating the conditional probabilities and for determining their thresholds....
    • A Perspective on Incentive Design: Challenges and Opportunities

      Lillian J. Ratliff,1 Roy Dong,2 Shreyas Sekar,1 and Tanner Fiez11Department of Electrical Engineering, University of Washington, Seattle, Washington 98195, USA; email: [email protected]2Department of Electrical and Computer Engineering, University of Illinois, Urbana-Champaign, Urbana, Illinois 61801, USA
      Annual Review of Control, Robotics, and Autonomous Systems Vol. 2: 305 - 338
      • ...healthy discussions by a diverse range of academic communities and industry practitioners provide an encouraging sign that fairness-based constraints will play a key role in developing learning policies in the future (113, 121...

  • 35.
    Cukier K, Mayer-Schoenberger V. 2013. The rise of big data: how it's changing the way we think about the world. Foreign Aff. 92(3): 28–40
    • Web of Science ®
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
    • Article Location
    More AR articles citing this reference

    • Issues and Challenges in Census Taking

      Chris SkinnerDepartment of Statistics, London School of Economics and Political Science, London WC2A 2AE, United Kingdom; email: [email protected]
      Annual Review of Statistics and Its Application Vol. 5: 49 - 63
      • ...Cukier & Mayer-Schoenberger 2013) and does not apply to sample surveys....

  • 36.
    Deutsch KW. 1963. The Nerves of Government: Models of Political Communication and Control. New York: Free Press
    • Google Scholar
    Article Location
  • 37.
    Donoho D. 2017. 50 years of data science. J. Comput. Graphical Stat. 26(4): 745–66
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
    • Article Location
    • Article Location
    • Article Location
    More AR articles citing this reference

    • Perspective on Data Science

      Roger D. Peng1 and Hilary S. Parker21Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland 21205, USA; email: [email protected]2Independent Consultant, San Francisco, California 94102, USA
      Annual Review of Statistics and Its Application Vol. 9: 1 - 20
      • ...In a recent paper titled “50 Years of Data Science” David Donoho (2017) cites a number of definitions that ultimately could be interpreted to include essentially any scientific activity....
      • ...with some characterizing it narrowly as the application of statistical methods to clean datasets and others broadening its definition to the point that there is little distinction between data analysis and data science (Chatfield 1995, Donoho 2017, Wing et al. 2018, Wing 2020)....
    • Governance by Data

      Fleur JohnsFaculty of Law & Justice, University of New South Wales (UNSW) Sydney, New South Wales 2052, Australia; email: [email protected]
      Annual Review of Law and Social Science Vol. 17: 53 - 71
      • ...quite banally, as “the science of learning from data” (Donoho 2017, ...
      • ...Accordingly, Donoho (2017, p. 745) has suggested that data science comprises “a superset of the fields of statistics and machine learning.” The “super” in that superset reflects the impact of exponential increases in computational storage and processing capacity and associated growth in data's profusion, ...
    • Machine Learning for Social Science: An Agnostic Approach

      Justin Grimmer,1 Margaret E. Roberts,2 and Brandon M. Stewart31Department of Political Science and Hoover Institution, Stanford University, Stanford, California 94305, USA; email: [email protected]2Department of Political Science and Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, California 92093, USA; email: [email protected]3Department of Sociology and Office of Population Research, Princeton University, Princeton, New Jersey 08540, USA; email: [email protected]
      Annual Review of Political Science Vol. 24: 395 - 419
      • ...However, a key aspect is what Donoho [(2017), crediting Liberman (2010)] calls the “the secret sauce” of predictive culture, ...
    • Graduate Education in Statistics and Data Science: The Why, When, Where, Who, and What

      Marc Aerts,1 Geert Molenberghs,1,2 and Olivier Thas1,3,41Interuniversity Institute for Biostatistics and statistical Bioinformatics (I-BioStat), Hasselt University, BE3590 Hasselt, Belgium; email: [email protected]2Interuniversity Institute for Biostatistics and statistical Bioinformatics (I-BioStat), KU Leuven, 3000 Leuven, Belgium3National Institute for Applied Statistics Research Australia (NIASRA), University of Wollongong, Keiraville, New South Wales 2500, Australia4Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 9000 Ghent, Belgium
      Annual Review of Statistics and Its Application Vol. 8: 25 - 39
      • ...Definitions vary from exhaustive lists of subjects of knowledge and skills to just defining data science as the science of learning from data, with everything that this entails (e.g., Donoho 2017), ...
    • Machine Learning for Sociology

      Mario Molina and Filiz GaripDepartment of Sociology, Cornell University, Ithaca, New York 14853, USA; email: [email protected], [email protected]
      Annual Review of Sociology Vol. 45: 27 - 45
      • ...Donoho (2017) updates the terms as generative modeling and predictive modeling....
      • ...and they allow for a common-task framework where different teams can compete on the same question (Donoho 2017)....
      • ...and specify a parametric (typically linear) model to relate the inputs to an output (Breiman 2001a, Donoho 2017)....
      • ...Sociologists can identify pure prediction () problems where different research teams can potentially compete in a common-task framework (Donoho 2017)....
    • Sentiment Analysis

      Robert A. StineDepartment of Statistics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA; email: [email protected]
      Annual Review of Statistics and Its Application Vol. 6: 287 - 308
      • ...This narrow perspective allows comparison of a variety of techniques applied to the same data with a common objective, the common task framework (Donoho 2017)....

  • 38.
    Dunlap CJ. 2014. The hyper-personalization of war: cyber, big data, and the changing face of conflict. Georgetown J. Int. Aff. 15: 108–18
    • Google Scholar
    Article Location
  • 39.
    Dustdar S, Nastić S, Šćekić O. 2017. Smart Cities: The Internet of Things, People, and Systems. New York: Springer Int. Publ.
    • Crossref
    • Google Scholar
    Article Location
  • 40.
    Dzau VJ, Ginsburg GS. 2016. Realizing the full potential of precision medicine in health and health care. JAMA 316(16): 1659–60
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
  • 41.
    Enos RD. 2016. What the demolition of public housing teaches us about the impact of racial threat on political behavior. Am. J. Political Sci. 60(1): 123–42
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • The Politics of Housing

      Ben W. AnsellDepartment of Politics and International Relations, University of Oxford and Nuffield College, New Road, Oxford, OX1 1NF, United Kingdom; email: [email protected]
      Annual Review of Political Science Vol. 22: 165 - 185
      • ...in terms of longstanding racial divisions beyond immigration, Enos (2016) finds that when public housing is demolished, ...

  • 42.
    Evans P. 2018. Harnessing big data: a tsunami of transformation. In Opening Government, pp. 137–44. Acton, ACT, Aust.: ANU Press
    • Crossref
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
  • 43.
    Farrell H. 2012. The consequences of the internet for politics. Annu. Rev. Political Sci. 15: 35–52
    • Link
    • Web of Science ®
    • Google Scholar
  • 44.
    Glaeser EL, Cominers SD, Luca M, Naik N. 2018. Big data and big cities: the promises and limitations of improved measures of urban life. Econ. Inq. 56(1): 114–37
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • What Shapes the Quality and Behavior of Government Officials? Institutional Variation in Selection and Retention Methods

      Claire S.H. Lim1,2 and James M. Snyder, Jr.3,41School of Economics and Finance, Queen Mary University of London, London E1 4NS, United Kingdom; email: [email protected]2Centre for Economic Policy Research, London EC1V 0DX, United Kingdom3Department of Government, Harvard University, Cambridge, Massachusetts 02138, USA; email: [email protected]4National Bureau of Economic Research, Cambridge, Massachusetts 02138, USA
      Annual Review of Economics Vol. 13: 87 - 109
      • ... provide excellent overviews of machine learning methods essential to economists. Glaeser et al. (2018)...

  • 45.
    Goff PA, Lloyd T, Geller A. 2016. The science of justice: race, arrests, and police use of force. Rep. Cent. Policing Equity, New York, NY
    • Google Scholar
    Article Location
  • 46.
    Gomez-Rodriguez M, Leskovec J, Krause A. 2012. Inferring networks of diffusion and influence. ACM Trans. Knowledge Discov. Data 5(4): 21
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
  • 47.
    Granato J, Scioli F. 2004. Puzzles, proverbs, and omega matrices: the scientific and social significance of Empirical Implications of Theoretical Models (EITM). Perspect. Politics 2(2): 313–23
    • Crossref
    • Google Scholar
    Article Location
  • 48.
    Gray J. 2009. Jim Gray on eScience: a transformed scientific method. In The Fourth Paradigm: Data-Intensive Scientific Discovery, ed. T Hey, S Tansley, K Tolle, pp. xvii–xxxi. Redmond, WA: Microsoft Res.
    • Google Scholar
    Article Location
  • 49.
    Grimmer J, Messing S, Westwood SJ. 2012. How words and money cultivate a personal vote: the effect of legislator credit claiming on constituent credit allocation. Am. Political Sci. Rev. 106(4): 703–19
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
    More AR articles citing this reference

    • Who Enters Politics and Why?

      Saad GulzarDepartment of Political Science, Stanford University, Stanford, California 94305, USA; email: [email protected]
      Annual Review of Political Science Vol. 24: 253 - 275
      • ...; speaking in legislative session and writing bills (Grimmer et al. 2012, Parthasarathy et al. 2019)...

  • 50.
    Grimmer J, Stewart BM. 2013. Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Political Anal. 21(3): 267–97
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
    More AR articles citing this reference

    • Testing Causal Theories with Learned Proxies

      Dean Knox,1 Christopher Lucas,2 and Wendy K. Tam Cho31Operations, Information, and Decisions Department and Analytics at Wharton, The Wharton School of the University of Pennsylvania, Philadelphia, Pennsylvania, USA; email: [email protected]2Department of Political Science and Division of Computational and Data Sciences, Washington University in St. Louis, St. Louis, Missouri, USA; email: [email protected]3Departments of Political Science, Statistics, Mathematics, Computer Science, and Asian American Studies; College of Law; and National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Champaign, Illinois, USA; email: [email protected]
      Annual Review of Political Science Vol. 25: 419 - 441
      • ...have been the subject of much work (Adcock & Collier 2001, Grimmer & Stewart 2013)....
      • ...then training a model that attempts to reconstruct the resulting labels based on some reduced feature set, such as word frequencies (Grimmer & Stewart 2013)....
    • Computational Methods in Legal Analysis

      Jens Frankenreiter1 and Michael A. Livermore21Ira M. Millstein Center for Global Markets and Corporate Ownership, Columbia Law School, Columbia University, New York, NY 10027, USA2School of Law, University of Virginia, Charlottesville, Virginia 22903, USA; email: [email protected]
      Annual Review of Law and Social Science Vol. 16: 39 - 57
      • ...a trend that has been described as using text as data (Grimmer & Stewart 2013)....
      • ...These choices can have consequences for the information that is ultimately analyzed and must be made carefully (Grimmer & Stewart 2013)....
    • Machine Learning for Sociology

      Mario Molina and Filiz GaripDepartment of Sociology, Cornell University, Ithaca, New York 14853, USA; email: [email protected], [email protected]
      Annual Review of Sociology Vol. 45: 27 - 45
      • ...readers are directed to articles by Bail (2014), Blei (2012), Evans & Aceves (2016), Grimmer & Stewart (2013), ...
      • ...puts their domain knowledge at the center (Grimmer & Stewart 2013)....
    • Qualitative Methods

      John GerringDepartment of Government, University of Texas, Austin, Texas 78712; email: [email protected]
      Annual Review of Political Science Vol. 20: 15 - 36
      • ...either through judgment exercised by coders or through mathematical algorithms (Grimmer & Stewart 2013)....
    • Coding the Ideological Direction and Content of Policies

      Joshua D. ClintonDepartment of Political Science, Vanderbilt University, Nashville, Tennessee 37235; email: [email protected]
      Annual Review of Political Science Vol. 20: 433 - 450
      • ...Is it possible to use finer-grained information about the statutes being enacted to develop a more nuanced measure of policy content and scope—perhaps by applying text analysis to statutory language (see Proksch & Slapin 2012, Grimmer & Stewart 2013, Eggers & Spirling 2014)? Text analysis methods have been usefully applied to the task of analyzing party platforms in comparative politics (e.g., ...
    • Large-Scale Computerized Text Analysis in Political Science: Opportunities and Challenges

      John Wilkerson and Andreu CasasDepartment of Political Science, University of Washington, Seattle, Washington 98195; email: [email protected]
      Annual Review of Political Science Vol. 20: 529 - 544
      • ...Validation is a critical component of every text-as-data project (Saldana 2009, Grimmer & Stewart 2013)....
      • ...Two (classification and scaling) will be familiar to many readers (Grimmer & Stewart 2013)....
    • Nationalism in Settled Times

      Bart BonikowskiDepartment of Sociology, Harvard University, Cambridge, Massachusetts 02138; email: [email protected]
      Annual Review of Sociology Vol. 42: 427 - 449
      • ...Although these methods are not without their limitations and their validation can be time-consuming (Grimmer & Stewart 2013), ...
    • Machine Translation: Mining Text for Social Theory

      James A. Evans and Pedro AcevesDepartment of Sociology, University of Chicago, Chicago, Illinois 60637; email: [email protected]
      Annual Review of Sociology Vol. 42: 21 - 50
      • ...Many general and text-specific ML techniques have now proven powerful for translating text and related communicative traces into sociologically valuable data (Grimmer & Stewart 2013)....
      • ...media organizations, consumers, and constituents (Grimmer & King 2011, Grimmer & Stewart 2013)....
    • Re-imagining the Cambridge School in the Age of Digital Humanities

      Jennifer A. LondonDepartment of Political Science, University of California, Los Angeles, California 90095; email: [email protected]
      Annual Review of Political Science Vol. 19: 351 - 373
      • ...The role of the text's author and the view of a text as a representative whole are not part of this analysis—which aims to reduce the human labor involved in classification (Grimmer & Stewart 2013)....
      • ...that text-as-data scholars do not suggest such quantitative analysis should replace traditional textual interpretation (Grimmer & Stewart 2013, Blaydes et al. 2015, ...
      • ...they believe that such methods can “amplify and augment careful reading and thoughtful analysis” (Grimmer & Stewart 2013, ...
      • ...Text-as-data methods have been used in many other areas of political science for this purpose (Grimmer & Stewart 2013), ...
      • ...as a requisite scientific balance on computer findings (Grimmer & Stewart 2013)....
      • ...17Grimmer & Stewart (2013) introduce alternative tools for studying texts as data in political science, ...
    • Measuring Policy Positions in Political Space

      Michael LaverDepartment of Political Science, New York University, New York, New York 10003; email: [email protected]
      Annual Review of Political Science Vol. 17: 207 - 223
      • ...but there is now a large and complex technical literature on “text as data”—the use of automated methods to measure policy positions using political texts. Grimmer & Stewart (2013) provide an excellent and authoritative review of this literature that should be a starting point for scholars thinking of moving into this area....

  • 51.
    Hanauer DA, Rhodes DR, Chinnaiyan AM. 2009. Exploring clinical associations using ‘-omics’ based enrichment analyses. PLOS ONE 4(4): e5203
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
  • 52.
    Harcourt BE. 2007. Against Prediction: Profiling, Policing, and Punishing in an Actuarial Age. Chicago: Univ. Chicago Press
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • The Society of Algorithms

      Jenna Burrell and Marion FourcadeSchool of Information and Department of Sociology, University of California, Berkeley, California 94720, USA; email: [email protected], [email protected]
      Annual Review of Sociology Vol. 47: 213 - 237
      • ...finance and insurance, labor management, and education (Bouk 2015, Gandy 1993, Harcourt 2007, Lauer 2017)....
      • ...fueling a movement away from group assessment of risk and, increasingly, toward its individualization (Harcourt 2007, Lauer 2017)....
      • ...still more scrutiny. Harcourt (2007) names this reinforcing cycle of scrutiny the ratcheting effect....
      • ...But as Harcourt (2007) points out, reducing justice (and punishment) to a computational model is fundamentally problematic....
      • ...This amounts to an “epistemic distortion” of criminal justice (Harcourt 2007, ...
    • Parole Release and Supervision: Critical Drivers of American Prison Policy

      Kevin R. Reitz1 and Edward E. Rhine21University of Minnesota Law School, Minneapolis, Minnesota 55455, USA; email: [email protected]2Robina Institute of Criminal Law and Criminal Justice, University of Minnesota Law School, Minneapolis, Minnesota 55455, USA; email: [email protected]
      Annual Review of Criminology Vol. 3: 281 - 298
      • ...including worries that they will exacerbate racial disparities in punishment and expand the overall reach of penal control (Eaglin 2013, Harcourt 2007, Klingele 2016)....
    • Conceptualizing Policing and Security: New Harmscapes, the Anthropocene, and Technology

      Cameron Holley,1 Tariro Mutongwizo,1 and Clifford Shearing2,3,41UNSW Law, UNSW Sydney, Sydney, New South Wales 2052, Australia2Griffith Criminology Institute and School of Criminology and Criminal Justice, Griffith University, Brisbane, Queensland 4122, Australia3Faculty of Law, University of Cape Town, Cape Town, Western Cape 7700, South Africa4School of Criminology, University of Montreal, Montreal, Quebec H3C 3J7, Canada; email: [email protected]
      Annual Review of Criminology Vol. 3: 341 - 358
      • ...contemporary studies of police profiling of potential offenders (Harcourt 2007, Neild 2009)....
    • Racial Innocence: Law, Social Science, and the Unknowing of Racism in the US Carceral State

      Naomi MurakawaDepartment of African American Studies, Princeton University, Princeton, New Jersey 08544, USA; email: [email protected]
      Annual Review of Law and Social Science Vol. 15: 473 - 493
      • ...raising concerns that discrimination is woven into predictions (Eaglin 2017, Ferguson 2017, Goddard & Myers 2017, Hannah-Moffat 2013, Harcourt 2007, Schwalbe et al. 2007)....
      • ...risk-assessment tools are likely to exacerbate but sanitize existing racial inequality (Harcourt 2007, 2015)....
    • Machine Learning for Sociology

      Mario Molina and Filiz GaripDepartment of Sociology, Cornell University, Ithaca, New York 14853, USA; email: [email protected], [email protected]
      Annual Review of Sociology Vol. 45: 27 - 45
      • ...There are legitimate concerns that SML predictions (and the data on which they are based) can perpetuate social inequalities (Barocas & Selbst 2016, Harcourt 2007, Starr 2014)....
    • The Rise and Restraint of the Preventive State

      Lucia Zedner1,2,3 and Andrew Ashworth1,41All Souls College, University of Oxford, Oxford OX1 4AL, United Kingdom2Faculty of Law, University of Oxford, Oxford OX1 3UL, United Kingdom; email: [email protected]3Faculty of Law, University of New South Wales, Sydney, New South Wales 2052, Australia4Faculty of Law, University of Tasmania, Hobart, Tasmania 7001, Australia
      Annual Review of Criminology Vol. 2: 429 - 450
      • ...this raises issues of principles and priorities that call for further debate (Ferguson 2017, Harcourt 2007)....
      • ...Those who took up this call are too numerous to recount in full but just a few prominent examples suffice to convey the breadth and import of their contributions to charting the burdens entailed by the Preventive State. Harcourt's (2007) masterful attack on the fundamental claims of actuarial justice in Against Prediction called into question the underlying premises of many preventive endeavors by challenging the validity of actuarial assessment and, ...
      • ...or group-based measures, often along unwarranted racial or religious lines (Harcourt 2007)....

  • 53.
    Hashem IAT, Chang V, Anuar NB, Adewole K, Yaqoob I, et al. 2016. The role of big data in Smart City. Int. J. Inf. Manag. 36: 748–58
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
  • 54.
    Hastie T, Tibshirani R, Friedman J. 2016. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Stanford, CA: Stanford Univ. Press. 2nd ed.
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
  • 55.
    Hersh ED. 2013. Long-term effect of September 11 on the political behavior of victims' families and neighbors. PNAS 110(52): 20959–63
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
  • 56.
    Hilbert M, López P. 2011. The world's technological capacity to store, communicate, and compute information. Science 332: 60–65
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • The Society of Algorithms

      Jenna Burrell and Marion FourcadeSchool of Information and Department of Sociology, University of California, Berkeley, California 94720, USA; email: [email protected], [email protected]
      Annual Review of Sociology Vol. 47: 213 - 237
      • ...and compute information” in the early 2000s (Hilbert & López 2011)....
    • Smart Everything: Will Intelligent Systems Reduce Resource Use?

      Jonathan G. Koomey,1 H. Scott Matthews,2 and Eric Williams31Steyer-Taylor Center for Energy Policy and Finance, Stanford University, Stanford, California 94305; email: [email protected]2Department of Civil and Environmental Engineering, Department of Engineering and Public Policy, and Green Design Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213-3890; email: [email protected]3Golisano Institute for Sustainability, Rochester Institute of Technology, Rochester, New York 14623-5603; email: [email protected]
      Annual Review of Environment and Resources Vol. 38: 311 - 343
      • ...with important implications for the structure of businesses and the ways consumers access information (4)....

  • 57.
    Hochschild J, Sen M. 2015. Genetic determinism, technology, optimism, and race: views of the American public. Ann. AAPSS 661: 160–80
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Why Sociology Matters to Race and Biosocial Science

      Dorothy E. Roberts1 and Oliver Rollins21Department of Sociology, Department of Africana Studies, Law School, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA; email: [email protected]2Department of Sociology, University of Louisville, Louisville, Kentucky 40292, USA; email: [email protected]
      Annual Review of Sociology Vol. 46: 195 - 214
      • ...has examined consumers’ experiences with genetic ancestry testing and the growing public anticipation and acceptance of genomic understandings of health and difference (Byrd & Ray 2015, Hochschild & Sen 2015, Morin-Chassé et al. 2017, Nelson 2016, Panofsky & Donovan 2019, Phelan et al. 2013, Roth & Ivemark 2018, Shim et al. 2018, Yaylaci et al. 2019)...

  • 58.
    Hopkins D, King G. 2010. A method of automated nonparametric content analysis for social science. Am. J. Political Sci. 54(1): 229–47
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Machine Learning for Sociology

      Mario Molina and Filiz GaripDepartment of Sociology, Cornell University, Ithaca, New York 14853, USA; email: [email protected], [email protected]
      Annual Review of Sociology Vol. 45: 27 - 45
      • ... to use human-identified topics as training data to classify a larger set of documents (Hopkins & King 2010, Mohr et al. 2013)....
    • Data ex Machina: Introduction to Big Data

      David Lazer1,2 and Jason Radford1,31Department of Political Science and College of Computer and Information Science, Northeastern University, Boston, Massachusetts 02115; email: [email protected], [email protected]2Institute for Quantitative Social Science, Harvard University, Cambridge, Massachusetts 021383Department of Sociology, University of Chicago, Chicago, Illinois 60637
      Annual Review of Sociology Vol. 43: 19 - 39
      • ... and Hopkins & King (2010) use Twitter data to generate estimates of public opinion....
    • Large-Scale Computerized Text Analysis in Political Science: Opportunities and Challenges

      John Wilkerson and Andreu CasasDepartment of Political Science, University of Washington, Seattle, Washington 98195; email: [email protected]
      Annual Review of Political Science Vol. 20: 529 - 544
      • ...ReadMe is a supervised method that reliably predicts class proportions using a much smaller number of training examples (Hopkins & King 2010)...
      • ...These developments have already produced important advances in research methods (Hopkins & King 2010, Benoit et al. 2016), ...

  • 59.
    Hsiang SM, Burke M, Miguel E. 2013. Quantifying the influence of climate on human conflict. Science 341: 1235367
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • The Elusive Peace Dividend of Development Policy: From War Traps to Macro Complementarities

      Dominic Rohner1,2 and Mathias Thoenig1,21Faculty of Business and Economics (HEC Lausanne), University of Lausanne, 1015 Lausanne, Switzerland; email: [email protected], [email protected]2Centre for Economic Policy Research, London EC1V 0DX, United Kingdom
      Annual Review of Economics Vol. 13: 111 - 131
      • ...with a reduction of economic growth and an increase in fighting (Hidalgo et al. 2010, Dell 2012, Hsiang et al. 2013, König et al. 2017, Vanden Eynde 2018)....
    • A Novel Approach to Carrying Capacity: From a priori Prescription to a posteriori Derivation Based on Underlying Mechanisms and Dynamics

      Safa Mote,1, Jorge Rivas,2, and Eugenia Kalnay1, 1Department of Atmospheric and Oceanic Science, and Institute for Physical Science and Technology, University of Maryland, College Park, Maryland 20742, USA; email: [email protected], [email protected]2Independent Researcher, Greenbelt, Maryland 20770, USA; email: [email protected]
      Annual Review of Earth and Planetary Sciences Vol. 48: 657 - 683
      • ...in cases of the so-called “failed states” (Hsiang et al. 2011, 2013...
    • Impacts of Hosting Forced Migrants in Poor Countries

      Jean-François Maystadt,1,2 Kalle Hirvonen,3 Athur Mabiso,4 and Joachim Vandercasteelen51Institute of Development Policy (IOB), University of Antwerp, 2000 Antwerp, Belgium2Department of Economics, Lancaster University, Lancaster LA1 4YX, United Kingdom; email: [email protected]3Development Strategy and Governance Division, International Food Policy Research Institute (IFPRI), Washington, DC 20005, USA; email: [email protected]4Research and Impact Assessment Division, International Fund for Agricultural Development (IFAD), 00142 Rome, Italy; email: [email protected]5Centre for Institutions and Economic Performance (LICOS), KU Leuven, 3000 Leuven, Belgium; email: [email protected]
      Annual Review of Resource Economics Vol. 11: 439 - 459
      • ... but may also trigger conflicts (Hsiang et al. 2013) and therefore refugee inflows....
    • Climate Change and Conflict

      Vally Koubi1,21Center for Comparative and International Studies, Swiss Federal Institute of Technology (ETH), Zurich 8092, Switzerland; email: [email protected]2Institute of Economics, University of Bern, Bern 3001, Switzerland
      Annual Review of Political Science Vol. 22: 343 - 360
      • ...Hsiang et al. (2013), seeking to systematically examine the climate–conflict relationship, ...
    • Projected Behavioral Impacts of Global Climate Change

      Gary W. EvansDepartment of Design and Environmental Analysis and Department of Human Development, Cornell University, Ithaca, New York 14850, USA; email: [email protected]
      Annual Review of Psychology Vol. 70: 449 - 474
      • ...including one using historical data spanning centuries (Hsiang & Burke 2014, Hsiang et al. 2013)....
      • ...researchers have uncovered consistent but relatively modest effect sizes on the order of a 0.3% increase in interpersonal violence and a 3.5% increase of intergroup conflict following 1 SD change in rainfall (Burke et al. 2015, Hsiang & Burke 2014, Hsiang et al. 2013)....
      • ...Most studies on precipitation extremes and crime have occurred in agrarian regions where agricultural disruption caused by too much or too little rain has major consequences on economic livelihood (Hsiang et al. 2013)....
    • Climate Change and Collective Violence

      Barry S. Levy,1 Victor W. Sidel,2 and Jonathan A. Patz31School of Medicine, Tufts University, Sherborn, Massachusetts 01770; email: [email protected]2Department of Medicine and Department of Healthcare Policy and Research, Weill Cornell Medical College, New York, NY 10021; email: [email protected]3Global Health Institute, University of Wisconsin-Madison, Madison, Wisconsin 53726; email: [email protected]
      Annual Review of Public Health Vol. 38: 241 - 257
      • ...Hsiang and colleagues (27) performed a meta-analysis based on 60 longitudinal studies, ...
      • ...with increasing temperatures, there would likely be substantial increases in conflict (27)....
    • Climate Change and Global Food Systems: Potential Impacts on Food Security and Undernutrition

      Samuel S. Myers,1,2 Matthew R. Smith,1 Sarah Guth,2 Christopher D. Golden,1,2 Bapu Vaitla,1 Nathaniel D. Mueller,3,4 Alan D. Dangour,5 and Peter Huybers2,31Department of Environmental Health, Harvard T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts 02115; email: [email protected], [email protected], [email protected], [email protected]2Harvard University Center for the Environment, Cambridge, Massachusetts 02138; email: [email protected]3Department of Earth and Planetary Sciences, Harvard University, Cambridge, Massachusetts 02138; email: [email protected], [email protected]4Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 021385Faculty of Epidemiology and Population Health, London School of Hygiene & Tropical Medicine, London WC1E 7HT, United Kingdom; email: [email protected]
      Annual Review of Public Health Vol. 38: 259 - 277
      • ...A recent review of 60 primary studies identified a strong and significant historical relationship between the two phenomena (69), ...
    • Documenting the Effects of Armed Conflict on Population Health

      Barry S. Levy1 and Victor W. Sidel21Public Health and Community Medicine, School of Medicine, Tufts University, Sherborn, Massachusetts 01770; email: [email protected]2Department of Medicine and Department of Healthcare Policy and Research, Weill Cornell Medical College, New York, NY 10021; email: [email protected]
      Annual Review of Public Health Vol. 37: 205 - 218
      • ...changing patterns of precipitation, raising sea level, and creating environmental refugees (10, 11, 21, 25, 43)....

  • 60.
    Hsiang SM, Meng KC, Cane MA. 2011. Civil conflicts are associated with the global climate. Nature 476: 438–41
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Transportation and the Environment in Developing Countries

      Shanjun Li,1,2 Jianwei Xing,3 Lin Yang,1 and Fan Zhang41Dyson School of Applied Economics and Management, Cornell University, Ithaca, New York 14850, USA; email: [email protected]2National Bureau of Economic Research, Cambridge, Massachusetts 02138, USA3National School of Development, Peking University, Beijing 100871, China4Chief Economist's Office of Infrastructure, World Bank, Washington, DC 20433, USA
      Annual Review of Resource Economics Vol. 12: 389 - 409
      • ...and Barreca et al. (2016) on mortality; and Miguel et al. (2004), Hsiang et al. (2011), ...
    • A Novel Approach to Carrying Capacity: From a priori Prescription to a posteriori Derivation Based on Underlying Mechanisms and Dynamics

      Safa Mote,1, Jorge Rivas,2, and Eugenia Kalnay1, 1Department of Atmospheric and Oceanic Science, and Institute for Physical Science and Technology, University of Maryland, College Park, Maryland 20742, USA; email: [email protected], [email protected]2Independent Researcher, Greenbelt, Maryland 20770, USA; email: [email protected]
      Annual Review of Earth and Planetary Sciences Vol. 48: 657 - 683
      • ...in cases of the so-called “failed states” (Hsiang et al. 2011, 2013...
    • Climate Change and Conflict

      Vally Koubi1,21Center for Comparative and International Studies, Swiss Federal Institute of Technology (ETH), Zurich 8092, Switzerland; email: [email protected]2Institute of Economics, University of Bern, Bern 3001, Switzerland
      Annual Review of Political Science Vol. 22: 343 - 360
      • ...Hsiang et al. (2011) find that the probability of civil conflict onset in the tropics during El Niño Southern Oscillation years is twice as large as in La Niña years....
      • ... also do not find any evidence that El Niño is linked to civil conflict onset in sub-Saharan Africa, refuting Hsiang et al.'s (2011) result....
    • Climate Change and Collective Violence

      Barry S. Levy,1 Victor W. Sidel,2 and Jonathan A. Patz31School of Medicine, Tufts University, Sherborn, Massachusetts 01770; email: [email protected]2Department of Medicine and Department of Healthcare Policy and Research, Weill Cornell Medical College, New York, NY 10021; email: [email protected]3Global Health Institute, University of Wisconsin-Madison, Madison, Wisconsin 53726; email: [email protected]
      Annual Review of Public Health Vol. 38: 241 - 257
      • ...ENSO may “have had a role” in the development of 21% of civil conflicts (28, ...
    • Climate and Conflict

      Marshall Burke,1 Solomon M. Hsiang,2,4 Edward Miguel3,4 1Department of Earth System Science, and Center on Food Security and the Environment, Stanford University, Stanford, California 94305; email: [email protected] 2Goldman School of Public Policy, University of California, Berkeley, California 94720; email: [email protected] 3Department of Economics, University of California, Berkeley, California 94720; email: [email protected] 4National Bureau of Economic Research, Cambridge, Massachusetts 02138
      Annual Review of Economics Vol. 7: 577 - 617
      • ...countries in Sub-Saharan Africa (Burke et al. 2009), and the global tropics (Hsiang et al. 2011)....
      • ...and civil conflict risk in the tropics increases with sea-surface temperature (data from Hsiang et al. 2011)....
      • ...both political violence and other forms of collective violence (Burke et al. 2009; Hsiang et al. 2011, 2013b...
      • ... to the risk of civil conflicts throughout the entire global tropics (Hsiang et al. 2011)....
      • ...as well as organized political conflict (Miguel et al. 2004, Levy et al. 2005, Cervellati et al. 2011, Hsiang et al. 2011, Fjelde & von Uexkull 2012, Hendrix & Salehyan 2012, O’Loughlin et al. 2012, Harari & La Ferrara 2013, Couttenier & Soubeyran 2014, Fetzer 2014), ...
      • ...and exposure to the El Niño Southern Oscillation in Hsiang et al. (2011) is coded as temperature.] The rightmost panels in Supplemental Figures 1 and 2 display results for temperature and rainfall separately for both classes of conflict, ...
      • ...and cross-partial effects (e.g., Burke et al. 2009, Hsiang et al. 2011, Dell et al. 2012)....
      • ...or very wet) also reduce productivity in agriculture (Schlenker & Roberts 2009, Hidalgo et al. 2010, Schlenker & Lobell 2010, Welch et al. 2010, Hsiang et al. 2011, Lobell et al. 2011), ...
      • ...Hsiang et al. (2011) match patterns of heterogeneous responses to climate for both income and conflict, ...
    • African Lessons on Climate Change Risks for Agriculture

      Christoph MüllerPotsdam Institute for Climate Impact Research, D-14473 Potsdam, Germany; email: [email protected]
      Annual Review of Nutrition Vol. 33: 395 - 411
      • ...It is being debated whether social unrest can be related to climate change and declining food production (6, 19, 21, 53, 99, 111), ...
    • Climate Change Politics

      Thomas BernauerCenter for Comparative and International Studies and Institute for Environmental Decisions, ETH Zurich, CH-8092 Zurich, Switzerland; email: [email protected]
      Annual Review of Political Science Vol. 16: 421 - 448
      • ...they have examined whether there is a direct relationship between climatic changes (or climate variability) and large-scale political violence measured in terms of civil or interstate war (Zhang et al. 2007, Buhaug 2010a, Hsiang et al. 2011, Gartzke 2012)....
    • Taking Stock of Malthus: Modeling the Collapse of Historical Civilizations

      Rafael ReuvenySchool of Public and Environmental Affairs, Indiana University, Bloomington, Indiana 47405; email: [email protected]
      Annual Review of Resource Economics Vol. 4: 303 - 329
      • ...Nel & Righarts (2008), Burke et al. (2009), Hsiang et al. (2011), Hendrix & Salehyan (2012), Raleigh & Kniveton (2012), ...

  • 61.
    Jamieson K. 2018. Cyber-War: How Russian Hackers and Trolls Helped Elect a President. New York: Oxford Univ. Press
    • Google Scholar
    Article Location
  • 62.
    Jordan M. 2018. Artificial intelligence—the revolution hasn't happened yet. Medium. https://medium.com/@mijordan3/artificial-intelligence-the-revolution-hasnt-happened-yet-5e1d5812e1e7
    • Google Scholar
    Article Location
  • 63.
    Kalil T. 2012. Big data is a big deal. Press release, The White House, Mar. 29. https://obamawhitehouse.archives.gov/blog/2012/03/29/big-data-big-deal
    • Google Scholar
    Article Location
  • 64.
    Kandel S, Paepeke A, Hellerstein Heer J. 2011. Wrangler: interactive visual specification of data transformation scripts. Paper presented at CHI Conference on Human Factors in Computing Systems, May 7–12, Vancouver, BC
    • Crossref
    • Google Scholar
    Article Location
  • 65.
    Kandel S, Paepeke A, Hellerstein Heer J. 2012. Enterprise data analysis and visualization: an interview study. IEEE Trans. Vis. Comput. Graph. 18(12): 2917–26
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
  • 66.
    Kaplan F. 2017. Dark Territory: The Secret History of Cyber War. New York: Simon & Schuster
    • Google Scholar
    Article Location
  • 67.
    Kim IS. 2017. Political cleavages within industry: firm-level lobbying for trade liberalization. Am. Political Sci. Rev. 111(1): 1–20
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
    More AR articles citing this reference

    • Empirical Models of Lobbying

      Matilde Bombardini1,2,3 and Francesco Trebbi1,2,31Vancouver School of Economics, University of British Columbia, Vancouver, British Columbia V6T 1L4, Canada; email: [email protected], [email protected]2Canadian Institute for Advanced Research, Toronto, Ontario M5G 1M1, Canada3National Bureau of Economic Research, Cambridge, Massachusetts 02138, USA
      Annual Review of Economics Vol. 12: 391 - 413
      • ...even for specific issues such as trade policy (see Bombardini & Trebbi 2012 and Kim 2017 for discussion)....
    • Economic Geography, Politics, and Policy

      Stephanie J. RickardDepartment of Government, London School of Economics and Political Science, London WC2A 2AE, United Kingdom; email: [email protected]
      Annual Review of Political Science Vol. 23: 187 - 202
      • ...New new trade theory identifies firms as the primary actors in trade politics and demonstrates how firms influence countries’ trade patterns via lobbying on trade policy (Osgood 2016, Kim 2017, Kim et al. 2019)....
    • Survey Experiments in International Political Economy: What We (Don't) Know About the Backlash Against Globalization

      Megumi NaoiDepartment of Political Science, University of California, San Diego, La Jolla, California 92093-0521, USA; email: [email protected]
      Annual Review of Political Science Vol. 23: 333 - 356
      • ...among low-skilled workers (Owen & Johnston 2017), across localities, within industries (Kim 2017), ...
    • The Economics and Politics of Preferential Trade Agreements

      Leonardo Baccini1,21Department of Political Science, McGill University, Montreal, Quebec H3A 2T7, Canada; email: [email protected]2CIREQ, Montreal, Quebec H3A 2T7, Canada
      Annual Review of Political Science Vol. 22: 75 - 92
      • ...Scholars in international political economy have recently taken advantage of this wealth of micro-data to explore the micro-foundation of lobbying on trade policy (Kim 2017)....
    • Firms in Trade and Trade Politics

      In Song Kim1 and Iain Osgood21Department of Political Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA; email: [email protected]2Department of Political Science, University of Michigan, Ann Arbor, Michigan 48104, USA; email: [email protected]
      Annual Review of Political Science Vol. 22: 399 - 417
      • ...the key scope conditions for intraindustry reallocations of profit are that products be differentiated and that trade liberalization be reciprocal between two reasonably competitive partners (Osgood 2016, Kim 2017)....
      • ...The literature has thus examined product differentiation as a key scope condition. Kim (2017)...

  • 68.
    Kim IS, Kunisky D. 2018. Mapping political communities: a statistical analysis of lobbying networks in legislative politics. Work. Pap., Mass. Inst. Technol., http://web.mit.edu/insong/www/pdf/network.pdf. Accessed Dec. 2, 2018
    • Google Scholar
    Article Location
  • 69.
    King G, Pan J, Roberts ME. 2013. How censorship in China allows government criticism but silences collective expression. Am. Political Sci. Rev. 107(2): 326–43
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
    • Article Location
    More AR articles citing this reference

    • Political Control

      Mai Hassan,1 Daniel Mattingly,2 and Elizabeth R. Nugent21Department of Political Science, University of Michigan, Ann Arbor, Michigan, USA; email: [email protected]2Department of Political Science, Yale University, New Haven, Connecticut, USA; email: [email protected], [email protected]
      Annual Review of Political Science Vol. 25: 155 - 174
      • ...For example, states censor to prevent collective action (King et al. 2013)...
    • Political Effects of the Internet and Social Media

      Ekaterina Zhuravskaya,1 Maria Petrova,2,3,4,5,6 and Ruben Enikolopov3,2,4,5,61Paris School of Economics, École des Hautes Études en Sciences Sociales, 75014 Paris, France; email: [email protected]2Department of Economics and Business, Universitat Pompeu Fabra, 08002 Barcelona, Spain3New Economic School, Moscow 121353, Russia4Institute of Political Economy and Governance, 08005 Barcelona, Spain5Graduate School of Economics, 08005 Barcelona, Spain6Catalan Institute for Research and Advanced Studies (ICREA), 08010 Barcelona, Spain
      Annual Review of Economics Vol. 12: 415 - 438
      • ...despite heavy censorship that specifically targets information related to any form of collective action (King et al. 2013, 2014)....
      • ...selective deletion of online content is widespread. King et al. (2013) study the types of online content that are more likely to get censored in modern China....
      • ... and Huang & Yeh (2019), as well as King et al. (2013, 2014)...
    • Contemporary Social Movements in a Hybrid Media Environment

      Neal Caren, Kenneth T. Andrews, and Todd LuDepartment of Sociology, University of North Carolina, Chapel Hill, North Carolina 27599, USA; email: [email protected]
      Annual Review of Sociology Vol. 46: 443 - 465
      • ...and enhance the capacities for surveillance and repression of movements (King et al. 2013, Tufekci 2017)....
    • Resilience to Online Censorship

      Margaret E. RobertsDepartment of Political Science, University of California, San Diego, La Jolla, California 92093-0521, USA; email: [email protected]
      Annual Review of Political Science Vol. 23: 401 - 419
      • ... show large impacts of a 2018 social media tax in Uganda on overall social media use. King et al. (2013, 2014) show that the Chinese government individually removes large numbers of social media posts, ...
    • Social Networks in Policy Making

      Marco Battaglini1,2 and Eleonora Patacchini1,21Department of Economics, Cornell University, Ithaca, New York 14853, USA; email: [email protected], [email protected]2Einaudi Institute for Economics and Finance, 00187 Rome, Italy
      Annual Review of Economics Vol. 11: 473 - 494
      • ...the national party can use social media posts to monitor officials in local regions (King et al. 2013)....
    • Large-Scale Computerized Text Analysis in Political Science: Opportunities and Challenges

      John Wilkerson and Andreu CasasDepartment of Political Science, University of Washington, Seattle, Washington 98195; email: [email protected]
      Annual Review of Political Science Vol. 20: 529 - 544
      • ...ReadMe is a supervised method that reliably predicts class proportions using a much smaller number of training examples (Hopkins & King 2010). King et al. (2013)...
    • Media and Politics

      David StrömbergInstitute for International Economic Studies, Stockholm University, Stockholm 10691, Sweden; email: [email protected]
      Annual Review of Economics Vol. 7: 173 - 205
      • ...30% happen in the first half hour and 90% within 24 hours. King et al. (2013) find that censoring of Sina Weibo in China allows for criticism of leaders but not content aimed at collective action....
    • Democratic Authoritarianism: Origins and Effects

      Dawn BrancatiDepartment of Political Science, Washington University in St. Louis, St. Louis, Missouri 63130; email: [email protected]
      Annual Review of Political Science Vol. 17: 313 - 326
      • ...China allows criticism of the government on social media sites so long as it does not represent, reinforce, or encourage social mobilization (King et al. 2013)....
    • The Political Economy of Development in China and Vietnam

      Edmund Malesky1 and Jonathan London21Department of Political Science, Duke University, Durham, North Carolina 27708; email: [email protected]2Department of Asian and International Studies, City University of Hong Kong, Kowloon Tong, Kowloon, Hong Kong SAR; email: [email protected]
      Annual Review of Political Science Vol. 17: 395 - 419
      • ...; and (h) the partial liberalization of the media and Internet to gather current information on societal grievances (Lorentzen 2009, Shirk 2009, King et al. 2013)....

  • 70.
    Kitchin R. 2014. The real-time city? Big data and smart urbanism. GeoJournal 79(1): 1–14
    • Crossref
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Policing in the Era of Big Data

      Greg RidgewayDepartment of Criminology and Department of Statistics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA; email: [email protected]
      Annual Review of Criminology Vol. 1: 401 - 419
      • ...emerging literature on Big Data has noted additional common features (Kitchin 2014)....

  • 71.
    Kitzes J, Turek D, Deniz F. 2017. The Practice of Reproducible Research: Case Studies and Lessons from the Data-Intensive Sciences. Oakland: Univ. Calif. Press
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
  • 72.
    Kleinberg J, Ludwig J, Mullainathan S, Obermeyer Z. 2015. Prediction policy problems. Am. Econ. Rev. Pap. Proc. 105(5): 491–95
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Algorithms and Decision-Making in the Public Sector

      Karen Levy, Kyla E. Chasalow, and Sarah RileyDepartment of Information Science, Cornell University, Ithaca, New York 14853, USA; email: [email protected]
      Annual Review of Law and Social Science Vol. 17: 309 - 334
      • ...Those data are also processed in new ways: Machine learning techniques are characterized by a focus on developing models for prediction rather than explanation (Breiman 2001, Hofman et al. 2017, Kleinberg et al. 2015) and, ...
    • Computational Methods in Legal Analysis

      Jens Frankenreiter1 and Michael A. Livermore21Ira M. Millstein Center for Global Markets and Corporate Ownership, Columbia Law School, Columbia University, New York, NY 10027, USA2School of Law, University of Virginia, Charlottesville, Virginia 22903, USA; email: [email protected]
      Annual Review of Law and Social Science Vol. 16: 39 - 57
      • ...Kleinberg et al. (2015) refer to these problems as “prediction policy problems.”...
    • Machine Learning for Sociology

      Mario Molina and Filiz GaripDepartment of Sociology, Cornell University, Ithaca, New York 14853, USA; email: [email protected], [email protected]
      Annual Review of Sociology Vol. 45: 27 - 45
      • ...ML can be viewed as an offshoot of nonparametric statistics (Kleinberg et al. 2015)....
      • ...but it can still have high generalization error if it yields high-variance estimates (Kleinberg et al. 2015)....
      • ...that penalizes functions that create variance (Kleinberg et al. 2015, Mullainathan & Spiess 2017)....
      • ...which sets the relative price for variance (Kleinberg et al. 2015)....
      • ...are two essential features: regularization and the data-driven choice of regularization parameters (also known as empirical tuning) (Athey & Imbens 2017, Kleinberg et al. 2015, Mullainathan & Spiess 2017)....
      • ...Scholars apply SML to various questions in economics, demographics, political science, and criminology. Kleinberg et al. (2015)...
      • ...are already using SML to make policy predictions (Kleinberg et al. 2015)....

  • 73.
    Knight W. 2017. The dark secret at the heart of AI. MIT Technol. Rev., May/June. https://www.technologyreview.com/s/604087/the-dark-secret-at-the-heart-of-ai/
    • Google Scholar
    Article Location
  • 74.
    Laney D. 2001. 3D data management: controlling data volume, velocity, and variety. Application Delivery Strategies File 949, Feb. 6, META Group. https://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf
    • Google Scholar
    Article Location
  • 75.
    Lasswell HD. 1951. The policy orientation. In The Policy Sciences: Recent Developments in Scope and Method, ed. D Lerner, H Lasswell, pp. 3–15. Stanford, CA: Stanford Univ. Press
    • Google Scholar
    Article Location
  • 76.
    Laver M, Benoit K, Garry J. 2003. Extracting policy positions from political texts using words as data. Am. Political Sci. Rev. 97(2): 311–31
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Coding the Ideological Direction and Content of Policies

      Joshua D. ClintonDepartment of Political Science, Vanderbilt University, Nashville, Tennessee 37235; email: [email protected]
      Annual Review of Political Science Vol. 20: 433 - 450
      • ...? Text analysis methods have been usefully applied to the task of analyzing party platforms in comparative politics (e.g., Laver et al. 2003), ...
    • Machine Translation: Mining Text for Social Theory

      James A. Evans and Pedro AcevesDepartment of Sociology, University of Chicago, Chicago, Illinois 60637; email: [email protected]
      Annual Review of Sociology Vol. 42: 21 - 50
      • ...social science applications have analyzed contingency tables to identify distinguishing n-grams from positive and negative examples (Gentzkow & Shapiro 2010, Laver et al. 2003), ...
    • Measuring Policy Positions in Political Space

      Michael LaverDepartment of Political Science, New York University, New York, New York 10003; email: [email protected]
      Annual Review of Political Science Vol. 17: 207 - 223
      • ...Early publications in mainstream political science journals offered simple methods that could easily be deployed by nonspecialists (Laver et al. 2003, Slapin & Proksch 2008), ...
    • Using Roll Call Estimates to Test Models of Politics

      Joshua D. ClintonDepartment of Political Science and Center for the Study of Democratic Institutions, Vanderbilt University, Nashville, Tennessee 37235-1817; email: [email protected]
      Annual Review of Political Science Vol. 15: 79 - 99
      • ...similar issues may arise when scholars try to measure elite preferences by using political speeches or text (e.g., Laver et al. 2003), ...

  • 77.
    Lazer D, Kennedy R, King G, Vespignani A. 2014. The parable of Google flu: traps in big data analysis. Science 343(6176): 1203–4
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
    More AR articles citing this reference

    • Climate Change Effects on Pathogen Emergence: Artificial Intelligence to Translate Big Data for Mitigation

      K.A. Garrett,1,2,3 D.P. Bebber,4 B.A. Etherton,1,2,3 K.M. Gold,5 A.I. Plex Sulá,1,2,3 and M.G. Selvaraj61Plant Pathology Department, University of Florida, Gainesville, Florida, USA; email: [email protected]2Food Systems Institute, University of Florida, Gainesville, Florida, USA3Emerging Pathogens Institute, University of Florida, Gainesville, Florida, USA4Department of Biosciences, University of Exeter, Exeter, United Kingdom5Plant Pathology and Plant Microbe Biology Section, School of Integrative Plant Sciences, Cornell AgriTech, Cornell University, Geneva, New York, USA6The Alliance of Bioversity International and the International Center for Tropical Agriculture (CIAT), Cali, Colombia
      Annual Review of Phytopathology Vol. 60: 357 - 378
      • ...the idea that data quantity can make up for data quality, when its predictions were not effective (83)....
    • Influenza Virus: Tracking, Predicting, and Forecasting

      Sheikh Taslim Ali and Benjamin J. CowlingWorld Health Organization Collaborating Centre for Infectious Disease Epidemiology and Control, School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong Special Administrative Region, China; email: [email protected]
      Annual Review of Public Health Vol. 42: 43 - 57
      • ...This data assimilation technique opens up an opportunity to develop the hybrid digital surveillance systems (47, 74) using web-based surveillance with more traditional sources....
    • Infectious Disease Research in the Era of Big Data

      Peter M. Kasson1,21Department of Biomedical Engineering and Department of Molecular Physiology, University of Virginia, Charlottesville, Virginia 22908, USA; email: [email protected]2Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, 752 37 Uppsala, Sweden
      Annual Review of Biomedical Data Science Vol. 3: 43 - 59
      • ...These predictions later showed some systematic overestimation versus CDC estimates and have been criticized by some as an example of big tech hubris (89)....
    • Mining Social Media Data for Biomedical Signals and Health-Related Behavior

      Rion Brattig Correia,1,2,3 Ian B. Wood,2 Johan Bollen,2 and Luis M. Rocha1,21Instituto Gulbenkian de Ciência, 2780-156 Oeiras, Portugal2Center for Social and Biomedical Complexity, Luddy School of Informatics, Computing & Engineering, Indiana University, Bloomington, Indiana 47408, USA; email: [email protected]3CAPES Foundation, Ministry of Education of Brazil, 70040 Braslia DF, Brazil
      Annual Review of Biomedical Data Science Vol. 3: 433 - 458
      • ...such data add value in combination with other health data (22)....
      • ...possibly leading to models that overfit the data and generalize poorly (22, 183)....
    • Social Media– and Internet-Based Disease Surveillance for Public Health

      Allison E. Aiello, Audrey Renson, and Paul N. ZivichDepartment of Epidemiology, Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599-7435, USA; email: [email protected], [email protected], [email protected]
      Annual Review of Public Health Vol. 41: 101 - 118
      • ...the greatest potential use for digital data has been described as the development of hybrid digital surveillance systems (60, 95, 110)....
      • ...The inaccurate predictions by Google Flu Trends in multiple seasons raised awareness of potential biases inherent in digital surveillance (60), ...
    • Macroeconomic Nowcasting and Forecasting with Big Data

      Brandyn Bok,1 Daniele Caratelli,2 Domenico Giannone,1 Argia M. Sbordone,1 and Andrea Tambalotti11Federal Reserve Bank of New York, New York, New York 10045, USA; email: [email protected]2Department of Economics, Stanford University, Stanford, California 94305, USA
      Annual Review of Economics Vol. 10: 615 - 643
      • ... and Lazer et al. (2014) caution about measurement problems that have not yet been fully addressed for these new data sources....
    • Big Data in Public Health: Terminology, Machine Learning, and Privacy

      Stephen J. Mooney1 and Vikas Pejaver21Harborview Injury Prevention and Research Center, University of Washington, Seattle, Washington 98122, USA; email: [email protected]2Department of Biomedical Informatics and Medical Education and the eScience Institute, University of Washington, Seattle, Washington 98109, USA; email: [email protected]
      Annual Review of Public Health Vol. 39: 95 - 112
      • ...; or (e) measures compiled from the data effluent created by life in an electronic world, such as search term records (67), ...
    • Studying the Digital: Directions and Challenges for Digital Methods

      Keith N. HamptonDepartment of Media and Information, Michigan State University, East Lansing, Michigan 48824; email: [email protected]
      Annual Review of Sociology Vol. 43: 167 - 188
      • ... and leaves open the very real possibility of finding spurious correlations (Lazer et al. 2014)....
    • Data ex Machina: Introduction to Big Data

      David Lazer1,2 and Jason Radford1,31Department of Political Science and College of Computer and Information Science, Northeastern University, Boston, Massachusetts 02115; email: [email protected], [email protected]2Institute for Quantitative Social Science, Harvard University, Cambridge, Massachusetts 021383Department of Sociology, University of Chicago, Chicago, Illinois 60637
      Annual Review of Sociology Vol. 43: 19 - 39
      • ...Google Flu Trends sought to track cases of flu using Google search data (Ginsberg et al. 2009; although see Lazer et al. 2014)....
      • ...It is worth noting that this granularity is most useful when fused with traditional methods rather than used as a replacement for them (Lazer et al. 2014)....
      • ...“Big data hubris” (Lazer et al. 2014, p. 1203) is the belief that volume can solve all problems....
      • ...and Google Flu Trends started dramatically overreporting the flu (Lazer et al. 2014)....
      • ...Big data offers strengths and weaknesses that are quite different than existing data sources (Lazer et al. 2014)....
    • Does Big Data Change the Privacy Landscape? A Review of the Issues

      Sallie Ann Keller, Stephanie Shipp, and Aaron SchroederSocial and Decision Analytics Laboratory, Biocomplexity Institute of Virginia Tech, Arlington, Virginia 22203; email: [email protected]
      Annual Review of Statistics and Its Application Vol. 3: 161 - 180
      • ...Today, the “all data” revolution (Lazer et al. 2014) that was first identified as the “big data” revolution (Manyika et al. 2011)...

  • 78.
    LeCun Y, Bengio Y, Hinton G. 2015. Deep learning. Nature 521: 436–44
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Detection and Monitoring of Viral Infections via Wearable Devices and Biometric Data

      Craig J. Goergen,1 MacKenzie J. Tweardy,2 Steven R. Steinhubl,2,3 Stephan W. Wegerich,2 Karnika Singh,4 Rebecca J. Mieloszyk,5 and Jessilyn Dunn41Weldon School of Biomedical Engineering, Purdue University, West Lafayette, Indiana, USA; email: [email protected]2physIQ Inc., Chicago, Illinois, USA3Scripps Research Translational Institute, La Jolla, California, USA4Department of Biomedical Engineering, Duke University, Durham, North Carolina, USA5Microsoft Research, Redmond, Washington, USA
      Annual Review of Biomedical Engineering Vol. 24: 1 - 27
      • ...and random forests, with a greater current interest in neural networks (78) (Figure 3)....
    • Real-Time Functional MRI in the Treatment of Mental Health Disorders

      Vincent Taschereau-Dumouchel,1,2 Cody A. Cushing,3 and Hakwan Lau41Department of Psychiatry and Addictology, Université de Montréal, Montréal, Québec, Canada; email: [email protected]2Centre de Recherche de l'Institut Universitaire en Santé Mentale de Montréal, Montréal, Québec, Canada3Department of Psychology, University of California, Los Angeles, California, USA4RIKEN Center for Brain Science, Wakoshi, Saitama, Japan
      Annual Review of Clinical Psychology Vol. 18: 125 - 154
      • ...Modern machine learning algorithms have recently achieved remarkable successes in the field of pattern recognition and computer vision (LeCun et al. 2015, Goodfellow et al. 2016)....
      • ...Thanks to the recent explosion of research in artificial intelligence and machine learning (LeCun et al. 2015)...
    • Semantic Structure in Deep Learning

      Ellie PavlickDepartment of Computer Science, Brown University, Providence, Rhode Island, USA; email: [email protected]
      Annual Review of Linguistics Vol. 8: 447 - 471
      • ...traditional DSMs have fallen to the wayside in favor of linguistic representations derived from deep learning (LeCun et al. 2015)....
    • Machine Learning for the Study of Plankton and Marine Snow from Images

      Jean-Olivier Irisson,1 Sakina-Dorothée Ayata,1 Dhugal J. Lindsay,2 Lee Karp-Boss,3 and Lars Stemmann11Laboratoire d'Océanographie de Villefranche, Sorbonne Université, CNRS, F-06230 Villefranche-sur-Mer, France; email: [email protected], [email protected], [email protected]2Advanced Science-Technology Research (ASTER) Program, Institute for Extra-Cutting-Edge Science and Technology Avant-Garde Research (X-STAR), Japan Agency for Marine-Earth Science and Technology, Yokosuka, Kanagawa 237-0021, Japan; email: [email protected]3School of Marine Sciences, University of Maine, Orono, Maine 04469, USA; email: [email protected]
      Annual Review of Marine Science Vol. 14: 277 - 301
      • ...We now tend to separate classic machine learning from deep learning (LeCun et al. 2015)....
    • Small Steps with Big Data: Using Machine Learning in Energy and Environmental Economics

      Matthew C. Harding1 and Carlos Lamarche21Department of Economics and Department of Statistics, University of California, Irvine, California 92697; email: [email protected]2Department of Economics, Gatton College of Business and Economics, University of Kentucky, Lexington, Kentucky 40506
      Annual Review of Resource Economics Vol. 13: 469 - 488
      • ...the literature has concentrated its attention on multilayer networks and generalizations to Deep Learning (LeCun et al. 2015, Farrell et al. 2018)....
    • Face Recognition by Humans and Machines: Three Fundamental Advances from Deep Learning

      Alice J. O'Toole1 and Carlos D. Castillo21School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, Texas 75080, USA; email: [email protected]2Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA; email: [email protected]
      Annual Review of Vision Science Vol. 7: 543 - 570
      • ...DCNNs also emulate computational aspects of the ventral visual system (Fukushima 1988, Krizhevsky et al. 2012, LeCun et al. 2015) and support surprisingly direct, ...
    • Spatial Integration in Normal Face Processing and Its Breakdown in Congenital Prosopagnosia

      Galia Avidan1 and Marlene Behrmann21Department of Psychology and Department of Cognitive and Brain Sciences, Ben-Gurion University of the Negev, Beer-Sheva 8410501, Israel; email: [email protected]2Department of Psychology and Neuroscience Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
      Annual Review of Vision Science Vol. 7: 301 - 321
      • ...focusing on a certain location of the stimulus enables the processing of information at that location but also the generation of a prediction of the next location to be processed (Ji et al. 2013, LeCun et al. 2015) (for an illustration of a model dCNN as a coarse analogy to ventral pathway function, ...
    • Optical Coherence Tomography and Glaucoma

      Alexi Geevarghese,1 Gadi Wollstein,1,2,3 Hiroshi Ishikawa,1,2 and Joel S. Schuman1,2,3,41Department of Ophthalmology, NYU Langone Health, NYU Grossman School of Medicine, New York, NY 10016, USA; email: [email protected]2Department of Biomedical Engineering, NYU Tandon School of Engineering, Brooklyn, New York 11201, USA3Center for Neural Science, NYU College of Arts and Sciences, New York, NY 10003, USA4Department of Physiology and Neuroscience, NYU Langone Health, NYU Grossman School of Medicine, New York, NY 10016, USA
      Annual Review of Vision Science Vol. 7: 693 - 726
      • ...Importance is applied to each node on the basis of an iterative training process that determines the optimal weights that yield the smallest classification error (LeCun et al. 2015, Zheng et al. 2019). ...
    • Quantitative Molecular Positron Emission Tomography Imaging Using Advanced Deep Learning Techniques

      Habib Zaidi1,2,3,4 and Issam El Naqa5,6,71Division of Nuclear Medicine and Molecular Imaging, Geneva University Hospital, 1211 Geneva, Switzerland; email: [email protected]2Geneva Neuroscience Centre, University of Geneva, 1205 Geneva, Switzerland3Department of Nuclear Medicine and Molecular Imaging, University of Groningen, 9700 RB Groningen, Netherlands4Department of Nuclear Medicine, University of Southern Denmark, DK-5000 Odense, Denmark5Department of Machine Learning, Moffitt Cancer Center, Tampa, Florida 33612, USA6Department of Radiation Oncology, University of Michigan, Ann Arbor, Michigan 48109, USA7Department of Oncology, McGill University, Montreal, Quebec H3A 1G5, Canada
      Annual Review of Biomedical Engineering Vol. 23: 249 - 276
      • ...but recent studies have shown that it is most effective with deep neural network (DNN) methods due to their universal approximation nature (42, 43)....
    • Extension of Plant Phenotypes by the Foliar Microbiome

      Christine V. Hawkes,1 Rasmus Kjøller,2 Jos M. Raaijmakers,3 Leise Riber,4 Svend Christensen,4 Simon Rasmussen,5 Jan H. Christensen,4 Anders Bjorholm Dahl,6 Jesper Cairo Westergaard,4 Mads Nielsen,7 Gina Brown-Guedira,8 and Lars Hestbjerg Hansen41Department of Plant and Microbial Biology, North Carolina State University, Raleigh, North Carolina 27695, USA; email: [email protected]2Department of Biology, University of Copenhagen, 2100 Copenhagen Ø, Denmark; email: [email protected]3Department of Microbial Ecology, Netherlands Institute of Ecology, 6708 PB Wageningen, The Netherlands; email: [email protected]4Department of Plant and Environmental Sciences, University of Copenhagen, 1871 Frederiksberg C, Denmark; email: [email protected], [email protected], [email protected], [email protected], [email protected]5Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark; email: [email protected]6Department of Applied Mathematics and Computer Science, Technical University of Denmark, 2800 Lyngby, Denmark; email: [email protected]7Department of Computer Science, University of Copenhagen, 2100 Copenhagen Ø, Denmark; email: [email protected]8Plant Science Research Unit, USDA Agricultural Research Service and Department of Crop and Soil Sciences, North Carolina State University, Raleigh, North Carolina 27695, USA; email: [email protected]
      Annual Review of Plant Biology Vol. 72: 823 - 846
      • ...and neural networks have been applied in a wide array of fields within the last 50 years (for an overview, see 75)....
    • Applications of Machine and Deep Learning in Adaptive Immunity

      Margarita Pertseva,1,2 Beichen Gao,1 Daniel Neumeier,1 Alexander Yermanos,1,3,4 and Sai T. Reddy11Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland; email: [email protected]2Life Science Zurich Graduate School, ETH Zurich and University of Zurich, 8006 Zurich, Switzerland3Department of Pathology and Immunology, University of Geneva, 1205 Geneva, Switzerland4Department of Biology, Institute of Microbiology and Immunology, ETH Zurich, 8093 Zurich, Switzerland
      Annual Review of Chemical and Biomolecular Engineering Vol. 12: 39 - 62
      • ...one of its major limitations is that the feature extraction step can be tedious and often requires domain-specific knowledge (50)....
      • ...DL uses a class of algorithms that find a relevant set of features required to perform a particular task in a more automated manner (50)....
      • ...the computed result of one layer acts as an input to the next layer, resulting in an increasingly abstract data representation (50)....
      • ...Full coverage of DL models is outside the scope of this review; the interested reader could refer to several additional resources (50, 51, 55)....
    • Syntactic Structure from Deep Learning

      Tal Linzen1 and Marco Baroni2,3,41Department of Linguistics and Center for Data Science, New York University, New York, NY 10003, USA; email: [email protected]2Facebook AI Research, Paris 75002, France; email: [email protected]3Catalan Institute for Research and Advanced Studies, Barcelona 08010, Spain4Departament de Traducció i Ciències del Llenguatge, Universitat Pompeu Fabra, Barcelona 08018, Spain
      Annual Review of Linguistics Vol. 7: 195 - 212
      • ...which have been rebranded as deep learning (LeCun et al. 2015), ...
    • Toward Realizing the Promise of Educational Neuroscience: Improving Experimental Design in Developmental Cognitive Neuroscience Studies

      Usha GoswamiCentre for Neuroscience in Education, University of Cambridge, Cambridge CB2 3EB, United Kingdom; email: [email protected]
      Annual Review of Developmental Psychology Vol. 2: 133 - 155
      • ...clusters of real medical symptoms) and then acquire expertise that can exceed that of human operators (for example, in medical diagnosis; see LeCun et al. 2015)....
      • ...“a giraffe is standing in the forest with trees in the background”; LeCun et al. 2015)....
    • Spatial Metabolomics and Imaging Mass Spectrometry in the Age of Artificial Intelligence

      Theodore Alexandrov1,21Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany; email: [email protected]2Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, California 92093, USA
      Annual Review of Biomedical Data Science Vol. 3: 61 - 87
      • ...a method that has transformed machine learning by outperforming other methods, first for computer vision and later for other problems (49)....
    • Identifying Regulatory Elements via Deep Learning

      Mira Barshai,1, Eitamar Tripto,2, and Yaron Orenstein11School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva 8410501, Israel; email: [email protected]2Department of Biomedical Engineering, Ben-Gurion University of the Negev, Beer-Sheva 8410501, Israel
      Annual Review of Biomedical Data Science Vol. 3: 315 - 338
      • ...termed deep learning, has been revolutionizing the data science world (45)....
      • ...Prediction accuracy has been improving tremendously for image and text processing tasks (45)....
    • Computational Approaches for Unraveling the Effects of Variation in the Human Genome and Microbiome

      Chengsheng Zhu,1 Maximilian Miller,1 Zishuo Zeng,1 Yanran Wang,1 Yannick Mahlich,1 Ariel Aptekmann,1 and Yana Bromberg1,21Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey 08873, USA; email: [email protected], [email protected]2Department of Genetics, Rutgers University, Piscataway, New Jersey 08854, USA
      Annual Review of Biomedical Data Science Vol. 3: 411 - 432
      • ...a class of machine learning algorithms well suited to processing high-dimensional data, provide new means for this type of analysis (185)....
    • Synaptic Plasticity Forms and Functions

      Jeffrey C. Magee and Christine GrienbergerDepartment of Neuroscience and Howard Hughes Medical Institute, Baylor College of Medicine, Houston, Texas 77030, USA; email: [email protected]
      Annual Review of Neuroscience Vol. 43: 95 - 117
      • ...the learning rules used are essentially the same (Woodrow & Hoff 1960, Rumelhart et al. 1986, LeCun et al. 2015) (Figure 2c–e)....
      • ...While there are relatively straightforward methods to accomplish this in ANNs (Rumelhart et al. 1986, LeCun et al. 2015), ...
    • Opportunities and Challenges for Machine Learning in Materials Science

      Dane Morgan and Ryan JacobsDepartment of Materials Science and Engineering, University of Wisconsin–Madison, Madison, Wisconsin 53706, USA; email: [email protected], [email protected]
      Annual Review of Materials Research Vol. 50: 71 - 103
      • ...The large number of ML models and their many technical details are well covered in many texts and reviews (41–43, 151), ...
    • Machine Learning in Materials Discovery: Confirmed Predictions and Their Underlying Approaches

      James E. Saal,1 Anton O. Oliynyk,2 and Bryce Meredig11Citrine Informatics, Redwood City, California 94063, USA; email: [email protected]2Department of Chemistry and Biochemistry, Manhattan College, Riverdale, New York 10471, USA
      Annual Review of Materials Research Vol. 50: 49 - 69
      • ... and (deep) neural network (NN) (61, 62) algorithms are illustrated conceptually in Figure 5. ...
    • Machine Learning for Molecular Simulation

      Frank Noé,1,2,3 Alexandre Tkatchenko,4 Klaus-Robert Müller,5,6,7 and Cecilia Clementi1,3,81Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany; email: [email protected]2Department of Physics, Freie Universität Berlin, 14195 Berlin, Germany3Department of Chemistry and Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, USA; email: [email protected]4Physics and Materials Science Research Unit, University of Luxembourg, 1511 Luxembourg, Luxembourg; email: [email protected]5Department of Computer Science, Technical University Berlin, 10587 Berlin, Germany; email: [email protected]6Max-Planck-Institut für Informatik, 66123 Saarbrücken, Germany7Department of Brain and Cognitive Engineering, Korea University, Seoul 136-713, South Korea8Department of Physics, Rice University, Houston, Texas 77005, USA
      Annual Review of Physical Chemistry Vol. 71: 361 - 390
      • ...and we refer to the literature for an introduction to statistical learning theory (3, 4) and deep learning (5, 6)....
    • Machine-Learning Quantum States in the NISQ Era

      Giacomo Torlai1 and Roger G. Melko2,31Center for Computational Quantum Physics, Flatiron Institute, New York, NY 10010, USA; email: [email protected]2Department of Physics and Astronomy, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada; email: [email protected]3Perimeter Institute for Theoretical Physics, Waterloo, Ontario N2L 2Y5, Canada
      Annual Review of Condensed Matter Physics Vol. 11: 325 - 344
      • ...Artificial neural networks, the bedrock of modern machine learning and artificial intelligence (8), ...
    • Statistical Mechanics of Deep Learning

      Yasaman Bahri,1 Jonathan Kadmon,2 Jeffrey Pennington,1 Sam S. Schoenholz,1 Jascha Sohl-Dickstein,1 and Surya Ganguli1,21Google Brain, Google Inc., Mountain View, California 94043, USA2Department of Applied Physics, Stanford University, Stanford, California 94035, USA; email: [email protected]
      Annual Review of Condensed Matter Physics Vol. 11: 501 - 528
      • ...Deep neural networks, with multiple hidden layers (1), have achieved remarkable success across many fields, ...
    • Use of Mechanistic Nutrition Models to Identify Sustainable Food Animal Production

      Mark D. Hanigan1 and Veridiana L. Daley1,21Department of Dairy Science, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061, USA; email: [email protected], [email protected]2National Animal Nutrition Program (NANP), Department of Animal & Food Sciences, University of Kentucky, Lexington, Kentucky 40546, USA
      Annual Review of Animal Biosciences Vol. 8: 355 - 376
      • ...outputs, and detection of patterns in the input variables (121); thus, ...
    • Distributional Semantics and Linguistic Theory

      Gemma Boleda1,21Department of Translation and Language Sciences, Universitat Pompeu Fabra, Barcelona 08018, Spain; email: [email protected]2Catalan Institution for Research and Advanced Studies (ICREA), Barcelona 08010, Spain
      Annual Review of Linguistics Vol. 6: 213 - 234
      • ...Neural networks are a type of machine learning algorithm, recently revamped as deep learning (LeCun et al. 2015), ...
    • Big Data and Artificial Intelligence Modeling for Drug Discovery

      Hao ZhuDepartment of Chemistry and Center for Computational and Integrative Biology, Rutgers University, Camden, New Jersey 08102, USA; email: [email protected]
      Annual Review of Pharmacology and Toxicology Vol. 60: 573 - 589
      • ...The milestone paper of deep learning was published at almost the same time (103), ...
    • Machine Learning for Fluid Mechanics

      Steven L. Brunton,1 Bernd R. Noack,2,3 and Petros Koumoutsakos41Department of Mechanical Engineering, University of Washington, Seattle, Washington 98195, USA2LIMSI (Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur), CNRS UPR 3251, Université Paris-Saclay, F-91403 Orsay, France3Institut für Strömungsmechanik und Technische Akustik, Technische Universität Berlin, D-10634 Berlin, Germany4Computational Science and Engineering Laboratory, ETH Zurich, CH-8092 Zurich, Switzerland; email: [email protected]
      Annual Review of Fluid Mechanics Vol. 52: 477 - 508
      • ...which sparked the current movement in deep learning (LeCun et al. 2015)....
    • Concepts and Compositionality: In Search of the Brain's Language of Thought

      Steven M. Frankland1 and Joshua D. Greene21Princeton Neuroscience Institute, Princeton University, Princeton, New Jersey 08544, USA; email: [email protected]2Department of Psychology and Center for Brain Science, Harvard University, Cambridge, Massachusetts 02138, USA; email: [email protected]
      Annual Review of Psychology Vol. 71: 273 - 303
      • ...proponents of the LoT hypothesis suspect that human comprehension depends on complex semantic representations with internal representations that are far more structurally constrained. LeCun et al. (2015, ...
    • Data-Driven Approaches to Understanding Visual Neuron Activity

      Daniel A. ButtsDepartment of Biology and Program in Neuroscience and Cognitive Science, University of Maryland, College Park, Maryland 20742, USA; email: [email protected]
      Annual Review of Vision Science Vol. 5: 451 - 477
      • ...the recent machine learning–driven successes in computer vision (Kriegeskorte 2015, Krizhevsky et al. 2012, LeCun et al. 2015, Serre 2019) suggest a new range of possible approaches, ...
      • ...making such methods broadly accessible for using DNNs to fit a larger variety of data (LeCun et al. 2015)....
      • ...refer to solving tasks such as object and face recognition and have played a crucial role in driving the development of DNNs (LeCun et al. 2015, Schmidhuber 2015, Serre 2019)....
    • Machine Learning Methods That Economists Should Know About

      Susan Athey1,2,3 and Guido W. Imbens1,2,3,41Graduate School of Business, Stanford University, Stanford, California 94305, USA; email: [email protected], [email protected]2Stanford Institute for Economic Policy Research, Stanford University, Stanford, California 94305, USA3National Bureau of Economic Research, Cambridge, Massachusetts 02138, USA4Department of Economics, Stanford University, Stanford, California 94305, USA
      Annual Review of Economics Vol. 11: 685 - 725
      • ...This suggests that using a deep model expresses a useful preference over the space of functions the model can learn. (LeCun et al. 2015, ...
    • Computational and Informatic Advances for Reproducible Data Analysis in Neuroimaging

      Russell A. Poldrack,1 Krzysztof J. Gorgolewski,1 and Gaël Varoquaux21Department of Psychology, Stanford University, Stanford, California 94305, USA; email: [email protected]2Parietal Team, Inria and NeuroSpin/CEA (Atomic Energy Commission), 91191 Gif/-sur-Yvette, France
      Annual Review of Biomedical Data Science Vol. 2: 119 - 138
      • ...Machine learning has opened new alleys in extracting information from texts, images, genomes, etc. (42), ...
    • Scientific Discovery Games for Biomedical Research

      Rhiju Das,1 Benjamin Keep,2Peter Washington,3 and Ingmar H. Riedel-Kruse31Department of Biochemistry and Department of Physics, Stanford University, Stanford, California 94305, USA; email: [email protected]2Department of Learning Sciences, Stanford University, Stanford, California 94305, USA3Department of Bioengineering, Stanford University, Stanford, California 94305, USA; email: [email protected]
      Annual Review of Biomedical Data Science Vol. 2: 253 - 279
      • ...it will be important to compare results to more recent algorithmic methods for the same visual tasks, which have been improving at an impressive pace (85)....
    • System Identification: A Machine Learning Perspective

      A. Chiuso and G. PillonettoDepartment of Information Engineering, University of Padova, 35131 Padova, Italy; email: [email protected]
      Annual Review of Control, Robotics, and Autonomous Systems Vol. 2: 281 - 304
      • ...which have recently garnered renewed interest thanks to deep networks’ success in classification and pattern recognition (7)....
    • Deep Learning and Its Application to LHC Physics

      Dan Guest,1 Kyle Cranmer,2 and Daniel Whiteson11Department of Physics and Astronomy, University of California, Irvine, California 92697, USA2Physics Department, New York University, New York, NY 10003, USA
      Annual Review of Nuclear and Particle Science Vol. 68: 161 - 181
      • ...when a convergence of techniques enabled training of very large neural networks that greatly outperformed the previous state of the art (2...
    • Invariant Recognition Shapes Neural Representations of Visual Input

      Andrea Tacchetti, Leyla Isik, and Tomaso A. PoggioCenter for Brains, Minds and Machines, MIT, Cambridge, Massachusetts 02139, USA; email: [email protected], [email protected], [email protected]
      Annual Review of Vision Science Vol. 4: 403 - 422
      • ... and the availability of powerful computational models (Serre et al. 2007a, Kriegeskorte 2015, LeCun et al. 2015), ...
      • ...specific instances of this class of models achieved human-level performance on a number of perceptual tasks (Kriegeskorte 2015, LeCun et al. 2015), ...
      • ...and one model with convolutional templates learned by optimizing performance on an action recognition task (LeCun et al. 2015)....
    • Hyperspectral Sensors and Imaging Technologies in Phytopathology: State of the Art

      A.-K. Mahlein,1 M.T. Kuska,2 J. Behmann,2 G. Polder,3 and A. Walter41Institute of Sugar Beet Research (IfZ), 37079 Göttingen, Germany; email: [email protected]2Institute of Crop Science and Resource Conservation (INRES)–Plant Diseases and Plant Protection, University of Bonn, 53115 Bonn, Germany3Greenhouse Horticulture, Wageningen University and Research, 6708PB Wageningen, Netherlands4Institute of Agricultural Sciences, ETH Zürich, 8092 Zürich, Switzerland
      Annual Review of Phytopathology Vol. 56: 535 - 558
      • ...Recently, deep learning arose for data analysis from machine learning (86)....
      • ...the general trend to use deep learning approaches has changed the way of data interpretation in many application fields (86)....
    • Computational Methods for Understanding Mass Spectrometry–Based Shotgun Proteomics Data

      Pavel Sinitcyn, Jan Daniel Rudolph, and Jürgen CoxComputational Systems Biochemistry Research Group, Max Planck Institute of Biochemistry, 82152 Martinsried, Germany; email: [email protected]
      Annual Review of Biomedical Data Science Vol. 1: 207 - 234
      • ...Deep learning (145, 146) is gaining traction in proteomics (75) and will likely find more applications in the future....
    • Defining Phenotypes from Clinical Data to Drive Genomic Research

      Jamie R. Robinson,1,2 Wei-Qi Wei,1 Dan M. Roden,1,3,4 and Joshua C. Denny1,31Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee 37232, USA; email: [email protected]2Department of General Surgery, Vanderbilt University Medical Center, Nashville, Tennessee 37232, USA3Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee 37232, USA4Department of Pharmacology, Vanderbilt University Medical Center, Nashville, Tennessee 37232, USA
      Annual Review of Biomedical Data Science Vol. 1: 69 - 92
      • ...The key aspect of deep learning is that these layers of features are learned from the data rather than designed by domain experts (86)....
    • Toward an Integrative Theory of Thalamic Function

      Rajeev V. Rikhye,1,2 Ralf D. Wimmer,1,3 and Michael M. Halassa1,2,31Department of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA; email: [email protected]2McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA3Stanley Center for Psychiatric Genetics, Broad Institute, Cambridge, Massachusetts 02139, USA
      Annual Review of Neuroscience Vol. 41: 163 - 183
      • ...and recent advances in coupling artificial HCNNs with more efficient learning algorithms have given rise to the revolution of machines that are almost on par with humans in their ability to recognize objects (Hassabis et al. 2017, LeCun et al. 2015). ...
    • Computational Principles of Supervised Learning in the Cerebellum

      Jennifer L. Raymond1 and Javier F. Medina21Department of Neurobiology, Stanford University School of Medicine, Stanford, California 94305, USA; email: [email protected]2Department of Neuroscience, Baylor College of Medicine, Houston, Texas 77030, USA; email: [email protected]
      Annual Review of Neuroscience Vol. 41: 233 - 253
      • ...the process of finding a suitable representation of the input data is called feature engineering and is a critical step that often determines whether the algorithm will succeed or fail (Bengio et al. 2013, LeCun et al. 2015). (c) Instructive signals compose the third element....
    • Machine Learning Approaches for Clinical Psychology and Psychiatry

      Dominic B. Dwyer, Peter Falkai, and Nikolaos KoutsoulerisDepartment of Psychiatry and Psychotherapy, Section for Neurodiagnostic Applications, Ludwig-Maximilian University, Munich 80638, Germany; email: [email protected], [email protected], [email protected]
      Annual Review of Clinical Psychology Vol. 14: 91 - 118
      • ...The idea of meta-learning is an important concept in fields such as deep learning (LeCun et al. 2015), ...
    • Big Data in Public Health: Terminology, Machine Learning, and Privacy

      Stephen J. Mooney1 and Vikas Pejaver21Harborview Injury Prevention and Research Center, University of Washington, Seattle, Washington 98122, USA; email: [email protected]2Department of Biomedical Informatics and Medical Education and the eScience Institute, University of Washington, Seattle, Washington 98109, USA; email: [email protected]
      Annual Review of Public Health Vol. 39: 95 - 112
      • ...have been used extensively in image classification and natural language processing (68)....
    • Computational Neuroscience: Mathematical and Statistical Perspectives

      Robert E. Kass,1 Shun-Ichi Amari,2 Kensuke Arai,3 Emery N. Brown,4,5 Casey O. Diekman,6 Markus Diesmann,7,8 Brent Doiron,9 Uri T. Eden,3 Adrienne L. Fairhall,10 Grant M. Fiddyment,3 Tomoki Fukai,2 Sonja Grün,7,8 Matthew T. Harrison,11 Moritz Helias,7,8 Hiroyuki Nakahara,2 Jun-nosuke Teramae,12 Peter J. Thomas,13 Mark Reimers,14 Jordan Rodu,15 Horacio G. Rotstein,16,17 Eric Shea-Brown,10 Hideaki Shimazaki,18,19 Shigeru Shinomoto,19 Byron M. Yu,20 and Mark A. Kramer31Department of Statistics, Machine Learning Department, and Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA; email: [email protected]2Mathematical Neuroscience Laboratory, RIKEN Brain Science Institute, Wako, Saitama Prefecture 351-0198, Japan3Department of Mathematics and Statistics, Boston University, Boston, Massachusetts 02215, USA4Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA5Department of Anesthesia, Harvard Medical School, Boston, Massachusetts 02115, USA6Department of Mathematical Sciences, New Jersey Institute of Technology, Newark, New Jersey 07102, USA7Institute of Neuroscience and Medicine, Jülich Research Centre, 52428 Jülich, Germany8Department of Theoretical Systems Neurobiology, Institute of Biology, RWTH Aachen University, 52062 Aachen, Germany9Department of Mathematics, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, USA10Department of Physiology and Biophysics, University of Washington, Seattle, Washington 98105, USA11Division of Applied Mathematics, Brown University, Providence, Rhode Island 02912, USA12Department of Integrated Theoretical Neuroscience, Osaka University, Suita, Osaka Prefecture 565-0871, Japan13Department of Mathematics, Applied Mathematics, and Statistics, Case Western Reserve University, Cleveland, Ohio 44106, USA14Department of Neuroscience, Michigan State University, East Lansing, Michigan 48824, USA15Department of Statistics, University of Virginia, Charlottesville, Virginia 22904, USA16Federated Department of Biological Sciences, Rutgers University/New Jersey Institute of Technology, Newark, New Jersey 07102, USA17Institute for Brain and Neuroscience Research, New Jersey Institute of Technology, Newark, New Jersey 07102, USA18Honda Research Institute Japan, Wako, Saitama Prefecture 351-0188, Japan19Department of Physics, Kyoto University, Kyoto, Kyoto Prefecture 606-8502, Japan20Department of Electrical and Computer Engineering and Department of Biomedical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
      Annual Review of Statistics and Its Application Vol. 5: 183 - 214
      • ...3.4.5. Deep learning.Deep learning (le Cun et al. 2015) is an outgrowth of PDP modeling (see Section 1.4)....
      • ...receptive fields (le Cun et al. 2015) identify a very specific input pattern, ...
    • Neural Circuitry of Reward Prediction Error

      Mitsuko Watabe-Uchida,1, Neir Eshel,1,2, and Naoshige Uchida11Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, Massachusetts 02138; email: [email protected], [email protected]2Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, California 94305; email: [email protected]
      Annual Review of Neuroscience Vol. 40: 373 - 394
      • ...versus simple box-and-arrow computations? As is the case in modern artificial neural networks (LeCun et al. 2015), ...
    • Toward a Rational and Mechanistic Account of Mental Effort

      Amitai Shenhav,1,2 Sebastian Musslick,3 Falk Lieder,4 Wouter Kool,5 Thomas L. Griffiths,6 Jonathan D. Cohen,3,7 and Matthew M. Botvinick8,91Department of Cognitive, Linguistic and Psychological Sciences, Brown University, Providence, Rhode Island 02912; email: [email protected]2Brown Institute for Brain Science, Brown University, Providence, Rhode Island 029123Princeton Neuroscience Institute, Princeton University, Princeton, New Jersey 085444Helen Wills Neuroscience Institute, University of California, Berkeley, California 947205Department of Psychology, Harvard University, Cambridge, Massachusetts 021386Department of Psychology, University of California, Berkeley, California 947207Department of Psychology, Princeton University, Princeton, New Jersey 085408Google DeepMind, London M1C 4AG, United Kingdom9Gatsby Computational Neuroscience Unit, University College London, London W1T 4JG, United Kingdom
      Annual Review of Neuroscience Vol. 40: 99 - 124
      • ... and is driving the current explosion of interest in deep learning networks within the machine learning community (Bengio et al. 2013, Caruana 1998, LeCun et al. 2015)....
    • The Role of Variability in Motor Learning

      Ashesh K. Dhawale,1,2 Maurice A. Smith,2,3 and Bence P. Ölveczky1,21Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138; email: [email protected]2Center for Brain Science, Harvard University, Cambridge, Massachusetts 021383John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts 02138
      Annual Review of Neuroscience Vol. 40: 479 - 498
      • ...been due to the use of convolutional network architectures that reduce dramatically the dimensionality of the solution space by enforcing highly symmetric patterns in the weights to be learned (LeCun et al. 1998, 2015...
      • ...Another key to the success of deep learning networks has been the use of unsupervised methods to pretrain networks based on the statistics of the input data (Hinton et al. 2006, LeCun et al. 2015, Lee et al. 2009)....
    • Deep Learning in Medical Image Analysis

      Dinggang Shen,1,2 Guorong Wu,1 and Heung-Il Suk21Department of Radiology, University of North Carolina, Chapel Hill, North Carolina 27599; email: [email protected]2Department of Brain and Cognitive Engineering, Korea University, Seoul 02841, Republic of Korea; email: [email protected]
      Annual Review of Biomedical Engineering Vol. 19: 221 - 248
      • ...and then discovers the informative representations in a self-taught manner (8, 9)....
      • ...Deep neural networks can discover hierarchical feature representations such that higher-level features can be derived from lower-level features (9)....
    • Personal Sensing: Understanding Mental Health Using Ubiquitous Sensors and Machine Learning

      David C. Mohr,1 Mi Zhang,2 and Stephen M. Schueller11Center for Behavioral Intervention Technologies and Department of Preventive Medicine, Northwestern University, Chicago, Illinois 60611; email: [email protected], [email protected]2Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824; email: [email protected]
      Annual Review of Clinical Psychology Vol. 13: 23 - 47
      • ...they do not generalize well to challenging problems involving large-scale datasets (LeCun et al. 2015)....
    • Visual Object Recognition: Do We (Finally) Know More Now Than We Did?

      Isabel Gauthier1 and Michael J. Tarr21Department of Psychology, Vanderbilt University, Nashville, Tennessee 37240-7817; email: [email protected]2Department of Psychology, Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
      Annual Review of Vision Science Vol. 2: 377 - 396
      • ...most typically embodied—as illustrated in Figure 3—in convolutional neural networks (CNNs) (LeCun et al. 2015). (By one estimate, ...
      • ...Figure and caption adapted, with permission, from LeCun et al. (2015)....
    • Early Visual Cortex as a Multiscale Cognitive Blackboard

      Pieter R. Roelfsema1,2,3 and Floris P. de Lange41Netherlands Institute for Neuroscience, 1105 BA Amsterdam, The Netherlands; email: [email protected]2Department of Integrative Neurophysiology, VU University Amsterdam, 1081 HV Amsterdam, The Netherlands3Psychiatry Department, Academic Medical Center, 1105 AZ Amsterdam, The Netherlands4Donders Institute for Brain, Cognition and Behavior, Radboud University, 6525 EN Nijmegen, The Netherlands
      Annual Review of Vision Science Vol. 2: 131 - 151
      • ...Recent progress in deep learning has been made in the recognition of semantic categories in photographs by using neural networks consisting of many layers (LeCun et al. 2015)....
    • Deep Neural Networks: A New Framework for Modeling Biological Vision and Brain Information Processing

      Nikolaus KriegeskorteMedical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge CB2 7EF, United Kingdom; email: [email protected]
      Annual Review of Vision Science Vol. 1: 417 - 446
      • ...I argue that recent advances in neural network models (LeCun et al. 2015) will usher in a new era of computational neuroscience, ...

  • 79.
    Leskovec J, Backstrom L, Kleinberg J. 2009. Meme-tracking and the dynamics of the news cycle. Paper presented at 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, June 28–July 1, Paris, France
    • Crossref
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
    More AR articles citing this reference

    • Machine Translation: Mining Text for Social Theory

      James A. Evans and Pedro AcevesDepartment of Sociology, University of Chicago, Chicago, Illinois 60637; email: [email protected]
      Annual Review of Sociology Vol. 42: 21 - 50
      • ...Within mass and social media, Leskovec et al. (2009) have analyzed the temporal dynamics of the recurring news cycle on a large data set of news and social media sites. Tan et al. (2014)...
    • The Consequences of the Internet for Politics

      Henry FarrellDepartment of Political Science, George Washington University, Washington, DC 20037; email: [email protected]
      Annual Review of Political Science Vol. 15: 35 - 52
      • ...it is possible both to examine the dissemination and gradual transformation of ideas across social networks (Leskovec et al. 2009)...

  • 80.
    Libicki MC. 2014. Why cyber war will not and should not have its grand strategist. Strateg. Stud. Q. 8(1): 23–39
    • Google Scholar
    Article Location
  • 81.
    Lin H, Tegmark M, Rolnick D. 2017. Why does deep and cheap learning work so well? J. Stat. Phys. 168(6): 1223–47
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
  • 82.
    Lugmayr A, Stockleben B, Scheib C. 2016. A comprehensive survey on big-data research and its implications—What is really ‘new’ in big data?—It's cognitive big data! In PACIS 2016 Proceedings, Abstr. 248. https://aisel.aisnet.org/pacis2016/248
    • Google Scholar
    Article Location
  • 83.
    Luks S, Brady HE. 2003. Defining welfare spells. Coping with problems of survey responses and administrative data. Eval. Rev. 27(4): 395–420
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
  • 84.
    Lyman P, Varian HR. 2003. How much information? Executive summary. Rep. School Inf. Manag. Syst., Univ. Calif., Berkeley, CA. http://groups.ischool.berkeley.edu/archive/how-much-info-2003/execsum.htm
    • Google Scholar
    Article Location
  • 85.
    Maimon O, Roach L. 2005. The Data Mining and Knowledge Discovery Handbook. New York: Springer
    • Crossref
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Information Recovery and Causality: A Tribute to George Judge

      Gordon Rausser1 and David A. Bessler21Department of Agricultural and Resource Economics, University of California, Berkeley, California 94720; email: [email protected]2Department of Agricultural Economics, Texas A&M University, College Station, Texas 77843; email: [email protected]
      Annual Review of Resource Economics Vol. 8: 7 - 23
      • ...Maimon & Rokach (2005, p. 2) concur with their more parsimonious definition: “Data mining is a term coined to describe the process of sifting (as one sifts flour) through large databases for interesting patterns and relationships.”...
      • ...Predictive methods look to predict endogenous variables on the basis of these groupings. Maimon & Rokach (2005) discuss similar functions in terms of unsupervised learning (e.g., ...

  • 86.
    Manjoo F. 2016. A plan in case robots take the jobs: give everyone a paycheck. New York Times, Mar. 2. https://www.nytimes.com/2016/03/03/technology/plan-to-fight-robot-invasion-at-work-give-everyone-a-paycheck.html
    • Google Scholar
    Article Location
  • 87.
    Mayer-Schönberger V, Cukier K. 2014. Big Data: A Revolution That Will Transform How We Live, Work, and Think. Boston: Houghton Mifflin Harcourt
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
  • 88.
    Mbadiwe T. 2018. Algorithmic injustice. New Atlantis 54: 3–28
    • Google Scholar
    Article Location
  • 89.
    Mergel I. 2016. Big data in public affairs education. J. Public Aff. Educ. 22(2): 231–48
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
  • 90.
    Miller K. 2012. Big data analytics in biomedical research. Biomed. Comput. Rev. Winter 2011/2012:14–21. http://biomedicalcomputationreview.org/content/big-data-analytics-biomedical-research
    • Google Scholar
    Article Location
  • 91.
    Mosco V. 2014. To the Cloud: Big Data in a Turbulent World. New York: Paradigm
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
  • 92.
    Mullainathan S, Spiess J. 2017. Machine learning: an applied econometric approach. J. Econ. Perspect. 31(2): 87–106
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Small Steps with Big Data: Using Machine Learning in Energy and Environmental Economics

      Matthew C. Harding1 and Carlos Lamarche21Department of Economics and Department of Statistics, University of California, Irvine, California 92697; email: [email protected]2Department of Economics, Gatton College of Business and Economics, University of Kentucky, Lexington, Kentucky 40506
      Annual Review of Resource Economics Vol. 13: 469 - 488
      • ...It is known that ML algorithms can learn from complex data and offer better out-of-sample predictions than standard parametric and nonparametric methods (see Mullainathan & Spiess 2017, Harding & Hersh 2018, ...
    • What Shapes the Quality and Behavior of Government Officials? Institutional Variation in Selection and Retention Methods

      Claire S.H. Lim1,2 and James M. Snyder, Jr.3,41School of Economics and Finance, Queen Mary University of London, London E1 4NS, United Kingdom; email: [email protected]2Centre for Economic Policy Research, London EC1V 0DX, United Kingdom3Department of Government, Harvard University, Cambridge, Massachusetts 02138, USA; email: [email protected]4National Bureau of Economic Research, Cambridge, Massachusetts 02138, USA
      Annual Review of Economics Vol. 13: 87 - 109
      • ...the scope of the data that can be digitized and analyzed quantitatively is expanding rapidly. Athey & Imbens (2019) and Mullainathan & Spiess (2017)...
    • Machine Learning for Social Science: An Agnostic Approach

      Justin Grimmer,1 Margaret E. Roberts,2 and Brandon M. Stewart31Department of Political Science and Hoover Institution, Stanford University, Stanford, California 94305, USA; email: [email protected]2Department of Political Science and Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, California 92093, USA; email: [email protected]3Department of Sociology and Office of Population Research, Princeton University, Princeton, New Jersey 08540, USA; email: [email protected]
      Annual Review of Political Science Vol. 24: 395 - 419
      • ... and the excellent reviews of machine learning for economists (Mullainathan & Spiess 2017, Athey & Imbens 2019)...
      • ...This is in contrast to most areas of social science. Mullainathan & Spiess (2017) characterize this as a difference between a focus on (the prediction of the outcome and the focus of machine learning) and (the parameter of the model and the focus of social scientists)....
    • Computational Methods in Legal Analysis

      Jens Frankenreiter1 and Michael A. Livermore21Ira M. Millstein Center for Global Markets and Corporate Ownership, Columbia Law School, Columbia University, New York, NY 10027, USA2School of Law, University of Virginia, Charlottesville, Virginia 22903, USA; email: [email protected]
      Annual Review of Law and Social Science Vol. 16: 39 - 57
      • ...the credibility of regression analysis and similar techniques can be improved by using machine-learning techniques in certain steps of the analysis (Copus et al. 2019, Mullainathan & Spiess 2017)....
      • ...This shift does come with a cost: Machine learning is primarily geared toward prediction and classification rather than causal inference (Mullainathan & Spiess 2017)....
    • Machine Learning Methods That Economists Should Know About

      Susan Athey1,2,3 and Guido W. Imbens1,2,3,41Graduate School of Business, Stanford University, Stanford, California 94305, USA; email: [email protected], [email protected]2Stanford Institute for Economic Policy Research, Stanford University, Stanford, California 94305, USA3National Bureau of Economic Research, Cambridge, Massachusetts 02138, USA4Department of Economics, Stanford University, Stanford, California 94305, USA
      Annual Review of Economics Vol. 11: 685 - 725
      • ...As Mullainathan & Spiess (2017) highlight, some substantive problems are naturally cast as prediction problems, ...
      • ...even though there are cases where using simple off-the-shelf algorithms from the ML literature can be effective (for examples, see Mullainathan & Spiess 2017), ...
      • ... provides an early high-level discussion of a selection of important ML methods. Mullainathan & Spiess (2017)...
      • ...there are some conceptual differences in the ML literature (for discussion, see Mullainathan & Spiess 2017)....
    • Experiments on Cognition, Communication, Coordination, and Cooperation in Relationships

      Vincent P. Crawford1,2,31Department of Economics, University of Oxford, Oxford OX1 3UQ, United Kingdom; email: [email protected]2All Souls College, University of Oxford, Oxford OX1 4AL, United Kingdom3Department of Economics, University of California, San Diego, California 92093-0508, USA
      Annual Review of Economics Vol. 11: 167 - 191
      • ...see Mullainathan & Spiess 2017, Fudenberg & Liang 2019, Gentzkow et al. 2019)....
    • Machine Learning for Sociology

      Mario Molina and Filiz GaripDepartment of Sociology, Cornell University, Ithaca, New York 14853, USA; email: [email protected], [email protected]
      Annual Review of Sociology Vol. 45: 27 - 45
      • ...that penalizes functions that create variance (Kleinberg et al. 2015, Mullainathan & Spiess 2017)....
      • ...but the best fitting tree among those of a certain depth (Mullainathan & Spiess 2017)....
      • ...are two essential features: regularization and the data-driven choice of regularization parameters (also known as empirical tuning) (Athey & Imbens 2017, Kleinberg et al. 2015, Mullainathan & Spiess 2017)....
      • ...Instead, SML is good at solving what Mullainathan & Spiess (2017, ...
      • ...and data augmentation [for reviews, see Mullainathan & Spiess (2017) for predictive modeling in economics, ...
      • ...We provide some basic intuition and examples from this rather technical literature and refer the readers to Athey & Imbens (2017) and Mullainathan & Spiess (2017) for comprehensive reviews, ...
    • The Digitization of Patient Care: A Review of the Effects of Electronic Health Records on Health Care Quality and Utilization

      Hilal Atasoy,1 Brad N. Greenwood,2 and Jeffrey Scott McCullough31Department of Accounting, Temple University, Philadelphia, Pennsylvania 19122, USA; email: [email protected]2Carlson School of Management, University of Minnesota, Minneapolis, Minnesota 55455, USA; email: [email protected]3Department of Health Management and Policy, University of Michigan, Ann Arbor, Michigan 48109-2029, USA; email: [email protected]
      Annual Review of Public Health Vol. 40: 487 - 500
      • ...Machine learning and artificial intelligence instead learn patterns from data and discover information that might otherwise have gone unnoticed (85), ...
    • Econometric Methods for Program Evaluation

      Alberto Abadie1 and Matias D. Cattaneo21Department of Economics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, USA; email: [email protected]2Department of Economics and Department of Statistics, University of Michigan, Ann Arbor, Michigan 48109, USA; email: [email protected]
      Annual Review of Economics Vol. 10: 465 - 503
      • ...An important recent development that has had a profound impact on the program evaluation literature is the arrival of new data environments (Mullainathan & Spiess 2017)....

  • 93.
    Nagler J, Tucker JA. 2015. Drawing inferences and testing theories with big data. PS Political Sci. Politics 48(1): 84–88
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • External Validity

      Michael G. Findley,1 Kyosuke Kikuta,2 and Michael Denly11Department of Government, University of Texas, Austin, Texas 78712, USA; email: [email protected]2Osaka School of International Public Policy, Osaka University, Osaka 560-0043, Japan
      Annual Review of Political Science Vol. 24: 365 - 393
      • ...As Nagler & Tucker (2015, p. 85) argue, “With big data comes the illusion of big precision.” This hope currently rests on a precarious foundation of brittle assumptions about causal processes, ...

  • 94.
    National Research Council. 2011. Toward Precision Medicine: Building a Knowledge Network for Biomedical Research and a New Taxonomy of Disease. Washington, DC: Natl. Acad. Press
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Treatment Selection in Depression

      Zachary D. Cohen and Robert J. DeRubeisDepartment of Psychology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA; email: [email protected]
      Annual Review of Clinical Psychology Vol. 14: 209 - 236
      • ...Precision medicine1 has afforded major advances in cancer treatment (National Research Council 2011, Schwaederle et al. 2015)....
      • ...that term is sometimes misinterpreted as implying that unique treatments can be designed for each individual” (National Research Council 2011, ...

  • 95.
    National Research Council. 2013. Frontiers in Massive Data Analysis. Washington, DC: Natl. Acad. Press
    • Google Scholar
    Article Location
  • 96.
    Neumann R. 2016. The Digital Difference: Media Technology and the Theory of Communication Effects. Cambridge, MA: Harvard Univ. Press
    • Crossref
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
    • Article Location
    • Article Location
  • 97.
    Nickerson DW, Rogers T. 2014. Political campaigns and big data. J. Econ. Perspect. 28(2): 51–73
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
  • 98.
    NIST (Natl. Inst. Standards Technol.). 2015. Big data interoperability framework: Volume 1, definitions. NIST Spec. Publ. 1500-1. https://bigdatawg.nist.gov/_uploadfiles/NIST.SP.1500-1.pdf
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
    • Article Location
    • Article Location
  • 99.
    NITRD (Netw. Inf. Technol. Res. Dev.). 2016. The federal big data research and development strategic plan. Rep. Big Data Senior Steering Group, Subcomm. NITRD, Washington, DC. https://www.nitrd.gov/PUBS/bigdatardstrategicplan.pdf
    • Google Scholar
    Article Location
  • 100.
    Noble S. 2018. Algorithms of Oppression: How Search Engines Reinforce Racism. New York: New York Univ. Press
    • Crossref
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Governance by Data

      Fleur JohnsFaculty of Law & Justice, University of New South Wales (UNSW) Sydney, New South Wales 2052, Australia; email: [email protected]
      Annual Review of Law and Social Science Vol. 17: 53 - 71
      • ...and race and gender (Benjamin 2019, McMillan Cottom 2020, Nakamura & Chow-White 2012, Noble 2018, Shah 2015)....
    • The Society of Algorithms

      Jenna Burrell and Marion FourcadeSchool of Information and Department of Sociology, University of California, Berkeley, California 94720, USA; email: [email protected], [email protected]
      Annual Review of Sociology Vol. 47: 213 - 237
      • ...Scholars are now reckoning with the fact that algorithmic systems may both reproduce certain group inequalities and create new social hierarchies (Barocas & Selbst 2016, Benjamin 2019, Eubanks 2017, Fourcade & Healy 2013, Gandy 1993, Noble 2018, O'Neill 2016)....
      • ...data sets widely used to train facial recognition systems contributed to the propagation of errors and prejudices throughout the field (Hanna et al. 2020, Noble 2018)....
      • ...adversely affecting women and members of minority groups (Cottom 2017, Noble 2018, Sweeney 2013)....
    • Typologies, Typifications, and Types

      Stephanie Sadre-OrafaiDepartment of Anthropology, University of Cincinnati, Cincinnati, Ohio 45221-0380, USA; email: [email protected]
      Annual Review of Anthropology Vol. 49: 193 - 208
      • ...From algorithms (Besteman & Gusterson 2019, Crawford & Paglen 2019, Noble 2018), advertising (Dávila 2001, Shankar 2015)...
      • ...and communication scholars have developed incisive analyses of digital biotypologies created through artificial intelligence (AI) and other algorithmic ways of knowing and critiques of implicit bias and the individuation of systematic racism (Benjamin 2019, Browne 2015, Gates 2011, Kahn 2018, Magnet 2011, Noble 2018), ...
      • ...ImageNet Roulette provides a glimpse into that process—and to show how things can go wrong.” This project instantiates the findings of critical algorithm studies (Benjamin 2019, Noble 2018)....
    • Why Sociology Matters to Race and Biosocial Science

      Dorothy E. Roberts1 and Oliver Rollins21Department of Sociology, Department of Africana Studies, Law School, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA; email: [email protected]2Department of Sociology, University of Louisville, Louisville, Kentucky 40292, USA; email: [email protected]
      Annual Review of Sociology Vol. 46: 195 - 214
      • ...Noble (2018) exposed the way Google searches readily yield racist and pornographic representations of black women and girls....
      • ...Yet an identical Google search for “black girls” produced less sexualized depictions of black girls after programmers changed the algorithmic codes and suppressed pornographic representations (Noble 2018)....
      • ...health care, and public assistance (Benjamin 2019, Eubanks 2018, Noble 2018, Obermeyer et al. 2019)....
      • ...algorithmic codes can operate as malleable platforms that facilitate digital epidermalization (Browne 2015) or technological redlining (Noble 2018) practices of racial surveillance....
    • Contemporary Social Movements in a Hybrid Media Environment

      Neal Caren, Kenneth T. Andrews, and Todd LuDepartment of Sociology, University of North Carolina, Chapel Hill, North Carolina 27599, USA; email: [email protected]
      Annual Review of Sociology Vol. 46: 443 - 465
      • ...The literature on algorithms has documented their social and political harms in perpetuating inequalities and curtailing social movements (Noble 2018, Tufekci 2015, Youmans & York 2012)....
    • What Do Platforms Do? Understanding the Gig Economy

      Steven Vallas1 and Juliet B. Schor21Department of Sociology and Anthropology, Northeastern University, Boston, Massachusetts 02115, USA; email: [email protected]2Department of Sociology, Boston College, Chestnut Hill, Massachusetts 02467, USA
      Annual Review of Sociology Vol. 46: 273 - 294
      • ...; in racist outcomes in evaluation or ranking systems (Eubanks 2018, Noble 2018)...
      • ...Scholars are already studying the use of algorithms outside the workplace and how they are engineered in ways that favor some groups, races, and classes over others (Benjamin 2019, Eubanks 2018, Noble 2018, Pasquale 2015)....
      • ...and other biases outside the workplace (Benjamin 2019, Eubanks 2018, Noble 2018, Pasquale 2015), ...

  • 101.
    Oussous A, Benjelloun FZ, Lahcen AA, Belfkih S. 2018. Big data technologies: a survey. J. King Saud Univ.—Comput. Inf. Sci. 30(4): 431–48
    • Google Scholar
    Article Location
  • 102.
    Picon A. 2015. Smart Cities: A Spatialised Intelligence. New York: Wiley
    • Crossref
    • Google Scholar
    Article Location
  • 103.
    Pierson E, Simoiu C, Overgoor J, Overgoor J, Corbett-Davies S, et al. 2017. A large-scale analysis of racial disparities in police stops across the United States. arXiv:1706.05678 [stat.AP]
    • Google Scholar
    Article Location
  • 104.
    Pool IS. 1983. Tracking the flow of information. Science 221(4611): 609–13
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
  • 105.
    Porche IR, Wilson B, Johnson EE, Tierney S, Saltzman E. 2014. Barrier to benefiting from big data. In Data Flood: Helping the Navy Address the Rising Tide of Sensor Information, pp. 13–21. Santa Monica, CA: RAND Corp.
    • Google Scholar
    Article Location
  • 106.
    Powell J. 2017. Identification and asymptotic approximations: three examples of progress in econometric theory. J. Econ. Perspect. 31(2): 107–24
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
  • 107.
    Pratt GA. 2015. Is a Cambrian explosion coming for robotics? J. Econ. Perspect. 29: 51–60
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
  • 108.
    Prior M. 2013. Media and political polarization. Annu. Rev. Political Sci. 16: 101–27
    • Link
    • Web of Science ®
    • Google Scholar
  • 109.
    Rid T. 2012. Cyber war will not take place. J. Strateg. Stud. 35(1): 5–32
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
    More AR articles citing this reference

    • Do Emerging Military Technologies Matter for International Politics?

      Michael C. HorowitzPolitical Science Department, University of Pennsylvania, Philadelphia, Pennsylvania 19104-6215, USA; email: [email protected]
      Annual Review of Political Science Vol. 23: 385 - 400
      • ...argue that cyber capabilities are unlikely to be critical to warfare (Junio 2013, Rid 2012)....

  • 110.
    Ripley BD. 1995. Pattern Recognition and Neural Networks. New York: Cambridge Univ. Press
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
  • 111.
    Roberts M, Stewart B, Tingley D, Lucas C, Leder-Luis J, et al. 2014. Structural topic models for open-ended survey responses. Am. J. Political Sci. 58(4): 1064–82
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Testing Causal Theories with Learned Proxies

      Dean Knox,1 Christopher Lucas,2 and Wendy K. Tam Cho31Operations, Information, and Decisions Department and Analytics at Wharton, The Wharton School of the University of Pennsylvania, Philadelphia, Pennsylvania, USA; email: [email protected]2Department of Political Science and Division of Computational and Data Sciences, Washington University in St. Louis, St. Louis, Missouri, USA; email: [email protected]3Departments of Political Science, Statistics, Mathematics, Computer Science, and Asian American Studies; College of Law; and National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Champaign, Illinois, USA; email: [email protected]
      Annual Review of Political Science Vol. 25: 419 - 441
      • ...The only exceptions were applications of the Structural Topic Model (STM) (Roberts et al. 2013, 2014, 2016a)....
      • ...The case of STM (Roberts et al. 2013, 2014, 2016a) illustrates an alternative, ...
      • ...11Examples include approaches that explicitly couple the measurement and inferential processes, like the structural topic model (Roberts et al. 2013, 2014, 2016a), ...
    • Machine Learning for Social Science: An Agnostic Approach

      Justin Grimmer,1 Margaret E. Roberts,2 and Brandon M. Stewart31Department of Political Science and Hoover Institution, Stanford University, Stanford, California 94305, USA; email: [email protected]2Department of Political Science and Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, California 92093, USA; email: [email protected]3Department of Sociology and Office of Population Research, Princeton University, Princeton, New Jersey 08540, USA; email: [email protected]
      Annual Review of Political Science Vol. 24: 395 - 419
      • ...such as when scholars want to understand the effect of an intervention on the words that individuals say or the videos that news agencies produce (Roberts et al. 2014, Knox & Lucas 2021)....
    • Large-Scale Computerized Text Analysis in Political Science: Opportunities and Challenges

      John Wilkerson and Andreu CasasDepartment of Political Science, University of Washington, Seattle, Washington 98195; email: [email protected]
      Annual Review of Political Science Vol. 20: 529 - 544
      • ...scholars have delved into specific examples within topics to show that the topics make sense; demonstrated that different algorithms produce similar clusters; and established that variations in topic emphasis across time or venues correlate with real-world events (Blei & Lafferty 2009, Quinn et al. 2010, Grimmer & King 2011, Roberts et al. 2014)....
      • .... Roberts et al. (2014) show how incorporating additional information about documents (beyond the bag of words) into topic models can aid in interpretation of open-ended survey responses....
      • ...A number of recent studies have proposed different ways to assess and respond to topic-model instability (Grimmer & King 2011, Schmidt 2012, Boyd-Graber et al. 2014, Roberts et al. 2014)....
      • ...for example by examining the cohesiveness and distinctiveness of the topic words (Roberts et al. 2014)....

  • 112.
    Rogers R. 2013. Digital Methods. Cambridge, MA: MIT Press
    • Crossref
    • Google Scholar
    Article Location
  • 113.
    Russell S, Norvig P. 2009. Artificial Intelligence: A Modern Approach. New York: Pearson. 3rd ed.
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
  • 114.
    Salganik MJ. 2017. Bit by Bit: Social Research in the Digital Age. Princeton, NJ: Princeton Univ. Press
    • Google Scholar
    Article Location
  • 115.
    Samuel A. 1962. Artificial intelligence: a frontier of automation. Ann. Am. Acad. Political Social Sci. 340: 10–20
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
  • 116.
    Sanger DE. 2018. The Perfect Weapon: War, Sabotage, and Fear in the Cyber Age. New York: Crown
    • Google Scholar
    Article Location
  • 117.
    Sarle W. 1994. Neural networks and statistical models. In Proceedings of the Nineteenth Annual SAS Users Group International Conference, Dallas, Texas, Aprl 10–13. Cary, NC: SAS Inst. http://www.sascommunity.org/sugi/SUGI94/Sugi-94-255%20Sarle.pdf
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
  • 118.
    Schmidhuber J. 2015. Deep learning in neural networks: an overview. Neural Netw. 61: 85–117
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Extracellular Vesicle–Based Multianalyte Liquid Biopsy as a Diagnostic for Cancer

      Andrew A. Lin,1,2 Vivek Nimgaonkar,1 David Issadore,2 and Erica L. Carpenter11Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA; email: [email protected]2Department of Bioengineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA
      Annual Review of Biomedical Data Science Vol. 5: 269 - 292
      • ...] to prevent overfitting has led to only limited usage in classification problems in liquid biopsy (44, 125, 126)....
    • AI in Measurement Science

      Chao Liu1,2 and Jiashu Sun1,21CAS Key Laboratory of Standardization and Measurement for Nanotechnology, CAS Center for Excellence in Nanoscience, National Center for Nanoscience and Technology, Beijing 100190, China; email: [email protected]2University of Chinese Academy of Sciences, Beijing 100049, China
      Annual Review of Analytical Chemistry Vol. 14: 1 - 19
      • ...support vector machine (SVM) (30), random forest (RF) (31), and artificial neural network (ANN) (32). ...
      • ...and the convolutional neural network (CNN) is widely used in image processing for diagnosis (32)....
    • Animal-in-the-Loop: Using Interactive Robotic Conspecifics to Study Social Behavior in Animal Groups

      Tim Landgraf,1 Gregor H.W. Gebhardt,1,2 David Bierbach,3,4 Pawel Romanczuk,5 Lea Musiolek,6 Verena V. Hafner,6 and Jens Krause3,41Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany; email: [email protected]2Computational Systems Neuroscience, Institute of Zoology, University of Cologne, 50674 Cologne, Germany3Department of Biology and Ecology of Fishes, Leibniz-Institute of Freshwater Ecology and Inland Fisheries, 12587 Berlin, Germany4Faculty of Life Sciences, Albrecht Daniel Thaer-Institute of Agricultural and Horticultural Sciences, Humboldt-Universität zu Berlin, 10099 Berlin, Germany5Institute for Theoretical Biology, Department of Biology, Humboldt-Universität zu Berlin, 10115 Berlin, Germany6Department of Computer Science, Humboldt-Universität zu Berlin, 10099 Berlin, Germany
      Annual Review of Control, Robotics, and Autonomous Systems Vol. 4: 487 - 507
      • ...Recent reinforcement learning approaches often use recurrent neural networks to model such adaptation mechanisms (97, 98)....
    • Identifying Regulatory Elements via Deep Learning

      Mira Barshai,1, Eitamar Tripto,2, and Yaron Orenstein11School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva 8410501, Israel; email: [email protected]2Department of Biomedical Engineering, Ben-Gurion University of the Negev, Beer-Sheva 8410501, Israel
      Annual Review of Biomedical Data Science Vol. 3: 315 - 338
      • ...RNNs have been shown to outperform CNNs and other deep neural networks on sequential data (53)....
    • Data-Driven Approaches to Understanding Visual Neuron Activity

      Daniel A. ButtsDepartment of Biology and Program in Neuroscience and Cognitive Science, University of Maryland, College Park, Maryland 20742, USA; email: [email protected]
      Annual Review of Vision Science Vol. 5: 451 - 477
      • ...refer to solving tasks such as object and face recognition and have played a crucial role in driving the development of DNNs (LeCun et al. 2015, Schmidhuber 2015, Serre 2019)....
    • Deep Learning and Its Application to LHC Physics

      Dan Guest,1 Kyle Cranmer,2 and Daniel Whiteson11Department of Physics and Astronomy, University of California, Irvine, California 92697, USA2Physics Department, New York University, New York, NY 10003, USA
      Annual Review of Nuclear and Particle Science Vol. 68: 161 - 181
      • ...when a convergence of techniques enabled training of very large neural networks that greatly outperformed the previous state of the art (2...
      • ...when a convergence of techniques enabled training of very large neural networks that greatly outperformed the previous state of the art (2–5)....
    • Deep Learning in Biomedical Data Science

      Pierre BaldiDepartment of Computer Science, Institute for Genomics and Bioinformatics, and Center for Machine Learning and Intelligent Systems, University of California, Irvine, California 92697, USA; email: [email protected]
      Annual Review of Biomedical Data Science Vol. 1: 181 - 205
      • ...natural language processing, and games, to name just a few (3)....
    • Computational Methods for Understanding Mass Spectrometry–Based Shotgun Proteomics Data

      Pavel Sinitcyn, Jan Daniel Rudolph, and Jürgen CoxComputational Systems Biochemistry Research Group, Max Planck Institute of Biochemistry, 82152 Martinsried, Germany; email: [email protected]
      Annual Review of Biomedical Data Science Vol. 1: 207 - 234
      • ...Deep learning (145, 146) is gaining traction in proteomics (75) and will likely find more applications in the future....
    • Deep Learning in Medical Image Analysis

      Dinggang Shen,1,2 Guorong Wu,1 and Heung-Il Suk21Department of Radiology, University of North Carolina, Chapel Hill, North Carolina 27599; email: [email protected]2Department of Brain and Cognitive Engineering, Korea University, Seoul 02841, Republic of Korea; email: [email protected]
      Annual Review of Biomedical Engineering Vol. 19: 221 - 248
      • ...deep learning (7) has overcome this obstacle by incorporating the feature engineering step into a learning step....
    • Personal Sensing: Understanding Mental Health Using Ubiquitous Sensors and Machine Learning

      David C. Mohr,1 Mi Zhang,2 and Stephen M. Schueller11Center for Behavioral Intervention Technologies and Department of Preventive Medicine, Northwestern University, Chicago, Illinois 60611; email: [email protected], [email protected]2Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824; email: [email protected]
      Annual Review of Clinical Psychology Vol. 13: 23 - 47
      • ...deep learning, a new trend in machine learning, has emerged (Schmidhuber 2015)....
    • Deep Neural Networks: A New Framework for Modeling Biological Vision and Brain Information Processing

      Nikolaus KriegeskorteMedical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge CB2 7EF, United Kingdom; email: [email protected]
      Annual Review of Vision Science Vol. 1: 417 - 446
      • ...neural network research has an unbroken history (Schmidhuber 2015) in theoretical neuroscience and in computer science....

  • 119.
    Schroeder R. 2018. Social Theory after the Internet: Media, Technology, and Globalization. London: UCL Press
    • Crossref
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
    • Article Location
    • Article Location
  • 120.
    Schudson M. 2002. The news media as political institutions. Annu. Rev. Political Sci. 5: 249–69
    • Link
    • Web of Science ®
    • Google Scholar
  • 121.
    Scott JC. 1999. Seeing Like a State. London: Yale Univ. Press
    • Google Scholar
    Article Location
  • 122.
    Shmueli G. 2010. To explain or to predict. Stat. Sci. 25(3): 289–310
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Machine Learning in Epidemiology and Health Outcomes Research

      Timothy L. Wiemken1 and Robert R. Kelley21Center for Health Outcomes Research, Saint Louis University, Saint Louis, Missouri 63104, USA; email: [email protected]2Department of Computer Science, Bellarmine University, Louisville, Kentucky 40205, USA; email: [email protected]
      Annual Review of Public Health Vol. 41: 21 - 36
      • ...predictive models) or to produce a measure of treatment effect or magnitude and statistical association of individual independent variables on the dependent variable (e.g., explanatory models) (57)....

  • 123.
    Sims CA. 1980. Macroeconomics and reality. Econometrics 48(1): 1–48
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Macroeconomic Models for Monetary Policy: A Critical Review from a Finance Perspective

      Winston W. Dou,1 Andrew W. Lo,2,3 Ameya Muley,4 and Harald Uhlig51Department of Finance, The Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA; email: [email protected]2Sloan School of Management, Laboratory for Financial Engineering, and Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, USA; email: [email protected]3Santa Fe Institute, Santa Fe, New Mexico 87501, USA4Department of Economics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, USA; email: [email protected]5Department of Economics, University of Chicago, Chicago, Illinois 60637, USA; email: [email protected]
      Annual Review of Financial Economics Vol. 12: 95 - 140
      • ...It has explicitly theoretical foundations, allowing it to circumvent the Sims critique (see Sims 1980)...
      • ...As Sims (1980) emphasized, a reliable monetary policy experiment cannot ignore the effect of ongoing fiscal policy....
      • ...Sims (1980) argued that the absence of convincing identification assumptions to sort out the vast simultaneity among macroeconomic variables meant that one could have little confidence that the parameter estimates would be stable across different policy regimes....
      • ...Sims (1980) argued that large-scale macroeconometric models may fit the data well but that they will provide misleading answers due to noncredible identification restrictions....
      • ...Despite the criticisms by Lucas (1976) and Sims (1980), many central banks continued to use large-scale macroeconometric models and reduced-form statistical models in the 1980s and 1990s to produce forecasts of the economy that presumed no structural change, ...
      • ...as first introduced by Sims (1980) as an alternative to traditional large-scale macroeconometric models, ...
      • ...making the analysis less subject to the Sims critique (see Sims 1980)....
    • Macroeconomic Nowcasting and Forecasting with Big Data

      Brandyn Bok,1 Daniele Caratelli,2 Domenico Giannone,1 Argia M. Sbordone,1 and Andrea Tambalotti11Federal Reserve Bank of New York, New York, New York 10045, USA; email: [email protected]2Department of Economics, Stanford University, Stanford, California 94305, USA
      Annual Review of Economics Vol. 10: 615 - 643
      • ...BVARs have been advocated for by the earliest proponents of VAR models in economics (Doan et al. 1984, Sims 1980)....
    • Integrated Assessment Models of the Food, Energy, and Water Nexus: A Review and an Outline of Research Needs

      Catherine L. Kling,1 Raymond W. Arritt,2 Gray Calhoun,1 and David A. Keiser11Department of Economics and Center for Agricultural and Rural Development, Iowa State University, Ames, Iowa 50011; email: [email protected]2Department of Agronomy, Iowa State University, Ames, Iowa 50011
      Annual Review of Resource Economics Vol. 9: 143 - 163
      • ...vector autoregressions are multivariate time-series models widely used in macroeconomics to estimate intertemporal dynamics (Sims 1980)....
    • Information Recovery and Causality: A Tribute to George Judge

      Gordon Rausser1 and David A. Bessler21Department of Agricultural and Resource Economics, University of California, Berkeley, California 94720; email: [email protected]2Department of Agricultural Economics, Texas A&M University, College Station, Texas 77843; email: [email protected]
      Annual Review of Resource Economics Vol. 8: 7 - 23
      • ...letting the data have a role in ultimate specification. Sims (1980) marked the beginning of modern attempts to meet Haavelmo's suggestion for modeling passive observations (observational data)....
      • ...This early VAR piece by Sims (1980) might be regarded as a second major step in data mining in economics, ...
      • ...As suggested previously, Sims (1980) recognized the wasteful nature of the VAR representation....
      • ...This allows the researcher to study a large number of series, a feat not possible in the original presentation by Sims (1980).16...
    • Sparse High-Dimensional Models in Economics

      Jianqing Fan,1,2 Jinchi Lv,3 and Lei Qi1,21Bendheim Center for Finance and 2Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08544; email: [email protected], [email protected]3Information and Operations Management Department, Marshall School of Business, University of Southern California, Los Angeles, California 90089; email: [email protected]
      Annual Review of Economics Vol. 3: 291 - 317
      • ...the vector autoregressive (VAR) model (Sims 1980, Stock & Watson 2001) is a key technique to analyze the joint evolution of macroeconomic time series and can deliver a great deal of structural information....
    • Social Interactions

      Steven N. Durlauf1 and Yannis M. Ioannides21Department of Economics, University of Wisconsin, Madison, Wisconsin 53706; email: [email protected]2Department of Economics, Tufts University, Medford, Massachusetts 02155; email: [email protected]
      Annual Review of Economics Vol. 2: 451 - 478
      • ...but the idea is implicit in earlier critiques of econometric practice such as Sims (1980)....
    • The State of Macro

      Olivier BlanchardDepartment of Economics, MIT and NBER, Cambridge, Massachusetts 02142; email: [email protected]
      Annual Review of Economics Vol. 1: 209 - 228
      • ...They were best identified by Sims (1980): There was little reason why the aggregate dynamics of models put together in that way would replicate actual aggregate dynamics....

  • 124.
    Smith G. 2018. The AI Delusion. New York: Oxford Univ. Press
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
  • 125.
    Statistical Science. 2003. Tribute to John W. Tukey. Stat. Sci. 18(3)
    • Web of Science ®
    • Google Scholar
    Article Location
  • 126.
    Stephens-Davidowitz S. 2014. The cost of racial animus on a black candidate: evidence using Google search data. J. Public Econ. 118: 26–40
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
    More AR articles citing this reference

    • Risks and Opportunities to Ensure Equity in the Application of Big Data Research in Public Health

      Paul Wesson,1,2 Yulin Hswen,1,2 Gilmer Valdes,1,3 Kristefer Stojanovski,4,5 and Margaret A. Handley1,6,7,81Department of Epidemiology and Biostatistics, University of California, San Francisco, California, USA; email: [email protected]2Bakar Computational Health Sciences Institute, University of California, San Francisco, California, USA3Department of Radiation Oncology, University of California, San Francisco, California, USA4Department of Health Behavior and Health Education, School of Public Health, University of Michigan, Ann Arbor, Michigan, USA5Department of Social, Behavioral and Population Sciences, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, USA6Department of Medicine, University of California, San Francisco, California, USA7Zuckerberg San Francisco General Hospital and Trauma Center, San Francisco, California, USA8Partnerships for Research in Implementation Science for Equity (PRISE), University of California, San Francisco, California, USA
      Annual Review of Public Health Vol. 43: 59 - 78
      • ...people tend to lie when it comes to racially charged topics (131)....
      • ...An empirical example showed that racist searches on Google were a robust negative predictor of US President Barack Obama's voting share, while national survey estimates about being racist were not (131)....
    • What Shapes the Quality and Behavior of Government Officials? Institutional Variation in Selection and Retention Methods

      Claire S.H. Lim1,2 and James M. Snyder, Jr.3,41School of Economics and Finance, Queen Mary University of London, London E1 4NS, United Kingdom; email: [email protected]2Centre for Economic Policy Research, London EC1V 0DX, United Kingdom3Department of Government, Harvard University, Cambridge, Massachusetts 02138, USA; email: [email protected]4National Bureau of Economic Research, Cambridge, Massachusetts 02138, USA
      Annual Review of Economics Vol. 13: 87 - 109
      • ...Choi & Varian 2012; Bańbura et al. 2013; Scott & Varian 2014, 2015; Stephens-Davidowitz 2014)....
    • Race and Politics in the Age of Obama

      Christopher Sebastian ParkerDepartment of Political Science, University of Washington, Seattle, Washington 98195; email: [email protected]
      Annual Review of Sociology Vol. 42: 217 - 230
      • ...As Stephens-Davidowitz (2014) suggests, Google searches are ideal as proxies for the assessment of sensitive preferences because they are generally conducted alone and online....

  • 127.
    Tankersley J. 2018. Democrats' next big thing: government-guaranteed jobs. New York Times, May 22. https://www.nytimes.com/2018/05/22/us/politics/democrats-guaranteed-jobs.html
    • Google Scholar
    Article Location
  • 128.
    Taylor GR. 1951. The Transportation Revolution 1815–1860. New York: Rinehart
    • Google Scholar
    Article Location
  • 129.
    Thagard P. 1992. Conceptual Revolutions. Princeton, NJ: Princeton Univ. Press
    • Crossref
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • BIASES IN THE INTERPRETATION AND USE OF RESEARCH RESULTS

      Robert J. MacCounRichard and Rhoda Goldman School of Public Policy, University of California, Berkeley, California 94720-7320: e-mail: [email protected]
      Annual Review of Psychology Vol. 49: 259 - 287
      • ...I assume most readers of the Annual Review have at least a passing familiarity with the major developments in twentieth century philosophy of science (see Gholson & Barker 1985, Laudan 1990, Shadish 1995, Thagard 1992), ...
      • ...replacing the p(H) vs p* comparison with a more complex cognitive process of mapping the evidence onto alternative narrative structures and selecting the one with the best “goodness of fit.” Thagard's (1992) explanatory coherence model (ECHO) offers a similar interpretation using a connectionist constraint satisfaction network....
      • ...But more importantly, the history of science (e.g. Gholson & Barker 1985, Thagard 1992)...

  • 130.
    Theodoridis AG, Nelson AJ. 2012. Of BOLD claims and excessive fears: a call for caution and patience regarding political neuroscience. Political Psychol. 33(1): 27–28
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
  • 131.
    Tinati R, Halford S, Carr L, et al. 2014. Big data: methodological challenges and approaches for sociological analysis. Sociology 48(4): 663–81
    • Crossref
    • Google Scholar
    Article Location
  • 132.
    Titiunik R. 2015. Can big data solve the fundamental problem of causal inference? PS Political Sci. Politics 48(1): 75–79
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Big Data in Public Health: Terminology, Machine Learning, and Privacy

      Stephen J. Mooney1 and Vikas Pejaver21Harborview Injury Prevention and Research Center, University of Washington, Seattle, Washington 98122, USA; email: [email protected]2Department of Biomedical Informatics and Medical Education and the eScience Institute, University of Washington, Seattle, Washington 98109, USA; email: [email protected]
      Annual Review of Public Health Vol. 39: 95 - 112
      • ...] or by identifying variance patterns within these variables (as by a principal component analysis identifying patterns of gut bacteria) (124)....

  • 133.
    Townsend AM. 2013. Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia. New York/London: W.W. Norton
    • Google Scholar
    Article Location
  • 134.
    Tukey J. 1962. The future of data analysis. Ann. Math. Stat. 33(1): 1–67
    • Crossref
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • The Evolution of Data Quality: Understanding the Transdisciplinary Origins of Data Quality Concepts and Approaches

      Sallie Keller, Gizem Korkmaz, Mark Orr, Aaron Schroeder, and Stephanie ShippSocial and Decision Analytics Laboratory, Biocomplexity Institute of Virginia Tech, Arlington, Virginia 22203; email: [email protected], [email protected], [email protected], [email protected], [email protected]
      Annual Review of Statistics and Its Application Vol. 4: 85 - 108
      • ...One of the seminal works in the field originated with John Tukey in the application of exploratory data analysis to reveal patterns in data through graphical representations and multiple perspectives on data subsets (Tukey 1962, 1977)....
      • ...than an exact answer to the wrong question, which can always be made precise” (Tukey 1962, ...
    • Practice-Based Evidence in Public Health: Improving Reach, Relevance, and Results

      Alice Ammerman,1,2 Tosha Woods Smith,1,2 and Larissa Calancie1,21Center for Health Promotion and Disease Prevention,2Department of Nutrition, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina 27599-7426; email: [email protected], [email protected], [email protected]
      Annual Review of Public Health Vol. 35: 47 - 63
      • ...–John Tukey (83)...
    • Build, Compute, Critique, Repeat: Data Analysis with Latent Variable Models

      David M. BleiComputer Science Department, Princeton University, Princeton, New Jersey 08540; email: [email protected]
      Annual Review of Statistics and Its Application Vol. 1: 203 - 232
      • ...We should further develop the foundations of data exploration, along the lines of Tukey (1962), Good (1983), ...

  • 135.
    Turnbull N. 2008. Harold Lasswell's “problem orientation” for the policy sciences. Crit. Policy Anal. 2(2): 72–91
    • Crossref
    • Google Scholar
    Article Location
  • 136.
    Varian HR. 2014. Big data: new tricks for econometrics. J. Econ. Perspect. 28(2): 3–27
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
    More AR articles citing this reference

    • Machine Learning Methods That Economists Should Know About

      Susan Athey1,2,3 and Guido W. Imbens1,2,3,41Graduate School of Business, Stanford University, Stanford, California 94305, USA; email: [email protected], [email protected]2Stanford Institute for Economic Policy Research, Stanford University, Stanford, California 94305, USA3National Bureau of Economic Research, Cambridge, Massachusetts 02138, USA4Department of Economics, Stanford University, Stanford, California 94305, USA
      Annual Review of Economics Vol. 11: 685 - 725
      • ...often with more empirical examples and references to applications than we discuss in this review. Varian (2014)...
    • Machine Learning for Sociology

      Mario Molina and Filiz GaripDepartment of Sociology, Cornell University, Ithaca, New York 14853, USA; email: [email protected], [email protected]
      Annual Review of Sociology Vol. 45: 27 - 45
      • ...nested models that gradually introduce controls) but do not exhaust all possibilities (Varian 2014)...
    • Spatial Patterns of Development: A Meso Approach

      Stelios Michalopoulos1 and Elias Papaioannou21Department of Economics, Brown University, Providence, Rhode Island 02912, USA; email: [email protected]2Economics Department, London Business School, London NW1 4SA, United Kingdom; email: [email protected]
      Annual Review of Economics Vol. 10: 383 - 410
      • ...typically spatial data sets, the reader is referred to Varian (2014)...

  • 137.
    Voigt R, Camp NP, Prabhakaran V, et al. 2017. Language from policy body camera footage shows racial disparities in officer respect. PNAS 114(25): 6521–26
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Language and Discrimination: Generating Meaning, Perceiving Identities, and Discriminating Outcomes

      Justin T. Craft, Kelly E. Wright, Rachel Elizabeth Weissler, and Robin M. QueenDepartment of Linguistics, University of Michigan, Ann Arbor, Michigan 48109, USA; email: [email protected], [email protected], [email protected], [email protected]
      Annual Review of Linguistics Vol. 6: 389 - 407
      • ...A study by Voigt et al. (2017) shows evidence of racial disparities in respectful language toward the community from police as gathered from body camera footage....
      • ...This finding raises questions about how formality and respect are evaluated as a function of standard and nonstandard language (Voigt et al. 2017)....
    • Race, Place, and Effective Policing

      Anthony A. Braga, Rod K. Brunson, and Kevin M. DrakulichSchool of Criminology and Criminal Justice, Northeastern University, Boston, Massachusetts 02115, USA; email: [email protected]
      Annual Review of Sociology Vol. 45: 535 - 555
      • ...that officers treat black community members less respectfully than whites during everyday traffic stops (Voigt et al. 2017)....
    • Methodological Challenges and Opportunities in Testing for Racial Discrimination in Policing

      Roland Neil and Christopher WinshipDepartment of Sociology, Harvard University, Cambridge, Massachusetts, 02138, USA; email: [email protected]
      Annual Review of Criminology Vol. 2: 73 - 98
      • ... draw on population-based surveys from three American cities to test for disproportionate minority contact among juveniles with the police after adjusting for differences in criminal behavior and other risk factors. Voigt and colleagues (2017) use computational linguistic methods to test police body camera data for racial disparities in the use of respectful language....

  • 138.
    Ward JS, Barker A. 2013. Undefined by data: a survey of big data definitions. arXiv:1309.5821 [cs.DB]
    • Google Scholar
    Article Location
  • 139.
    Warner B, Misra M. 1996. Understanding neural networks as statistical tools. Am. Statistician 50(40): 284–93
    • Google Scholar
    Article Location
  • 140.
    Weil F. 2012. The sinews of society are changing. Huffington Post, Apr. 17. https://www.huffingtonpost.com/frank-a-weil/the-sinews-of-society-are_b_1277241.html
    • Google Scholar
    Article Location
  • 141.
    White H. 1992. Artificial Neural Networks: Approximation and Learning Theory. Cambridge, MA: Blackwell
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Small Steps with Big Data: Using Machine Learning in Energy and Environmental Economics

      Matthew C. Harding1 and Carlos Lamarche21Department of Economics and Department of Statistics, University of California, Irvine, California 92697; email: [email protected]2Department of Economics, Gatton College of Business and Economics, University of Kentucky, Lexington, Kentucky 40506
      Annual Review of Resource Economics Vol. 13: 469 - 488
      • ...Early theoretical results showing how well networks approximate unknown functions were established by Hornik et al. (1989) and White (1992)...
    • Machine Learning Methods That Economists Should Know About

      Susan Athey1,2,3 and Guido W. Imbens1,2,3,41Graduate School of Business, Stanford University, Stanford, California 94305, USA; email: [email protected], [email protected]2Stanford Institute for Economic Policy Research, Stanford University, Stanford, California 94305, USA3National Bureau of Economic Research, Cambridge, Massachusetts 02138, USA4Department of Economics, Stanford University, Stanford, California 94305, USA
      Annual Review of Economics Vol. 11: 685 - 725
      • ...which were the focus of a small econometrics literature in the 1990s (Hornik et al. 1989, White 1992) but more recently have become a very prominent part of the literature on ML in various subtle reincarnations....
      • ...Neural networks were studied in the econometric literature in the 1990s but did not catch on at the time (see Hornik et al. 1989, White 1992)....

  • 142.
    Wickham H. 2014. Tidy data. J. Stat. Softw. 59(10): 1–24
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Perspective on Data Science

      Roger D. Peng1 and Hilary S. Parker21Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland 21205, USA; email: [email protected]2Independent Consultant, San Francisco, California 94102, USA
      Annual Review of Statistics and Its Application Vol. 9: 1 - 20
      • ...This collection of software packages has revolutionized the practice of data analysis in R by designing a set of tools oriented around the theoretical framework of tidy data (Wickham 2014)....
    • Reproducible Research: A Retrospective

      Roger D. Peng and Stephanie C. HicksDepartment of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland 21205, USA; email: [email protected], [email protected]
      Annual Review of Public Health Vol. 42: 79 - 93
      • ...the emergence of the concept of tidy data has provided a generic format for many different types of data that serves as the backbone of a wide variety of analytic techniques (52)....
    • The Evolution of Data Quality: Understanding the Transdisciplinary Origins of Data Quality Concepts and Approaches

      Sallie Keller, Gizem Korkmaz, Mark Orr, Aaron Schroeder, and Stephanie ShippSocial and Decision Analytics Laboratory, Biocomplexity Institute of Virginia Tech, Arlington, Virginia 22203; email: [email protected], [email protected], [email protected], [email protected], [email protected]
      Annual Review of Statistics and Its Application Vol. 4: 85 - 108
      • ...or deleting this so-called dirty or coarse data (Wickham 2014, Wu 2013)....

  • 143.
    Wiedemann G. 2013. Opening up to big data: computer-assisted analysis of textual data in social sciences. Forum Qual. Soc. Res. 14(2): 13. http://www.qualitative-research.net/index.php/fqs/article/view/1949
    • Google Scholar
    Article Location
  • 144.
    Wigner E. 1960. The unreasonable effectiveness of mathematics in the natural sciences. Commun. Pure Appl. Math. 13(1): 1–14
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Statistical Graphics: Mapping the Pathways of Science

      Howard WainerEducational Testing Service, Princeton, New Jersey 08541; e-mail: [email protected] Paul F. VellemanCornell University, Ithaca, NY 14853; e-mail: [email protected]
      Annual Review of Psychology Vol. 52: 305 - 335
      • ...The Nobel Laureate Eugene P Wigner (1960), in his address commemorating the opening of the Courant Institute, ...

  • 145.
    Wilkerson J, Casas A. 2017. Large-scale computerized text analysis in political science: opportunities and challenges. Annu. Rev. Political Sci. 20: 529–44
    • Link
    • Web of Science ®
    • Google Scholar
  • 146.
    Williams BA, Brooks CF, Shmargad Y. 2018. How algorithms discriminate based on data they lack: challenges, solutions, and policy implications. J. Inf. Policy 8: 78–115
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Risks and Opportunities to Ensure Equity in the Application of Big Data Research in Public Health

      Paul Wesson,1,2 Yulin Hswen,1,2 Gilmer Valdes,1,3 Kristefer Stojanovski,4,5 and Margaret A. Handley1,6,7,81Department of Epidemiology and Biostatistics, University of California, San Francisco, California, USA; email: [email protected]2Bakar Computational Health Sciences Institute, University of California, San Francisco, California, USA3Department of Radiation Oncology, University of California, San Francisco, California, USA4Department of Health Behavior and Health Education, School of Public Health, University of Michigan, Ann Arbor, Michigan, USA5Department of Social, Behavioral and Population Sciences, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, USA6Department of Medicine, University of California, San Francisco, California, USA7Zuckerberg San Francisco General Hospital and Trauma Center, San Francisco, California, USA8Partnerships for Research in Implementation Science for Equity (PRISE), University of California, San Francisco, California, USA
      Annual Review of Public Health Vol. 43: 59 - 78
      • ...many patterns detected are correlational and are limited in their ability to explain why certain patterns are uncovered (133, 145)....
      • ...which arises from having limited information about other potentially relevant factors that influence the outcome (145)....
      • ...Williams et al. (145, p. 100) state: ...

  • 147.
    Yarkoni T, Westfall J. 2017. Choosing prediction over explanation in psychology: lessons from machine learning. Perspect. Psychol. Sci. 12(6): 1100–22
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
    More AR articles citing this reference

    • Personalized Models of Psychopathology

      Aidan G.C. Wright and William C. WoodsDepartment of Psychology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, USA; email: [email protected], [email protected]
      Annual Review of Clinical Psychology Vol. 16: 49 - 74
      • ...where the priority is accurately detecting or predicting behavior and understanding the underlying mechanisms is secondary (Yarkoni & Westfall 2017)....
      • ...The calculus becomes tricky when knowing why a behavior is occurring is a prerequisite for effective intervention. Yarkoni & Westfall (2017) argue that investing more in prediction over the traditional investment in explanation might yield new understanding as a by-product....
    • Machine Learning for Sociology

      Mario Molina and Filiz GaripDepartment of Sociology, Cornell University, Ithaca, New York 14853, USA; email: [email protected], [email protected]
      Annual Review of Sociology Vol. 45: 27 - 45
      • ...we reduce the chance of overfitting but now run the risk of underfitting because there are fewer data left for estimation (Yarkoni & Westfall 2017)....
      • ...Cranmer & Desmarais (2017) for political science, and Yarkoni & Westfall (2017) for psychology]....
      • ...Out-of-sample testing can also help address what Yarkoni & Westfall (2017) call procedural overfitting (also known as p-hacking) that can occur during data cleaning or model selection....
    • Computational and Informatic Advances for Reproducible Data Analysis in Neuroimaging

      Russell A. Poldrack,1 Krzysztof J. Gorgolewski,1 and Gaël Varoquaux21Department of Psychology, Stanford University, Stanford, California 94305, USA; email: [email protected]2Parietal Team, Inria and NeuroSpin/CEA (Atomic Energy Commission), 91191 Gif/-sur-Yvette, France
      Annual Review of Biomedical Data Science Vol. 2: 119 - 138
      • ...Many scientific data processing problems can be reformulated with the help of predictive models, including in psychology and brain imaging (44, 45)....
    • Treatment Selection in Depression

      Zachary D. Cohen and Robert J. DeRubeisDepartment of Psychology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA; email: [email protected]
      Annual Review of Clinical Psychology Vol. 14: 209 - 236
      • ...Part of this future will require a resolution of the tension between the statistical methodology of explanatory approaches that have dominated psychology and the predictive approaches the will power precision medicine going forward (Yarkoni & Westfall 2017)....

More AR articles citing this reference

  • Tables
  • Table 1  -The seven activities of data sciencea
  • Tables
  • Tables

Table 1  The seven activities of data sciencea

ActivitiesExamples
Data gathering, preparation, and explorationSurvey data, experimental data, genomic data, textual data, administrative data, image data, web data, and sensor data
Data cleaning and exploratory data analysis methods for checking on outliers and data quality
Data representation and transformationRelational and nonrelational databases
Networks and graphs
Other mathematical structures for data
Computing with dataR and Python
Programming packages, text manipulation languages
Cluster and cloud computing
Reproducible workflows
Data modelingDetermining or hypothesizing data generating probability functions, structural and predictive modeling
Data visualization and presentationTypes of visualizations and graphs
Rules for labeling and presenting data
Psychological impacts of various displays
Data archiving, indexing, and search and data governanceStandards for open data and reproducibility
Determining rules for access and privacy protection where necessary
Science about data scienceHow people do data science
Impacts of data science and big data on society

aThe activities are quoted from Donoho (2017, p. 755) except for “Data archiving, indexing, and search and data governance,” which is my addition. The examples are my own.

Previous Article Next Article
  • Related Articles
  • Literature Cited
  • Most Downloaded
Most Downloaded from this journal

POLITICAL PARTIES AND DEMOCRACY

S. C. Stokes
Vol. 2, 1999

AbstractPreview

Abstract

▪ Abstract A central claim of democratic theory is that democracy induces governments to be responsive to the preferences of the people. Political parties organize politics in every modern democracy, and some observers claim that parties are what induce ...Read More

  • Full Text HTML
  • Download PDF

THE CAUSES OF WAR AND THE CONDITIONS OF PEACE

Jack S. Levy
Vol. 1, 1998

AbstractPreview

Abstract

▪ Abstract I organize this review and assessment of the literature on the causes of war around a levels-of-analysis framework and focus primarily on balance of power theories, power transition theories, the relationship between economic interdependence ...Read More

  • Full Text HTML
  • Download PDF

The Origins and Consequences of Affective Polarization in the United States

Shanto Iyengar, Yphtach Lelkes, Matthew Levendusky, Neil Malhotra, Sean J. Westwood
Vol. 22, 2019

Abstract - FiguresPreview

Abstract

While previously polarization was primarily seen only in issue-based terms, a new type of division has emerged in the mass public in recent years: Ordinary Americans increasingly dislike and distrust those from the other party. Democrats and Republicans ...Read More

  • Full Text HTML
  • Download PDF
  • Figures
image

Figure 1: Using data from the American National Election Study (ANES), the figure shows trends in average feeling for the party participants identify with (in-party) and for the opposing party (out-pa...


Framing Theory

Dennis Chong and James N. Druckman
Vol. 10, 2007

AbstractPreview

Abstract

▪ Abstract We review the meaning of the concept of framing, approaches to studying framing, and the effects of framing on public opinion. After defining framing and framing effects, we articulate a method for identifying frames in communication and a ...Read More

  • Full Text HTML
  • Download PDF

Political Misinformation

Jennifer Jerit and Yangzi Zhao
Vol. 23, 2020

AbstractPreview

Abstract

Misinformation occurs when people hold incorrect factual beliefs and do so confidently. The problem, first conceptualized by Kuklinski and colleagues in 2000, plagues political systems and is exceedingly difficult to correct. In this review, we assess the ...Read More

  • Full Text HTML
  • Download PDF

See More
  • © Copyright 2022
  • Contact Us
  • Email Preferences
  • Annual Reviews Directory
  • Multimedia
  • Supplemental Materials
  • FAQs
  • Privacy Policy
  • Cookie Preferences
Back to Top