Measuring the Impact of Human Rights: Conceptual and Methodological Debates

Fifty years ago, the world had very few human rights laws and very little information on human rights violations. Today, the situation could not be more different. The world is awash in laws and indicators of legal violations


INTRODUCTION
Binding human rights laws and human rights indicators emerged simultaneously over the last 50 years.As of December 16, 1966, three total multilateral human rights treaties were opened for adoption, and no comprehensive effort existed to measure rights-based practices.Observers struggled to count and catalog atrocities in European dictatorships, war-torn Vietnam, and apartheid South Africa (Anti-Apartheid Mov. 1966, Benenson 1961, Norden 1966).By 2016, 9 multilateral treaties, 9 optional protocols, 3 regional bodies of law, and 99 additional international instruments had protected over 60 human rights. 1 These laws are now accompanied by an ever-increasing number of indicators.To date, roughly 400 quantitative and qualitative measures of human rights, democracy, and rule of law have been coded by projects making use of standardized human rights documents, journalistic accounts, the historical record, and most recently, thousands of collaborating experts. 2 Observing these historical developments, one may justifiably assume that human rights law caused indicators to come into being.After all, law demands the standardized collection of hard evidence, and at a deeper level, both lawmaking and statistics owe their existence to the enduring human desire for rationally ordered society (Dezalay & Garth 2002, Scott 1998).
However, human rights law and human rights indicators were directly connected neither at their origins nor during the first three decades of their emergence.Efforts to analyze the relationship between human rights law and indicators of states' compliance did not begin in earnest until the turn of the twenty-first century.Since then, two main approaches have come to dominate a quickly expanding body of research.One we term the factualist approach, 3 which studies the relationship between law and human rights violations using indictors that are accepted as accurate representations of social facts.The other is the constructivist approach, which treats indicators not as facts but as socially produced technologies of knowledge and power.Though these perspectives diverge radically, they agree on a central notion: that international human rights law has contributed very little to social progress.As we explain in the first two sections, these approaches have reached their apotheosis, converging on several irresolvable challenges.
In this article, we synthesize an emerging constitutive approach to law and indicators that means to address shortcomings in the literature by connecting theories of human rights with theories of measurement.The constitutive approach argues that human rights law possesses a politically motivated normative dimension that fundamentally changes the way humans process and understand data.The law did not cause indicators, but it has changed them over time.The constitutive approach addresses constructivists' central criticism-that the production of statistical indicators is not a neutral or independent social process.In so doing, it also challenges the conclusions from the factualist approach by questioning the assumed validity of indicators.Instead of rejecting indicators outright, as the critical constructivists often do, the constitutive approach points to a set of research design and statistical tools that allow for the linking of theory and measurement. 1These counts are themselves imperfect indicators.For the number of protected rights, see Green (2001).For numbers of treaties and instruments, see the Office of the UN High Commissioner for Human Rights (http://www.ohchr.org/EN/ProfessionalInterest/Pages/UniversalHumanRightsInstruments.aspx).Elliott (2011) places the number of total human rights instruments at 779, though his definition is looser and extends between the years 1863 and 2003.
The constitutive approach encourages scholars to treat measurement itself as an object of theorizing and inquiry.Fortunately, we now stand on the verge of a revolution in human rights theory and measurement, whereby social scientists can more accurately estimate the nexus between law, indicators, and state behaviors.Researchers are now able to use new, computationally intensive, but theoretically appropriate tools for understanding the flaws and biases of the information produced to understand human rights.When applying these new theoretically motivated tools, it becomes clear not only that there is a relationship between law and data collection but also, once this relationship is taken into account, that law has promoted improvements in the human condition.This is the promise of the constitutive approach.

A GENEALOGY OF HUMAN RIGHTS INDICATORS
In 1976, the year that the two main international human rights covenants took legal effect, the US State Department already had an institutional architecture in place for reporting on human rights practices and democracy in other countries; so did Amnesty International (AI) and Freedom House.However, none of these organizations had yet initiated standardized yearly reporting to monitor compliance with human rights laws.Embittered by Vietnam and the coup in Chile, and intent on linking foreign aid disbursements to improvements in rights protections, the US Congress established the State Department's Bureau of Human Rights and Humanitarian Affairs in 1975.The bureau would eventually develop into a fully functional human rights reporting organization (Cingranelli & Pasquarello 1985, De Neufville 1986). 4I developed its own reporting methods a decade prior, in response to a lack of effective legal treaties (Clark 2001, Grant 2011).If states would not take concrete measures to enforce human rights, AI argued, it would begin monitoring compliance on its own.AI published its first annual report in 1962 and performed a successful monitoring mission in authoritarian Greece in 1968.The organization followed up in 1973-the same year Donald Fraser initiated first-of-akind congressional hearings on human rights in US foreign policy-by starting a very influential campaign against torture, complete with more extensive reporting that set a standard for human rights documentation (Rodley 2011).This campaign set in motion a process that ended in the creation of the Convention Against Torture over a decade later (UN Gen. Assem. 1984).AI's expansion and notoriety coincided with the release of Freedom House's first Comparative Survey of Freedom report in 1972.An anti-communist outfit funded in part by the State Department, Freedom House had scored countries according to freedom debits and credits since 1955, but in 1972 it switched to categorical seven-point scaled rankings of civil and political rights (Bradley 2015, Giannone 2010, Satterthwaite 2016).All of these early human rights reports-which relied on foreign service officer documentation, witness testimony, and expert rankings-were inspired in some way by the concept of human rights, but none purported to measure directly compliance with legal treaties.The law was too underdeveloped by the time the reports started to accomplish this goal.
Scholarship using data on human rights violations also did not focus much on the question of compliance or implementation of international law.One exception is Matthew Lippman's (1979, p. 55) article in the first volume of Human Rights Quarterly, which notes that 61 countries use torture and that "it is futile to attempt to protect individuals from torture through treaties and legal instruments."The article counted binary data from AI on whether torture is used at all in a country.The extent of the methodology was to report any instance of torture as a failure measured against an ideal of zero reported incidents.Other more sophisticated studies from this early period focused on whether military regimes were more likely to suspend or violently suppress civil and political rights (Claude 1976, Henderson 1982, Hibbs 1973, McKinlay & Cohan 1976) or whether the US government's efforts to slow foreign aid to rights-violating countries worked (Cingranelli & Pasquarello 1985, Poe 1990, Rubin & Newberg 1980, Schoultz 1981, Stohl et al. 1984).As of 1984, the year that the Convention Against Torture was adopted, the two strands of research using human rights indicators were comparative studies of repressive regimes and analyses of US foreign policy.
Some seminal attempts to measure human rights, like Charles Humana's World Human Rights Guide, came and went during this early phase (Humana 1983; see also Rummel 1976, Taylor & Jodice 1983).In the process, Freedom House's civil and political rights indices gained the most cachet by the early 1980s.This was the data of first resort for two main reasons.One is that Freedom House produced two simple ordered categorical scores for each country, which allowed for longitudinal comparisons back to the early 1970s (Bollen 1986).The second reason is that it employed a checklist developed and published by a social scientist, Raymond Gastil (1980).This checklist provided some transparency.Perhaps in response to Freedom House's popularity, many scholars voiced concern that Gastil's scores, which he undertook individually, were not reliable because they could not be replicated (Scoble & Weisberg 1981).Other researchers showed that, because they were so highly correlated, the civil and political rights scales were largely redundant, and that the coding was biased toward Western understandings of liberal democracy (Banks 1986, Bollen 1986).
The years 1984-1986 marked a turning point for the use of human rights indicators.Two things happened.First, scholars began to devote a significant amount of attention to the subject of measurement itself.A special issue of Human Rights Quarterly in 1986 focused on statistical methods and human rights, as did edited books in 1986 and 1988 (Cingranelli 1988, Stohl & Lopez 1986).In the 1986 Human Rights Quarterly symposium, two points of view started to solidify.One held that the study of human rights was plagued by measurement issues, but that these could be addressed with greater methodological rigor and theoretical attention to the process by which the information was generated (Stohl et al. 1986).The other held that human rights scholarship was blindly following the dictum that "if data exist, they will be used" (Goldstein 1986, p. 621).From this perspective, the tendency to fetishize numbers means preferring bad data to no data, risks a loss of interpretive nuance, and creates a fundamental alteration to the production of knowledge.These themes would continue to resonate in future work and remain present today.
A second big change that came about in the mid-1980s was the exit of Freedom House as the most relied-upon measure of human rights.Freedom House was repeatedly challenged on methodological grounds, and by 1989, Raymond Gastil was replaced by a panel of coders.He admitted during this time that he had sometimes coded countries based on unsystematic impressions (Barsh 1993, Spirer 1990).Interestingly, toward the end of the Cold War, the Freedom House indices shifted from the human rights measures of choice to the democracy measures of choice.In so doing, it moved from use primarily as a dependent variable to use as an independent variable in statistical models of human rights.Though the Freedom House indices have also been challenged as a poor indicator of democracy (Munck & Verkuilen 2002), they continue to be favored among government officials, as well as policy-oriented scholars who publish in outlets like the Journal of Democracy or Foreign Affairs (Diamond 2011, Giannone 2010).
As Freedom House left the human rights scene, two other global databases arose to fill the gap.One is called the Political Terror Scale (PTS).The first iteration of this project was associated

INDICES AND SCALES
Both indices and scales are models that combine multiple pieces of data into a composite measure.Generally, though, indices and scales combine different forms of information, respectively, ratio-or interval-level data and categorical data that are ordered or unordered (nominal).For example, the Dow Jones Industrial Average is an index that combines the stock prices (ratio-level data) for 30 publicly traded companies.A scale tends to combine categorical responses, such as "strongly agree," "moderately agree," "moderately disagree," and "strongly disagree."In some cases, these categories map directly onto internal-level numerical information.For example, take "hot," "warm," and "cold."This is an English language-based measure of the concept of temperature with three constituent parts.If one were to substitute 1, 2, and 3 for "hot," "warm," and "cold," the following statement would make perfect sense: "Boiled water is very 1, and ice cubes are very 3." In this respect, both the alphabetical and numerical indicators are exactly the same, containing no loss of meaning.However, interval or ratio expressions are often preferred for their precision.If one wants to know the temperature outside, one most likely consults a thermometer, which ranges from −50 • F to 150 • F. Thermometers do not possess an inherent advantage over categorical indicators, but they capture more exacting variations in temperature than the English language allows.It would be quite difficult to memorize and apply 200 different words describing the temperature outside.The greater precision that thermometers provide us allows for comparison of temperatures across locations and the ability to know with certainty how bad a fever is.Generally, in the human rights field, we have only categorical information to combine into variables, so the term scale is preferred here.with researchers at Purdue University. 5Under the directorship of Mark Gibney since 1984, PTS adapted a coding scheme from the Freedom House checklist and applied it to information systematically captured in the Country Reports on Human Rights Practices published annually by the US State Department and the State of the World's Human Rights report published annually by AI (Wood & Gibney 2010).The resulting five-point categorical scales are based on the content of each set of reports.These two scales measure the extent to which every state in the world commits abuses to physical integrity rights, dating back to 1976. 6 The other project, the Cingranelli-Richards Human Rights Data Project (CIRI), came out of David Cingranelli's early work on US foreign policy (Cingranelli 1988, Cingranelli & Pasquarello 1985), but it did not become publicly available until 1999 (Cingranelli & Richards 1999a).These variables, ranging from 1981 to 2011, are also drawn from State Department and AI reports, but CIRI introduced a new scheme to disaggregate physical integrity violations combining information from both sets of reports.The set of variables contains four categorical scales (each 0-2) representing the extremity of political imprisonment, torture, extrajudicial killing, and disappearance in a country.Combined, these subscales produced an additive nine-point categorical scale: the Physical Integrity Index. 7This scale, somewhat confusingly labeled an index, is the summation of four three-point categorical scales (see sidebar titled Indices and Scales).
5 These include Michael Stohl, David Carleton, and Steven Johnson.These researchers originally collected data on 59 foreign aid recipient countries.This was expanded to the entire world under the directorship of Mark Gibney. 6Fewer countries were reported on by the US State Department before 1981. 7The project also used information in the reports to code an expanded set of categorical scales (each 0-2) representing the extremity of assembly and association, foreign movement, domestic movement, speech, electoral self-determination, and religion.These subscales are also used to produce an additive 15-point categorical scale: the Empowerment Rights Index.The CIRI project contains two versions of some of these variables and therefore two versions of the additive empowerment scale.Finally, the project also coded several categorical scales measuring women's rights.
PTS and CIRI would come to dominate the statistical study of human rights.Since 1982, PTS alone has been used in just over 500 scholarly publications.8However, the conquest of PTS and CIRI did not happen immediately.It followed a period during which categorization and statistical analysis of human rights and repression went mainstream.From 1987 to 1995, authors sought to establish patterns linking structural determinants to rights violations, using many different combinations of indicators.Park (1987) showed that Islamic states were less likely to protect civil and social rights.Mitchell & McCormick (1988) found that developed states were more likely to protect physical integrity.Harff & Gurr (1988; see also Gurr & Scarritt 1989) collected data on genocides, politicides, and minorities at risk and argued that fewer of these are found in Europe and Latin America.
New statistical discoveries dovetailed with the end-of-history mentality in the last throes of the Cold War (Fukuyama 2006).In 1991, amid the dissolution of the Soviet Empire, the United Nations Development Programme released a report ranking all countries according to their adherence to the Universal Declaration of Human Rights.The critical tone of this report sent lesser-developed and former-communist countries into an uproar but resonated with human rights statistical research (Barsh 1993).For example, based on the most comprehensive longitudinal categorical data from PTS, and using the most sophisticated statistical techniques available at the time, one prominent study showed that democracy was definitively better than communist dictatorship (Poe & Tate 1994).Government in the West was superior, and the data verified it.With these developments, human rights indicators went from a specialist's field to a part of the post-Cold War zeitgeist.
Though the mainstreaming of human rights indicators coincided with the "decade of international law" in the 1990s,9 scholarship still did not draw a direct connection between international law and indicators of rights protection.Studies in the late 1990s basically focused on the multivariate structural correlates of PTS and CIRI scores, combined with a theory concerning leaders' rational decisions to repress (Mason & Krane 1989).The main findings were that democracies outperform other regimes (Armstrong & Davenport 2004, Cingranelli & Richards 1999b, Davenport 1996), that transitional or middling democracies are actually more violent than stable democracies and autocracies (Fein 1995, Regan & Henderson 2002), that regimes respond to dissent and other threats with repressive violence (Carey 2009, Davenport 1995), and that civil war is consistently associated with more violations to physical integrity (Poe et al. 1999).Combined, these factors formed a "standard model" of repression (Keith 2012).However, this model included nothing about states' obligations to international human rights law.By the turn of the century, human rights lawyers had managed to push through several human rights instruments, and scholars of social movements were high on the potential for legal activism to address several rights (Keck & Sikkink 1998).However, statistical human rights research to this point engaged sparingly with law. 10 Comparative human rights studies possessed an advanced and evidence-based sense of what structures threaten a narrow set of "survival rights," but repression studies remained disconnected from the broader international legal context (Donnelly & Howard 1988).

TWO APPROACHES TO LAW AND INDICATORS
In some respects, the disconnect between comparative and legal research is to be expected.Political scientists are not lawyers.However, by 2000, there was a growing sense that political science should more deeply engage law in international and comparative politics.In that year, International Organization printed a symposium devoted to the concept of legalization in international relations.Others lamented the great scholarly expanse between international law and political science (Raustalia & Slaughter 2002).The effort to combine legal and comparative scholarship using indicators began in earnest after the initiation of the War on Terror. 11In 2002, Oona Hathaway (2002) published an article demonstrating that countries that ratify the International Covenant on Civil and Political Rights (ICCPR) are, counterintuitively, more likely to commit rights violations. 12This was followed by several compliance studies that test the effects of international law by correlating treaty ratifications with PTS and CIRI scores (Conrad 2012;Conrad & Ritter 2013;Hafner-Burton & Tsutsui 2005, 2007;Hill 2010;Lupu 2013aLupu ,b, 2015;;Neumayer 2005;Powell & Staton 2009;Simmons 2009;Simmons & Danner 2010).One study claimed human rights legalization and measured rights practices are "radically decoupling" (Hafner-Burton & Tsutsui 2005).Though this study did find a positive correlation with NGO presence, others find, conditionally, that treaty ratifications are associated with declining repressive violence in countries with an active NGO sector, or that are transitioning to democracy (Powell & Staton 2009, Simmons 2009).Still others argue that, on the basis of this correlational evidence, autocratic leaders ramp up abuses in the wake of treaty ratification, or use human rights treaties to signal their lack of willingness to leave office (Hollyer & Rosendorff 2012, Vreeland 2008).

Factualist Accounts
The style of these compliance studies-where state treaty ratification is treated as a predictor and changes in that state's average physical integrity scores are treated as an outcome-embodies what we call the factualist approach to international law and indicators.The factualist approach usually starts by deducing a causal theory relating an independent variable (causal input) to a dependent variable (social outcome).The aim for the scholar studying human rights law using the factualist approach is normally to evaluate whether, or under what conditions, multilateral treaties-with monitoring bodies but little direct enforcement-can improve the recognition of rights at the country-year level.Thus, the independent variable is typically a legal treaty like the ICCPR or CAT, and the dependent variable is the categorical level of human rights abuse from the PTS or CIRI projects.
Most factualist models make two assumptions.The first is that legal treaties and repression variables are exogenously related (or can be modeled as such), meaning that they belong to distinct social processes.Researchers test theory by studying the correlation between two data series normally retrieved from already-existing projects.Independent indicators of law are usually binary categories-1 for treaty ratification, 0 for no ratification-and indicators of human rights abuse are the PTS or CIRI categorical scales.Because these data are available across time in many countries, it is possible to estimate a value representing the average categorical physical integrity scores in all countries and years in which laws have been ratified and compare those averages to all country-years in which laws have not been ratified.A second assumption is that these data series are measured without error or bias.That is, no omitted variable causes change in these variables either directly or through the process by which they are measured.Implicit to this approach is trust that the procedures used to generate the information and measure these concepts are not systematically related to the development of human rights law or the actualization of state human rights practices.
The factualist approach to human rights law uses statistical models as an analytic tool but takes for granted the process by which the data are generated.Because the approach takes measurement for granted, it has the advantage of being relatively simple to implement, and its contributions are powerful because they seem both elegant and systematic.It is difficult to argue with averages when the precision of the underlying data is presumed to be perfect.For example, Eric Posner (2014, p. 7) contends in The Twilight of Human Rights Law that "there is little evidence that human rights treaties, on the whole, have improved the well-being of people, or even resulted in respect for the rights in those treaties."As evidence, Posner shows charts indicating that increasing numbers of treaty ratifications over time are not accompanied by concomitant upward-sloping averages in CIRI scores.A similar contribution comes from Rod Abouharb, Mikhail Fillipov, and David Cingranelli.In a 2015 volume, they find with regressions that ratification of the International Covenant on Economic, Social and Cultural Rights is correlated with higher scores on the Physical Quality of Life Index.However, "ratifying more international covenants worsens ESR [economic and social rights]" because too many laws means too many demands on leaders, who experience "agency loss" (Abouharb et al. 2015, p. 43).The statistical models used in these studies provide insight into why rational leaders seemingly fail to live up to the expectations embedded in the human rights treaties ratified by their countries.
In many ways, these two studies represent the apogee of the factualist approach.Statistical work has left us with a sea of correlations, negative and positive, between ratifications, domestic institutions, and human rights indicators (Hafner-Burton 2014).But no matter how one cuts it, law's impact is on average marginal, highly conditional, and causally underwhelming.Thus, the common interpretation is that law has not contributed significantly to progress.However, this is not the whole story.It is a version of reality heavily influenced by the way indicators are created.There is little reference to or theory about the production process that originally generates human rights language.
The factualist account is narrow and process-weak.It is narrow because compliance is operationalized as changes in aggregate state practices that follow in short order after ratification.However, this definition of compliance conflates the regulatory concept of rule-following with a consequentialist concept of effectiveness. 13Outcomes may improve following treaty enactment, but this might owe nothing to the intent of leaders to follow legally mandated policy changes.Official actors who in principle endorse the use of numerical indicators contend that they should not only capture outcomes but also measure legal processes that intercede between treaties and aggregate state practices (Landman 2004, Landman & Carvalho 2009).In a 2008 report, the Office of the UN High Commissioner of Human Rights embraced the use of indicators but suggested a three-part framework for measuring "structure, process, and outcomes."Factualist studies are weak on process because most do not collect or analyze data on the laws and behaviors that connect specific treaty provisions to domestic legal practices.One reason is that these data would require painstaking collection efforts and might not lend themselves to cross-sectional time-series analysis.

The Constructivist Approach
Another reason the factualist approach is limited is that categorical indicators might be fundamentally unable to capture the true process of compliance.This position is central to the constructivist 13 "Compliance needs to be distinguished from the concepts of implementation and effectiveness.Unlike those two concepts, compliance focuses neither on the effort to administer authoritatively public policy directives and the changes they undergo during this administrative process (implementation) nor on the efficacy of a given regulation to solve the political problem that preceded its formulation (effectiveness)" (Neyer & Wolf 2005, pp. 41-42).See also Finnemore &Toope (2001) and Howse & Teitel (2010).
approach to law and indicators.Constructivists hold that indicators are not a neutral or objective way of measuring reality.Instead, they produce a particular version of reality.They are a form of power-knowledge that reduces dynamic legal processes, political contestation, heterogeneous local contexts, and cultural symbols to static numbers (Davis et al. 2015). 14Further, the practice of making indicators shifts legal-political matters to the field of accountancy.It therefore reinforces a rationalist "audit culture" that assigns worth only to those interventions that have a quantifiable impact (Rosga & Satterthwaite 2009).At a maximum, this is abusive because indicators are produced by the powerful to legitimize hegemony, albeit behind the veil of objectivity.The Freedom House indices are one case in point (Bradley 2015, Giannone 2010).At a minimum, law is given short shrift because its normativity, or its ability to set standards and animate influential political interactions, is almost completely dismissed by the factualist approach.As a technology, numbers simply cannot represent norms like words and symbols can.Categorical data are therefore too crude to capture the conceptual richness of law and human rights and the relationship between them.
Critical constructivist accounts of human rights law are not the first to lament that statistics are a language of control that does violence to variation in local context.After all, we are reminded that the word statistics itself derives from the German stat-istik, which was used to describe the practice of collecting and rationalizing information for the purpose of state-building.These practices were damaging, harming nature and killing people (Scott 1998).Yet the factualist approach to data often promotes the use of statistics precisely because it allows one to see through the noise of local context (Hafner-Burton & Ron 2009).However, for constructivists, adopting this quantitative "language of distance" when considering human rights will cause one to miss diverse factors that require political judgment (Rosga & Satterthwaite 2009).The founding creators of human rights indicators understood this,15 but the message got lost as social scientists were trained to study law and politics by downloading data sets and crunching the numbers without reference to the underlying processes those data actually represent (Adcock & Collier 2001, Schrodt 2014).
In addition to loss of context, critical constructivists identify three concerns regarding the use of indicators in the study of human rights law.The first is that preference will be given to questions that lend themselves to quantifiable answers.Research on law will become data driven especially when the data are easy to collect.One may observe this phenomenon regarding ESR.The literature on human rights law contains a tremendous amount of studies of physical integrity rights.Welldesigned and well-implemented studies of civil and political rights are also available, though less plentiful (Davenport 1996(Davenport , 2007)).But studies of ESR are very few.Contrary to many accounts, this is not because human rights scholars from the West were traditionally uninterested in these rights.In fact, many scholars in the early period of research wrote thoughtfully on ESR (Donnelly & Howard 1988, Howard 1983).Others thought that valid indicators of ESR already existed, meaning that more attention should go to collecting data on other sets of rights. 16However, recent research has shown that conclusion to be exactly wrong.Legal protections like the right to health or the right to education, which are subject to "progressive realization" under international law, are tremendously difficult to measure (Kalantry et al. 2010).Doing so requires the creation of benchmarks, careful legal thinking, and complex frameworking.Because this does not translate well into cross-national and quantitative indicators-for example, percentage of primary school enrollment is a poor indicator of compliance with international law-scholars over time stopped trying to collect much data for or analyze these kinds of rights (Rosga & Satterthwaite 2009). 17 second concern for constructivists is Goodhart's law, or the "tendency of a measure to become a target" (Rosga & Satterthwaite 2009, p. 285).When human subjects are measured and held accountable on the basis of indicators, they adapt.Part of this adaptation is to improve on the measured item, without necessarily changing the structures that affect performance.If it is policy to allot funds to schools on the basis of test scores, then the schools will teach to the test to improve the scores.The indicator being used to hold schools accountable ceases to be a valid measure of overall education because it is now systematically related to test preparation instead of educational attainment.A similar process occurred in the advocacy and study of physical integrity rights.If a government knows its repressive tactics are being scrutinized, it will shift to hide its abuses.DeMars (2005) argues that the innovation of disappearances in Argentina and Chile during the 1970s was actually a response to human rights reporting.Because AI criticized these countries so heavily for torture, the regimes made their enemies disappear, rendering torture undetectable.18If Goodhart's law holds, and we think that it does, it means that indicators themselves change practice.Some quantitative human rights researchers have long recognized this possibility, but up until recently the standard approach of the factualist data user was to mostly ignore it (Barsh 1993, Goldstein 1986).
A third and related concern is indicator endogeneity.This problem refers to the possibility that the construction of indicators is itself a function of the law, and indicators change as the law changes.Clark & Sikkink (2013, p. 539) argue that "good news about increased human rights information [is] bad news for human rights measures."Because both digital media and the diffusion of transnational information networks have expanded since the origin of human rights research, there is a distinct possibility that researchers producing indicators see more evidence of abuse than they did 30 years ago.More evidence means worse scores, and worse scores over time mean deceptively little change in estimates of average human rights practices.Because international legal bodies engage in monitoring and information production (Simmons 2009), law and the legal processes that accompany it are indirectly implicated in the production of indicators.
The constructivist approach alerts us to these possibilities, but it falls short on solutions.Newer work in the critical constructivist vein tends to strike a chord of despair.Stephen Hopgood's (2013) The Endtimes of Human Rights is a powerful critique of the culture of law-giving, standardization, and imperious universality that pervades the human rights legal regime.Though he resists wholesale the econometric method, Hopgood shares with the factualist approach the conviction that human rights law contributes little to progress.The book ends with a prediction that international criminal law will soon crumble.Another critical volume devoted to indicators ends not with a way forward but with an open-ended question: "Why would anyone want to use an indicator that put itself radically in doubt?" (Nelsen 2015, p. 333).

Moving Forward
What we are left with, then, are two approaches that are pessimistic about the prospects for law, but which diverge significantly about what to do and how to evaluate the efficacy of rights-protecting institutions.The factualist approach continues to use human rights indicators to produce findings but is only beginning to confront their shortcomings.The constructivist approach seems content to persistently sound an alarm about indicators based on the following problems: (a) inattention to context, (b) ignorance of rights other than protections of physical integrity, (c) the tendency of measured subjects to adapt to measurement, and (d ) the endogenous relationship between law and the production of indicators.Next, we outline a third constitutive approach to human rights law and indicators.With this approach, we attempt to address these four problems, while also demonstrating that human rights law is associated with measurable improvements in the world.

A CONSTITUTIVE APPROACH TO INFORMATION, LAW, AND INDICATORS
Law does not independently cause the production of indicators.However, law directly influences how observers and academics use information to generate, develop, and change indicators.One pathway for the direct influence of human rights law is through the reporting requirements for UN treaties (Creamer & Simmons 2015).Once states ratify a human rights treaty, they must develop new domestic institutions or make use of existing ones to monitor, document, and report on the implementation of the provisions of the treaty (Creamer & Simmons 2015).All UN human rights treaties contain such provisions.Information gathering and reporting-measurement-is therefore an important type of compliance behavior that varies between states and over time.International and domestic monitors are also able to look harder and in more places for abuse because of an increasingly dense network of civil society organizations (Greenhill 2010, Keck & Sikkink 1998, Murdie & Bhasin 2011).Sometimes these civil society organizations connect and work with the state institutions charged with collecting information about compliance with domestic and international law.What are we to make of the complex relationship between law, measurement, and shifting human rights practices?
We could follow the factualist approach and continue to use human rights indicators to produce statistical findings while downplaying the biases of the data (Hafner-Burton 2014).Or we could follow the suggestions of constructivists and, cognizant of the failings inherent in the data, largely reject the use of indicators entirely.Neither of these pathways offers a way forward.
Fortunately, a third way is now coming into form.The constitutive approach suggests that human rights law possesses a normative political dimension that changes the way human beings process and understand information (Dancy & Fariss 2017).The constitutive approach is built on two theoretical assumptions.The first assumption is that law, human rights compliance, and the production of indicators are not independent processes.They are endogenous and mutually constitutive, in part because binding human rights law and human rights information gathering emerged as part of the same transnational campaigns (Brysk 1994;Keck & Sikkink 1998;Sikkink 2011Sikkink , 2017)).Information is never neutral, apolitical, or objectively separate from legal mobilization.This idea compels one to theorize about not only the relationship between law and human rights compliance as concepts of theoretical interest but also the monitoring and observational processes that have developed over time to produce information and indicators (see, e.g., Kapiszewski & Taylor 2013).
In reference to the science of indicators, the constitutive approach assumes first that the context in which information is produced affects the translation of the theoretical concept into an operational protocol.Translation error is a mismatch between the theoretical concept being represented and the proposed pieces of information chosen by the analyst for inclusion in the construction of that concept's indicator.This occurs when the operational protocol does not match the theory of the concept.If one were measuring judicial independence, it would be a translation error to propose including information on the popularity of Supreme Court decisions.Though popularity might make a court more effective, it does not mean the court is more independent, which applies to its judicial autonomy.Translation errors are often more subtle and difficult to detect than measurement errors for three reasons.First, legal concepts that scholars operationalize are complex and multifaceted.Civil and political rights like freedom of expression, or ESR like the right to education, look very different in case law than they do as indicators.Second, monitoring agencies or institutions creating the information used to construct indicators are often not the same entities that are building theoretical concepts.Researchers at AI and the US State Department are often surprised to find how extensively academics use their reports for data analysis.And third, as agencies change strategies over time, they also change the operational protocols they use for monitoring.If the analyst accepts the possibility of translation error, she will need to assess the original operational protocols used to collect existing information, alongside the theoretical concept it purports to represent.
The second assumption of the constitutive approach is that, even once translation errors have been addressed, measurement error exists because both systematic processes and random chance affect the collection of the information used to create data.Measurement errors occur when data collection procedures used to collect information and then generate indicators miss the true value of a concept.The true value of the theoretical concept is what we call the latent trait because, though we believe it exists, it is difficult if not impossible to objectively and precisely observe.This holds for both quantitative and qualitative data.For example, only one true value of yearly police killings in the United States exists.Likewise, there is only one value of police accountability for those killings.Estimating the value of police killings requires a sophisticated approach that accounts for measurement error (Ball 2016).It is difficult to ascertain the latent value of police killings with certainty because killings go unreported, data on lethal violence are not collected in a uniform way by local law enforcement, and the FBI makes mistakes when aggregating information from every locality into a database.Police accountability is an even more difficult concept to measure.Not only does it require counting events like trials of police officers, it also requires translating those events into a coherent and justifiable concept of accountability.
Most indicators of human rights tend to base categorization on available information about a specific rights practice.If the availability of information relevant to the operational protocol is missing, either systematically or randomly, then measurement errors will be present in the resulting indicators.Measurement errors are often addressed by collecting new data with special attention paid to why information might be missing systematically or at random.Valid comparisons require the careful and tedious application of the theoretically appropriate operational definitions concerning human rights abuses from incident to incident.The amount of work involved is often mind-boggling, especially if data must be gathered from scores of newspapers, archives, witness testimonies, or police files (Goldstein 1986).Even with new theoretically appropriate operational protocols, the collection of new data, however studious the data collector may be, will still contain at least some random error and possibly systematic error as well.
The sheer weight of concern over error inspires some scholars to counsel against systematic comparison in favor of practical judgment based on deep knowledge of cases (Zunino 2011).However, one may accept the likelihood of error and still attempt valid comparison.One pragmatic approach is to use models that can estimate and correct for translation and measurement errors using existing data.A measurement model (e.g., a latent variable model) is a statistical tool that brings together diverse sources of data (both quantitative and qualitative) and links them together in a theoretically informative way.This is accomplished within the model either (a) by assessing translation errors between the theoretical concept and the operational definition or (b) by addressing measurement errors that occur when the operational protocol is imperfectly employed to generate or gather the data.In both cases, these discrepancies can be adjusted for within the structure of the model and then evaluated empirically.
One example of a productive measurement model is the Human Rights Protection Scores Project (Fariss 2014).Using a latent variable analysis, Fariss addresses translation error, arguing that the operational definition used to generate human rights data based on information from human rights reports has changed over time.Despite this change, existing data sets like PTS and CIRI treat all cases as though they are directly comparable.Though the operational procedure of the CIRI and PTS projects reliably captures information from the human rights documents, these projects do not reflect the fact that the source material itself evolves.For one, human rights monitors widened the scope and depth of their search for abuses worldwide.The result has been a more extensive catalog of abuses over time.
The latent variable model links observed indicators derived from the content of the human rights country reports to the underlying latent trait, repressive violence.It treats each categorical indicator as an observed manifestation that is caused by the underlying theoretical human rights process.The observed categorical indicators taken from the reports are manifestations of a complex data-generation process that begins with the human rights abuses themselves, then follows with the observation and collection of allegations about those human rights abuses, the organization of those allegations into a structured narrative account contained within the country reports, and finally the coding/categorization process of that content.With sufficient information about this process, the latent variable model provides estimates of the relative level of human rights respect for each country-year unit in the sample based on the categorical values of the reports and theoretical knowledge about the rest of the underlying process (see sidebar titled The Latent Variable Model).Fariss's (2014) scores take into account the argument that the reports-based indicators are direct manifestations of the available information contained in AI and State Department human rights country reports, which are produced with an increasing standard of accountability.This research, which goes to great pains to address translation and measurement errors, finds that progress has occurred in specific issue areas like physical integrity rights, and likely some other areas as well (Figure 1).When the standard of accountability is modeled, aggregate data on the protection of human rights are clearly trending upward.Moreover, improvements in physical

THE LATENT VARIABLE MODEL
Mechanically, the latent variable model simply places each of the country-year units relative to one another along a single interval-level dimension with a score of 0, meaning that the units are average relative to one another, and unit standard deviations above or below this value, meaning that the units are more or less distinct from the average.The model produces a range of values from approximately −3 to 3. Substantively, these scores or relative placements of units correspond to the values of the original manifest variables so a country-year unit placed at the average value of 0, depending on the year, is probabilistically likely to commit acts of ill treatment and torture and to arbitrarily imprison individuals.However, this hypothetical average country is not probabilistically likely to commit acts of extrajudicial killing, though evidence of such acts is sometimes contained within the country-year reports.Country-year units that score close to −3 are being categorized as the worst abusers on nearly all of the human rights indicators contained in the model.Country-year units that score close to 3 are being categorized as committing few if any human rights abuses on the available indicators.For a more detailed introduction to this modeling process, see Fariss (2017).Yearly mean and credible intervals for latent physical integrity estimates from the changing standard latent variable model and the constant standard latent variable model (based on updated data from Fariss 2014).
Only the latent variable estimates that assume a changing standard of accountability show improvement for either type of country-year.
integrity over time are correlated with the increasing density of international laws being ratified and implemented by states (Fariss 2017).
Other measurement projects with theoretical insights about the way the information is produced provide a counterpoint to both the factualist and constructivist approaches (Crabtree & Fariss 2015).Data projects adopting a constitutive approach to understand the relationship between new and existing sets of indicators and theoretical concepts are in development.New human rights indicators were recently published as part of the Varieties of Democracy (V-DEM) Project (Coppedge et al. 2014).The expert-coded data include several hundred new indicators of human rights, democracy, the rule of law, and other institutional features from 1900 to 2015.The measurement models used to construct these latent variables rely on multiple country expert coders who answer categorical questions for each country-year unit.This project includes many new human rights variables, two of which are physical integrity variables: (a) freedom from political killing and (b) freedom from torture.Like those of Fariss (2014), the V-DEM data show a stark improvement over time (Figure 2).Though the V-DEM measurement model attempts to address the disagreement between coders, bias might still persist if the expert coders are using the same historical source material or share common biases.But these pitfalls actually represent the value of this measurement approach because the exploration of these potential biases and how they relate to the biases from other human rights data are important areas of future research.
Measurement models are now being deployed to create unified measures of other concepts like judicial efficacy (Figure 3) and women's rights protections (D.W. Hill Jr. & J. Inglett, unpublished manuscript; Linzer & Staton 2015).Thus far, these projects are capable of generating a fuller account of historical trends based on information captured by many data sources.In this sense, The yearly average for the two expert-coded Varieties of Democracy (V-DEM) physical integrity variables, (a) state-sanctioned killing and (b) state-sanctioned torture from 1949 to 2013 (Coppedge et al. 2014;Pemstein et al. 2015), which is the same time period available for the most recent update of latent human rights variables.What should be clear from this visualization is a very similar upward trend in respect for human rights after the end of the Cold War.These positive trends coincide with the pattern of increasing respect for human rights found with the new data from Fariss (2014).
measurement modelers are moving beyond the presentism of many singular studies, which tend to capture only temporally limited amounts of information, with a great deal of bias and error.In this sense, the new measurement modelers are escaping the narrow islands of empirical analysis produced by factualist accounts.But they are also moving beyond constructivist discontents, who see a complex world composed of human rights violations with ever-denser legal regulations, made

Mean judicial independence (standard deviations)
Less independence

More independence
The yearly average for Linzer & Staton's (2015) Judicial Independence data 1949-2012.This positive trend coincides with the pattern of increasing respect for human rights found with the new data from Fariss (2014).
even more complex by the hard-to-manage ecology of indicators diffusing in fits and starts across a chaotic global information network.Proponents of the constitutive model take this complexity seriously, and they are beginning to cut through it.

CONCLUSION
Fifty years ago, the world had very few human rights laws, very little information on human rights violations, and very few indicators of those violations.Today, the situation could not be more different.We are awash in all three.The factualist response has been to measure what one can, however imperfectly, and establish as many correlations between independent variables as possible.The constructivist approach is to remind us that these processes are not independent and that our efforts as a science of law and human rights are fallible.The constitutive approach accepts the constructivist critique and offers a way forward.The constitutive approach to data analysis supports macrohistorical work that finds evidence of legal-normative progress (Sikkink 2011), lessening cruelty (Pinker 2011), and declining war over time (Goldstein 2012).However, its great promise is that it also provides for microlevel comparative analyses of singular cases of social decay amid this larger backdrop of positive change.This approach can thus put to rest long-standing arguments over the relative utility of indicators, while also enabling a more concrete science of law and human rights (Schnakenberg & Fariss 2014).
But scholars are only at the tip of the iceberg.Much work remains for adherents to the constitutive approach.Measurement theorists should work with treaty bodies to establish more reliable indicators of ESR, such as the right to education and freedom from racial discrimination.Furthermore, although macrolevel relationships between expanding international legal regulations and improved practices are becoming evident, more research must be devoted to the mechanisms linking those processes together.For example, one research program demonstrates that state commitment to international rights law is correlated with greater activism in those states (Simmons 2009), as well as greater efforts at domestic litigation and transitional justice (Dancy & Michel 2016, Dancy & Montal 2017, Dancy & Sikkink 2012).There is a good chance that these trends towards greater legalization (Figure 4) are associated with improvements in human rights (a) Yearly counts of the number of domestic criminal prosecutions for human rights abuses .(b) Yearly number of country ratifications of UN Human Rights Treaties (1970-2010).These positive trends coincide with the new view of increasing respect for human rights found with the new data from Fariss (2014) (data taken from Dancy et al. 2014).
protections over time (Dancy & Fariss 2017).However, the same measurement techniques currently being applied to indicators of human rights conditions need to be applied to these mechanisms of remedy.Studies of what is to be done must keep pace with studies of what is wrong.Only if scholars move toward more comprehensive and theoretically sound indicators of mechanisms, in addition to indicators of causes and outcomes, can a science of law and humans be advanced. Figure1 Figure 2 Figure 4