Endogenous Selection Bias: The Problem of Conditioning on a Collider Variable

Felix Elwert; Christopher Winship

doi:10.1146/annurev-soc-071913-043455

Endogenous Selection Bias: The Problem of Conditioning on a Collider Variable

Felix Elwert¹, and Christopher Winship²
View Affiliations Hide Affiliations

Affiliations: ¹Department of Sociology, University of Wisconsin, Madison, Wisconsin 53706; email: [email protected] ²Department of Sociology, Harvard University, Cambridge, Massachusetts 02138; email: [email protected]
Vol. 40:31-53 (Volume publication date July 2014) https://doi.org/10.1146/annurev-soc-071913-043455
First published as a Review in Advance on June 02, 2014
© Annual Reviews

Abstract

Endogenous selection bias is a central problem for causal inference. Recognizing the problem, however, can be difficult in practice. This article introduces a purely graphical way of characterizing endogenous selection bias and of understanding its consequences (Hernán et al. 2004). We use causal graphs (direct acyclic graphs, or DAGs) to highlight that endogenous selection bias stems from conditioning (e.g., controlling, stratifying, or selecting) on a so-called collider variable, i.e., a variable that is itself caused by two other variables, one that is (or is associated with) the treatment and another that is (or is associated with) the outcome. Endogenous selection bias can result from direct conditioning on the outcome variable, a post-outcome variable, a post-treatment variable, and even a pre-treatment variable. We highlight the difference between endogenous selection bias, common-cause confounding, and overcontrol bias and discuss numerous examples from social stratification, cultural sociology, social network analysis, political sociology, social demography, and the sociology of education.

Keyword(s): causality, confounding, directed acyclic graphs, identification, selection

Article metrics loading...

/content/journals/10.1146/annurev-soc-071913-043455

2014-07-30

2024-05-05

Full text loading...

/deliver/fulltext/soc/40/1/annurev-soc-071913-043455.html?itemId=/content/journals/10.1146/annurev-soc-071913-043455&mimeType=html&fmt=ahah

Literature Cited

Alderman H, Behrman J, Kohler H, Maluccio JA, Watkins SC. 2001. Attrition in longitudinal household survey data: some tests for three developing-country samples. Demogr. Res. 5:79–124 [Google Scholar]
Allen MP, Lincoln A. 2004. Critical discourse and the cultural consecration of American films. Soc. Forces 82:3871–94 [Google Scholar]
Alwin DH, Hauser RM. 1975. The decomposition of effects in path analysis. Am. Sociol. Rev. 40:37–47 [Google Scholar]
Amin V. 2011. Returns to education: evidence from UK twins: comment. Am. Econ. Rev. 101:41629–35 [Google Scholar]
Angrist JD, Imbens GW, Rubin DB. 1996. Identification of causal effects using instrumental variables. J. Am. Stat. Assoc. 8:328–36 [Google Scholar]
Angrist JD, Krueger AB. 1999. Empirical strategies in labor economics. Handbook of Labor Economics 3, ed . O Ashenfelter, D Card 1277–366 Amsterdam: Elsevier [Google Scholar]
Bareinboim E, Pearl J. 2012. Controlling selection bias in causal inference. UCLA Cogn. Syst. Lab., Tech. Rep. R-381. Proc. 15th Int. Conf. Artif. Intell. Stat.(AISTATS), April 21–23, 2012, La Palma, Canary Islands N Lawrence, M Girolami 22100–8 Brookline, MA: Microtome http://ftp.cs.ucla.edu/pub/stat_ser/r381.pdf [Google Scholar]
Baron RM, Kenny DA. 1986. The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. J. Personal. Soc. Psychol. 51:1173–82 [Google Scholar]
Behr A, Bellgardt E, Rendtel U. 2005. Extent and determinants of panel attrition in the European Community Household Panel. Eur. Sociol. Rev. 21:5489–512 [Google Scholar]
Berk RA. 1983. An introduction to sample selection bias in sociological data. Am. Sociol. Rev. 48:3386–98 [Google Scholar]
Berkson J. 1946. Limitations of the application of fourfold tables to hospital data. Biometr. Bull. 2:347–53 [Google Scholar]
Blalock H. 1964. Causal Inferences in Nonexperimental Research Chapel Hill: Univ. N.C. Press
Bollen KA. 1989. Structural Equations with Latent Variables New York: Wiley
Bollen KA, Pearl J. 2013. Eight myths about causality and structural equation models. Handbook of Causal Analysis for Social Research SL Morgan 301–28 Dordrecht, Neth: Springer [Google Scholar]
Christofides LN, Li Q, Liu Z, Min I. 2003. Recent two-stage sample selection procedures with an application to the gender wage gap. J. Bus. Econ. Stat. 21:396–405 [Google Scholar]
Cole SR, Hernán MA. 2002. Fallibility in the estimation of direct effects. Int. J. Epidemiol. 31:163–65 [Google Scholar]
Coleman JS, Hoffer T, Kilgore S. 1982. High School Achievement: Public, Catholic, and Private Schools Compared New York: Basic Books
Duncan OD. 1966. Path analysis: sociological examples. Am J. Sociol. 72:11–16 [Google Scholar]
Elwert F. 2013. Graphical causal models. Handbook of Causal Analysis for Social Research SL Morgan 245–73 Dordrecht, Neth: Springer [Google Scholar]
Elwert F, Christakis NA. 2008. Wives and ex-wives: a new test for homogamy bias in the widowhood effect. Demography 45:4851–73 [Google Scholar]
Farr W. 1858. Influence of marriage on the mortality of the French people. Transaction National Association Promotion Social Science GW Hastings 504–13 London: John W. Park & Son [Google Scholar]
Finn JD, Gerber SB, Boyd-Zaharias J. 2005. Small classes in the early grades, academic achievement, and graduating from high school. J. Educ. Psychol. 97:2214–23 [Google Scholar]
Frangakis CE, Rubin DB. 2002. Principal stratification in causal inference. Biometrics 58:21–29 [Google Scholar]
Fu V, Winship C, Mare R. 2004. Sample selection bias models. Handbook of Data Analysis M Hardy, A Bryman 409–30 London: Sage [Google Scholar]
Gangl M, Ziefle A. 2009. Motherhood, labor force behavior, and women's careers: An empirical assessment of the wage penalty for motherhood in Britain, Germany, and the United States. Demography 46:2341–69 [Google Scholar]
Glymour MM, Greenland S. 2008. Causal diagrams. Modern Epidemiology KJ Rothman, S Greenland, T Lash 183–209 Philadelphia: Lippincott, 3rd ed.. [Google Scholar]
Grasdal A. 2001. The performance of sample selection estimators to control for attrition bias. Health Econ. 10:5385–98 [Google Scholar]
Greenland S. 2003. Quantifying biases in causal models: classical confounding versus collider-stratification bias. Epidemiology 14:300–6 [Google Scholar]
Greenland S, Pearl J, Robins JM. 1999a. Causal diagrams for epidemiologic research. Epidemiology 10:37–48 [Google Scholar]
Greenland S, Robins JM. 1986. Identifiability, exchangeability and epidemiological confounding. Int. J. Epidemiol. 15:413–19 [Google Scholar]
Greenland S, Robins JM, Pearl J. 1999b. Confounding and collapsibility in causal inference. Stat. Sci. 14:29–46 [Google Scholar]
Griliches Z, Mason WM. 1972. Education, income, and ability. J. Polit. Econ. 80:3S74–103 [Google Scholar]
Gronau R. 1974. Wage comparisons—a selectivity bias. J. Polit. Econ. 82:1119–44 [Google Scholar]
Gullickson A. 2006. Education and black-white interracial marriage. Demography 43:4673–89 [Google Scholar]
Hausman JA, Wise DA. 1977. Social experimentation, truncated distributions and efficient estimation. Econometrica 45:919–38 [Google Scholar]
Hausman JA, Wise DA. 1981. Stratification on endogenous variables and estimation. The Econometrics of Discrete Data C Manski, D McFadden 365–91 Cambridge, MA: MIT Press [Google Scholar]
Heckman JJ. 1974. Shadow prices, market wages and labor supply. Econometrica 42:4679–94 [Google Scholar]
Heckman JJ. 1976. The common structure of statistical models of truncation, sample selection, and limited dependent variables and a simple estimator for such models. Ann. Econ. Soc. Meas. 5:475–92 [Google Scholar]
Heckman JJ. 1979. Selection bias as a specification error. Econometrica 47:153–61 [Google Scholar]
Hernán MA, Hernández-Diaz S, Robins JM. 2004. A structural approach to section bias. Epidemiology 15:615–25 [Google Scholar]
Hernán MA, Hernández-Diaz S, Werler MM, Robins JM, Mitchell AA. 2002. Causal knowledge as a prerequisite of confounding evaluation: an application to birth defects epidemiology. Am. J. Epidemiol. 155:2176–84 [Google Scholar]
Hill DH. 1997. Adjusting for attrition in event-history analysis. Sociol. Methodol. 27:393–416 [Google Scholar]
Holland PW. 1986. Statistics and causal inference (with discussion). J. Am. Stat. Assoc. 81:945–70 [Google Scholar]
Holland PW. 1988. Causal inference, path analysis, and recursive structural equation models. Sociol. Methodol. 18:449–84 [Google Scholar]
Hudson JI, Javaras KN, Laird NM, VanderWeele TJ, Pope HG. et al. 2008. A structural approach to the familial coaggregation of disorders. Epidemiology 19:431–39 [Google Scholar]
Imai K, Keele L, Yamamoto T. 2010. Identification, inference, and sensitivity analysis for causal mediation effects. Stat. Sci. 25:151–71 [Google Scholar]
Kaufman S, Kaufman J, MacLenose R. 2009. Analytic bounds on causal risk differences in directed acyclic graphs involving three observed binary variables. J. Stat. Plan. Inference 139:3473–87 [Google Scholar]
Kim JH, Pearl J. 1983. A computational model for combined causal and diagnostic reasoning in inference systems. Proc. 8th Int. Jt. Conf. Artif. Intell. (IJCAI-83), Karlsruhe, FRG, Aug. 8–12, 1983 A Bundy 190–93 San Francisco: Morgan Kaufmann [Google Scholar]
Leigh A, Ryan C. 2008. Estimating returns to education using different natural experiment techniques. Econ. Educ. Rev. 27:2149–60 [Google Scholar]
Lin I, Schaeffer NC, Seltzer JA. 1999. Causes and effects of nonparticipation in a child support survey. J. Off. Stat. 15:2143–66 [Google Scholar]
Manski C. 2003. Partial Identification of Probability Distributions New York: Springer
Morgan SL, Winship C. 2007. Counterfactuals and Causal Inference: Methods and Principles for Social Research New York: Cambridge Univ. Press
O'Malley AJ, Elwert F, Rosenquist JN, Zaslavsky AM, Christakis NA. 2014. Estimating peer effects in longitudinal dyadic data using instrumental variables. Biometrics In press. doi: 10.1111/biom.12172
Pagan A, Ullah A. 1997. Nonparametric Econometrics Cambridge, UK: Cambridge Univ. Press
Pearl J. 1988. Probabilistic Reasoning in Intelligent Systems San Mateo, CA: Morgan Kaufmann
Pearl J. 1995. Causal diagrams for empirical research. Biometrika 82:4669–710 [Google Scholar]
Pearl J. 1998. Graphs, causality, and structural equation models. Sociol. Methods Res. 27:2226–84 [Google Scholar]
Pearl J. 2001. Direct and indirect effects. Proc. 17th Conf. Uncertain. Artif. Intell., Aug. 2–5, 2001, Seattle, WA J Breese, D Koller 411–20 San Francisco: Morgan Kaufmann [Google Scholar]
Pearl J. 2009. Causality: Models, Reasoning, and Inference New York: Cambridge Univ. Press, 2nd ed..
Pearl J. 2010. The foundations of causal inference. Sociol. Methodol. 40:75–149 [Google Scholar]
Pearl J. 2012. The causal mediation formula—a guide to the assessment of pathways and mechanisms. Prev. Sci. 13:4426–36 [Google Scholar]
Pearl J, Robins JM. 1995. Probabilistic evaluation of sequential plans from causal models with hidden variables. Uncertainty in Artificial Intelligence 11 P Besnard, S Hanks 444–53 San Francisco: Morgan Kaufmann [Google Scholar]
Raymo JM, Iwasawa M. 2005. Marriage market mismatches in Japan: an alternative view of the relationship between women's education and marriage. Am. Sociol. Rev. 70:5801–22 [Google Scholar]
Robins JM. 1986. A new approach to causal inference in mortality studies with a sustained exposure period: application to the health worker survivor effect. Math. Model. 7:1393–512 [Google Scholar]
Robins JM. 1989. The control of confounding by intermediate variables. Stat. Med. 8:679–701 [Google Scholar]
Robins JM. 1994. Correcting for non-compliance in randomized trials using structural nested mean models. Commun. Stat.-Theory Methods 23:2379–412 [Google Scholar]
Robins JM. 1999. Association, causation, and marginal structural models. Synthese 121:151–79 [Google Scholar]
Robins JM. 2001. Data, design, and background knowledge in etiologic inference. Epidemiology 23:3313–20 [Google Scholar]
Robins JM. 2003. Semantics of causal DAG models and the identification of direct and indirect effects. Highly Structured Stochastic Systems P Green, NL Hjort, S Richardson 70–81 New York: Oxford Univ. Press [Google Scholar]
Robins JM, Greenland S. 1992. Identifiability and exchangeability for direct and indirect effects. Epidemiology 3:143–55 [Google Scholar]
Robins JM, Wasserman L. 1999. On the impossibility of inferring causation from association without background knowledge. Computation, Causation, and Discovery CN Glymour, GG Cooper 305–21 Cambridge, MA: AAAI/MIT Press [Google Scholar]
Rosenbaum PR. 1984. The consequences of adjustment for a concomitant variable that has been affected by the treatment. J. R. Stat. Soc. Ser. A 147:5656–66 [Google Scholar]
Rothman KJ, Greenland S, Lash TL. 2008. Case-control studies. Modern Epidemiology KJ Rothman, S Greenland, TL Lash 111–27 Philadelphia, PA: Lippincott, 3rd ed.. [Google Scholar]
Rubin DB. 1974. Estimating causal effects of treatments in randomized and non-randomized studies. J. Educ. Psychol. 66:688–701 [Google Scholar]
Schmutz V. 2005. Retrospective cultural consecration in popular music. Am. Behav. Sci. 48:111510–23 [Google Scholar]
Schmutz V, Faupel A. 2010. Gender and cultural consecration in popular music. Soc. Forces 89:2685–708 [Google Scholar]
Shalizi CR, Thomas AC. 2011. Homophily and contagion are generically confounded in observational social network studies. Sociol. Methods Res. 40:211–39 [Google Scholar]
Sharkey P, Elwert F. 2011. The legacy of disadvantage: multigenerational neighborhood effects on cognitive ability. Am. J. Sociol. 116:61934–81 [Google Scholar]
Shpitser I, VanderWeele TJ. 2011. A complete graphical criterion for the adjustment formula in mediation analysis. Int. J. Biostat. 7:16 [Google Scholar]
Smith HL. 1990. Specification problems in experimental and nonexperimental social research. Sociol. Methodol. 20:59–91 [Google Scholar]
Sobel ME. 2008. Identification of causal parameters in randomized studies with mediating variables. J. Educ. Behav. Stat. 33:2230–51 [Google Scholar]
Spirtes P, Glymour CN, Scheines R. 2000. Causation, Prediction, and Search New York: Springer, 2nd ed..
Steiner PM, Cook TD, Shadish WR, Clark MH. 2010. The importance of covariate selection in controlling for selection bias in observational studies. Psychol. Methods 15:3250–67 [Google Scholar]
Stolzenberg RM, Relles DA. 1997. Tools for intuition about sample selection bias and its correction. Am. Sociol. Rev. 62:3494–507 [Google Scholar]
VanderWeele TJ. 2008a. Simple relations between principal stratification and direct and indirect effects. Stat. Probab. Lett. 78:2957–62 [Google Scholar]
VanderWeele TJ. 2008b. The sign of the bias of unmeasured confounding. Biometrics 64:702–6 [Google Scholar]
VanderWeele TJ. 2009a. Marginal structural models for the estimation of direct and indirect effects. Epidemiology 20:18–26 [Google Scholar]
VanderWeele TJ. 2009b. Mediation and mechanism. Eur. J. Epidemiol. 24:217–24 [Google Scholar]
VanderWeele TJ. 2010. Bias formulas for sensitivity analysis for direct and indirect effects. Epidemiology 21:540–51 [Google Scholar]
VanderWeele TJ. 2011a. Causal mediation analysis with survival data. Epidemiology 22:582–85 [Google Scholar]
VanderWeele TJ. 2011b. Sensitivity analysis for contagion effects in social networks. Sociol. Methods Res. 40:240–55 [Google Scholar]
VanderWeele TJ, An W. 2013. Social networks and causal inference. Handbook of Causal Analysis for Social Research SL Morgan 353–74 Dordrecht, Neth: Springer [Google Scholar]
VanderWeele TJ, Hernán MA, Robins JM. 2008. Causal directed acyclic graphs and the direction of unmeasured confounding bias. Epidemiology 19:720–28 [Google Scholar]
VanderWeele TJ, Robins JM. 2007a. Directed acyclic graphs, sufficient causes, and the properties of conditioning on a common effect. Am. J. Epidemiol. 166:91096–104 [Google Scholar]
VanderWeele TJ, Robins JM. 2007b. Four types of effect modification: a classification based on directed acyclic graphs. Epidemiology 18:5561–68 [Google Scholar]
VanderWeele TJ, Robins JM. 2009a. Minimal sufficient causation and directed acyclic graphs. Ann. Stat. 37:1437–65 [Google Scholar]
VanderWeele TJ, Robins JM. 2009b. Properties of monotonic effects on directed acyclic graphs. J. Mach. Learn. 10:699–718 [Google Scholar]
VanderWeele TJ, Shpitser I. 2011. A new criterion for confounder selection. Biometrics 67:1406–13 [Google Scholar]
Vella F. 1998. Estimating models with sample selection bias: A survey. J. Hum. Resour. 33:127–69 [Google Scholar]
Ver Steeg G, Galstyan A. 2011. A sequence of relaxations constraining hidden variable models. Proc. 27th Conf. Uncertain. Artif. Intell.(UAI2011), July 14–17, 2011, Barcelona, Spain FG Cozman, A Pfeffer 717–726 Corvalis, OR: AUAI Press [Google Scholar]
Verma T, Pearl J. 1988. Causal networks: semantics and expressiveness. Proc. 4th Workshop Uncertain. Artif. Intell.352–59 Minneapolis, MN/Mountain View, CA: AUAI Press [Google Scholar]
Weinberg CR. 1993. Towards a clearer definition of confounding. Am. J. Epidemiol. 137:1–8 [Google Scholar]
Winship C, Korenman S. 1997. Does staying in school make you smarter? The effect of education on IQ in The Bell Curve. Intelligence, Genes, and Success: Scientists Respond to The Bell Curve, ed. B Devlin, SE Fienberg, DP Resnick, K Roeder 215–34 New York: Springer [Google Scholar]
Winship C, Mare RD. 1992. Models for sample selection bias. Annu. Rev. Sociol. 18:327–50 [Google Scholar]
Wodtke G, Harding D, Elwert F. 2011. Neighborhood effects in temporal perspective: the impact of long-term exposure to concentrated disadvantage on high school graduation. Am. Sociol. Rev. 76:713–36 [Google Scholar]
Wooldridge J. 2002. Econometric Analysis of Cross Section and Panel Data Cambridge, MA: MIT Press
Wooldridge J. 2005. Violating ignorability of treatment by controlling for too many factors. Econ. Theory 21:1026–28 [Google Scholar]
Wright S. 1934. The method of path coefficients. Ann. Math. Stat. 5:3161–215 [Google Scholar]

/content/journals/10.1146/annurev-soc-071913-043455

Endogenous Selection Bias: The Problem of Conditioning on a Collider Variable

Annual Review of Sociology 40, 31 (2014); https://doi.org/10.1146/annurev-soc-071913-043455

/content/journals/10.1146/annurev-soc-071913-043455

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Birds of a Feather: Homophily in Social Networks
  
  Miller McPherson, Lynn Smith-Lovin, and James M Cook
  
  Vol. 27 (2001), pp. 415–444
- Social Capital: Its Origins and Applications in Modern Sociology
  
  Alejandro Portes
  
  Vol. 24 (1998), pp. 1–24
- Conceptualizing Stigma
  
  Bruce G. Link, and Jo C. Phelan
  
  Vol. 27 (2001), pp. 363–385
- Framing Processes and Social Movements: An Overview and Assessment
  
  Robert D. Benford, and David A. Snow
  
  Vol. 26 (2000), pp. 611–639
- Organizational Learning
  
  Barbara Levitt, and James G. March
  
  Vol. 14 (1988), pp. 319–338
- The Study of Boundaries in the Social Sciences
  
  Michèle Lamont, and Virág Molnár
  
  Vol. 28 (2002), pp. 167–195
- Assessing “Neighborhood Effects”: Social Processes and New Directions in Research
  
  Robert J. Sampson, Jeffrey D. Morenoff, and Thomas Gannon-Rowley
  
  Vol. 28 (2002), pp. 443–478
- Social Exchange Theory
  
  R M Emerson
  
  Vol. 2 (1976), pp. 335–362
- Culture and Cognition
  
  Paul DiMaggio
  
  Vol. 23 (1997), pp. 263–287
- Focus Groups
  
  David L. Morgan
  
  Vol. 22 (1996), pp. 129–152
More Less

Annual Review of Sociology

Volume 40, 2014

Review Article

Free

Endogenous Selection Bias: The Problem of Conditioning on a Collider Variable

Abstract

Most Read This Month

Most Cited Most Cited RSS feed

Birds of a Feather: Homophily in Social Networks

Social Capital: Its Origins and Applications in Modern Sociology

Conceptualizing Stigma

Framing Processes and Social Movements: An Overview and Assessment

Organizational Learning

The Study of Boundaries in the Social Sciences

Assessing “Neighborhood Effects”: Social Processes and New Directions in Research

Social Exchange Theory

Culture and Cognition

Focus Groups