1932

Abstract

The increasingly tight coupling between humans and system operations in domains ranging from intelligent infrastructure to e-commerce has led to a challenging new class of problems founded on a well-established area of research: incentive design. There is a clear need for a new tool kit for designing mechanisms that help coordinate self-interested parties while avoiding unexpected outcomes in the face of information asymmetries, exogenous uncertainties from dynamic environments, and resource constraints. This article provides a perspective on the current state of the art in incentive design from three core communities—economics, control theory, and machine learning—and highlights interesting avenues for future research at the interface of these domains.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-control-053018-023634
2019-05-03
2024-04-16
Loading full text...

Full text loading...

/deliver/fulltext/control/2/1/annurev-control-053018-023634.html?itemId=/content/journals/10.1146/annurev-control-053018-023634&mimeType=html&fmt=ahah

Literature Cited

  1. 1.  Campaigne C, Balandat M, Ratliff LJ 2016. Welfare effects of dynamic electricity pricing Work. Pap., Univ. Calif., Berkeley
  2. 2.  Ge Y, Knittel CR, MacKenzie D, Zoepf S 2016. Racial and gender discrimination in transportation network companies NBER Work. Pap. 22776
  3. 3.  Yuan C, Thai J, Bayen AM 2016. ZUbers against ZLyfts apocalypse: an analysis framework for DoS attacks on mobility-as-a-service systems. 2016 ACM/IEEE 7th International Conference on Cyber-Physical Systems (ICCPS) New York: IEEE https://doi.org/10.1109/ICCPS.2016.7479132
    [Crossref] [Google Scholar]
  4. 4.  Westenbroek T, Dong R, Ratliff LJ, Sastry SS 2017. Statistical estimation in competitive settings with strategic data sources. 2017 IEEE 56th Annual Conference on Decision and Control4994–99 New York: IEEE
    [Google Scholar]
  5. 5.  Ho CJ, Slivkins A, Vaughan JW 2016. Adaptive contract design for crowdsourcing markets: bandit algorithms for repeated principal-agent problems. J. Artif. Intell. Res. 55:31759
    [Google Scholar]
  6. 6.  Singla A, Krause A 2013. Truthful incentives in crowdsourcing tasks using regret minimization mechanisms. Proceedings of the 22nd International Conference on World Wide Web116778 New York: ACM
    [Google Scholar]
  7. 7.  Cai Y, Daskalakis C, Papadimitriou CH 2015. Optimum statistical estimation with strategic data sources. Proceedings of the 28th Conference on Learning Theory P Grünwald, E Hazan, S Kale28096 Proc. Mach. Learn. Res. 40 N.p.: PMLR
    [Google Scholar]
  8. 8.  Goodfellow IJ, Shlens J, Szegedy C 2014. Explaining and harnessing adversarial examples. arXiv:1412.6572 [stat.ML]
  9. 9.  Laffont JJ, Martimort D 2002. The Theory of Incentives: The Principal-Agent Model Princeton, NJ: Princeton Univ. Press
  10. 10.  Bolton P, Dewatripont M 2005. Contract Theory Cambridge, MA: MIT Press
  11. 11.  Weber T 2011. Optimal Control Theory with Applications in Economics Cambridge, MA: MIT Press
  12. 12.  Antmann P 2009. Reducing technical and non-technical losses in the power sector Rep. 92639, World Bank, Washington, DC
  13. 13.  Amin S, Schwartz G, Cardenas A, Sastry S 2015. Game-theoretic models of electricity theft detection in smart utility networks. IEEE Control Syst. Mag. 35:16681
    [Google Scholar]
  14. 14.  Akerlof GA 1970. The market for “lemons”: quality uncertainty and the market mechanism. Q. J. Econ. 84:488500
    [Google Scholar]
  15. 15.  Mussa M, Rosen S 1978. Monopoly and product quality. J. Econ. Theory 18:30117
    [Google Scholar]
  16. 16.  Maskin E, Riley J 1984. Monopoly with incomplete information. RAND J. Econ. 15:17196
    [Google Scholar]
  17. 17.  Rothschild M, Stiglitz J 1976. Equilibrium in competitive insurance markets: an essay on the economics of imperfect information. Q. J. Econ. 90:62949
    [Google Scholar]
  18. 18.  Spence A 1974. Market Signaling: Informational Transfer in Hiring and Related Screening Processes Cambridge, MA: Harvard Univ. Press
  19. 19.  Alchian AA, Demsetz H 1972. Production, information costs, and economic organization. Am. Econ. Rev. 62:77795
    [Google Scholar]
  20. 20.  Hölmstrom B 1979. Moral hazard and observability. Bell J. Econ. 10:7491
    [Google Scholar]
  21. 21.  Dirk B, Juuso V 2010. The dynamic pivot mechanism. Econometrica 78:77189
    [Google Scholar]
  22. 22.  Susan A, Ilya S 2013. An efficient dynamic mechanism. Econometrica 81:246385
    [Google Scholar]
  23. 23.  Courty P, Hao L 2000. Sequential screening. Rev. Econ. Stud. 67:697717
    [Google Scholar]
  24. 24.  Battaglini M 2005. Long-term contracting with Markovian consumers. Am. Econ. Rev. 95:63758
    [Google Scholar]
  25. 25.  Esö P, Szentes B 2007. Optimal information disclosure in auctions and the handicap auction. Rev. Econ. Stud. 74:70531
    [Google Scholar]
  26. 26.  Board S 2007. Selling options. J. Econ. Theory 136:32440
    [Google Scholar]
  27. 27.  Kakade SM, Lobel I, Nazerzadeh H 2013. Optimal dynamic mechanism design and the virtual-pivot mechanism. Oper. Res. 61:83754
    [Google Scholar]
  28. 28.  Alessandro P, Ilya S, Juuso T 2014. Dynamic mechanism design: a Myersonian approach. Econometrica 82:60153
    [Google Scholar]
  29. 29.  Borenstein S 2005. The long-run efficiency of real-time electricity pricing. Energy J. 26:93116
    [Google Scholar]
  30. 30.  Jónsson T, Pinson P, Madsen H 2010. On the market impact of wind energy forecasts. Energy Econ. 32:31320
    [Google Scholar]
  31. 31.  Başar T, Olsder GJ 1995. Dynamic Noncooperative Game Theory Philadelphia: Soc. Ind. Appl. Math.
  32. 32.  Ho YCH, Luh PB, Olsder GJ 1980. A control-theoretic view on incentives. 1980 19th IEEE Conference on Decision and Control Including the Symposium on Adaptive Processes116070 New York: IEEE
    [Google Scholar]
  33. 33.  Groot N, Schutter BD, Hellendoorn H 2012. Reverse Stackelberg games, part I: basic framework. 2012 IEEE International Conference on Control Applications42126 New York: IEEE
    [Google Scholar]
  34. 34.  Zheng YP, Başar T, Cruz JB 1984. Stackelberg strategies and incentives in multiperson deterministic decision problems. IEEE Trans. Syst. Man Cybernet. SMC-14:1024
    [Google Scholar]
  35. 35.  Ho YC, Luh P, Muralidharan R 1981. Information structure, Stackelberg games, and incentive controllability. IEEE Trans. Autom. Control 26:45460
    [Google Scholar]
  36. 36.  Zheng YP, Başar T 1982. Existence and derivation of optimal affine incentive schemes for Stackelberg games with partial information: a geometric approach. Int. J. Control 35:9971011
    [Google Scholar]
  37. 37.  Ratliff LJ, Coogan S, Calderone D, Sastry SS 2012. Pricing in linear-quadratic dynamic games. 2012 50th Annual Allerton Conference on Communication, Control, and Computing1798805 New York: IEEE
    [Google Scholar]
  38. 38.  Olsder GJ 2009. Phenomena in inverse Stackelberg games, part 1: static problems. J. Optim. Theory Appl. 143:589
    [Google Scholar]
  39. 39.  Olsder GJ 2009. Phenomena in inverse Stackelberg games, part 2: dynamic problems. J. Optim. Theory Appl. 143:601
    [Google Scholar]
  40. 40.  Liu X, Zhang S 1992. Optimal incentive strategy for leader-follower games. IEEE Trans. Autom. Control 37:195761
    [Google Scholar]
  41. 41.  Ho YC 1983. On incentive problems. Syst. Control Lett. 3:6268
    [Google Scholar]
  42. 42.  Cruz J 1978. Leader-follower strategies for multilevel systems. IEEE Trans. Autom. Control 23:24455
    [Google Scholar]
  43. 43.  Başar T, Selbuz H 1979. Closed-loop Stackelberg strategies with applications in optimal control or multilevel systems. IEEE Trans. Autom. Control 24:16679
    [Google Scholar]
  44. 44.  Tolwinski B 1981. Closed-loop Stackelberg solution to a multistage linear-quadratic game. J. Optim. Theory Appl. 34:485501
    [Google Scholar]
  45. 45.  Banerjee S, Johari R, Riquelme C 2015. Pricing in ride-sharing platforms: a queueing-theoretic approach. Proceedings of the Sixteenth ACM Conference on Economics and Computationp 639 New York: ACM
    [Google Scholar]
  46. 46.  Calderone D, Ratliff LJ, Sastry SS 2014. Pricing for coordination in open-loop differential games. IFAC Proc. Vol. 47:90016
    [Google Scholar]
  47. 47.  Jing YW, Zhang SY 1988. The solution to a kind of Stackelberg game systems with multi-follower: coordinative and incentive. Analysis and Optimization of Systems A Bensoussan, JL Lions pp. 593–602. Berlin: Springer
    [Google Scholar]
  48. 48.  Zhang SY 1987. A nonlinear incentive strategy for multi-stage Stackelberg games with partial information. 1986 25th IEEE Conference on Decision and Control135257 New York: IEEE
    [Google Scholar]
  49. 49.  Dobakhshari DG, Gupta V 2016. A contract design approach for phantom demand response. arXiv:1611.09788 [math.OC]
  50. 50.  Vamvoudakis KG, Lewis FL, Dixon WE 2019. Open-loop Stackelberg learning solution for hierarchical control problems. Int. J. Adapt. Control Signal Process. 33:285–99
    [Google Scholar]
  51. 51.  Ratliff LJ 2015. Incentivizing efficiency in societal-scale cyber-physical systems PhD Thesis, Univ. Calif., Berkeley
  52. 52.  Ratliff LJ, Fiez T 2018. Adaptive incentive design. arXiv:1806.05749 [cs.GT]
  53. 53.  Auer P, Cesa-Bianchi N, Fischer P 2002. Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47:23556
    [Google Scholar]
  54. 54.  Arora S, Hazan E, Kale S 2012. The multiplicative weights update method: a meta-algorithm and applications. Theory Comput. 8:12164
    [Google Scholar]
  55. 55.  Auer P, Cesa-Bianchi N, Freund Y, Schapire RE 2002. The nonstochastic multiarmed bandit problem. J. Comput. 32:4877
    [Google Scholar]
  56. 56.  Bubeck S, Cesa-Bianchi N 2012. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found. Trends Mach. Learn. 5:1122
    [Google Scholar]
  57. 57.  Bubeck S 2011. Introduction to online optimization Lect. Notes, Dep. Oper. Res. Financ. Eng., Princeton Univ., Princeton, NJ http://sbubeck.com/BubeckLectureNotes.pdf
  58. 58.  Hazan E 2016. Introduction to online convex optimization. Found. Trends Optim. 2:157325
    [Google Scholar]
  59. 59.  Ratliff LJ, Sekar S, Zheng L, Fiez T 2018. Incentives in the dark: multi-armed bandits for evolving users with unknown type. arXiv:1803.04008 [cs.LG]
  60. 60.  Fiez T, Sekar S, Zheng L, Ratliff LJ 2018. Combinatorial bandits for incentivizing agents with dynamic preferences. Uncertainty in Artificial Intelligence: Proceedings of the Thirty-Fourth Conference A Globerson, R Silva693703 Corvallis, OR: AUAI Press
    [Google Scholar]
  61. 61.  Jain S, Gujar S, Bhat S, Zoeter O, Narahari Y 2018. A quality assuring, cost optimal multi-armed bandit mechanism for expert sourcing. Artif. Intell. 254:4463
    [Google Scholar]
  62. 62.  Jain S, Narayanaswamy B, Narahari Y 2014. A multiarmed bandit incentive mechanism for crowdsourcing demand response in smart grids. Twenty-Eighth AAAI Conference on Artificial Intelligence72127 Menlo Park, CA: AAAI Press
    [Google Scholar]
  63. 63.  Mansour Y, Slivkins A, Syrgkanis V, Wu ZS 2016. Bayesian exploration: incentivizing exploration in Bayesian games. Proceedings of the 2016 ACM Conference on Economics and Computationp 661 New York: ACM
    [Google Scholar]
  64. 64.  Li L, Chu W, Langford J, Schapire RE 2010. A contextual-bandit approach to personalized news article recommendation. Proceedings of the 19th International Conference on World Wide Web66770 New York: ACM
    [Google Scholar]
  65. 65.  Wais P, Lingamneni S, Cook D, Fennell J, Goldenberg B 2010. Towards building a high-quality workforce with Mechanical Turk Paper presented at the Computational Social Science and the Wisdom of Crowds Workshop, Neural Information Processing Systems Conference, Whistler, Can., Dec. 10 https://people.cs.umass.edu/∼wallach/workshops/nips2010css/papers/wais.pdf
  66. 66.  Huang JL, Liu M, Bowling NA 2015. Insufficient effort responding: examining an insidious confound in survey data. J. Appl. Psychol. 100:82845
    [Google Scholar]
  67. 67.  Lovett M, Bajaba S, Lovett M, Simmering MJ 2018. Data quality from crowdsourced surveys: a mixed method inquiry into perceptions of Amazon's Mechanical Turk Masters. Appl. Psychol. 67:33966
    [Google Scholar]
  68. 68.  Guha S, Munagala K 2007. Approximation algorithms for budgeted learning problems. Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing10413 New York: ACM
    [Google Scholar]
  69. 69.  Babaioff M, Sharma Y, Slivkins A 2014. Characterizing truthful multi-armed bandit mechanisms. SIAM J. Comput. 43:194230
    [Google Scholar]
  70. 70.  Einav L, Farronato C, Levin J, Sundaresan N 2018. Auctions versus posted prices in online markets. J. Political Econ. 126:178215
    [Google Scholar]
  71. 71.  Blum A, Hartline JD 2005. Near-optimal online auctions. Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms115663 Philadelphia: Soc. Ind. Appl. Math.
    [Google Scholar]
  72. 72.  Hajiaghayi MT, Kleinberg RD, Parkes DC 2004. Adaptive limited-supply online auctions. Proceedings of the 5th ACM Conference on Electronic Commerce7180 New York: ACM
    [Google Scholar]
  73. 73.  Cesa-Bianchi N, Gentile C, Mansour Y 2015. Regret minimization for reserve prices in second-price auctions. IEEE Trans. Inf. Theory 61:54964
    [Google Scholar]
  74. 74.  Babaioff M, Dughmi S, Kleinberg R, Slivkins A 2015. Dynamic pricing with limited supply. ACM Trans. Econ. Comput. 3:4
    [Google Scholar]
  75. 75.  Badanidiyuru A, Kleinberg R, Slivkins A 2018. Bandits with knapsacks. J. ACM 65:13
    [Google Scholar]
  76. 76.  Roth A, Ullman J, Wu ZS 2016. Watch and learn: optimizing from revealed preferences feedback. Proceedings of the 48th Annual ACM Symposium on Theory of Computing94962 New York: ACM
    [Google Scholar]
  77. 77.  Frazier PI, Kempe D, Kleinberg JM, Kleinberg R 2014. Incentivizing exploration. Proceedings of the Fifteenth ACM Conference on Economics and Computation522 New York: ACM
    [Google Scholar]
  78. 78.  Han L, Kempe D, Qiang R 2015. Incentivizing exploration with heterogeneous value of money. Proceedings of the 11th International Conference on Web and Internet Economics37083 Berlin: Springer
    [Google Scholar]
  79. 79.  Singla A, Santoni M, Bartók G, Mukerji P, Meenen M, Krause A 2015. Incentivizing users for balancing bike sharing systems. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence72329 Menlo Park, CA: AAAI Press
    [Google Scholar]
  80. 80.  Papanastasiou Y, Bimpikis K, Savva N 2017. Crowdsourcing exploration. Manag. Sci. 64:172746
    [Google Scholar]
  81. 81.  Liu Y, Ho C 2018. Incentivizing high quality user contributions: new arm generation in bandit learning. Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence114653 Menlo Park, CA: AAAI Press
    [Google Scholar]
  82. 82.  Liu Y, Chen Y 2016. A bandit framework for strategic regression. Advances in Neural Information Processing Systems 29 DD Lee, M Sugiyama, UV Luxburg, I Guyon, R Garnett181321 Red Hook, NY: Curran
    [Google Scholar]
  83. 83.  Roughgarden T, Schrijvers O 2017.Online prediction with selfish experts Advances in Neural Information Processing Systems 30 I Guyon, UV Luxburg, S Bengio, H Wallach, R Fergus130010 Red Hook, NY: Curran:
  84. 84.  Amin K, Rostamizadeh A, Syed U 2013. Learning prices for repeated auctions with strategic buyers. Advances in Neural Information Processing Systems 26 CJC Burges, L Bottou, M Welling, Z Ghahramani, KQ Weinberger116977 Red Hook, NY: Curran:
    [Google Scholar]
  85. 85.  Braverman M, Mao J, Schneider J, Weinberg SM 2017. Multi-armed bandit problems with strategic arms. arXiv:1706.09060 [cs.GT]
  86. 86.  Tversky A, Kahneman D 1974. Judgment under uncertainty: heuristics and biases. Science 185:112431
    [Google Scholar]
  87. 87.  Brown PN, Marden JR 2017. Studies on robust social influence mechanisms: incentives for efficient network routing in uncertain settings. IEEE Control Syst. 37:98115
    [Google Scholar]
  88. 88.  Croci E 2016. Urban road pricing: a comparative study on the experiences of London, Stockholm and Milan. Transp. Res. Procedia 14:25362
    [Google Scholar]
  89. 89.  Kahneman D, Tversky A 1979. Prospect theory: an analysis of decision under risk. Econometrica 47:26391
    [Google Scholar]
  90. 90.  Tang CK 2016. Traffic externalities and housing prices: evidence from the London congestion charge Discuss. Pap. 205, Spatial Econ. Res. Cent., London
  91. 91.  Tversky A, Kahneman D 1992. Advances in prospect theory: cumulative representation of uncertainty. J. Risk Uncertain. 5:297323
    [Google Scholar]
  92. 92.  Simon HA 1955. A behavioral model of rational choice. Q. J. Econ. 69:99118
    [Google Scholar]
  93. 93.  Cohen A, Einav L 2007. Estimating risk preferences from deductible choice. Am. Econ. Rev. 97:74588
    [Google Scholar]
  94. 94.  Gershman SJ, Horvitz EJ, Tenenbaum JB 2015. Computational rationality: a converging paradigm for intelligence in brains, minds, and machines. Science 349:27378
    [Google Scholar]
  95. 95.  Ratliff LJ, Mazumdar E 2017. Inverse risk-sensitive reinforcement learning. arXiv:1703.09842v3 [cs.LG]
  96. 96.  Mazumdar E, Ratliff LJ, Fiez T, Sastry SS 2017. Gradient-based inverse risk-sensitive reinforcement learning. 2017 IEEE 56th Annual Conference on Decision and Control5796801 New York: IEEE
    [Google Scholar]
  97. 97.  Shen Y, Tobia MJ, Sommer T, Obermayer K 2014. Risk-sensitive reinforcement learning. Neural Comput. 26:1298328
    [Google Scholar]
  98. 98.  Majumdar A, Singh S, Mandlekar A, Provone M 2017. Risk-sensitive inverse reinforcement learning via coherent risk models. Robotics: Science and Systems XIII N Amato, S Srinivasa, N Ayanian, S Kuindersma chap. 69. N.p.: Robot. Sci. Syst. Found.
  99. 99.  Acemoglu D, Makhdoumi A, Malekian A, Ozdaglar A 2018. Informational Braess’ paradox: the effect of information on traffic congestion. Oper. Res. 66:893917
    [Google Scholar]
  100. 100.  Sekar S, Zheng L, Ratliff LJ, Zhang B 2018. Uncertainty in multi-commodity routing networks: When does it help. 2018 Annual American Control Conference655358 New York: IEEE:
    [Google Scholar]
  101. 101.  Wu M, Liu J, Amin S 2017. Informational aspects in a class of Bayesian congestion games. 2017 American Control Conference365057 New York: IEEE
    [Google Scholar]
  102. 102.  Braess D, Nagurney A, Wakolbinger T 2005. On a paradox of traffic planning. Transp. Sci. 39:44650
    [Google Scholar]
  103. 103.  Lo C 2012. Game theory: introducing randomness to airport security. Airport Technology July 25 http://www.airport-technology.com/features/featuregame-theory-airport-security-teamcore-Stackelberg
  104. 104.  Li P, Sekar S, Zhang B 2018. A capacity-price game for uncertain renewables resources. Proceedings of the Ninth International Conference on Future Energy Systems11933 New York: ACM
    [Google Scholar]
  105. 105.  Kamenica E, Gentzkow M 2011. Bayesian persuasion. Am. Econ. Rev. 101:2590615
    [Google Scholar]
  106. 106.  Bergemann D, Morris S 2018. Information design: a unified perspective. J. Econ. Lit. In press
  107. 107.  Vakili S, Zhao Q 2016. Risk-averse multi-armed bandit problems under mean-variance measure. J. Sel. Top. Signal Process. 10:1093111
    [Google Scholar]
  108. 108.  Even-Dar E, Kearns MJ, Wortman J 2006. Risk-sensitive online learning. Algorithmic Learning Theory: 17th International Conference, ALT 2006 JL Balcázar, PM Long, F Stephan199213 Berlin: Springer:
    [Google Scholar]
  109. 109.  Haws KL, Bearden WO 2006. Dynamic pricing and consumer fairness perceptions. J. Consum. Res. 33:30411
    [Google Scholar]
  110. 110.  Irwin N 2017. Why surge prices make us so mad: what Springsteen, Home Depot and a Nobel winner know. New York Times Oct. 14 https://www.nytimes.com/2017/10/14/upshot/why-surge-prices-make-us-so-mad-what-springsteen-home-depot-and-a-nobel-winner-know.html
  111. 111.  Varian HR 1974. Equity, envy, and efficiency. J. Econ. Theory 9:6391
    [Google Scholar]
  112. 112.  Berliant M, Thomson W, Dunz K 1992. On the fair division of a heterogeneous commodity. J. Math. Econ. 21:20116
    [Google Scholar]
  113. 113.  Dwork C, Hardt M, Pitassi T, Reingold O, Zemel R 2012. Fairness through awareness. Proceedings of the 3rd Innovations in Theoretical Computer Science Conference21426 New York: ACM
    [Google Scholar]
  114. 114.  Joseph M, Kearns M, Morgenstern JH, Roth A 2016.Fairness in learning: classic and contextual bandits Advances in Neural Information Processing Systems 29 DD Lee, M Sugiyama, UV Luxburg, I Guyon, R Garnett32533 Red Hook, NY: Curran
  115. 115.  Nace D, Pioro M 2008. Max-min fairness and its applications to routing and load-balancing in communication networks: a tutorial. IEEE Commun. Surv. Tutor. 10:517
    [Google Scholar]
  116. 116.  Gillen S, Jung C, Kearns M, Roth A 2018. Online learning with an unknown fairness metric. arXiv:1802.06936 [cs.LG]
  117. 117.  Bertsimas D, Farias VF, Trichakis N 2012. On the efficiency-fairness trade-off. Manag. Sci. 58:223450
    [Google Scholar]
  118. 118.  Hu L, Chen Y 2018. A short-term intervention for long-term fairness in the labor market. Proceedings of the 2018 World Wide Web Conference138998 Geneva: Int. World Wide Web Conf. Comm.
    [Google Scholar]
  119. 119.  Benade G, Kazachkov AM, Procaccia AD, Psomas CA 2018. How to make envy vanish over time. Proceedings of the 2018 ACM Conference on Economics and Computation593610 New York: ACM
    [Google Scholar]
  120. 120.  Lorini E, Mühlenbernd R 2015. The long-term benefits of following fairness norms: a game-theoretic analysis. PRIMA 2015: Principles and Practice of Multi-Agent Systems Q Chen, P Torroni, S Villata, J Hsu, A Omicini30118 Cham, Switz.: Springer
    [Google Scholar]
  121. 121.  Corbett-Davies S, Pierson E, Feller A, Goel S, Huq A 2017. Algorithmic decision making and the cost of fairness. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining797806 New York: ACM
    [Google Scholar]
  122. 122.  Friedler SA, Scheidegger C, Venkatasubramanian S 2016. On the (im)possibility of fairness. arXiv:1609.07236 [cs.CY]
  123. 123.  Smith M, Patil D, Munoz C 2016. Big data: a report on algorithmic systems, opportunity, and civil rights Rep., Exec. Off. Pres., Washington, DC
  124. 124.  Aghion P, Bloom N, Blundell R, Griffith R, Howitt P 2005. Competition and innovation: an inverted-U relationship. Q. J. Econ. 120:70128
    [Google Scholar]
  125. 125.  Anandkumar A, Michael N, Tang A 2010. Opportunistic spectrum access with multiple users: learning under competition. 2010 Proceedings IEEE INFOCOM New York: IEEE: https://doi.org/10.1109/INFCOM.2010.5462144
    [Crossref] [Google Scholar]
  126. 126.  Mansour Y, Slivkins A, Wu ZS 2018. Competing bandits: learning under competition. Proceedings of the 9th Innovations in Theoretical Computer Science Conference AR Karlin48 Saarbrücken, Ger.: Dagstuhl
    [Google Scholar]
  127. 127.  Ben-Porat O, Tennenholtz M 2018. Competing prediction algorithms. arXiv:1806.01703 [cs.GT]
  128. 128. Fed. Trade Comm 2017. Algorithms and collusion - note by the United States Doc. DAF/COMP/WD(2017)41, Fed. Trade Comm., Washington, DC
  129. 129.  Mehra SK 2015. Antitrust and the robo-seller: competition in the time of algorithms. Minn. Law Rev. 100:132375
    [Google Scholar]
  130. 130.  Dani V, Hayes TP, Kakade SM 2008. Stochastic linear optimization under bandit feedback. Proceedings of the 21st Annual Conference on Learning Theory35566 Madison, WI: Omnipress
    [Google Scholar]
  131. 131.  Abbasi-Yadkori Y, Pál D, Szepesvári C 2011. Improved algorithms for linear stochastic bandits. Advances in Neural Information Processing Systems 24 J Shawe-Taylor, RS Zemel, PL Bartlett, F Pereira, KQ Weinberger231220 Red Hook, NY: Curran
    [Google Scholar]
  132. 132.  Slivkins A 2011. Multi-armed bandits on implicit metric spaces. Advances in Neural Information Processing Systems 24 J Shawe-Taylor, RS Zemel, PL Bartlett, F Pereira, KQ Weinberger160210 Red Hook, NY: Curran
    [Google Scholar]
  133. 133.  Kleinberg R, Slivkins A, Upfal E 2013. Bandits and experts in metric spaces. arXiv:1312.1277 [cs.DS]
  134. 134.  Slivkins A, Radlinski F, Gollapudi S 2013. Ranked bandits in metric spaces: learning diverse rankings over large document collections. J. Mach. Learn. Res. 14:399436
    [Google Scholar]
  135. 135.  Slivkins A 2014. Contextual bandits with similarity information. J. Mach. Learn. Res. 15:253368
    [Google Scholar]
  136. 136.  Chapelle O, Li L 2011. An empirical evaluation of Thompson sampling. Advances in Neural Information Processing Systems 24 J Shawe-Taylor, RS Zemel, PL Bartlett, F Pereira, KQ Weinberger224957 Red Hook, NY:Curran
    [Google Scholar]
  137. 137.  Agrawal S, Goyal N 2012. Analysis of Thompson sampling for the multi-armed bandit problem. Proceedings of the 25th Annual Conference on Learning Theory S Mannor, N Srebro, RC Williamson39.1–26 Proc. Mach. Learn. Res. 23. N.p.: PMLR
  138. 138.  Srinivas N, Krause A, Kakade SM, Seeger M 2009. Gaussian process optimization in the bandit setting: no regret and experimental design. Proceedings of the 27th International Conference on International Conference on Machine Learning101522 Madison, WI: Omnipress
    [Google Scholar]
  139. 139.  Srinivas N, Krause A, Kakade SM, Seeger MW 2012. Information-theoretic regret bounds for Gaussian process optimization in the bandit setting. IEEE Trans. Inf. Theory 58:325065
    [Google Scholar]
  140. 140.  Langford J, Zhang T 2008. The epoch-greedy algorithm for multi-armed bandits with side information. Advances in Neural Information Processing Systems 20 JC Platt, D Koller, Y Singer, ST Roweis81724 Red Hook, NY: Curran
    [Google Scholar]
  141. 141.  Chu W, Li L, Reyzin L, Schapire R 2011. Contextual bandits with linear payoff functions. Proceedings of the 14th International Conference on Artificial Intelligence and Statistics G Gordon, D Dunson, M Dudík20814 Proc. Mach. Learn. Res. 15. N.p.: PMLR
  142. 142.  Agrawal S, Goyal N 2013. Thompson sampling for contextual bandits with linear payoffs. Proceedings of the 30th International Conference on Machine Learning S Dasgupta, D McAllester12735 Proc. Mach. Learn. Res. 28(3). N.p.: PMLR
  143. 143.  Eghbali R, Fazel M 2016. Designing smoothing functions for improved worst-case competitive ratio in online optimization. Advances in Neural Information Processing Systems 29 DD Lee, M Sugiyama, UV Luxburg, I Guyon, R Garnett328795 Red Hook, NY: Curran
    [Google Scholar]
  144. 144.  Eghbali R, Fazel M, Mesbahi M 2016. Worst case competitive analysis for online conic optimization. 2016 IEEE 55th Conference on Decision and Control194550 New York: IEEE
    [Google Scholar]
  145. 145.  Eghbali R, Saunderson J, Fazel M 2018. Competitive online algorithms for resource allocation over the positive semidefinite cone. Math. Program. 170:26792
    [Google Scholar]
  146. 146.  Mansour Y, Slivkins A, Syrgkanis V 2015. Bayesian incentive-compatible bandit exploration. Proceedings of the 16th ACM Conference on Economics and Computation56582 New York: ACM
    [Google Scholar]
  147. 147.  Kannan S, Kearns M, Morgenstern J, Pai M, Roth A 2017. Fairness incentives for myopic agents. Proceedings of the 2017 ACM Conference on Economics and Computation36986 New York: ACM
    [Google Scholar]
  148. 148.  Ghalme G, Jain S, Gujar S, Narahari Y 2017. Thompson sampling based mechanisms for stochastic multi-armed bandit problems. Proceedings of the 16th Conference on Autonomous Agents and Multiagent Systems8795 Richland, SC: Int. Found. Auton. Agents Multiagent Syst.
    [Google Scholar]
  149. 149.  Kohavi R, Longbotham R, Sommerfield D, Henne RM 2009. Controlled experiments on the web: survey and practical guide. Data Min. Knowl. Discov. 18:14081
    [Google Scholar]
  150. 150.  Swaminathan A, Krishnamurthy A, Agarwal A, Dudík M, Langford J 2017. Off-policy evaluation for slate recommendation. Advances in Neural Information Processing Systems 30 I Guyon, UV Luxburg, S Bengio, H Wallach, R Fergus363545 Red Hook, NY: Curran
    [Google Scholar]
  151. 151.  Strehl AL, Langford J, Li L, Kakade S 2010.Learning from logged implicit exploration data Advances in Neural Information Processing Systems 23 JD Lafferty, CKI Williams, J Shawe-Taylor, RS Zemel, A Culottapp221725 Red Hook, NY: Curran
  152. 152.  Bottou L, Peters J, Candela JQ, Charles DX, Chickering M 2013. Counterfactual reasoning and learning systems: the example of computational advertising. J. Mach. Learn. Res. 14:320760
    [Google Scholar]
  153. 153.  Bareinboim E, Forney A, Pearl J 2015. Bandits with unobserved confounders: a causal approach. Advances in Neural Information Processing Systems 28 C Cortes, ND Lawrence, DD Lee, M Sugiyama, R Garnett134250 Red Hook, NY: Curran
    [Google Scholar]
  154. 154.  Alon N, Cesa-Bianchi N, Dekel O, Koren T 2015. Online learning with feedback graphs: beyond bandits. Proceedings of the 28th Conference on Learning Theory P Grünwald, E Hazan, S Kalepp2335 Proc. Mach. Learn. Res. 40. N.p.: PMLR
  155. 155.  Hu H, Li Z, Vetta AR 2014. Randomized experimental design for causal graph discovery. Advances in Neural Information Processing Systems 27 Z Ghahramani, M Welling, C Cortes, ND Lawrence, KQ Weinberger233947 Red Hook, NY: Curran
    [Google Scholar]
  156. 156.  Lattimore F, Lattimore T, Reid MD 2016. Causal bandits: learning good interventions via causal inference. Advances in Neural Information Processing Systems 29 DD Lee, M Sugiyama, UV Luxburg, I Guyon, R Garnett118189 Red Hook, NY: Curran
    [Google Scholar]
/content/journals/10.1146/annurev-control-053018-023634
Loading
  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error