Acemoglu, Daron, Ali Makhdoumi, Azarakhsh Malekian, Asu Ozdaglar. 2022. Too much data: Prices and inefficiencies in data markets. American Economic Journal: Microeconomics 14(4) 218â256.
Adida, Elodie, Fernanda Bravo. 2019. Contracts for healthcare referral services: Coordination via outcome-based penalty contracts. Management Science 65(3) 1322â1341.
- Agarwal, Anish, Munther Dahleh, Tuhin Sarkar. 2019. A marketplace for data: An algorithmic solution.
Paper not yet in RePEc: Add citation now
- Alon, Tal, Paul Dütting, Yingkai Li, Inbal Talgam-Cohen. 2022. Bayesian analysis of linear contracts.
Paper not yet in RePEc: Add citation now
- Ananthakrishnan, Nivasini, Nika Haghtalab, Chara Podimata, Kunhe Yang. 2024b. Is knowledge power? on the (im) possibility of learning from strategic interactions. The Thirty-eighth Annual Conference on Neural Information Processing Systems.
Paper not yet in RePEc: Add citation now
- Ananthakrishnan, Nivasini, Stephen Bates, Michael Jordan, Nika Haghtalab. 2024a. Delegating data collection in decentralized machine learning. International Conference on Artificial Intelligence and Statistics. PMLR, 478â486.
Paper not yet in RePEc: Add citation now
- Artstein, Ron, Massimo Poesio. 2008. Inter-coder agreement for computational linguistics. Computational linguistics 34(4) 555â596.
Paper not yet in RePEc: Add citation now
- Askell, Amanda, Yuntao Bai, Anna Chen, Dawn Drain, Deep Ganguli, Tom Henighan, Andy Jones, Nicholas Joseph, Ben Mann, Nova DasSarma, et al. 2021. A general language assistant as a laboratory for alignment. arXiv preprint arXiv:2112.00861 .
Paper not yet in RePEc: Add citation now
- Authorâs sentiment prediction. arXiv preprint arXiv:2011.06128 .
Paper not yet in RePEc: Add citation now
- Bacon, David F, Yiling Chen, Ian Kash, David C Parkes, Malvika Rao, Manu Sridharan. 2012. Predicting your own effort. Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 2 (AAMAS). 695â702.
Paper not yet in RePEc: Add citation now
- Bareket, Dan, Reut Tsarfaty. 2021. Neural modeling for named entities and morphology (nemo^2). Transactions of the Association for Computational Linguistics 9 909â928.
Paper not yet in RePEc: Add citation now
Barron, Daniel, George Georgiadis, Jeroen Swinkels. 2020. Optimal contracts with a risk-taking agent. Theoretical Economics 15(2) 715â761.
- Bastan, Mohaddeseh, Mahnaz Koupaee, Youngseo Son, Richard Sicoli, Niranjan Balasubramanian. 2020.
Paper not yet in RePEc: Add citation now
Bergemann, Dirk, Alessandro Bonatti. 2019. Markets for information: An introduction. Annual Review of Economics 11(1) 85â107.
- Boyd, Stephen. 2004. Convex optimization. Cambridge UP .
Paper not yet in RePEc: Add citation now
- Bradley, Ralph Allan, Milton E Terry. 1952. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika 39(3/4) 324â345.
Paper not yet in RePEc: Add citation now
- Bretagnolle, Jean, Catherine Huber. 1978. Estimation des densités: risque minimax. Séminaire de probabilités de Strasbourg 12 342â363.
Paper not yet in RePEc: Add citation now
- Cai, Yang, Constantinos Daskalakis, Christos Papadimitriou. 2015. Optimum statistical estimation with strategic data sources. Conference on Learning Theory. PMLR, 280â296.
Paper not yet in RePEc: Add citation now
- Callison-Burch, Chris, Mark Dredze. 2010. Creating speech and language data with amazonâs mechanical turk. Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazonâs Mechanical Turk. 1â12.
Paper not yet in RePEc: Add citation now
Carroll, Gabriel. 2015. Robustness and linear contracts. American Economic Review 105(2) 536â563.
- Chen, Junjie, Minming Li, Haifeng Xu. 2022. Selling data to a machine learner: Pricing via costly signaling. International Conference on Machine Learning. PMLR, 3336â3359.
Paper not yet in RePEc: Add citation now
- Chowdhury, Sayak Ray, Anush Kini, Nagarajan Natarajan. 2024. Provably robust dpo: Aligning language models with noisy feedback. arXiv preprint arXiv:2403.00409 .
Paper not yet in RePEc: Add citation now
- Collina, Natalie, Varun Gupta, Aaron Roth. 2024. Repeated contracting with multiple non-myopic agents: Policy regret and limited liability. Proceedings of the 25th ACM Conference on Economics and Computation. EC â24, Association for Computing Machinery, New York, NY, USA, 640â668.
Paper not yet in RePEc: Add citation now
- Corbett, Charles J, Christopher S Tang. 1999. Designing supply contracts: Contract type and information asymmetry. Quantitative models for supply chain management 269â297.
Paper not yet in RePEc: Add citation now
Corbett, Charles J, Gregory A DeCroix, Albert Y Ha. 2005. Optimal shared-savings contracts in supply chains: Linear contracts and double moral hazard. European journal of operational research 163(3) 653â667.
- Cui, Ganqu, Lifan Yuan, Ning Ding, Guanming Yao, Wei Zhu, Yuan Ni, Guotong Xie, Zhiyuan Liu, Maosong Sun. 2023. Ultrafeedback: Boosting language models with high-quality feedback. arXiv preprint arXiv:2310.01377 .
Paper not yet in RePEc: Add citation now
- Dütting, Paul, Michal Feldman, Inbal Talgam-Cohen, et al. 2024. Algorithmic contract theory: A survey. Foundations and Trends in Theoretical Computer Science 16(3-4) 211â412.
Paper not yet in RePEc: Add citation now
- Dütting, Paul, Tim Roughgarden, Inbal Talgam-Cohen. 2019. Simple versus optimal contracts. Proceedings of the 2019 ACM Conference on Economics and Computation. 369â387.
Paper not yet in RePEc: Add citation now
- Dai, Josef, Xuehai Pan, Ruiyang Sun, Jiaming Ji, Xinbo Xu, Mickel Liu, Yizhou Wang, Yaodong Yang. 2024. Safe RLHF: Safe reinforcement learning from human feedback. The Twelfth International Conference on Learning Representations. URL https://openreview.net/forum?id=TyFrPOKYXw.
Paper not yet in RePEc: Add citation now
- Dasgupta, Anirban, Arpita Ghosh. 2013. Crowdsourced judgement elicitation with endogenous proficiency. Proceedings of the 22nd international conference on World Wide Web. 319â330.
Paper not yet in RePEc: Add citation now
de Zegher, Joann F, Dan A Iancu, Hau L Lee. 2019. Designing contracts and sourcing channels to create shared value. Manufacturing & Service Operations Management 21(2) 271â289.
- Duetting, Paul, Vahab Mirrokni, Renato Paes Leme, Haifeng Xu, Song Zuo. 2024. Mechanism design for large language models. Proceedings of the ACM on Web Conference 2024. 144â155.
Paper not yet in RePEc: Add citation now
- Dutting, Paul, Tim Roughgarden, Inbal Talgam-Cohen. 2021. The complexity of contracts. SIAM Journal on Computing 50(1) 211â254.
Paper not yet in RePEc: Add citation now
- Frick, Mira, Ryota Iijima, Yuhta Ishii. 2023. Monitoring with rich data. arXiv preprint arXiv:2312.16789 .
Paper not yet in RePEc: Add citation now
- Gao, Yang, Dana Alon, Donald Metzler. 2024. Impact of preference noise on the alignment performance of generative language models. arXiv preprint arXiv:2404.09824 .
Paper not yet in RePEc: Add citation now
- Georgiadis, George, Balazs Szentes. 2020. Optimal monitoring design. Econometrica 88(5) 2075â2107.
Paper not yet in RePEc: Add citation now
- Ghosal, Deepanway, Siqi Shen, Navonil Majumder, Rada Mihalcea, Soujanya Poria. 2022. Cicero: A dataset for contextualized commonsense inference in dialogues. arXiv preprint arXiv:2203.13926 .
Paper not yet in RePEc: Add citation now
- Goldwasser, Shafi, Guy N Rothblum, Jonathan Shafer, Amir Yehudayoff. 2021. Interactive proofs for verifying machine learning. 12th Innovations in Theoretical Computer Science Conference (ITCS 2021). Schloss-Dagstuhl-Leibniz Zentrum für Informatik.
Paper not yet in RePEc: Add citation now
- Grossman, Sanford J, Oliver D Hart. 1992. An analysis of the principal-agent problem. Foundations of Insurance Economics: Readings in Economics and Finance. Springer, 302â340.
Paper not yet in RePEc: Add citation now
- Guo, Chuan, Geoff Pleiss, Yu Sun, Kilian Q Weinberger. 2017. On calibration of modern neural networks.
Paper not yet in RePEc: Add citation now
- Hao, Shugang, Lingjie Duan. 2024. Online learning from strategic human feedback in llm fine-tuning.
Paper not yet in RePEc: Add citation now
- Harris, Keegan, Nicole Immorlica, Brendan Lucier, Aleksandrs Slivkins. 2023. Algorithmic persuasion through simulation: Information design in the age of generative ai. arXiv preprint arXiv:2311.18138 .
Paper not yet in RePEc: Add citation now
Harris, Milton, Artur Raviv. 1979. Optimal incentive contracts with imperfect information. Journal of economic theory 20(2) 231â259.
Herweg, Fabian, Daniel Müller, Philipp Weinschenk. 2010. Binary payment schemes: Moral hazard and loss aversion. American Economic Review 100(5) 2451â2477.
- Ho, Chien-Ju, Aleksandrs Slivkins, Jennifer Wortman Vaughan. 2014. Adaptive contract design for crowdsourcing markets: Bandit algorithms for repeated principal-agent problems. Proceedings of the fifteenth ACM conference on Economics and computation. 359â376.
Paper not yet in RePEc: Add citation now
- Holmström, Bengt. 1979. Moral hazard and observability. The Bell journal of economics 74â91.
Paper not yet in RePEc: Add citation now
Holmstrom, Bengt, Paul Milgrom. 1987. Aggregation and linearity in the provision of intertemporal incentives. Econometrica: Journal of the Econometric Society 303â328.
- Ivanov, Dima, Paul Dütting, Inbal Talgam-Cohen, Tonghan Wang, David C Parkes. 2024. Principalagent reinforcement learning: Orchestrating ai agents with contracts. arXiv preprint arXiv:2407.18074 .
Paper not yet in RePEc: Add citation now
Jain, Nitish, Sameer Hasija, Dana G Popescu. 2013. Optimal contracts for outsourcing of repair and restoration services. Operations Research 61(6) 1295â1311.
- Jewitt, Ian. 2006. Information order in decision and agency problems .
Paper not yet in RePEc: Add citation now
- Ji, Jiaming, Donghai Hong, Borong Zhang, Boyuan Chen, Josef Dai, Boren Zheng, Tianyi Qiu, Boxun Li, Yaodong Yang. 2024. Pku-saferlhf: Towards multi-level safety alignment for llms with human preference. arXiv preprint arXiv:2406.15513 .
Paper not yet in RePEc: Add citation now
- Karlin, Samuel, Herman Rubin. 1956. The theory of decision procedures for distributions with monotone likelihood ratio. The Annals of Mathematical Statistics 272â299.
Paper not yet in RePEc: Add citation now
- Kaufmann, Timo, Paul Weng, Viktor Bengs, Eyke Hüllermeier. 2023. A survey of reinforcement learning from human feedback. arXiv preprint arXiv:2312.14925 .
Paper not yet in RePEc: Add citation now
- Kim, Son Ku. 1995. Efficiency of an information system in an agency model. Econometrica: Journal of the Econometric Society 89â102.
Paper not yet in RePEc: Add citation now
- Klie, Jan-Christoph, Juan Haladjian, Marc Kirchner, Rahul Nair. 2024b. On efficient and statistical quality estimation for data annotation. arXiv preprint arXiv:2405.11919 .
Paper not yet in RePEc: Add citation now
- Klie, Jan-Christoph, Richard Eckart de Castilho, Iryna Gurevych. 2024a. Analyzing dataset annotation quality management in the wild. Computational Linguistics 50(3) 817â866.
Paper not yet in RePEc: Add citation now
- Krippendorff, Klaus, et al. 1989. Content analysis. International encyclopedia of communication 1(1) 403â407.
Paper not yet in RePEc: Add citation now
- Krippendorff, Klaus. 2004. Reliability in content analysis: Some common misconceptions and recommendations. Human communication research 30(3) 411â433.
Paper not yet in RePEc: Add citation now
- Laffont, Jean-Jacques, David Martimort. 2009. The theory of incentives: the principal-agent model. The theory of incentives. Princeton university press.
Paper not yet in RePEc: Add citation now
- Lazear, Edward P, Paul Oyer. 2007. Personnel economics. Working Paper 13480, National Bureau of Economic Research. doi:10.3386/w13480. URL http://www.nber.org/papers/w13480.
Paper not yet in RePEc: Add citation now
- Le Cam, Lucien. 2012. Asymptotic methods in statistical decision theory. Springer Science & Business Media.
Paper not yet in RePEc: Add citation now
- Lemma A.2 (Le Camâs Lemma (Le Cam, 2012)). For any two distributions Q and P over the space (â¦, F), and denote Ψ as a measurable function from ⦠to {0, 1}. Then inf Ψ Q(Ψ(Ï) = 0) + P(Ψ(Ï) = 1) = 1 â TV(Q, P). Furthermore, such an infimum is met with the following function Ψâ (s) := 1 ndQ dP (s) ⥠1 o .
Paper not yet in RePEc: Add citation now
- Liang, Xize, Chao Chen, Jie Wang, Yue Wu, Zhihang Fu, Zhihao Shi, Feng Wu, Jieping Ye. 2024. Robust preference optimization with provable noise tolerance for llms. arXiv preprint arXiv:2404.04102 .
Paper not yet in RePEc: Add citation now
- Liao, JG, Arthur Berg. 2019. Sharpening jensenâs inequality. The American Statistician .
Paper not yet in RePEc: Add citation now
- Liu, Chris Yuhao, Liang Zeng, Jiacai Liu, Rui Yan, Jujie He, Chaojie Wang, Shuicheng Yan, Yang Liu, Yahui Zhou. 2024a. Skywork-reward: Bag of tricks for reward modeling in llms. arXiv preprint arXiv:2410.18451 .
Paper not yet in RePEc: Add citation now
- Liu, Jinsong, Dongdong Ge, Ruihao Zhu. 2024b. Reward learning from preference with ties. arXiv preprint arXiv:2410.05328 .
Paper not yet in RePEc: Add citation now
Lopomo, Giuseppe, Luca Rigotti, Chris Shannon. 2011. Knightian uncertainty and moral hazard. Journal of Economic Theory 146(3) 1148â1172.
Miller, Nolan, Paul Resnick, Richard Zeckhauser. 2005. Eliciting informative feedback: The peerprediction method. Management Science 51(9) 1359â1373.
- Monarch, Robert Munro. 2021. Human-in-the-Loop Machine Learning: Active learning and annotation for human-centered AI . Simon and Schuster.
Paper not yet in RePEc: Add citation now
Moscarini, Giuseppe, Lones Smith. 2002. The law of large demand for information. Econometrica 70(6) 2351â2366.
- Northcutt, Curtis, Lu Jiang, Isaac Chuang. 2021. Confident learning: Estimating uncertainty in dataset labels. Journal of Artificial Intelligence Research 70 1373â1411.
Paper not yet in RePEc: Add citation now
- Ouyang, Long, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback. Advances in neural information processing systems 35 27730â27744.
Paper not yet in RePEc: Add citation now
- Polyanskiy, Yury, Yihong Wu. 2025. Information Theory: From Coding to Learning. Cambridge University Press.
Paper not yet in RePEc: Add citation now
- Proceedings of the 2019 ACM Conference on Economics and Computation. 701â726.
Paper not yet in RePEc: Add citation now
- Pustejovsky, James, Amber Stubbs. 2012. Natural Language Annotation for Machine Learning: A guide to corpus-building for applications. " OâReilly Media, Inc.".
Paper not yet in RePEc: Add citation now
- Qian, Kun, Ahmad Beirami, Zhouhan Lin, Ankita De, Alborz Geramifard, Zhou Yu, Chinnadhurai Sankar. 2021. Annotation inconsistency and entity bias in multiwoz. arXiv preprint arXiv:2105.14150 .
Paper not yet in RePEc: Add citation now
- Rafailov, Rafael, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, Chelsea Finn. 2024. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems 36.
Paper not yet in RePEc: Add citation now
- Saig, Eden, Inbal Talgam-Cohen, Nir Rosenfeld. 2024b. Delegated classification. Advances in Neural Information Processing Systems 36.
Paper not yet in RePEc: Add citation now
- Saig, Eden, Ohad Einav, Inbal Talgam-Cohen. 2024a. Incentivizing quality text generation via statistical contracts. The Thirty-eighth Annual Conference on Neural Information Processing Systems. URL https://openreview.net/forum?id=wZgw4CrxwK.
Paper not yet in RePEc: Add citation now
- Silva Filho, Telmo, Hao Song, Miquel Perello-Nieto, Raul Santos-Rodriguez, Meelis Kull, Peter Flach. 2023. Classifier calibration: a survey on how to assess and improve predicted class probabilities. Machine Learning 112(9) 3211â3260.
Paper not yet in RePEc: Add citation now
Singh, Nirvikar. 1985. Monitoring and hierarchies: The marginal value of information in a principal-agent model. Journal of Political Economy 93(3) 599â609.
- Sun, Hao, Yunyi Shen, Jean-Francois Ton. 2024a. Rethinking bradley-terry models in preference-based reward modeling: Foundations, theory, and alternatives. arXiv preprint arXiv:2411.04991 .
Paper not yet in RePEc: Add citation now
- Sun, Haoran, Yurong Chen, Siwei Wang, Wei Chen, Xiaotie Deng. 2024b. Mechanism design for llm fine-tuning with multiple reward models. arXiv preprint arXiv:2405.16276 .
Paper not yet in RePEc: Add citation now
- This implies that the test based on AÌ is uniformly most powerful by the KarlinâRubin theorem (Karlin and Rubin, 1956). A.1.4 Proof of Proposition 3.3 Proof. The first two inequalities are the direct consequence of the following Le Camâs two-point method and Bretagnolle-Huber inequality (Lemma A.3) by taking â to be the L1 distance. The proof of Le Camâs two-point method is standard and can be found in textbooks.
Paper not yet in RePEc: Add citation now
- Touvron, Hugo, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 .
Paper not yet in RePEc: Add citation now
- Tseng, Yu-Min, Yu-Chao Huang, Teng-Yun Hsiao, Yu-Ching Hsu, Jia-Yin Foo, Chao-Wei Huang, YunNung Chen. 2024. Two tales of persona in llms: A survey of role-playing and personalization. arXiv preprint arXiv:2406.01171 .
Paper not yet in RePEc: Add citation now
Walton, Daniel, Gabriel Carroll. 2022. A general framework for robust contracting models. Econometrica 90(5) 2129â2159.
- Wang, Binghai, Rui Zheng, Lu Chen, Yan Liu, Shihan Dou, Caishuang Huang, Wei Shen, Senjie Jin, Enyu Zhou, Chenyu Shi, et al. 2024. Secrets of rlhf in large language models part ii: Reward modeling.
Paper not yet in RePEc: Add citation now
- Wang, Zhilin, Yi Dong, Jiaqi Zeng, Virginia Adams, Makesh Narsimhan Sreedhar, Daniel Egert, Olivier Delalleau, Jane Polak Scowcroft, Neel Kant, Aidan Swope, et al. 2023. Helpsteer: Multi-attribute helpfulness dataset for steerlm. arXiv preprint arXiv:2311.09528 .
Paper not yet in RePEc: Add citation now
- Zadrozny, Bianca, Charles Elkan. 2001. Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. Icml, vol. 1. 609â616.
Paper not yet in RePEc: Add citation now
- Zhao, Yao, Rishabh Joshi, Tianqi Liu, Misha Khalman, Mohammad Saleh, Peter J Liu. 2023. Slic-hf: Sequence likelihood calibration with human feedback. arXiv preprint arXiv:2305.10425 .
Paper not yet in RePEc: Add citation now