Google Scholar

Mapping social choice theory to RLHF

J Dai, E Fleisig - arXiv preprint arXiv:2404.13038, 2024 - arxiv.org

arXiv preprint arXiv:2404.13038, 2024•arxiv.org

Recent work on the limitations of using reinforcement learning from human feedback (RLHF)
to incorporate human preferences into model behavior often raises social choice theory as a
reference point. Social choice theory's analysis of settings such as voting mechanisms
provides technical infrastructure that can inform how to aggregate human preferences amid
disagreement. We analyze the problem settings of social choice and RLHF, identify key
differences between them, and discuss how these differences may affect the RLHF …

Recent work on the limitations of using reinforcement learning from human feedback (RLHF) to incorporate human preferences into model behavior often raises social choice theory as a reference point. Social choice theory's analysis of settings such as voting mechanisms provides technical infrastructure that can inform how to aggregate human preferences amid disagreement. We analyze the problem settings of social choice and RLHF, identify key differences between them, and discuss how these differences may affect the RLHF interpretation of well-known technical results in social choice.

arxiv.org

Show moreShow less

Save Cite Cited by 8 Related articles All 3 versions View as HTML

Cite

Advanced search

Saved to My library

Mapping social choice theory to RLHF