Computer Science > Artificial Intelligence

arXiv:2404.13038 (cs)

[Submitted on 19 Apr 2024]

Title:Mapping Social Choice Theory to RLHF

Abstract:Recent work on the limitations of using reinforcement learning from human feedback (RLHF) to incorporate human preferences into model behavior often raises social choice theory as a reference point. Social choice theory's analysis of settings such as voting mechanisms provides technical infrastructure that can inform how to aggregate human preferences amid disagreement. We analyze the problem settings of social choice and RLHF, identify key differences between them, and discuss how these differences may affect the RLHF interpretation of well-known technical results in social choice.

Subjects:	Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Cite as:	arXiv:2404.13038 [cs.AI]
	(or arXiv:2404.13038v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2404.13038

Submission history

From: Eve Fleisig [view email]
[v1] Fri, 19 Apr 2024 17:49:56 UTC (62 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.AI

< prev | next >

new | recent | 2024-04

Change to browse by:

cs
cs.CY

References & Citations

export BibTeX citation

Computer Science > Artificial Intelligence

Title:Mapping Social Choice Theory to RLHF

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Mapping Social Choice Theory to RLHF

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators