Whistleblowers can contain the unethical externalities of human-AI delegation

Abstract

Prior work using controlled principal-agent experiments suggests two risks of delegating tasks to AI systems. Human principals are more likely to request profit-maximizing misconduct from AI agents than from human agents, and AI agents are more likely to comply. Here we test whether third-party observers can contain the resulting harm. In an incentivized die-reporting paradigm, principals instructed either a human or an AI agent how strongly to prioritize profit over accuracy, creating potential financial harm to a charity. We first confirm, with human principals and three large language models as AI agents, that delegation to AI produces larger negative externalities than delegation to humans. We then study observers who could pay a personal cost to flag a principal instruction, cancelling the principal gain in favor of the charity, as a laboratory analogue of whistleblowing. In this observer study, the probability of flagging increased with how unethical the principal request was, but did not depend on whether the request was directed to a human or an AI agent. Because principals made more unethical requests under AI delegation, flagging was more frequent under AI delegation. When combined with agent behavior, this increase in flagging fully neutralized the negative externalities of AI delegation in our experimental setting. These findings support institutional protections for whistleblowers as one potential organizational safeguard against the harms of human-AI delegation.

Publication
PNAS
JF Bonnefon
JF Bonnefon
Research Psychologist

Related