The Institution for Social and Policy Studies and the Yale School of Management present
Fabrizio Gilardi, Professor of Political Science, University of Zurich:
“Indirect Effects of Content Moderation Errors: A Chatroom Experiment with AI Agents”
Content moderation on social media platforms inevitably entails trade-offs. Tighter rules remove more harmful speech but also silence benign, even socially valuable, expression; softer rules allow more harmful content to remain visible but better protect legitimate speech. While scholars have documented unintended effects on users who are themselves moderated, far less is known about the indirect “third-party” effects that may arise when bystanders witness others being wrongly moderated. We address this question with a preregistered experiment conducted in a bespoke chatroom environment. Participants will each enter a live chatroom inhabited by several AI agents and debate a moderately controversial topic linked to climate change. All participants receive clear discussion guidelines and are then assigned to three treatment groups: over-moderation (false positives), under-moderation (false negatives), and correct moderation (no false positives or negatives). We measure behavioral effects inside the chat (participation volume, range of ideas, semantic distortion) and attitudinal effects in a post-chat survey (psychological safety, procedural fairness, willingness to endorse strict moderation policies). By fully controlling both the conversational environment and the moderation errors, our design isolates causal effects of witnessing over- as well as under-moderation in a realistic setting and without exposing human subjects to harmful content. Findings inform platform governance debates by quantifying the hidden social costs of erroneous moderation enforcement.
Fabrizio Gilardi is a professor of political science at the University of Zurich, where he researches how artificial intelligence and digital technology are transforming politics and democracy. He led the ERC Advanced Grant project “Problem Definition in the Digital Democracy” (PRODIGI, 2021–2025), currently leads the SNF-funded project “Improving the Quality of Online Public Discourse” (2024–2028), and is co-leading “Virtual Arena for Research, Education, and Democratic Innovation (VIRENA)”. His work has been published in leading academic journals such as the Proceedings of the National Academy of Sciences (PNAS), the American Journal of Political Science, the British Journal of Political Science, Political Science Research and Methods, and Political Communication.