🌙

缓解大语言模型中的年龄相关偏见:负责任人工智能开发策略

Mitigating Age-Related Bias in Large Language Models: Strategies for Responsible Artificial Intelligence Development

INFORMS journal on computing · 2025
被引 6 · 同刊同年前 1%
人大 BUTD24ABS 3

中文导读

提出一种两阶段偏见缓解方法,利用大语言模型的共情能力、强化学习和人在回路机制,在不修改模型参数的情况下识别并纠正年龄偏见,实验表明所训练的FairLLM模型显著降低年龄偏见。

Abstract

The increasing popularity of large language models (LLMs) in digital platforms elevates the urgency to address inherent biases, particularly age-related biases, which can significantly skew the model’s fairness and performance. This paper introduces a novel two-stage bias mitigation approach utilizing LLM’s empathy ability, reinforcement learning, and human-in-the-loop mechanisms to identify and correct age-related biases without altering model parameters. There are two modes for our bias mitigation strategy. Self-bias mitigation in the loop allows LLMs to self-assess and adjust their outputs autonomously, promoting inherent bias awareness and correction. Alternatively, cooperative bias mitigation in the loop leverages collaborative filtering among multiple LLMs to debate and mitigate biases through consensus. Furthermore, we introduce the empathetic perspective exchange strategy, which can further refine the answers by changing the perspective in the context information given to the LLM. In this way, more suitable responses applicable to different ages are generated. Our comprehensive evaluation across several data sets demonstrates that our trained model, FairLLM, significantly reduces age bias, outperforming existing techniques in fairness metrics. These findings underscore the effectiveness of our proposed framework in fostering the development of more equitable artificial intelligence systems, potentially benefiting a broader demographic spectrum by reducing digital ageism. History: This paper has been accepted by Kaushik Dutta for the Special Issue on Responsible AI and Data Science for Social Good. Funding: This work was supported by the National Natural Science Foundation of China [Grants 71971046, 72172029, 72403033, 72272028, and 72442025]. Supplemental Material: The software that supports the findings of this study is available within the paper and its Supplemental Information ( https://pubsonline.informs.org/doi/suppl/10.1287/ijoc.2024.0645 ) as well as from the IJOC GitHub software repository ( https://github.com/INFORMSJoC/2024.0645 ). The complete IJOC Software and Data Repository is available at https://informsjoc.github.io/ .

计算机科学人工智能机器学习公平性偏见缓解