通过大语言模型驱动的数据增强和集成学习从用户生成内容中挖掘新产品需求

Mining novel customer needs for product design from user-generated content through large language model-enabled data augmentation and ensemble learning

International Journal of Production Research · 2026

被引 1 · 同刊同年前 7%

ABS 3

Shaoqin Huang
Hu Qin
Y. Wang 通讯

中文导读

针对在线评论中新产品需求稀少且噪声多的问题，提出用LLaMA生成样本增强数据，再训练多个BERT分类器集成投票，在亚马逊笔记本评论数据上提升了召回率和F1值，帮助产品开发团队不遗漏有价值的用户需求。

Abstract

Detecting novel customer needs from user-generated content such as online product reviews is essential for supporting product innovation and improving customer satisfaction. However, challenges arise due to severe class imbalance, where only a small portion of the content reflects truly novel needs, and the presence of noisy or irrelevant information. In response, this paper presents a new framework that combines generative artificial intelligence with ensemble learning. We use LLaMA, a powerful open-source large language model, to generate diverse meaningful positive samples that reduce data sparsity. With this augmented dataset, we train multiple BERT-based classifiers, each on different subsets of the data, and integrate their predictions using a hard voting strategy to improve accuracy and robustness. Experiments on a dataset of Amazon laptop reviews show that the proposed method achieves superior performance in recall and F1 score compared with standard baselines. This performance capability helps ensure that valuable insights are not missed. We offer a practical approach for applying generative models to support product development, laying the foundation for future studies in artificial intelligence for business innovation.

产品设计用户需求挖掘集成学习大语言模型产品创新

阅读原文 ↗