混合的价值：使用深度强化学习管理陈化库存

The Value of Blending—Managing Ameliorating Inventory Using Deep Reinforcement Learning

Production and Operations Management · 2025

被引 0

人大 AFT50UTD24ABS 4

Alexander Pahr · 慕尼黑工业大学通讯
Martin Grunow · 慕尼黑工业大学

中文导读

研究了威士忌、奶酪等陈化食品的库存管理问题，使用深度强化学习算法优化采购和混合决策，发现混合能显著提升利润，尤其平均年龄标签规则比最低年龄规则利润高8.7%。

Abstract

Stocks of some food products, such as whiskey, cheese, or port wine, ameliorate during storage, facilitating product differentiation according to age. This induces a trade-off between immediate revenues and further maturation. Inventory management decisions include purchasing volumes of agricultural produce and production volumes for age-differentiated products. Because products can be blended from stocks of different ages, issuance decisions offer operational flexibility. However, whereas some industries (port wine, sherry) only request that the product labels refer to the average age of issued stocks, others (whiskey, rum) have stricter blending regulations, requiring that the product labels represent the minimum age of all components. Further, producers must deal with multiple uncertainties. Purchase prices of agricultural commodities depend on volatile climate-dependent harvest seasons, stocks decay during maturation, and sales market conditions fluctuate. We solve this inventory management problem using a deep reinforcement learning algorithm with three key innovations: (i) A novel actor pipeline that decomposes the action space and flexibly partitions decision dimensions between a neural network and a lookahead optimization model, (ii) an algorithm explicitly maximizing average rewards, and (iii) reward-handling techniques that exploit structural problem insights. Our approach yields near-optimal policies that consistently outperform benchmark heuristics. Beyond the algorithmic contributions, our results offer new managerial insights into the value of blending under uncertainty. Minimum-age blending substantially enhances the profits of firms as compared to no blending because companies can adjust their purchasing policy in response to price fluctuations. The more flexible average-age regime further improves profits by <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline" overflow="scroll"> <mml:mn>8.7</mml:mn> <mml:mi mathvariant="normal">%</mml:mi> </mml:math> on average, suggesting that whiskey and rum regulators may wish to reconsider their strict blending rules. We mine black-box policies from deep reinforcement learning using supervised machine learning and Shapley values to analyze near-optimal decision drivers. Exploiting the value of blending requires producers to install sufficient processing capacity, especially when dealing with large variations in harvest seasons. Additionally, blending entails increased planning complexity because the inventory management decisions are driven by a large number of factors.

库存管理深度强化学习食品供应链收益管理

阅读原文 ↗