一种用于选择任意动作子集的新型学习自动机

A New Learning Automaton for Selecting an Arbitrary Subset of Actions

IEEE Transactions on Systems, Man, and Cybernetics: Systems · 2023

被引 1

ABS 3

Junqi Zhang
Peng Zu
PengZhan Qiu
MengChu Zhou

中文导读

提出一种离散化等追求奖励-无动作算法（DEP_RI-AS），能选择任意目标动作子集（如最优和最差动作），并证明其ε-最优性，仿真和实际应用验证了效果。

Abstract

As a powerful reinforcement learning method, the learning automaton (LA) has been studied, analyzed, and applied to various engineering systems for decades. However, the state-of-the-art LA-based methods can select only the optimal action or optimal subset and cannot select an arbitrary target subset like selecting the best and worst actions or the ones in a given rank range. In order to solve the problem of selecting a given arbitrary subset of actions, this work proposes a novel pursuit learning scheme, called a discretized equal pursuit reward-inaction algorithm for arbitrary subset selection ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">${\text {DEP}_{RI}}$ </tex-math></inline-formula> -AS). The proposed scheme pursues the currently estimated arbitrary action subset and makes the probabilities of selecting each action in the subset equal, so as to increase the convergence speed. The proof of its <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula> -optimality property is presented. Simulation results of comparison experiments, parameter analysis, and a real-world application demonstrate its power in selecting a given subset of user-desired actions.

强化学习学习自动机动作选择算法设计

阅读原文 ↗