Learning While Experimenting
研究了冒险实验中的信息获取策略:悲观时通过正面信息确认实验价值,乐观时通过负面信息避免浪费,并分析了实验回报对最优策略的影响。
Abstract An agent performing risky experimentation can benefit from suspending it to learn directly about the state. ‘Positive’ information acquisition seeks news that would confirm the state that favours experimentation. It is used as a last-ditch effort when the agent is pessimistic about the risky arm before abandoning it. ‘Negative’ information acquisition seeks news that would demonstrate that experimentation is futile. It is used as an insurance strategy to avoid wasteful experimentation when the agent is still optimistic. A higher reward from risky experimentation expands the region of beliefs that the agent optimally chooses information acquisition rather than experimentation.