The invention discloses an off-line reinforcement learning method and system for spatial fine operation, and the method comprises the following steps: 1, collecting off-line multi-task interaction data, and segmenting the off-line multi-task interaction data; 2, based on the segmented offline multi-task interaction data, performing offline multi-task actor-commentator optimization to obtain a global strategy network; and step 3, taking the global strategy network as a controller, and transplanting to a real physical environment. According to the method, one-time off-line collection of interaction data of spatial fine operation is realized, multiple tasks are repeatedly utilized, and the sample collection and utilization efficiency is improved.
本发明公开了一种空间精细操作的离线强化学习方法及系统,其中,该方法包括如下步骤:步骤1:采集离线多任务交互数据,并对离线多任务交互数据进行分割;步骤2:基于分割后的离线多任务交互数据,进行离线多任务演员‑评论家优化得到全局策略网络;步骤3:将全局策略网络作为控制器,移植到真实物理环境。本发明实现空间精细操作的交互数据一次离线采集、多种任务多次重复利用,提升样本采集与样本利用效率。
Off-line reinforcement learning method and system for spatial fine operation
一种空间精细操作的离线强化学习方法及系统
2022-07-29
Patent
Electronic Resource
Chinese
Visual touch fusion fine operation method based on reinforcement learning
European Patent Office | 2020
|Train operation scheduling optimization method based on deep reinforcement learning
European Patent Office | 2023
|Controlling underestimation bias in reinforcement learning via minmax operation
Elsevier | 2024
|