The invention discloses an off-line reinforcement learning method and system for spatial fine operation, and the method comprises the following steps: 1, collecting off-line multi-task interaction data, and segmenting the off-line multi-task interaction data; 2, based on the segmented offline multi-task interaction data, performing offline multi-task actor-commentator optimization to obtain a global strategy network; and step 3, taking the global strategy network as a controller, and transplanting to a real physical environment. According to the method, one-time off-line collection of interaction data of spatial fine operation is realized, multiple tasks are repeatedly utilized, and the sample collection and utilization efficiency is improved.

    本发明公开了一种空间精细操作的离线强化学习方法及系统,其中,该方法包括如下步骤:步骤1:采集离线多任务交互数据,并对离线多任务交互数据进行分割;步骤2:基于分割后的离线多任务交互数据,进行离线多任务演员‑评论家优化得到全局策略网络;步骤3:将全局策略网络作为控制器,移植到真实物理环境。本发明实现空间精细操作的交互数据一次离线采集、多种任务多次重复利用,提升样本采集与样本利用效率。


    Access

    Download


    Export, share and cite



    Title :

    Off-line reinforcement learning method and system for spatial fine operation


    Additional title:

    一种空间精细操作的离线强化学习方法及系统


    Contributors:
    XIE YONGCHUN (author) / LI LINFENG (author) / WANG YONG (author) / CHEN AO (author)

    Publication date :

    2022-07-29


    Type of media :

    Patent


    Type of material :

    Electronic Resource


    Language :

    Chinese


    Classification :

    IPC:    G06N COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS , Rechnersysteme, basierend auf spezifischen Rechenmodellen / B64G Raumfahrt , COSMONAUTICS / G06K Erkennen von Daten , RECOGNITION OF DATA



    Visual touch fusion fine operation method based on reinforcement learning

    SUN JUN / WU HAILEI / SUN YUE et al. | European Patent Office | 2020

    Free access

    Reinforcement and model learning for vehicle operation

    WRAY KYLE HOLLINS / WITWICKI STEFAN / ZILBERSTEIN SHLOMO | European Patent Office | 2021

    Free access

    Reinforcement and Model Learning for Vehicle Operation

    WRAY KYLE HOLLINS / WITWICKI STEFAN / ZILBERSTEIN SHLOMO | European Patent Office | 2020

    Free access

    Train operation scheduling optimization method based on deep reinforcement learning

    LI LIJUAN / YANG XUE / WANG HUAN et al. | European Patent Office | 2023

    Free access

    Controlling underestimation bias in reinforcement learning via minmax operation

    HUANG, Fanghui / HE, Yixin / ZHANG, Yu et al. | Elsevier | 2024

    Free access