Task decomposition is effective in manifold applications where the global complexity of a problem makes planning and decision-making too demanding. This is true, for example, in high-dimensional robotics domains, where (1) unpredictabilities and modeling limitations typically prevent the manual specification of robust behaviors, and (2) learning an action policy is challenging due to the curse of dimensionality. In this work, we borrow the concept of Hierarchical Task Networks (HTNs) to decompose the learning procedure, and we exploit Upper Confidence Tree (UCT) search to introduce HOP, a novel iterative algorithm for hierarchical optimistic planning with learned value functions. To obtain better generalization and generate policies, HOP simultaneously learns and uses action values. These are used to formalize constraints within the search space and to reduce the dimensionality of the problem. We evaluate our algorithm both on a fetching task using a simulated 7-DOF KUKA light weight arm and, on a pick and delivery task with a Pioneer robot.


    Zugriff

    Download


    Exportieren, teilen und zitieren



    Titel :

    Hi-Val: Iterative Learning of Hierarchical Value Functions for Policy Generation



    Erscheinungsdatum :

    2019-01-01



    Medientyp :

    Aufsatz (Konferenz)


    Format :

    Elektronische Ressource


    Sprache :

    Englisch



    Klassifikation :

    DDC:    629



    Chaining Value Functions for Off-Policy Learning

    Schmitt, Simon / Shawe-Taylor, John / Van Hasselt, Hado | BASE | 2022

    Freier Zugriff

    AUTONOMOUS BEHAVIOR GENERATION WITH HIERARCHICAL REINFORCEMENT LEARNING

    SOLEYMAN SEAN / KHOSLA DEEPAK | Europäisches Patentamt | 2021

    Freier Zugriff

    Machine learning based Synthetic Data Generation using Iterative Regression Analysis

    Shah, Sanskar / Gandhi, Darshan / Kothari, Jil | IEEE | 2020



    Estimation-based iterative learning control

    Wallén, Johanna | BASE | 2011

    Freier Zugriff