This thesis investigates artificial agents learning to make strategic decisions in imperfect-information games. In particular, we introduce a novel approach to reinforcement learning from self-play. We introduce Smooth UCT, which combines the game-theoretic notion of fictitious play with Monte Carlo Tree Search (MCTS). Smooth UCT outperformed a classic MCTS method in several imperfect-information poker games and won three silver medals in the 2014 Annual Computer Poker Competition. We develop Extensive-Form Fictitious Play (XFP) that is entirely implemented in sequential strategies, thus extending this prominent game-theoretic model of learning to sequential games. XFP provides a principled foundation for self-play reinforcement learning in imperfect-information games. We introduce Fictitious Self-Play (FSP), a class of sample-based reinforcement learning algorithms that approximate XFP. We instantiate FSP with neuralnetwork function approximation and deep learning techniques, producing Neural FSP (NFSP). We demonstrate that (approximate) Nash equilibria and their representations (abstractions) can be learned using NFSP end to end, i.e. interfacing with the raw inputs and outputs of the domain. NFSP approached the performance of state-of-the-art, superhuman algorithms in Limit Texas Hold’em - an imperfect-information game at the absolute limit of tractability using massive computational resources. This is the first time that any reinforcement learning algorithm, learning solely from game outcomes without prior domain knowledge, achieved such a feat.


    Zugriff

    Download


    Exportieren, teilen und zitieren



    Titel :

    Reinforcement Learning from Self-Play in Imperfect-Information Games


    Beteiligte:
    Heinrich, J (Autor:in)

    Erscheinungsdatum :

    2017-04-28


    Anmerkungen:

    Doctoral thesis, UCL (University College London).


    Medientyp :

    Hochschulschrift


    Format :

    Elektronische Ressource


    Sprache :

    Englisch


    Klassifikation :

    DDC:    629




    Pursuit Evasion Games with Imperfect Information Revisited

    Or, Barak / Ben-Asher, Joseph / Yaesh, Isaac | British Library Conference Proceedings | 2018


    Multi-Agent Games of Imperfect Information: Algorithms for Strategy Synthesis

    Åkerblom Jonsson, Viktor / Berisha, David | BASE | 2021

    Freier Zugriff


    Learning to design games: Strategic environments in reinforcement learning

    Zhang, H / Wang, J / Zhou, Z et al. | BASE | 2018

    Freier Zugriff