Reinforcement Learning from Self-Play in Imperfect-Information Games

Sie sind hier: Homepage > Suche

Reinforcement Learning from Self-Play in Imperfect-Information Games

Freier Zugriff

Heinrich, J

This thesis investigates artificial agents learning to make strategic decisions in imperfect-information games. In particular, we introduce a novel approach to reinforcement learning from self-play. We introduce Smooth UCT, which combines the game-theoretic notion of fictitious play with Monte Carlo Tree Search (MCTS). Smooth UCT outperformed a classic MCTS method in several imperfect-information poker games and won three silver medals in the 2014 Annual Computer Poker Competition. We develop Extensive-Form Fictitious Play (XFP) that is entirely implemented in sequential strategies, thus extending this prominent game-theoretic model of learning to sequential games. XFP provides a principled foundation for self-play reinforcement learning in imperfect-information games. We introduce Fictitious Self-Play (FSP), a class of sample-based reinforcement learning algorithms that approximate XFP. We instantiate FSP with neuralnetwork function approximation and deep learning techniques, producing Neural FSP (NFSP). We demonstrate that (approximate) Nash equilibria and their representations (abstractions) can be learned using NFSP end to end, i.e. interfacing with the raw inputs and outputs of the domain. NFSP approached the performance of state-of-the-art, superhuman algorithms in Limit Texas Hold’em - an imperfect-information game at the absolute limit of tractability using massive computational resources. This is the first time that any reinforcement learning algorithm, learning solely from game outcomes without prior domain knowledge, achieved such a feat.

Zugriff

Download

Exportieren, teilen und zitieren

Dokumentinformationen

Titel :

Reinforcement Learning from Self-Play in Imperfect-Information Games

Beteiligte:

Heinrich, J (Autor:in)

Erscheinungsdatum :

2017-04-28

Anmerkungen:

Doctoral thesis, UCL (University College London).

Medientyp :

Hochschulschrift

Format :

Elektronische Ressource

Sprache :

Englisch

Klassifikation :

DDC:

629

Reinforcement Learning from Self-Play in Imperfect-Information Games

Reinforcement Learning from Self-Play in Imperfect-Information Games

Zugriff

Exportieren, teilen und zitieren

Dokumentinformationen

Ähnliche Titel

Zugriff

Seitennavigation

Exportieren, teilen und zitieren