The lack of data efficiency and stability is one of the main challenges in end-to-end model free reinforcement learning (RL) methods. Recent researches solve the problem resort to supervised learning methods by utilizing human expert demonstrations, e.g. imitation learning. In this paper we present a novel framework which builds a self-improving process upon a policy improvement operator, which is used as a black box such that it has multiple implementation options for various applications. An agent is trained to iteratively imitate behaviors that are generated by the operator. Hence the agent can learn by itself without domain knowledge from human. We employ generative adversarial networks (GAN) to implement the imitation module in the new framework. We evaluate the framework performance over multiple application domains and provide comparison results in support. © 2019 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved.
|Title of host publication||Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS|
|Publication status||Published - 15 May 2019|
|Event||AAMAS 2019 - Concordia University, Montreal, Canada|
Duration: 13 May 2019 → 17 May 2019
|Name||ACM International Conference on Autonomous Agents and Multiagent Systems. Proceedings|
|Publisher||Association for Computing Machinery|
|Period||13/05/19 → 17/05/19|