Learning Behaviours in Multi-Agent Systems

  • Ross Conroy

Student thesis: Doctoral Thesis

Abstract

Interactive Dynamic In
uence Diagrams (I-DIDs) is a well recognised
graphical decision model that explicitly models the behaviour of a subject
agent whilst also explicitly modelling the behaviour of other agents. This
allows the expected behaviour of other agents to in
uence the subject
agent allowing the agent to maximise its utility not only from its belief
about the state-space but also its belief about the behaviour of other
agents. I-DIDs require models to be manually constructed ahead of time,
this can present a barrier to entry towards applying I-DIDs to some applications
such as when interacting with a human agent, due to it being
dicult, if not impossible, to construct models for human agents. This is
due to the fact that each human agent that may be encountered may have
their own unique thought process which may change over time, also human
agents may not always execute an optimal policy, taking shortcuts in
their thought process. I-DIDs also become dicult to construct for large
and complex domains where constructing I-DIDs and associated models
of other agents may become too time consuming to be considered reasonable.
I-DIDs also experience the problem of exponential growth within
their candidate models of other agents. The more candidate models there
are, the more computational resources to solve are required. This can
cause I-DIDs to become impossible to solve for complex domains or larger
time horizons. An example of such a domain which exhibits these properties
is that of real-time strategy (RTS) games such as StarCraft. StarCraft
has been growing in popularity in the eld of articial intelligence (AI)
research thanks to the abundance of historical data freely available from
on-line sources, as well as the ability to be software controlled allowing
for the development of intelligent agents to control the game. StarCraft
is a complex game requiring players to account for many aspects such as
combat, resource gathering and scouting whilst also constrained by the
partially observable nature of the game.
Initially this work provides a software framework towards learning the
behaviour of agents within an I-DID. This framework allows for dierent
behaviour learning techniques to be implemented such as those from
data sources and candidate models. The framework also allows the implementation
of multiple model reduction techniques such as the already
established Behavioural Equivalence (BE) and Action Equivalence (AE)
techniques along with the new Value Equivalent (VE) and its approximation.
The goal of this framework is to simplify the process of implementing
dierent learning and reduction techniques towards the I-DID framework.
This framework is used throughout this thesis to implement proposed
learning and model reduction methods.
This thesis aims to tackle the problem of constructing models of human
players by learning the behaviour of other agents by way of automatic
learning techniques, then applying this learning to the behaviour of human
players from StarCraft replay les. The learning also allows for learning incomplete
behaviour of opposing agents using a Behavioural Compatibility
Test (BCT) to complete partial behaviour from existing learnt behaviour
where compatible. This has been evaluated for performance and solution
quality in two problem domains, one simulated (Tiger Problem) and the
real world (StarCraft).
The problem of exponential growth within the model space of I-DIDs is
also tackled, building on the ideas of BE and AE with a new concept of VE.
VE aims to reduce the model space further than BE and AE by not only
reducing models based on their matching behaviour, but also based on
expected utility for the subject agent given the expected behaviour. The
expected utility for each behaviour is learnt from past interaction data
and behaviours grouped where expected utility is the same or similar.
To apply VE to problems where utility data may not be available, this
work also proposes a new framework for determining VE approximately
based on behavioural coverage. Provided with this framework are a set
of techniques towards determining a subset of candidate models given a
larger set. The goal of this behaviour reduction is to reduce the model
space within I-DIDs whilst still maintaining sucient behavioural coverage
to maintain reasonable solutions with reduced computational costs.
Date of Award15 Dec 2017
Original languageEnglish
Awarding Institution
  • Teesside University
SupervisorYifeng Zeng (Supervisor)

Cite this

'