Incentives for Safety Agreement Compliance in AI Race

Project: Research

Description

An AI race for technological advantage towards powerful AI systems could lead to serious negative consequences, especially when ethical and safety procedures are underestimated or even ignored. For all to enjoy the benefits provided by a safe, ethical and trustworthy AI, it is crucial to enact appropriate incentive strategies that ensure mutually beneficial, normative behaviour and safety-compliance from all parties involved. Using methods from Evolutionary Game Theory, this project will develop computational models (both analytic and simulated) that capture key factors of an AI race, revealing which strategic behaviours would likely emerge in different conditions and hypothetical scenarios of the race. Moreover, applying methods from incentives and agreement modelling, we will systematically analyse how different types of incentives (namely, positive vs. negative, peer vs. institutional, and their combinations) influence safety-compliance behaviours over time, and how such behaviours should be configured to ensure desired global outcomes, without undue restrictions that would slow down development. The project will thus provide foundations on which incentives will stimulate such outcomes, and how they need to be employed and deployed, within incentive boundaries suited to types of players, in order to achieve high level of compliance in a cooperative safety agreement and avoid AI disasters.

Layman's description

An AI race for technological advantage towards powerful AI systems could lead to serious negative consequences, especially when ethical and safety procedures are underestimated or even ignored. For all to enjoy the benefits provided by a safe, ethical and trustworthy AI, it is crucial to enact appropriate incentive strategies that ensure mutually beneficial, normative behaviour and safety-compliance from all parties involved. Using methods from Evolutionary Game Theory, this project will develop computational models (both analytic and simulated) that capture key factors of an AI race, revealing which strategic behaviours would likely emerge in different conditions and hypothetical scenarios of the race. Moreover, applying methods from incentives and agreement modelling, we will systematically analyse how different types of incentives (namely, positive vs. negative, peer vs. institutional, and their combinations) influence safety-compliance behaviours over time, and how such behaviours should be configured to ensure desired global outcomes, without undue restrictions that would slow down development. The project will thus provide foundations on which incentives will stimulate such outcomes, and how they need to be employed and deployed, within incentive boundaries suited to types of players, in order to achieve high level of compliance in a cooperative safety agreement and avoid AI disasters.
StatusActive
Effective start/end date30/11/1831/10/20

Funding

  • Future of Life Institute