# Course:CPSC522/Multi-Agent Systems

## Multi-Agent Systems

Multi-Agent Systems is an area of artificial intelligence concerned with the study of multiple autonomous agents acting in an environment.

Principal Author: Gudbrand Tandberg

## Abstract

The main components in the study of multi-agent systems are intelligent agents and environments. Due to the generality of the setting, multi-agent systems are used and studied in many disciplines, including economics, game theory, robotics, decision theory, and cognitive science. From the viewpoint of each agent, the primary objective is to choose actions that maximize the agent's future utility. This process is commonly formalized in the language of game theory, but other approaches such as reinforcement learning and distributed optimization can be applied. The settings in which multi-agent systems are commonly studied vary widely in complexity and abstraction, ranging from two agents playing a game of rock-paper-scissors to a fleet of autonomous vehicles navigating real-world road systems.

## Content

### Agents and Environments

The agent system

In artificial intelligence, an agent is something that acts in an environment. An agent can be physical, such as a dog, a person or a lamp, or it can be virtual, such as a computer program or an organization. An agent has the ability to partially perceive its environment through sensors, and act in the environment through actuators. An agent is typically divided into a body and a controller; the controller receives percepts from the body and sends commands to the body, while the body receives stimuli from the environment and performs actions in the environment. The controller can be classified as either reactive or cognitive: reactive agents simply have reflexes to stimuli, whereas cognitive agents can have the ability to reason and form plans for their behaviors

Agents are situated in time: they receive sensory data in time, and do actions in time [1]. Given a design goal, an agent should be able to qualitatively or quantitatively identify how its sensory stimuli relates to its goal. This is usually formalized in terms of the outcome of an action, and is realized to the agent as a reward (or feedback) from the environment.

This feedback can be evaluated in terms of the agent's private utility. Agents are typically designed for a specific task in a specific domain, and different agents vary widely in their abilities and complexity. Poole and Macworth[1] recognize ten dimensions along which the complexity of designing artificial agents vary. These are

• Modularity: the extent to which a system can be decomposed into interacting modules that can be understood separately. The values are flat, modular and hierarchical.
• Planning Horizon: how far ahead in time does the agent plan? Values for this dimension are non-planning, finite horizon, indefinite horizon and infinite horizon.
• Representation: how does the agent describe the world? Agents can reason in terms of states, features, or individuals and relations.
• Computational limits: what kind of rationality does the agent have: [perfect rationality] or [bounded rationality]?
• Learning: does the agent learn from data? Knowledge for the agent can either be given at design-time, or it can be learned from experience.
• Effect Uncertainty: how does the world change as it is affected by the agent's actions? Can be deterministic or stochastic.
• Preference: does the agent have simpler goals, or more complex preferences?
• Number of Agents are there other agents acting in the same environment?
• Interaction does the agent have to reason and act simultaneously, or can the agent reason before acting? Interaction with the environment can be either online or offline.

Although these dimensions do not encompass every aspect of intelligent agent design, they serve as a useful overview of different kinds of agents. In particular, one might imagine that each of the above dimensions could take on values that are not listed here. Collectively, the agent-environment pair makes up an agent system.

Intelligent agents in artificial intelligence are closely related to agents in economics, and versions of intelligent agents are studied in cognitive science, ethics and the philosophy of practical reason. Intelligent agents are also closely related to software agents (an autonomous computer program that carries out tasks on behalf of a user)[2]. When studying environments involving human agents, the dimension of rationality becomes particularly important, with models having to take into account effects due to bounded rationality.

#### Example 1: Robot

1983 Industrial Robots KUKA. Robots making machines.

A first example of an intelligent agent is a robot. A robot is a physical agent with actuators that allow it to act in | physical space. Robots typically consist of a body with moving parts, motors, cameras, sensors, and a centralized or distributed computing unit. The set of tasks that can be solved by a robot include product manufacturing, disaster rescue, subsea exploration, extra-terrestrial exploration, transportation, manual labour and personal healthcare. Environment + tasks ~ dimensions of complexity.

A second example of an intelligent agent is a trading agent. A trading agent is a software agent that observes the stock markets and executes trades based on its observations. A trading agent primarily seeks to maximize the cumulative return of its investments, but can also have secondary objectives such as minimizing risk or selecting only from a certain set of stocks. Trading agents vary from simple feature-based representations and if-then rules, as traditionally used in fundamental analysis, to more complex representations that incorporate learning and economic modeling. Trading agents can be used online for real-time portfolio allocation as well as offline as decision support tools for human investors.

### Multi-agent systems

A Multi-Agent System

Multi-agent systems are agent-systems with multiple interacting agents in a single environment. An environment can consist of objects and relations between these objects and must provide stimulus to the agents. In some cases, the strict discernment between agent and environment can be modified or relaxed; an agent in a multi-agent system can collectively view the other agents together with the environment as a single agent, or even as a single environment wherein the actions of the other agents are encoded in the feedback the agent receives. Sometimes nature itself is treated as an agent, where nature is defined as being a special agent that "does not have preferences and does not act strategically; it just acts, perhaps stochastically"[1].

#### Example 1: Autonomous Vehicles

Navlab models 1 (farthest) through 5 (front). Navlab 5 completed the first coast-to-coast drive with 98.2% autonomy. All five vehicles were developed at Carnegie Mellon University, from 1984 through 1995.

Interacting autonomous vehicles are an example of a multi-agent system. Each agent has its very own goals, but because the agents have to share resources (the roadway system), the agents need to communicate and coordinate in order to jointly maximize each agents utility. In the context of autonomous vehicles it is reasonable to let the agent's utility reflect the safety of the passengers and the degree to which the passengers are being transported to their destination. In addition it would be reasonable to take into account the social welfare of the entire fleet of online vehicles, the total energy consumption, and other factors of interest. Since this multi-agent system includes humans, there are ethical concerns involved. The philosophy of utilitarianism can be used to reason about how an agent should act in difficult situations.

#### Example 2: The Economy

The global economic system can be considered a multi-agent system. In such a system, nations, corporations and individuals act in the real world of goods, value, information, and services. The economy, together with human society, is probably one of the most complex multi-agent systems under study. From a practical point of view, the only way of making any progress towards understanding or modeling the system is through simplification and restriction. Only by making simplifying assumptions about the environment, and by appropriately restricting the complexity of the agents being modeled can this problem be satisfactorily approached.

### Game Theory

The basic ingredients of game theory.

One of the most successful approaches to modeling the interactions of multiple agents is through game theory. Game theory is the study of how strategic agents act in rule-constrained environments. Different environments can be modeled as different kinds of games. Because of the flexibility offered by the formalism of game theory, games are ubiquitous in AI, and have since the early history of AI been integral to its study.

#### Types of games

Types of games vary widely, from tic-tac-toe to to life itself. The simplest representation of a game is a normal form game.

##### Normal Form

A normal form game consists of the following:

• A set of ${\displaystyle n}$ agents (or players), typically identified with the set of natural numbers ${\displaystyle \{1,2,\dots ,n\}}$.
• For each agent ${\displaystyle i}$, a set of actions ${\displaystyle A_{i}}$.
• For each agent ${\displaystyle i}$, a utility function ${\displaystyle u_{i}:\prod _{i=1}^{n}A_{i}\rightarrow \mathbb {R} }$

An element ${\displaystyle {\vec {s}}\in \prod _{i=1}^{n}A_{i}}$ is an ${\displaystyle n}$-tuple of actions, and is referred to as a strategy profile. A strategy profile represents the actions chosen by each agent. Each agent can choose to play a single action in its action set ${\displaystyle A_{i}}$, or to select actions probabilistically according to a distribution over ${\displaystyle A_{i}}$. In the first case, we say the agent is playing a pure strategy, while in the second case the agent is playing a mixed strategy. Given a strategy profile ${\displaystyle {\vec {s}}}$ (of either pure or mixed strategies), the rules of the game produce an outcome which is evaluated by each agent according to its utility function. The sum of all agents utilities in a given outcome is called the social welfare of the outcome.

Normal form games can represent several important features of a multi-agent system, including the number of agents, available actions, stochasticity of the environment and stochasticity of agents. However they do not represent other important features, such as the impact of time and a non-stationary environment, the notion of repeated moves, or turns, and the notion of imperfect information. For this, richer representations are necessary.

##### Extensive Form

The extensive form of a game is an extension of a single agent decision tree. A game in extensive form is represented as a finite tree, where nodes are states and edges corresponds to actions. Each node is labelled by an agent, and this agent is said to control the node. A strategy for agent ${\displaystyle i}$ in this setting is a function from nodes controlled by ${\displaystyle i}$ into actions.

The extensive form of a game is the natural setting to study turn-based board games such as chess and go, and is also used in the context of reinforcement learning and Markov decision processes.

##### Bayesian Games

In the above forms of games it has been assumed that the agents have complete information about the world, indeed that the world is completely observable. In practice this is often an unrealistic and undesirable assumption. For example, in an online auction, a bidder does not know the private information held by the other bidders. All it knows is its own private valuation of the object, perhaps in addition to some prior beliefs about the other bidders' valuations. In a Bayesian game, this uncertainty with regards to the other agents' actions is formalized using Bayesian priors.

#### Other Notions

Types of games vary along several other dimensions, including

• Competitive / cooperative: are agents competing or are they cooperating? Most games lie somewhere in-between these extremes, with good strategies incorporating elements of both cooperation and competition.
• Zero-sum / general sum: in a zero-sum game, the sum of all agents' utilities for any outcome is always equal to zero. Games can also be general sum.
• Symmetric / asymmetric: are the action sets available to each player identical or do they differ? Other kinds of asymmetry can exist; informational asymmetry being an important example.
• Complete information / incomplete information do the agents have complete information regarding the other agents and the environment?

#### Solution Concepts

One of the basic tools for studying how agents interact with a game is a solution concept. A solution concept is a formal rule for predicting how a game will be played. These predictions are called solutions, and describe which strategies will be adopted by agents and, therefore, the result of the game.[3]

Hierarchy of equilibria in games.

The simplest solution concept is that of a Dominant Strategy Equilibrium (DSE). In a DSE, all agents choose a strategy whose outcome generates a utility that cannot be improved upon, regardless of the strategies chosen by other agents. Such equilibria rarely exist in anything but the simplest games. The next natural solution concept that arises in simple games is a Nash equilibrium (NE). A Nash equilibrium is a strategy profile where for each agent, the outcome generates a utility that cannot be improved upon, conditioning on the other agents playing their respective equilibrium strategy. Versions of Nash equilibria exist for pure and mixed strategies, where in the mixed case, the equilibrium constraints have to hold in expectation. A DSE is a NE, but the converse is not true. The [[Course:522/Game_Theory#Prisoner_dilemma | prisoner's dilemma] is an example of a game with a pure Nash equilibrium, but no dominant strategy equilibrium. rock-paper-scissors is an example of a game with a mixed Nash equilibrium, but no pure Nash equilibrium.

A further relaxation of the equilibrium condition is that of a Correlated Nash Equilibrium (CNE). A CNE is a joint distribution over strategies for all the agents. In a CNE, each agent is guaranteed (in expectation) a maximal utility by acting according to the equilibrium distribution, conditioned on the other agents also acting according to the equilibrium distribution. The coordination present in a CNE is akin to a central correlating device such as a traffic light: if all other agents respect traffic lights, it is in your best interest to do so too.

The solution concepts discussed above fit into a larger hierarchy of solution concepts. Each solution concept applies to certain kinds of games, and each solution concept also entails a certain class of computational complexity; computing correlated equilibria is computationally easier than computing dominant strategies.

#### Algorithmic Game Theory

While classical game theory is more often concerned with existence of solutions, the relatively recent branch of algorithmic game theory (AGT) is more concerned with computability issues. Even though a certain game is provably known to have an efficient equilibrium, it might still not be true that this equilibrium can be computed efficiently. In fact, in all but the simplest examples, actually computing equilibrium profiles is an ${\displaystyle NP}$-hard problem. Exceptions to this include zero-sum two-player games, for which Von Neumann proved the equivalence of a mixed Nash equilibrium and a certain feasible linear program using his famous Minimax Theorem. Further, Nash equilibria in general-sum two-player games have been shown to be computable in polynomial time.

The computational view taken in AGT makes it a suitable tool for studying multi-agent systems. There are many approaches to computing equilibria. These include learning through sequential play, distributed optimization and no-regret dynamics.

Another concern in algorithmic game theory is that even when efficient equilibria exist, agents might still choose to not act according to the equilibrium distribution, thereby sabotaging for the other agents. As a tool to study this phenomenon, Roughgarden[4] introduces the concepts of Price of Anarchy, and Price of Stability. The price of anarchy of a game is defined as the ratio of the maximal attainable social welfare across all strategies over the minimal attainable social welfare across all equilibrium strategies. The price of stability is the ratio of the maximal attainable social welfare across all strategies over the maximal attainable social welfare across all equilibrium strategies. The price of anarchy measures how the efficiency of a system degrades due to selfish behaviour of its agents, and is therefore a useful analytical tool for reasoning about the outcomes in different multi-agent system settings.

### Mechanism Design

Another central idea from economics and game theory that has shown itself useful for studying multi-agent systems is mechanism design. While game theory is concerned with studying how agents act in game environments, mechanism design is concerned with designing the rules of the games in such a way that the players are likely to act in certain desirable ways. In other words, mechanism design is concerned with designing protocols for strategic agents to act according to. In many cases, designing the rules of the game itself is as important as deciding how an agent should act, given the game. In other cases, the rules of the game are given and immutable.

In mechanism design, each agent has a type and a type space ${\displaystyle t_{i}\in T_{i}}$. Here ${\displaystyle t_{i}}$ denotes the private information only available to agent ${\displaystyle i}$, while the type space ${\displaystyle T_{i}}$ is available to the mechanism, and reflects the possible values ${\displaystyle t_{i}}$ could have. Since mechanisms allow for strategic interaction, each agent also has an associated strategy space ${\displaystyle S_{i}}$ of available actions. A strategy in this context is a mapping ${\displaystyle \sigma :T_{i}\rightarrow S_{i}}$. Formally, a mechanism is a function that maps strategy profiles to outcomes.

#### Example 1: Elections

The first example of a mechanism is a voting mechanism. The set of outcomes is the set of all candidates up for election, and each agent's type represents its candidate of choice. When designing a voting mechanism, there are certain characteristics that are desirable, for example one might want the mechanism to discourage strategic voting, i.e. voting for someone other than your preferred candidate. One might also ask of such a mechanism that it be non-dictatorial, i.e. there is no single agent that determines the outcome of the election. A mechanism that takes in a binary preference relation from each agent and outputs a candidate is called a social choice function. Arrow's impossibility theorem[5] states that no social choice function that satisfies non-strategic voting, non-dictatorialism, and independence of irrelevant alternatives, exists.

#### Example 2: Auctions

The second example, an auction, is an important example of a mechanism with money. The output of the mechanism is described by an allocation rule (who get's what?), and a payment rule (who gives or receives money from the mechanism?). Money is a special kind of transferrable utility available to the mechanism. In a simple auction setting, each agent's type is a real number ${\displaystyle v}$ called the agent's value. Auctions can be sequential, such as the commonly used English (increasing) and Dutch (decreasing) auctions. In a first prize auction, the highest bidder wins and pays the highest bid. In a second prize auction, the highest bidder wins, and pays the second highest bid. An auction can also be implemented as a direct revelation mechanism, where each agent immediately reveals its type to the mechanism. Ideally, an auction mechanism should discourage strategic bidding, i.e. overbidding or underbidding.

#### Example 3: Markets

Another example of a mechanism is a market. Designing "good" markets is important for many different practical global objectives. The function of a market typically involves distributing resources, virtual or physical, between participating agents, each agent with its private incentives and resources, interacting with other agents in order to maximize its private utility. One of the first thinkers to analytically expound the idea of a market was Adam Smith. In his 1776 treatise "The Wealth of Nations", he argues that through a market mechanism ("the invisible hand") each participant's self-interested motivation will be directed towards the common goal of greater social welfare. In Smith's conception of the market, there is very little room for design. Prices will be determined according to the rules of supply and demand, and successful businesses will drive out the outdated, old, and unnecessary. This is perhaps one of the most widely debated ideas in all of economics, and in some sense, it is a question of mechanism design. For the modern electronic market designer (which could be a national institution, or a private company), there are many parameters available for manipulation, such as transactional fees, taxation, bidding protocols, network implementation, verification (smart contracts), and security.

#### Mechanism Design Goals

The emergence of mechanism design is a result of decades of earlier work in economic theory and mathematical economics. Fundamental to economics is the concept of an equilibrium. In the ubiquitous resource allocation problem, a simple variant of an economic equilibrium concept is that of a Walrasian equilibrium, named after the French economist Léon Walras. A Walrasian equilibrium is a payment/allocation-rule pair for which each agent derives maximal utility for its allocated package, and every item to be allocated gets allocated at a positive price. The First Welfare Theorem, states that the existence of Walrasian equilibria is equivalent to the feasibility of a certain integer program. Another early positive result from mechanism design is the solution to the single-item incomplete-information auction presented in Myerson '81[6].

A mechanism where it is a dominant strategy for each agent to bid truthfully is called a dominant strategy incentive compatible (DSIC) mechanism. The second prize auction is DSIC, the first prize auction is not. The revelation principle is a fundamental principle in classical mechanism design, and states that if a social choice function can be implemented by an arbitrary mechanism, then the same function can be implemented by an incentive-compatible direct-revelation mechanism with the same equilibrium outcome[7].

#### Computational Mechanism Design

Classical mechanism design has mostly concerned itself with designing good (i.e. DSIC) mechanisms with the common assumption that agents can reveal their complete preferences among outcomes (the revelation principle), and that the mechanism can solve an optimization problem to select the best outcome[8]. In practice this is rarely possible in all but the simplest cases. For example, any auction setting in which there is more than a single item for sale will suffer from limiting computational concerns due to combinatorial scaling. Computational mechanism design is an approach to tackling these real-world implementation concerns through theoretical and empirical algorithm design. Examples of topics in computational mechanism design include maximum-welfare-approximating algorithms, approximately incentive compatible mechanisms, dynamic mechanisms, mechanism simulation, empirical mechanisms, and market design. Recently, tools from AI such as as deep learning and support vector machines have been successfully applied to such problems as estimating agents' types, and computing approximately efficient allocations [9][10]. A fundamental paradigm that has emerged from computational mechanism design is the view of "markets-as-computation", where essentially markets are modeled as self-organizing, self-motivated multi-agent systems. This view unifies the game-theoretic and the distributed-AI approaches to mechanism design, which in itself is a fundamentally economical, even political problem.

### Other Approaches

Although fruitful and general, game theory is far from the only approach used in the study of multi-agent systems. Historically, the game-theoretic approach has been paralleled by a control theoretic approach, in which a multi-agent system is modelled as a networked dynamical system. The foundations of this approach are based largely on the theoretical framework of distributed computing. Important problems in the field include consensus dynamics, distributed constraint satisfaction, sensor fusion and autonomic computing[11]. In the field of distributed AI, in important building block of modern multi-agent systems, the study of distributed systems is coupled with the study of intelligent agents and goal-based interactions to arrive at a unified framework for analyzing networked multi-agent systems. This can again be combined with other areas of classical AI, such as reasoning, knowledge representation, models and inference to model even more complex real-world systems.

Computational sociology is a branch of sociology that uses computational methods to analyze and model social phenomena[12]. With the advent of global social media platforms, methods from computational sociology have been successfully combined with big data algorithms and computational graph theory to shed light on many difficult problems involving social networks between human agents, and the dynamics of these networks. Problems that have been studied include team formation with communication cost, information propagation, collective dynamics, political recruitment, and population modelling. Overall, online social networks make up an interesting environment to study and apply ideas from classical multi-agent systems.

### History

Tales of human-created autonomous agents are at least as old as the self-navigating three-legged tables in the Iliad. From the early Jewish legend of the golem created like Adam from clay to Mary Shelley's Frankenstein and modern day science fiction, imaginings of embodied intelligent agents have long been an active area of speculation. It was not until the relatively recent advent of the electronic computer that serious scientific research into the problem became relevant. Early investigations include Alan Turing's 1951 paper "Computing Machinery and Intelligence"[13], in which he proposes his famous "Imitation Game" for deciding a class of artificial intelligence (AI), also known as the "Turing Test".

Deep Blue, the super computer that defeated world champion chess player Garry Kasparov in 1997

As the computing capabilities grew during the 50s and 60's, so did the domain and impact of AI research. An important thread in the early development of computerized agent-systems is the work in computer checkers undertaken by Arthur Samuel. By 1959, Samuel's checkers program outplayed human players, using his alpha-beta pruning algorithm for searching through the extensive form representation of checkers. Samuel believed that teaching computers to play games would be very fruitful for developing tactics appropriate to general problems[14]. This view has been commonly held by the AI community until the present day, and the study of computational games still remain a driving force in AI research. A recent example is the AlphaZero algorithm developed by DeepMind, which combines game theory with stochastic search methods from machine learning to reach super-human levels of play in games such as chess and go.

The foundations of modern game theory were presented in Von Neumann and Morgenstern's 1944 seminal book "Theory of Games and Economic Behavior". The second edition of this book provided an axiomatic theory of utility, which reincarnated Daniel Bernoulli's old theory of utility as an independent discipline[15]. The book also contains the first treatment of pure Nash equilibria, albeit in the relatively narrow context of two-player zero-sum games. John Nash went on to prove in the 1951 article "Non-Cooperative Games"[16] that every ${\displaystyle n}$-player, general-sum non-cooperative game has a mixed Nash equilibrium.

With the invention of integrated circuits and semi-conductor technology, a new era of interdisciplinary computer science research was ushered in. Economic modelling and simulation immediately became one of the most important applications of computer programming, which in turn instigated further research into economic theory and social choice theory. During the 70's and 80's early work involving "laying the foundations of mechanism design theory"[17] was being carried out by Leonid Hurwicz, Eric Maskin, and Roger Myerson, for which they received the 2007 Nobel Memorial Prize in Economics. In 2001, the prize was awarded George Akerlof, Michael Spence, and Joseph E. Stiglitz for their "analyses of markets with asymmetric information"[18].

By the 1980's, the study of multi-agent systems had matured into its own well-defined and wide-reaching subfield of AI. This partly coincided with the first generation of electronic, programmable robots coming into emergence in the early 80's. In recent times (2012-), multi-agent systems, as well as the broader area of machine learning and artificial intelligence is going through a surge of renewed interest and media attention, due partly to examples such as self-driving cars, deep learning-based game-playing agents, crypto-currency mining networks, and unmanned aerial vehicles (drones). The debate on the future of agent-based AI research has, due to its far-reaching societal implications, become part of the public debate. It is predicted that our civilization is standing at the cusp of a new industrial revolution, which will have up to a thousandfold increase in impact on human life on earth, compared to previous industrial revolutions, depending on sector[19][20]. At its core, much of this disruption will be due to progress and development in the field of multi-agent systems.