# Course:CPSC522/Predicting Human Behavior in Normal-Form Games

## Predicting Human Behavior in Normal-Form Games

### Referenced Papers

1) Beyond Equilibrium: Predicting Human Behavior in Normal Form Games. J. Wright, K. Leyton-Brown. Conference of the Association for the Advancement of Artificial Intelligence (AAAI-10), 2010.[1]

2) Camerer, C.; Ho, T.; and Chong, J. 2001. Behavioral game theory: Thinking, learning, and teaching. Nobel Symposium on Behavioral and Experimental Economics. [2]

## Abstract

In multi-agent settings, the convention is to assume that agents will adopt Nash equilibrium strategies. However, studies in experimental economics demonstrate that Nash equilibrium is a poor indicator of human players’ initial behavior in normal-form games. The page starts out by considering a range of widely-studied models from behavioral game theory. Using large-scale and publicly-available experimental data from the literature, each of these models is evaluated in a meta-analysis. Finally, modifications to the best-performing model are proposed and analysed, that make it more suitable for practical prediction of initial play by humans in normal-form games.

## Content

### Introduction

Game theory is a mathematical system for analyzing and predicting how idealized rational agents behave in strategic situations.
Behavioral game theory aims to extend game theory to modelling human agents.

Figure 1a: Chicken Game

In game theory, the normal-form representation of a game includes all conceivable strategies, and their corresponding payoffs, for each player.
In a normal form game:

• Each agent simultaneously chooses an action from a finite action set.
• Each combination of actions yields a known utility to each agent.
• The agents may choose actions either deterministically or stochastically.

In a Nash equilibrium, each agent best responds to the others. An agent best responds to other agents’ actions by choosing a strategy that maximizes utility, conditional on the other agents’ strategies.
${\displaystyle BR_{i}(s_{-i})=argmax_{s_{i}}u_{i}(s_{i},s_{-i})}$

Figure 1b: The Game of Chicken: Bi-Matrix

#### A Worked Example: Game of Chicken

For “simple” games a convenient way to represent normal-form games is in bi-matrix form. We can look at a simple game called "the chicken game" to understand how a game is represented in normal form and how nash equilibria is determined. Figure 1a shows that the game has two players (you can think of them as two cars currently opposite to each other). Figure 1b shows the bi-matrix representation for the game of chicken:

• There are two players: Player 1 (Row Player) and Player 2 (Column Player).
• Each player can either “dare” ${\displaystyle (D)}$ or “chicken out” ${\displaystyle (C)}$.
• If both players “Dare”, they collide and receive payoffs ${\displaystyle (0,0)}$
• If player 1 “Dares” & player 2 “Chickens out”, then player 1 receives a payoff of ${\displaystyle 5}$ and player 2 receives only ${\displaystyle 1}$.
• If player 1 “Chickens Out” & player 2 “Dares”, then player 1 receives a payoff of only ${\displaystyle 1}$ and player 2 receives ${\displaystyle 5}$.
• If players 1 and 2 both “Chicken Out”. Then each receives a payoff of ${\displaystyle 4}$.
• The pure-strategy Nash equilibria for this game are ${\displaystyle (C,D)}$ and ${\displaystyle (D,C)}$, where each player has no incentive to deviate. What this means is that these pairs of strategies which constitute the pure-strategy Nash equilibria are "self-enforcing", i.e. it makes each player's strategy an optimal (best) response to the other player's strategy.

#### Poor Predictive Performance of Nash equilibrium

However, even though the Nash Equilibrium is an intuitive and appealing notion of achieving equilibrium, it is often a poor predictor of human behavior. The vast majority of human players choose ${\displaystyle (C,C)}$ in the game of chicken. Modifications to a game that don’t change Nash equilibrium predictions at all can cause large changes in how human subjects play the game. In Game of Chicken: When the penalty is large, people play much closer to Nash equilibrium. Clearly Nash equilibrium is not the whole story. Behavioral game theory proposes a number of models to better explain human behavior.

### Behavioral Game Theory Models

Themes:

1. Quantal response: Agents best-respond with high probability rather than deterministically best responding.
2. Iterative strategic reasoning: Agents can only perform limited steps of strategic “look-ahead”.

One model is based on quantal response, two models are based on iterative strategic reasoning, and one model incorporates both.

#### Quantal Response Equilibrium (QRE)

One prominent behavioral theory asserts that agents become more likely to make errors as those errors become less costly. We refer to this property as cost-proportional errors. This can be modeled by assuming that agents best respond quantally, rather than via strict maximization.

QRE model [4]

• Agents quantally best respond to each other.
• A (logit) quantal best response by agent ${\displaystyle i}$ to a strategy profile ${\displaystyle s_{-i}}$ is a mixed strategy ${\displaystyle s_{i}}$ such that

${\displaystyle QBR_{i}(S_{-i})={\frac {\exp(\lambda U_{i}(a_{i},s_{-i}))}{\sum _{a}'{\exp(\lambda U_{i}(a'_{i},s_{-i}))}}}}$
where λ (the precision parameter) indicates how sensitive agents are to utility differences. Note that unlike regular best response, which is a set-valued function, quantal best response returns a single mixed strategy.

#### Level-k

Another key idea from behavioral game theory is that humans can perform only a bounded number of iterations of strategic reasoning. The level-k model [5] captures this idea by associating each agent i with a level ${\displaystyle k}$${\displaystyle {0,1,2,...}}$, corresponding to the number of iterations of reasoning the agent is able to perform.

• A level-0 agent plays randomly, choosing uniformly at random from his possible actions.
• A level-k agent, for ${\displaystyle k}$${\displaystyle 1}$, best responds to the strategy played by level ${\displaystyle (k}$${\displaystyle 1)}$ agents. If a level-k agent has more than one best response, he mixes uniformly over them.

Here we consider a particular level-k model, dubbed ${\displaystyle Lk}$, which assumes that all agents belong to levels ${\displaystyle 0,1,}$ and ${\displaystyle 2}$. Each agent with level ${\displaystyle k}$ > ${\displaystyle 0}$ has an associated probability ${\displaystyle k}$ of making an “error”, i.e., of playing an action that is not a best response to the level- (${\displaystyle k}$${\displaystyle 1}$) strategy. However, the agents do not account for these errors when forming their beliefs about how lower-level agents will act.

#### Cognitive Hierarchy

The cognitive hierarchy model [6], like level-k, aims to model agents with heterogeneous bounds on iterated reasoning. It differs from the level-k model in two ways:

• First, agent types do not have associated error rates; each agent best responds perfectly to its beliefs.
• Second, agents best respond to the full distribution of lower-level types, rather than only to the strategy one level below.

More formally, every agent again has an associated level ${\displaystyle m}$${\displaystyle {0,1,2,...}}$. Let ${\displaystyle F}$ be the cumulative distribution of the levels in the population. Level-0 agents play (typically uniformly) at random. Level-m agents ${\displaystyle (m}$${\displaystyle 1)}$ best respond to the strategies that would be played in a population described by the cumulative distribution ${\displaystyle F(j|j. (Camerer, Ho, and Chong 2004)[6] advocate a single parameter restriction of the cognitive hierarchy model called Poisson-CH, in which the levels of agents in the population ${\displaystyle F}$ are distributed according to a Poisson distribution.

#### Quantal Level-k

The Quantal Level K (QLK) model combines elements of the QRE and level-k models; we refer to it as the quantal level-k model. In QLk, agents have one of three levels, as in Lk. Each agent responds to its beliefs quantally, as in QRE. Like Lk, agents believe that the rest of the population has the next-lower type. The main difference between QLk and Lk is in the error structure. In Lk, higher-level agents believe that all lowerlevel agents best respond perfectly, although in fact every agent has some probability of making an error. In contrast, in QLk, agents are aware of the quantal nature of the lowerlevel agents’ responses, and have a (possibly-incorrect) belief about the lower-level agents’ precision.

### Model Analysis

#### Data

The authors identified nine large-scale, publicly-available sets of human-subject experimental data. Each observation of an action by an experimental subject is represented as a pair ${\displaystyle (a_{i},G)}$, where ${\displaystyle a_{i}}$ is the action that the subject took when playing as player ${\displaystyle i}$ in game ${\displaystyle G}$. All games were two player, so each single play of a game generated two observations. Data from six experimental studies, plus a combined dataset was used:

• SW94: 400 observations from [Stahl & Wilson 1994][7]
• SW95: 576 observations from [Stahl & Wilson 1995][8]
• CGCB98: 1296 observations from [Costa-Gomes et al. 1998][9]
• GH01: 500 observations from [Goeree & Holt 2001][10]
• CVH03: 2992 observations from [Cooper & Van Huyck 2003][11]
• RPC09: 1210 observations from [Rogers et al. 2009][12]
• ALL6: All 6974 observations

Subjects played 2-player normal form games once each. Each action by an individual player is a single observation.

#### Model comparisons

The authors compare the behavioral game theory (BGT) models’ prediction performance to Nash equilibrium. Unmodified Nash equilibrium is not suitable for predictions since games often have multiple Nash equilibria and a Nash equilibrium will often assign probability 0 to some actions. Two different Nash-based models were constructed to deal with multiple equilibria:

1. UNEE: Take the average of all Nash equilibria.
2. NNEE: Predict using the post-hoc “best” Nash equilibrium.

Both models avoid probability 0 predictions via a tunable error probability.

#### Methods

To evaluate a given model on a given dataset, the authors performed 10 rounds of 10-fold cross-validation. Specifically, for each round, each dataset was randomly divided into 10 parts. For each of the 10 ways of selecting 9 parts from the 10, the maximum likelihood estimate of the model’s parameters were computed based on those 9 parts, using the Nelder-Mead simplex algorithm.[13] Then the log likelihood of the remaining part was computed given the prediction. We call the average of this quantity across all 10 parts, the cross-validated log likelihood.
The predictive power of different behavioral models on a given dataset was evaluated by comparing the average cross-validated log likelihood of the dataset under each model. We say that one model predicted significantly better than another when the 95% confidence intervals for the average cross-validated log likelihoods do not overlap.

Figure 2: Nash equilibrium vs. BGT[1]

Figure 2 shows that

• UNEE performed worse than every other BGT model (except in datasets GH01 and SW95).
• Even NNEE performed worse than QLk and QRE in most datasets.
• What this indicates is that: BGT models typically predict human behavior better than Nash equilibrium-based models.

In most datasets, the model based on cost-proportional errors (QRE) predicted significantly better than the two models based on bounded iterated reasoning (Lk and Poisson-CH). However, in three datasets, including the aggregated dataset, the situation was reversed, with Lk and PoissonCH outperforming QRE. This mixed result is consistent with earlier comparisons of QRE with these two models (Chong, Camerer, and Ho 2005[14]; Crawford and Iriberri 2007[15]; Rogers, Palfrey, and Camerer 2009[12]), and suggests that bounded iterated reasoning and cost-proportional errors capture distinct underlying phenomena. That suggests that our remaining model, which incorporates both components, should predict better than models that incorporate only one component. This was indeed the case, as QLk generally outperformed the single component models. Overall, QLk was the strongest of the behavioral models, predicting significantly better than all models in all datasets except CVH03 and SW95.

Figure 3: Lk and CH vs. QRE[1]

Figure 3 shows that

• ${\displaystyle Lk}$ and Poisson-CH performed roughly similarly.
• Iterative models and quantal response appear to capture distinct phenomena.
Figure 4: Model comparisons: QLk[1]

#### The Modified Model: Quantal Cognitive Heirarchy

The authors of Paper 1 [1] constructed a model in which non-random (i.e., nonlevel-0) agents were constrained to have identical precisions. Further, the agents were constrained to have correct beliefs about the precisions and the relative proportions of lowerlevel types. This model can also be viewed as an extension of cognitive hierarchy that adds quantal response; hence it is called quantal cognitive hierarchy, or QCH. In quantal cognitive hierarchy model (QCH), all agent levels:

• respond quantally (as in QLk).
• respond to truncated, true distribution of lower levels (as incognitive hierarchy).
• have the same precision λ.
• are aware of the true precision of lower levels.

Figure 5 shows the comparison between the prediction performance of QCH and QLk. QCH actually performed considerably better than QLk on the ALL6 and SW95 datasets. Otherwise its performance was similar to QLk’s, and was never worse by more than a factor of 10. This suggests that QLk’s added flexibility in terms of heterogeneous beliefs and precisions did not lead to substantially better predictions.

Figure 5: Average likelihood ratios between predictions of modified and initial models, with 95% confidence intervals.[1]

### Incremental Contribution of Paper 1[1] over Paper 2[2]

While the existing behavioral game theory literature (including paper 2[2]) broadly focuses on explaining (fitting) in-sample behavior rather than predicting out-of-sample behavior, paper 1 tries to answer the question of which model to use to predict human behavior. This is the first study to address the question of which of the behavioral models: QRE, level-k, cognitive hierarchy, and quantal level-k behavioral models is best suited to predicting unseen human play of normal-form games. The study explored the prediction performance of these models, along with several modifications. Overall, the results indicate that the QLk model had substantially better prediction performance than any other model from the literature. They also designed a hybrid model called QCH, which is a novel and conceptually-simpler modification of QLk. QCH performed better than all of the behavioral models including QLk in most of the data sets.

### My thoughts and Future Directions

It is important to note that the meta analysis performed on these data sets only refer to simultaneous move normal form games. This study excludes two important classes of games: extensive form and repeated games. Therefore, an interesting direction for future work would be to evaluate models that have been extended to account for learning and non-initial play, including repeated game and extensive-form game settings.

## Annotated Bibliography

1. Beyond Equilibrium: Predicting Human Behavior in Normal Form Games. J. Wright, K. Leyton-Brown. Conference of the Association for the Advancement of Artificial Intelligence (AAAI-10), 2010., from http://www.cs.ubc.ca/~kevinlb/pub.php?u=2010-AAAI-Beyond-Equilibrium.pdf
2. Camerer, C.; Ho, T.; and Chong, J. 2001. Behavioral game theory: Thinking, learning, and teaching. Nobel Symposium on Behavioral and Experimental Economics. from http://people.hss.caltech.edu/~camerer/Ch08Pg_119-179.pdf
3. Game Theory CPSC522 wiki from https://en.wikipedia.org/wiki/Game_Theory
4. McKelvey, R., and Palfrey, T. 1995. Quantal response equilibria for normal form games. GEB 10(1):6–38.
5. Costa-Gomes, M.; Crawford, V.; and Broseta, B. 2001. Cognition and behavior in normal-form games: An experimental study. Econometrica 69(5):1193–1235.
6. Camerer, C.; Ho, T.; and Chong, J. 2004. A cognitive hierarchy model of games. QJE 119(3):861–898.
7. Stahl, D., and Wilson, P. 1994. Experimental evidence on players’ models of other players. JEBO 25(3):309–327.
8. Stahl, D., and Wilson, P. 1995. On players’ models of other players: Theory and experimental evidence. GEB 10(1):218– 254.
9. Costa-Gomes, M.; Crawford, V.; and Broseta, B. 1998. Cognition and behavior in normal form games: an experimental study. Discussion paper 98-22, UCSD
10. Goeree, J. K., and Holt, C. A. 2001. Ten little treasures of game theory and ten intuitive contradictions. AER 91(5):1402–1422.
11. Cooper, D., and Van Huyck, J. 2003. Evidence on the equivalence of the strategic and extensive form representation of games. JET 110(2):290–308.
12. Rogers, B. W.; Palfrey, T. R.; and Camerer, C. F. 2009. Heterogeneous quantal response equilibrium and cognitive hierarchies. JET 144(4):1440–1467.
13. Nelder, J. A., and Mead, R. 1965. A simplex method for function minimization. Computer Journal 7(4):308–313.
14. Chong, J.; Camerer, C.; and Ho, T. 2005. Cognitive hierarchy: A limited thinking theory in games. Experimental Business Research, Vol. III: Marketing, accounting and cognitive perspectives 203–228.
15. Crawford, V., and Iriberri, N. 2007. Fatal attraction: Salience, naivete, and sophistication in experimental “hide-and-seek” games. AER 97(5):1731–1750..