Course:CPSC522/Minimax Regret Preference Elicitation for Risky Prospects

From UBC Wiki
Jump to: navigation, search

Title

In this page, we introduce how Hines and Larson[1] try to use cumulative prospect theory[2] to do preference elicitation with the minimax regret decision criterion which is introduced by Wang and Boutilier[3]. Indeed, the incremental papers which this page mainly discusses are the following:

Paper 1: Incremental utility elicitation with the minimax regret decision criterion. By T. Wang and C. Boutilier. 2003.[3]

Paper 2: Preference elicitation for risky prospects. By Greg Hines and Kate Larson. 2010.[1]

Principal Author: Ali Mohammad Mehr
Collaborators:

Abstract

Due to the size and complexity of utility functions, it is often not feasible to try to make full elicitation, requiring that decisions be made without full utility information. Adopting the minimax regret criterion for decision making with incomplete utility information, we will try to use cumulative prospect theory as an alternative to expected utility theory to make better decisions for humans in risky settings. Using the space of standard gambling queries, we will use myopically optimal queries to make decisions.

Builds on

Basic understanding of probabilities; basic understanding of utility theory, preferences, and lotteries; basic understanding of criteria in elicitation of factored utilities.

Related Pages

An overview of elicitation of factored utilities can be found in here. The page includes an overview of criteria used in elicitation of utilities.

An overview of utility theory, preferences, and lotteries can be found in the slides of the book Artificial Intelligence: Foundations of Computational Agents, second edition, Cambridge University Press 2017 by David Poole and Alan Mackworth in here.

Introduction

In many areas of artificial intelligence, we are interested in making decisions on behalf of users. Tailoring decisions to a specific user requires knowledge of the user’s utility function, information that generally cannot be built into software in advance. In many different areas, e.g. travel planning and resource allocation, assessing a user's utility function is an integral part of interactive decision making. To do this assessment, we need to get information about the user’s utility values.

A common method of querying a user is a standard gamble query (SGQ) which asks the user to decide between two outcomes. From the user’s response, we are able to infer a constraint about their utility function. Such constraints give the set of all possible utility functions. If we have access to a probability distribution of utility functions for a population, we should choose a decision that maximizes the expected expected utility ; i.e. the expected utility of the decision according to each possible utility function multiplied by the probability of the user having that utility function[4].

In making decision with incomplete information, especially when a probability distribution over users' utility functions, the principle of maximum expected utility cannot be used directly. Therefore, different criteria have been devised to make decisions with the gathered information about the user's utility function. One such criterion is the minimax regret decision criterion, which is adopted by the two papers that we will discuss. Indeed, the papers adopt a distribution-free model, working with linear constraints on utility functions. Using the minimax regret criterion, the papers generate decisions whose quality (difference from optimal) can be bounded in the face of incomplete utility information. These bounds can be traded off against query cost or minimum error requirements to guide the query process.

In making decisions of behalf of users, We are often specifically interested in cases where these decisions involve a degree of risk. SGQs assume that users follow the predictions of expected utility theory (EUT). There is strong empirical evidence, however, that people systematically break such predictions[5], especially in risky situations. Cumulative prospect theory (CPT) is a theory that is better able to explain preferences between risky choices[2].

In the first paper, Wang and Boutilier[3] show how decisions of minimax regret can be computed using simple linear programs (LPs) if utility constraints are linear. They also discuss incremental elicitation, focusing on standard gamble queries (SGQs), the responses to which impose one-dimensional, linear utility constraints that can be easily handled using LPs. In the end, they develop several myopic elicitation strategies. Having claimed that humans actually do not follow the predictions of expected utility theory, in the second paper, Hines and Larson[1] try to incorporate cumulative prospect theory in the elicitation strategies developed by Wang and Boutilier[3], especially the myopic elicitation strategies for picking which SGQ to ask next, to make better decisions for humans in risky situations.

Before diving into the contributions of each of the papers, we will first introduce the problem definition to define our notation and introduce cumulative prospect theory (CPT).

Problem Definition

In a preference elicitation problem, we have a decision scenario with a set of possible outcomes and a user with a private utility function . We will assume that the user's utility function is always scaled so that and . There exists a finite set of decisions which we can make on behalf of the user. Each decision is a prospect over . The prospect is the probability distribution over the set of outcomes .

Expected utility theory states that the overall expected utility of a decision is

,

and the best decision, ,is the one that maximizes the expected utility of the user. The notation used on this page is based on the paper by Hines and Larson[1].

Cumulative Prospect Theory

Figure 1. An example of a probability weighting function which is commonly used in cumulative prospect theory.

There has been considerable empirical evidence found that people may systematically violate the axioms of EUT[5]. While most people prefer the guaranteed outcome of $3,000 over the prospect [0.2, $0; 0.8, $4, 000], people tend to prefer the prospect [0.8, $0, 0.2, $4, 000] over the prospect [0.75, $0; 0.25, $3, 000]. This violates EUT because the ratio of the probabilities for the non-zero outcomes are the same in both prospects. One of the famous alternative models that has been able to better explain and predict human decisions is cumulative prospect theory (CPT)[2]. Two key features of human behaviour that CPT captures are loss aversion and probability weighting. Probability weighting distorts the probabilities people consider when judging the utility of a risky prospect. CPT models probability weighting using a weighting function, w(p), an example of which is shown in Figure 1. In CPT, the weight of a probability is also dependent on the rank of the respective outcome. For the prospect where , the overall weight, , for probability is

and the overall utility of a prospect is, .

For example, the utility of a prospect is .

Many preference elicitation approaches have been proposed to be used with CPT. Hines and Larson[1] are specifically interested in approaches that do not choose specific weighting functions for the users.

Contributions of the papers

Minimax Criterion by Paper 1[3]

After defining a decision scenario, expected utility, and maximum expected utility, Wang and Boutilier introduce how we model the uncertainty of the decision system about the utility function of the user. The authors say that this modeling is done by assuming a set of linear constraints over the set of possible utility functions . Indeed, they are assuming that constraints over unknown utility values are linear. We use to denote the subset of satisfying . Adopting the minimax regret decision criterion, Wang and Boutilier define the optimal decision with respect to utility vector to be

.

If the utility function were known, would be the correct decision. The regret of the decision with respect to is

.

Then, Wang and Boutilier define maximum regret of decision with respect to as:

.

Then, the decision with minimax regret with respect to will be:

.

Finally, the minimax regret level of feasible utility set will be defined as,

.

Considering that we only know that the user's utility function is in the set , is a reasonable decision because choosing minimizes the worst case loss with respect to any possible realizations of the utility function in the set . If is defined by a set of linear constraints, then as well as can be computed using a set of linear programs[6]. In the final step, Wang and Boutilier introduce the pairwise max regret for any pair of decisions and as:

.

The pairwise max regret can be computed with an LP (i.e., maximizing a linear function of the unknown outcome utilities subject to ). Solving such LPs, one for each ordered pair of actions, allows us to identify the decision that achieves minimax regret and to determine the minimax regret level .

Incremental Elicitation

Wang and Boutilier propose heuristics, known as myopic elicitation strategies, for picking which SGQ to ask at the next step. Their most successful strategy is the maximum expected improvement (MEI) strategy. MEI estimates the expected improvement of a query as

,

where is the resulting minimax regret if the user responds yes to and is similarly defined. The formula, basically, computes the difference of current minimax regret level, i.e. , and the expected minimax regret level after receiving the answer to , which is computed using .

Preference Elicitation for Risky Prospects by Paper 2[1]

Following the works of Wang and Boutilier, Hines and Larson first introduce to be the set of outcomes and redefine to be a finite subset of outcomes of . They introduce as a set of constraints for as follows:

.

where is the minimum possible value for and is similarly defined. Initially and for all . Hines and Larson redefine the set of all constraints, , to include all constraint to ensure monotonicity of . Monotonicity of is important because in CPT, the weight of a probability is also dependent on the rank of the respective outcome.

The set is the set of outcomes in for which we know the exact utilities. Initially, . For , it is convenient to define as the outcome with the utility and is defined in the same manner.

Queries

Hines and Larson define two types of queries: configuration queries and outcome queries. Configuration queries provide information about the user’s probability weighting function. Outcome queries obtain information about the user’s utility function. The preference elicitation process works by initially asking only configuration queries. After enough information has been gathered about the user’s probability weighting function, the elicitation system starts to ask only outcome queries. Only outcome queries can reduce the regret.

Configuration Queries

Configuration queries are used to solve

.

To find that makes the above equation hold, Hines and Larson introduce a query, which we will not go in details here. Each time the elicitation system makes the mentioned query, it's estimate of the , the solution to the above equation, gets more accurate. The authors claim that in their experiments repeating the query 10 times gave an accurate enough value for for their approach to always work.

Outcome Queries

Following Wang and Boutilier's work that introduced a method for the elicitation system to make a reasonable decision when a set is known to include the user's utility function, Hines and Larson introduce outcome queries to update the constraints in to make decisions with less maximum regret. The queries are designed so that the user’s probability weighting can be factored out of the user’s response.

For any two outcomes and in with known utilities, i.e., , such that , the outcome query asks the user to pick an outcome in such that

This indifference implies[1],

.

Since we know and , we can add into and update the constraints in as applicable. e.g., for the outcome ,, if , then we update to be .Finally, the new minimax regret is calculated. If the regret is below some desired threshold, the process is terminated.

Incremental Elicitation

Hines and Larson introduce expected minimax regret (EMR) as a heuristic to choose be best query to ask in the next step. Again by following Wang and Boutilier's work, the authors use the pairwise max regret (PMR) to decide which question is better to ask. The authors approximate PMR between decisions and as

.

Then, they show that for any two decisions and , the expected change to the PMR between those two decisions is[1]

.

For each possible query, the authors calculate the overall . This allows to estimate the expected PMR resulting from any query. Then, the authors choose the query which gives the lowest expected PMR.

Experiments by paper 2

First, Hines and Larson study the performance of previous preference elicitation models in a CPT setting by implementing Wang and Boutilier’s minimax regret model. The experiment includes a simple decision system with 4 possible outcomes. They simulate a user that has a specific utility function and a specific weighting function, both derived from commonly used utility functions and weighting functions in the literature. The elicitation process was run until the minimax regret was at most 0.01, at which point the minimax decision was selected and evaluated with the user's actual utility function; thus determining the actual regret of the decision. The results show that there is a high possibility that the decision made by the Wang and Boutilier’s minimax regret model for a user with a weighting function is has a high regret.

Secondly, the authors generated decision scenarios with 8 outcomes and 27 decisions. In the incremental elicitation, for each new decision, 50 candidate decisions were chosen uniformly at random. The first candidate decision to achieve a minimax regret of 0.5 with respect to all the decisions already picked was chosen as the next decision. They plot the minimax regret for different incremental elicitation methods including EMR and random decision selection as baseline. They show that with the same number of queries, the minimax regret for the EMR method is much less that the other baseline methods and previous methods in the literature.

My thoughts on the contributions

Based on Hines and Larson experiment on the performance of previous preference elicitation models in a CPT setting, I believe their model is actually useful for the CPT setting and the previous models cannot be used to make decisions with low minimax regret.

I believe there are a few problems with Hines and Larson's contribution. Firstly, I think that standard gamble queries are not a query type that can be used on a wide range of people. The amount of energy and contemplation needed to decide on a standard gambling question is quite high. Secondly, Hines and Larson's approach includes asking the user to pick an outcome in such that . My thought is that this is quite a difficult question to be asked from a user because the user needs to come up with new outcomes, which is a very difficult task. Therefore, it seems like these queries are not very useful in real world situations where the outcome space is not covered by ordered numbers. This brings me to another issue in Hines and Larson's paper which is the fact that the authors do not actually use this system in a real world environment or give their thoughts about where their decision system can be used. It seems like the authors just try to combine two different aspects of preference elicitation without any real world incentive.

Annotated Bibliography

  1. 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 Greg Hines and Kate Larson. 2010. Preference elicitation for risky prospects. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1 (AAMAS '10), Vol. 1. International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 889-896.
  2. 2.0 2.1 2.2 A. Tversky and D. Kahneman. Advances in prospecttheory: Cumulative representation of uncertainty. Journal of Risk and Uncertainty, 5(4):297–323, October 1992.
  3. 3.0 3.1 3.2 3.3 3.4 T. Wang and C. Boutilier. Incremental utility elicitation with the minimax regret decision criterion. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI-03), pages 309–318, Acapulco, Mexico, 2003
  4. U. Chajewwska, D. Koller, and R. Parr. Making rational decisions using adaptive utility elicitation. In Proceedings of the National Conference on Artificial Intelligence (AAAI), pages 363–369, Austin, TX, 2000.
  5. 5.0 5.1 C. Starmer. Developments in non-expected utility theory: The hunt for a descriptive theory of choice under risk. Journal of Economic Literature, 38:332–382, 2000.
  6. Craig Boutilier, Fahiem Bacchus, and Ronen I. Brafman.UCP-Networks: A directed graphical representationof conditional utilities. In Proceedings of the SeventeenthConference on Uncertainty in Artificial Intelligence,pages 56–64, Seattle, 2001.

To Add

Put links and content here to be added. This does not need to be organized, and will not be graded as part of the page. If you find something that might be useful for a page, feel free to put it here.


Some rights reserved
Permission is granted to copy, distribute and/or modify this document according to the terms in Creative Commons License, Attribution-NonCommercial-ShareAlike 3.0. The full text of this license may be found here: CC by-nc-sa 3.0
By-nc-sa-small-transparent.png