# Game Theory: Story of Thinking (Part 2)

## Pay-off Function, Stochastic Outcomes and Decision Making

# Introduction

In this blog, we will discuss about thinking, which is inevitable process before any decision making. All decision making problems involves player, alternatives to choose, consequences of the outcome and preferences of those consequences.

*Actions*: Alternatives from which player can choose {a,b}*Outcomes*: consequences from player’s actions {x,y}*Preferences*: how the player ranks the set of possible outcomes

## Example !

Let’s say you have two choices for your desert: milkshake and ice cream. actions as A = {milkshake, ice cream}, we will denote the set of outcomes by X = {x, y}, where x denotes drinking milk shake and y denotes eating ice cream.

For example you prefer drinking milkshake to eating ice cream. Then we will write x >∼ y, which should be read as “x is at least as good as y.” Term “**>∼”** is called preference relation and please note that ~ must not be confused with negation.

Here the actions are simple and easy to solve but in real world we will have complex stochastic or continuous choices. We will discuss more about it in the coming sections.

# Assumptions

we will make two important assumptions about the player’s ability to think through the this decision problem.

- Given two outcomes, player should prefer one over other, so that either x >∼ y or y >∼ x. This is called “Completeness Axiom”
*.* - The preference relation >∼ is transitive: for any three outcomes x, y, z ∈ X, if x > ∼ y and y >∼ z then x >∼ z. It simply means if you prefer
*ice cream over milkshake*and*milkshake over donut*then you prefer*ice cream over donut*. This is called “Transitivity Axiom”.

Player to definitely prefer one outcome given all possible outcomes. This way we can ensure that player behaves consistently.

# Payoff Function

We saw how we represent “preference” over outcomes with notation >∼. But imagine having thousands of actions and outcomes. we can’t represent preferences for thousands of outcomes. So here comes a need for pay-off function, it quantifies the outcomes and serves as a perfect proxy for preferences.

If A is the actions set, every action a ∈ A yields a profit π(a). Then we can just look at the profit from each action and choose an action that maximizes profits.

*Payoff Function* **V**:X→R represents the preference relation >∼ if for any pair x, y ∈ X, **V**(x) ≥ **V**(y) if and only if x >∼ y.

**Example !**

Let’s discuss an example with continuous action space. Let’s take how much cake to eat as a decision problem, There is a one-kg cake, so your action set is A = [0, 1], where a ∈ A is how much you cake you can eat. Your preferences are represented by the following payoff function over actions:

you must maximize your payoff, in order to do that take a differential and equate it to zero. we obtain

2 − 8a = 0 and a = 0.25.

implies that in-order to maximize your payoff, you must eat 250 grams .

# Stochastic Outcomes

Not always the outcomes of the actions taken by the player are certain, these outcomes can be random ( stochastic).

We can reasonably guess that probabilities help us in this context, so that player can compare uncertain consequences in a meaningful way. We can utilize a decision tree to describe the player’s decision problem that includes uncertainty.

## Example !

Take a decision problem:

Let’s say player have two choices ‘g’ and ‘s’, So Action set: {g,s}. If player chooses **action ‘g’ **then player gets 10 units payoff with probability 0.75 and 0 units with probability 0.25. The probabilities are 0.5:0.5 if player chooses action s to get payoff 10:0.

A simple and intuitive decision tree is built:

From the above example we can completely define a decision problem with a simple decision tree.

We will understand how to take decision in stochastic/uncertain outcomes scenario. Before that let’s get introduced to similar but more complex decision problem as shown below.

The randomness unfolds over time, for a given action, the distribution of payoff can change with time. This decision tree depicts exactly the same thing.

## Continuous Stochastic Outcomes

We discussed a cake example in the above example where the action set is continuous. If we combine stochasticity with continuous actions, then the decision trees depiction becomes redundant.

We must take help from statistics concepts like *random variables*, *PDF* (probability distribution Function) and *CDF*(Continuous Distribution Function). We will discuss more about this in the next section.

# Decision Making

If the outcomes are certain, then the decision making is simple. lets consider another decision problem which involves random outcomes.

Intuitively, it seems that the two probabilities that follow g and s are easy to compare. Both have the same set of outcomes, a profit of 10 or a profit of 0. The choice g has a higher chance at getting the profit of 10, and hence we would expect rational player to choose g.

Now the comparison is obvious. But not all the time, we must quantify the payoff that can be expected from an out-come. So lets introduce the concept of expected payoff

*Definition*: Let u(x) be the player’s payoff function over outcomes in X = {x₁, x₂, . . . , xn }, and let P= (p₁, p₂, . . . , pn) be a lottery over X such that Pⱼ = Pr{x = xⱼ}. Then we define the player’s expected payoff from the lottery P as E[u(x)|p]= ⅀pⱼ.u(xⱼ) = p₁.u(x₁) + p₂.u(x₂) + . . .

## Example 1

Using above definition, if we try to solve the above decision problem

By choosing ‘**g**’, the expected payoff is

v(g) = E[u(x)|g]= 0.75 *(9) + 0.25*(−1) = 6.5

By choosing ‘**s**’, the expected payoff is

v(s) = E[u(x)|s]= 0.5*(10) + 0.5*(0) = 5.

The expected payoff from ‘**s**’ is still 5, while the expected payoff from ‘**g**’ is 6.5, so that ‘**g**’ is his best choice.

## Continuous Case

Let’s continue this decision evaluation case in continuous action space by using the introduced topic of cumulative distribution function (CDF) in the next section.

Definition Let *u(x)* be the player’s payoff function over outcomes in the interval *X* with a lottery given by the cumulative distribution *F(x)*, with density *f(x)*. Then we define the player’s expected payoff:

Also we must keep in mind that the density function *f(x)* is just a derivative for *CDF F(x).*

Player who understands the stochastic consequences of each of his actions, will choose an action that offers him the highest expected payoff.

# MBA vs No-MBA Example

Let’s illustrate an another example of maximizing expected payoff with a finite set of actions and outcomes, Imagine that you have been working for a company and you are taking a decision whether or not join MBA. MBA fees and coaching costs you 10L(opportunity costs included)

if labor marker is strong and economy is bullish . your income value from having an MBA is 32L, while your income value from your current job is 12L.

if labor marker is average and economy is monotonous . your income value from having an MBA is 16L, while your income value from your current job is 8L. if labor marker is weak and economy is bearish. your income value from having an MBA is 12L, while your income value from your current job is 4L. lets assume the labor market will be strong with probability 0.25, average with probability 0.5, and weak with probability 0.25.

*The decision is : Should you pursue the MBA?*

Lets illustrate this decision problem in decision tree:

Note that, if player chooses to pursue MBA: then we subtract the cost of the degree from the income benefit in each of the three states of nature.

Let’s calculate the expected payoff from each action. By evaluating the expected payoff values we can tell which outcome is more preferred.

By looking at the expected payoff values, the rational player would choose to pursue MBA.

Till now, a single player is involved in decision problem. In the next blogs in this series, we will discuss the multi player scenario.

Thanks for your time