Counterfactual regret minimization is a technique used in imperfect-information game AI. In perfect-information games like chess and go, each player knows the entire state of the game at all times. In imperfect-information games, each player is missing information. Imperfect-information games are important since real-life strategic applications are often more easily modeled as imperfect-information games. Applications include security models, economic models, auctions, negotiations, and more.

However, imperfect-information games are tougher to solve than perfect-information games. In a perfect-information game, each subtree’s solution is independent of the rest. You can solve the game by traversing the game tree once. In an imperfect-information game, there are many dependencies. An imperfect-information game with the equivalent state-complexity as a perfect-information game will be more difficult to solve.

Poker is the classic example of an imperfect-information game. Most recent research on imperfect-information games has focused on poker. It’s the perfect proving ground. To humans, poker is so difficult to reason about that it is most often categorized as a game of complete luck. This is wrong. In fact, heads-up limit hold’em (HU LHE) has been solved [1]. In 2017, two AIs defeated professional poker players over a statistically significant sample for the first time in heads-up no-limit hold’em (HU NLHE). This was a significant event. The version of HU NLHE played in the ACPC has 10^{164} game states [2]. To put that into perspective, there are estimated to be 10^{82} atoms in the universe. If you had one trillion computers each crunching one trillion states per second for the next one trillion years you’d still have only reached 10^{43} states. We may never truly solve a game with a state-complexity of 10^{164}.

Instead, recent AIs create strategies with approximately near-zero exploitability. We assume that poker is a two-person zero-sum game where players alternate positions. Given these assumptions, this means that a near-zero exploitability strategy can do no worse than break-even. Even if an opponent can see your cards and see the future, this is still true. If you have a set of near-zero exploitability strategies this is called a Nash equilibrium.

One minor note is that a Nash equilibrium in one sense is not an optimal strategy in poker. This is because current poker AIs treat each hand as a separate game. This is similar to a poker cash game as opposed to a tournament. This form of poker does not have a binary outcome. Instead, a Nash equilibrium sets a lower-bound on how much you can lose. It is a defensive strategy. Two players following Nash equilibrium strategies would lose money over time since poker is not truly a zero-sum game. That being said, neural networks and deep reinforcement learning approaches have had limited success while Nash equilibrium approximations have been shown to perform very well in practice.

Counterfactual regret minimization (CFR) is an algorithm that approximates a Nash equilibrium. All modern poker AIs use a variant of CFR. The simplest variant and the one discussed here is Vanilla CFR. Intuitively, CFR works by repeatedly playing against itself while minimizing regret. Regret is a numeric value describing how much you regret a decision. It can be positive, negative, or zero. A positive regret indicates that you would have rather taken a different decision. A negative regret indicates that you are happy with your decision and a zero regret indicates that you are indifferent. CFR minimizes regret over many iterations until the average strategy over all iterations converges. The average strategy is the approximated Nash equilibrium. If the average overall regret of two players is , then the strategy can be exploited by at most [3].

Regret minimization algorithms have existed for a long time. However, regret minimization algorithms work best when the game can be represented as a matrix, also known as normal-form. Imperfect-information games like poker are not easily represented this way. And when they are, regret minimization algorithms are too slow. Poker is best represented as an imperfect-information extensive-form game. This means that the game is represented as a tree structure. Each node in the tree represents either a player’s decision, a chance event, or a terminal outcome. Edges represent actions taken.

CFR was created to minimize regret for extensive-form games. It does this by introducing a new value called counterfactual regret. The algorithm minimizes a separate counterfactual regret value for every information set, where an information set is the set of nodes in the game-tree that are indistinguishable for a given player. In poker, an information set for player contains all the nodes representing the different private card combinations an opponent can hold. Minimizing counterfactual regret for each information set works because it has been proven that the average overall regret is less than or equal to the sum of the counterfactual regrets [3]. Therefore, minimizing the counterfactual regret minimizes overall regret.

It’s easiest to see CFR in action by implementing CFR on a simpler form of poker. The canonical form of simple poker is Kuhn Poker. In this form, there are three cards. Each player is dealt a single card and antes 1 chip (an ante is a forced bet at the start of every hand). There is a single round of betting. Players can check, bet 1 chip, call, or fold. Below is an image of the complete game tree. The dotted lines connect nodes that are in the same information set.

The following implementation is written in Python 3. It is not written for performance. A practical CFR implementation is resource-intensive. For that reason, CFR implementations to be used on NLHE poker variants are often written in C++. Instead, this implementation is optimized for readability. We use the NumPy library to make vector operations easier to read, although it will make the algorithm slower in this case.

```
1import numpy as np
2
3# Number of actions a player can take at a decision node.
4_N_ACTIONS = 2
5_N_CARDS = 3
```

First, we import NumPy and define constants for the number of cards and number of possible actions. In our algorithm each player only has two possible actions: check and bet. This simplifies our code. This is possible because a fold can be treated as a check, and a call can be treated as a bet without loss of generality.

```
8def main():
9 """
10 Run iterations of counterfactual regret minimization algorithm.
11 """
12 i_map = {} # map of information sets
13 n_iterations = 10000
14 expected_game_value = 0
15
16 for _ in range(n_iterations):
17 expected_game_value += cfr(i_map)
18 for _, v in i_map.items():
19 v.next_strategy()
20
21 expected_game_value /= n_iterations
22
23 display_results(expected_game_value, i_map)
```

`main()`

is our entry point to the program. `i_map`

defined on line 12 is a dictionary, or hash map data structure, that contains information set objects keyed by a string. Each iteration of `cfr()`

returns the expected game value for that iteration. We are able to approximate the true game value by averaging over all iterations. The true game value is the expected amount that a player will win while following the average strategy. On line 19 we calculate the strategy for the next iteration for each information set.

```
26def cfr(i_map, history="", card_1=-1, card_2=-1, pr_1=1, pr_2=1, pr_c=1):
27 """
28 Counterfactual regret minimization algorithm.
29
30 Parameters
31 ----------
32
33 i_map: dict
34 Dictionary of all information sets.
35 history : [{'r', 'c', 'b'}], str
36 A string representation of the game tree path we have taken.
37 Each character of the string represents a single action:
38
39 'r': random chance action
40 'c': check action
41 'b': bet action
42 card_1 : (0, 2), int
43 player A's card
44 card_2 : (0, 2), int
45 player B's card
46 pr_1 : (0, 1.0), float
47 The probability that player A reaches `history`.
48 pr_2 : (0, 1.0), float
49 The probability that player B reaches `history`.
50 pr_c: (0, 1.0), float
51 The probability contribution of chance events to reach `history`.
52 """
53 if is_chance_node(history):
54 return chance_util(i_map)
55
56 if is_terminal(history):
57 return terminal_util(history, card_1, card_2)
58
59 n = len(history)
60 is_player_1 = n % 2 == 0
61 info_set = get_info_set(i_map, card_1 if is_player_1 else card_2, history)
62
63 strategy = info_set.strategy
64 if is_player_1:
65 info_set.reach_pr += pr_1
66 else:
67 info_set.reach_pr += pr_2
68
69 # Counterfactual utility per action.
70 action_utils = np.zeros(_N_ACTIONS)
71
72 for i, action in enumerate(["c", "b"]):
73 next_history = history + action
74 if is_player_1:
75 action_utils[i] = -1 * cfr(i_map, next_history,
76 card_1, card_2,
77 pr_1 * strategy[i], pr_2, pr_c)
78 else:
79 action_utils[i] = -1 * cfr(i_map, next_history,
80 card_1, card_2,
81 pr_1, pr_2 * strategy[i], pr_c)
82
83 # Utility of information set.
84 util = sum(action_utils * strategy)
85 regrets = action_utils - util
86 if is_player_1:
87 info_set.regret_sum += pr_2 * pr_c * regrets
88 else:
89 info_set.regret_sum += pr_1 * pr_c * regrets
90
91 return util
```

`cfr()`

contains the bulk of the logic. The function recursively performs a depth-first traverse across the game tree. In Vanilla CFR we traverse the entire game tree every iteration. `cfr()`

performs different tasks depending on if the node is a chance node, a terminal node, or a decision node. It takes seven parameters.

`i_map`

is the hash map of information sets.`history`

is a string that represents our current location in the game tree. Each character represents an action we have taken and an edge we have followed.- ‘r’ is a random chance event.
- ‘c’ is a check action.
- ‘b’ is a bet action.

`card_1`

is the private card of player one.`card_2`

is the private card of player two.`pr_1`

is player one’s contribution to the reach probability—the probability that we’ve reached`history`

.`pr_2`

is player two’s contribution to the reach probability.`pr_c`

is the contribution of chance events to the reach probability.

At line 53, we check if we are at a chance node with `is_chance_node()`

. If so, we call `chance_util()`

.

```
94def is_chance_node(history):
95 """
96 Determine if we are at a chance node based on tree history.
97 """
98 return history == ""
99
100
101def chance_util(i_map):
102 expected_value = 0
103 n_possibilities = 6
104 for i in range(_N_CARDS):
105 for j in range(_N_CARDS):
106 if i != j:
107 expected_value += cfr(i_map, "rr", i, j, 1, 1, 1/n_possibilities)
108 return expected_value/n_possibilities
```

`is_chance_node()`

determines if we are at a chance node by checking for an empty history. This works because in Kuhn Poker chance events only occur at the start of the game. If `is_chance_node()`

returns true then `chance_util()`

enumerates all possible combinations of chance nodes. There are six total possible combinations
. For each possibility, we recursively call `cfr()`

. The next history is `"rr"`

to represent two random chance actions. Neither player has taken any actions, so their reach probabilities are `1`

. Each chance event has a uniformly random probability of occurring of `1/n_possibilities`

. The expected value over all possibilities is just the sum of the utility of each possibility divided by the number of possibilities.

At line 56 we check if we have reached a terminal node of the game tree with `is_terminal()`

.

```
112def is_terminal(history):
113 """
114 Returns true if the history is a terminal history.
115 """
116 possibilities = {"rrcc": True, "rrcbc": True,
117 "rrcbb": True, "rrbc": True, "rrbb": True}
118 return history in possibilities
119
120
121def terminal_util(history, card_1, card_2):
122 """
123 Returns the utility of a terminal history.
124 """
125 n = len(history)
126 card_player = card_1 if n % 2 == 0 else card_2
127 card_opponent = card_2 if n % 2 == 0 else card_1
128
129 if history == "rrcbc" or history == "rrbc":
130 # Last player folded. The current player wins.
131 return 1
132 elif history == "rrcc":
133 # Showdown with no bets
134 return 1 if card_player > card_opponent else -1
135
136 # Showdown with 1 bet
137 assert(history == "rrcbb" or history == "rrbb")
138 return 2 if card_player > card_opponent else -2
```

`is_terminal()`

checks if `history`

is in the predefined set of known terminal histories. If `True`

, `terminal_util()`

returns the utility of the current player. Since players alternate turns, the current player is the player one if the number of actions in `history`

is even and player two otherwise. To calculate the utility of a terminal history there are three cases.

- If the last player folded, the current player wins 1 chip.
- If both players check during the betting round then players showdown with a pot of 2. The player with the higher card wins 1 chip.
- The players showdown with a pot of 4. The player with the higher card wins 2 chips.

If we reach line 59 we are in an information set. On line 60 we check who’s turn it is. On line 61 we retrieve the information set from `i_map`

using `get_info_set()`

.

```
141def card_str(card):
142 if card == 0:
143 return "J"
144 elif card == 1:
145 return "Q"
146 return "K"
147
148
149def get_info_set(i_map, card, history):
150 """
151 Retrieve information set from dictionary.
152 """
153 key = card_str(card) + " " + history
154 info_set = None
155
156 if key not in i_map:
157 info_set = InformationSet(key)
158 i_map[key] = info_set
159 return info_set
160
161 return i_map[key]
162
163
164class InformationSet():
165 def __init__(self, key):
166 self.key = key
167 self.regret_sum = np.zeros(_N_ACTIONS)
168 self.strategy_sum = np.zeros(_N_ACTIONS)
169 self.strategy = np.repeat(1/_N_ACTIONS, _N_ACTIONS)
170 self.reach_pr = 0
171
172 def next_strategy(self):
173 self.strategy_sum += self.reach_pr * self.strategy
174 self.strategy = self.calc_strategy(self.reach_pr)
175 self.reach_pr = 0
176
177 def calc_strategy(self, pr):
178 """
179 Calculate current strategy from the sum of regret.
180
181 ---
182 Parameters
183
184 pr: (0.0, 1.0), float
185 The probability that this information set has been reached.
186 """
187 strategy = self.make_positive(self.regret_sum)
188 total = sum(strategy)
189 if total > 0:
190 strategy = strategy / total
191 else:
192 n = _N_ACTIONS
193 strategy = np.repeat(1/n, n)
194
195 return strategy
196
197 def get_average_strategy(self):
198 """
199 Calculate average strategy over all iterations. This is the
200 Nash equilibrium strategy.
201 """
202 total = sum(self.strategy_sum)
203 if total > 0:
204 strategy = self.strategy_sum / total
205
206 # Purify
207 strategy = np.where(strategy < 0.001, 0, strategy)
208
209 # Re-normalize
210 total = sum(strategy)
211 strategy /= total
212
213 return strategy
214
215 n = _N_ACTIONS
216 return np.repeat(1/n, n)
217
218 def make_positive(self, x):
219 return np.where(x > 0, x, 0)
220
221 def __str__(self):
222 strategies = ['{:03.2f}'.format(x)
223 for x in self.get_average_strategy()]
224 return '{} {}'.format(self.key.ljust(6), strategies)
```

`i_map`

’s keys are strings created by the concatenation of a player’s private card and the game tree history. We represent information sets as an instance of the `InformationSet`

class. `get_info_set()`

generates the key and retrieves the information set. If the information set is not already in the map then we create a new instance and insert it.

In Vanilla CFR each information set is visited multiple times each iteration. `InformationSet`

stores `self.regret_sum`

, `self.strategy_sum`

, `self.strategy`

, and `self.reach_pr`

. `self.regret_sum`

, `self.strategy_sum`

, and `self.strategy`

are arrays indexed by an action. In our algorithm check is index `0`

and bet is index `1`

. `self.regret_sum`

is the sum of counterfactual regrets for each action over all visits. `self.strategy_sum`

is the sum of each visit’s strategy multiplied by the information set player’s reach probability. `self.regret_sum`

is used to calculate the next strategy to try. `self.strategy_sum`

is used to calculate the average strategy. `self.strategy`

is the strategy for the current iteration and `self.reach_pr`

accumulates the probability of reaching an information set. As more iterations are performed, average regret will minimize and the average strategy will converge to a Nash equilibrium.

At the end of every iteration the `self.next_strategy()`

updates `self.strategy_sum`

, `self.strategy`

, and resets `self.reach_pr`

. The next strategy is generated in the `calc_strategy()`

method. `calc_strategy()`

takes the current information set player’s reach probability as an argument. Intuitively, `calc_strategy()`

chooses a strategy proportional to the positive elements of `self.regret_sum`

. Any action with negative regret is ignored in the next strategy. If the sum of all elements of `self.regret_sum`

is non-positive then the next strategy is to uniformly choose any action at random.

On line 63 we retrieve the information set’s current strategy. This strategy was calculated at the end of the previous iteration. On lines 64-67 we add the player’s reach probabililty to the information set’s reach probability. The information set’s reach probability is the sum of the reach probabilities of all histories in the information set. From lines 72-81 we recursively call `cfr()`

for each possible action in the information set. We store the result of each call in an array of expected utilities. The history for the next recursive call is created by appending the next action to `history`

. Each player chooses action
with probability `strategy[i]`

. This changes the current player’s reach probability. We multiply the expected utility from `cfr()`

by `-1`

because `cfr()`

returns the utility for the next turn. In a zero-sum two-player game a player’s utility is the opposite of the other player.

On line 84 we calculate the expected utility over all actions. On line 85 we calculate the regret for each action by subtracting `util`

from each element of `action_utils`

. On lines 87 and 89 we calculate counterfactual regret. The counterfactual regret is added to the information set’s `self.regret_sum`

. Finally, we return the history’s expected utility.

After all iterations are finished `main()`

will call `display_results()`

.

```
227def display_results(ev, i_map):
228 print('player 1 expected value: {}'.format(ev))
229 print('player 2 expected value: {}'.format(-1 * ev))
230
231 print()
232 print('player 1 strategies:')
233 sorted_items = sorted(i_map.items(), key=lambda x: x[0])
234 for _, v in filter(lambda x: len(x[0]) % 2 == 0, sorted_items):
235 print(v)
236 print()
237 print('player 2 strategies:')
238 for _, v in filter(lambda x: len(x[0]) % 2 == 1, sorted_items):
239 print(v)
```

`display_results()`

prints each player’s information sets using the `__str__()`

method from the `InformationSet`

class. `__str__()`

prints the information set key followed by using `self.get_average_strategy()`

to print the average strategy. `self.average_strategy()`

calculates the average strategy from `self.strategy_sum`

and sets near-zero probabilities to zero.

`$ python cfr_vanilla.py`

Running the program for 10,000 iterations produces the output:

```
1player 1 expected value: -0.05666979368427769
2player 2 expected value: 0.05666979368427769
3
4player 1 strategies:
5J rr ['0.79', '0.21']
6J rrcb ['1.00', '0.00']
7K rr ['0.39', '0.61']
8K rrcb ['0.00', '1.00']
9Q rr ['1.00', '0.00']
10Q rrcb ['0.45', '0.55']
11
12player 2 strategies:
13J rrb ['1.00', '0.00']
14J rrc ['0.67', '0.33']
15K rrb ['0.00', '1.00']
16K rrc ['0.00', '1.00']
17Q rrb ['0.66', '0.34']
18Q rrc ['1.00', '0.00']
```

If you run the program for 100,000 iterations it will produce a slightly better approximation. Some CFR implementations have run for up to 6 million core hours [6]. As you approach an infinite number of iterations the overall average regret will approach zero and the average strategy will approach a Nash equilibrium.

```
1player 1 expected value: -0.055439677199254814
2player 2 expected value: 0.055439677199254814
3
4player 1 strategies:
5J rr ['0.79', '0.21']
6J rrcb ['1.00', '0.00']
7K rr ['0.38', '0.62']
8K rrcb ['0.00', '1.00']
9Q rr ['1.00', '0.00']
10Q rrcb ['0.46', '0.54']
11
12player 2 strategies:
13J rrb ['1.00', '0.00']
14J rrc ['0.66', '0.34']
15K rrb ['0.00', '1.00']
16K rrc ['0.00', '1.00']
17Q rrb ['0.67', '0.33']
18Q rrc ['1.00', '0.00']
```

The game value of player one in Kuhn Poker is . After 100,000 iterations the algorithm approximates the game value to be -0.055439677199254814.

The strategies indicate that if player one has a J her first move should be to check 79% of the time and bet 21% of the time. If player two has a K and player one checked, he should bet 100% of the time. This is just one Nash equilibrium. Kuhn Poker has infinitely many. Vanilla CFR is a deterministic algorithm, but many variants are non-deterministic. A non-deterministic CFR may find a different Nash equilibrium.

That’s Vanilla CFR. The algorithm is relatively simple. I hope that this post has helped you to understand the basics of the algorithm.

### Resources

- [1] Heads-up Limit Hold’em Poker is Solved
- [2] Measuring the Size of Large No-Limit Poker Games
- [3] Regret Minimization in Games with Incomplete Information
- [4] Effecient Nash Equilibrium Approximation through Monte Carlo Counterfactual Regret Minimization
- [5] An Introduction to Counterfactual Regret Minimization
- [6] Libratus: The Superhuman AI for No-Limit Poker
- [7] Strategy Purification and Thresholding: Effective Non-Equilibrium Approaches for Playing Large Games

### Code

```
1import numpy as np
2
3# Number of actions a player can take at a decision node.
4_N_ACTIONS = 2
5_N_CARDS = 3
6
7
8def main():
9 """
10 Run iterations of counterfactual regret minimization algorithm.
11 """
12 i_map = {} # map of information sets
13 n_iterations = 100000
14 expected_game_value = 0
15
16 for _ in range(n_iterations):
17 expected_game_value += cfr(i_map)
18 for _, v in i_map.items():
19 v.next_strategy()
20
21 expected_game_value /= n_iterations
22
23 display_results(expected_game_value, i_map)
24
25
26def cfr(i_map, history="", card_1=-1, card_2=-1, pr_1=1, pr_2=1, pr_c=1):
27 """
28 Counterfactual regret minimization algorithm.
29
30 Parameters
31 ----------
32
33 i_map: dict
34 Dictionary of all information sets.
35 history : [{'r', 'c', 'b'}], str
36 A string representation of the game tree path we have taken.
37 Each character of the string represents a single action:
38
39 'r': random chance action
40 'c': check action
41 'b': bet action
42 card_1 : (0, 2), int
43 player A's card
44 card_2 : (0, 2), int
45 player B's card
46 pr_1 : (0, 1.0), float
47 The probability that player A reaches `history`.
48 pr_2 : (0, 1.0), float
49 The probability that player B reaches `history`.
50 pr_c: (0, 1.0), float
51 The probability contribution of chance events to reach `history`.
52 """
53 if is_chance_node(history):
54 return chance_util(i_map)
55
56 if is_terminal(history):
57 return terminal_util(history, card_1, card_2)
58
59 n = len(history)
60 is_player_1 = n % 2 == 0
61 info_set = get_info_set(i_map, card_1 if is_player_1 else card_2, history)
62
63 strategy = info_set.strategy
64 if is_player_1:
65 info_set.reach_pr += pr_1
66 else:
67 info_set.reach_pr += pr_2
68
69 # Counterfactual utility per action.
70 action_utils = np.zeros(_N_ACTIONS)
71
72 for i, action in enumerate(["c", "b"]):
73 next_history = history + action
74 if is_player_1:
75 action_utils[i] = -1 * cfr(i_map, next_history,
76 card_1, card_2,
77 pr_1 * strategy[i], pr_2, pr_c)
78 else:
79 action_utils[i] = -1 * cfr(i_map, next_history,
80 card_1, card_2,
81 pr_1, pr_2 * strategy[i], pr_c)
82
83 # Utility of information set.
84 util = sum(action_utils * strategy)
85 regrets = action_utils - util
86 if is_player_1:
87 info_set.regret_sum += pr_2 * pr_c * regrets
88 else:
89 info_set.regret_sum += pr_1 * pr_c * regrets
90
91 return util
92
93
94def is_chance_node(history):
95 """
96 Determine if we are at a chance node based on tree history.
97 """
98 return history == ""
99
100
101def chance_util(i_map):
102 expected_value = 0
103 n_possibilities = 6
104 for i in range(_N_CARDS):
105 for j in range(_N_CARDS):
106 if i != j:
107 expected_value += cfr(i_map, "rr", i, j,
108 1, 1, 1/n_possibilities)
109 return expected_value/n_possibilities
110
111
112def is_terminal(history):
113 """
114 Returns true if the history is a terminal history.
115 """
116 possibilities = {"rrcc": True, "rrcbc": True,
117 "rrcbb": True, "rrbc": True, "rrbb": True}
118 return history in possibilities
119
120
121def terminal_util(history, card_1, card_2):
122 """
123 Returns the utility of a terminal history.
124 """
125 n = len(history)
126 card_player = card_1 if n % 2 == 0 else card_2
127 card_opponent = card_2 if n % 2 == 0 else card_1
128
129 if history == "rrcbc" or history == "rrbc":
130 # Last player folded. The current player wins.
131 return 1
132 elif history == "rrcc":
133 # Showdown with no bets
134 return 1 if card_player > card_opponent else -1
135
136 # Showdown with 1 bet
137 assert(history == "rrcbb" or history == "rrbb")
138 return 2 if card_player > card_opponent else -2
139
140
141def card_str(card):
142 if card == 0:
143 return "J"
144 elif card == 1:
145 return "Q"
146 return "K"
147
148
149def get_info_set(i_map, card, history):
150 """
151 Retrieve information set from dictionary.
152 """
153 key = card_str(card) + " " + history
154 info_set = None
155
156 if key not in i_map:
157 info_set = InformationSet(key)
158 i_map[key] = info_set
159 return info_set
160
161 return i_map[key]
162
163
164class InformationSet():
165 def __init__(self, key):
166 self.key = key
167 self.regret_sum = np.zeros(_N_ACTIONS)
168 self.strategy_sum = np.zeros(_N_ACTIONS)
169 self.strategy = np.repeat(1/_N_ACTIONS, _N_ACTIONS)
170 self.reach_pr = 0
171
172 def next_strategy(self):
173 self.strategy_sum += self.reach_pr * self.strategy
174 self.strategy = self.calc_strategy(self.reach_pr)
175 self.reach_pr = 0
176
177 def calc_strategy(self, pr):
178 """
179 Calculate current strategy from the sum of regret.
180
181 ---
182 Parameters
183
184 pr: (0.0, 1.0), float
185 The probability that this information set has been reached.
186 """
187 strategy = self.make_positive(self.regret_sum)
188 total = sum(strategy)
189 if total > 0:
190 strategy = strategy / total
191 else:
192 n = _N_ACTIONS
193 strategy = np.repeat(1/n, n)
194
195 return strategy
196
197 def get_average_strategy(self):
198 """
199 Calculate average strategy over all iterations. This is the
200 Nash equilibrium strategy.
201 """
202 total = sum(self.strategy_sum)
203 if total > 0:
204 strategy = self.strategy_sum / total
205
206 # Purify
207 strategy = np.where(strategy < 0.001, 0, strategy)
208
209 # Re-normalize
210 total = sum(strategy)
211 strategy /= total
212
213 return strategy
214
215 n = _N_ACTIONS
216 return np.repeat(1/n, n)
217
218 def make_positive(self, x):
219 return np.where(x > 0, x, 0)
220
221 def __str__(self):
222 strategies = ['{:03.2f}'.format(x)
223 for x in self.get_average_strategy()]
224 return '{} {}'.format(self.key.ljust(6), strategies)
225
226
227def display_results(ev, i_map):
228 print('player 1 expected value: {}'.format(ev))
229 print('player 2 expected value: {}'.format(-1 * ev))
230
231 print()
232 print('player 1 strategies:')
233 sorted_items = sorted(i_map.items(), key=lambda x: x[0])
234 for _, v in filter(lambda x: len(x[0]) % 2 == 0, sorted_items):
235 print(v)
236 print()
237 print('player 2 strategies:')
238 for _, v in filter(lambda x: len(x[0]) % 2 == 1, sorted_items):
239 print(v)
240
241
242if __name__ == "__main__":
243 main()
```