Game Playing Using MiniMax with Alpha-Beta Cutoff
Perfect Information Games
- We are considering two-player, perfect information games.
- The two players take turns and try respectively to maximize and
minimize an Evaluation function (also called utility function).
- The two players are called respectively MAX and MIN.
We assume that the MAX player makes the first move.
Evaluation function
- An Evaluation function is used to evaluate the "goodness" of a
configuration of the game.
- Unlike in A* search where the
evaluation function was a non-negative estimate of the cost from
the start node to a goal and passing through the given node, here
the evaluation function
estimates board quality in leading to a win for one player.
- Instead of modeling the two players separately, the fact that we
don't have, in general, any information
about how our opponent plays, means we'll use a single evaluation
function to describe the goodness of a board with respect to BOTH
players.
- That is, f(n) = large positive value means the board associated
with node n is good for MAX and bad for MIN.
- f(n) = large negative value means the board is bad for MAX
and good for MIN.
- f(n) near 0 means the board is a neutral position.
- f(n) = +infinity means a winning position for MAX.
- f(n) = -infinity means a winning position for MIN.
- Example of an Evaluation Function for Tic-Tac-Toe:
f(n) = [number of 3-lengths open for MAX] - [number of 3-lengths
open for MIN]
where a 3-length is a complete row, column, or diagonal.
- Most evaluation functions are specified as a weighted sum of
"features:" (w1 * feat1) + (w2 * feat2) + ... + (wn * featn).
- For example,
in chess some features evaluate piece placement on the board and
other features describe configurations of several pieces.
- Deep Blue
has about 6000 features in its evaluation function.
MiniMax
- The game as represented as a tree where the nodes represent the
current position and the arcs represent moves.
- Since players take turns, successive nodes represent positions where
different players must move.
- We call the nodes MAX or MIN nodes depending
of who is the player that must move at that node.
- A game tree could be infinite.
- The leaves represent terminal positions, i.e. positions where
MAX wins (score: +infinity) or MIN wins (score: -infinity).
- The ply of a node is the number of moves
needed to reach that node (i.e. arcs from the root of the tree).
- The ply of a tree is the maximum of the plies of its nodes.
Imperfect Information Game
- An imperfect information game is one where we do
not expand fully the game tree and some of the leaves represent
positions that could be analyzed further.
- For these positions, in place of a utility function value, we will have a
scoring function value. Otherwise the game trees for perfect and imperfect
information games will be treated alike.
MINIMAX Game Strategy
- The MINIMAX GAME STRATEGY for the MAX (MIN) player is to select the move
that leads to the successor node with the highest (lowest) score.
- The scores
are computed starting from the leaves of the tree and backing up their scores
to their predecessor in accordance with the Minimax strategy.
- The problem with this strategy is that it explores each node in the tree.
Function MINIMAX
function MINIMAX(N) is
begin
if N is a leaf then
return the estimated score of this leaf
else
Let N1, N2, .., Nm be the successors of N;
if N is a Min node then
return min{MINIMAX(N1), .., MINIMAX(Nm)}
else
return max{MINIMAX(N1), .., MINIMAX(Nm)}
end MINIMAX;
ALPHA-BETA Pruning
ALPHA-BETA cutoff is a method for reducing the number of nodes
explored in the Minimax strategy. For the nodes it explores it
computes, in addition to the score, an alpha value and a
beta value.
ALPHA value of a node
- It is a value never greater than the true score of this node.
- Initially it is the score of that node, if the node is a leaf, otherwise
it is -infinity.
- Then at a MAX node it is set to the largest of the scores
of its successors explored up to now, and at a MIN node to
the alpha value of its predecessor.
BETA value of a node
- It is a value never smaller than the true score of this node.
Initially it is the score of that node, if the node is a leaf, otherwise
it is +infinity.
- Then at a MIN node it is set to the smallest of the scores
of its successors explored up to now, and at a MAX node to
the beta value of its predecessor.
It Is Guaranteed That:
- The score of a node will always be no less than the alpha value
and no greater than the beta value of that node.
- As the algorithm evolves, the alpha and beta values of a node may change,
but the alpha value will never decrease, and the beta value will
never increase.
- When a node is visited last, its score is set to the alpha value
of that node, if it is a MAX node, otherwise it is set to the beta
value.
Function MINIMAX-AB
function MINIMAX-AB(N, A, B) is ;; Here A is always less than B
begin
if N is a leaf then
return the estimated score of this leaf
else
Set Alpha value of N to -infinity and
Beta value of N to +infinity;
if N is a Min node then
For each successor Ni of N loop
Let Val be MINIMAX-AB(Ni, A, Min{B,Beta of N});
Set Beta value of N to Min{Beta value of N, Val};
When A >= Beta value of N then
Return Beta value of N endloop;
Return Beta value of N;
else
For each successor Ni of N loop
Let Val be MINIMAX-AB(Ni, Max{A,Alpha value of N}, B);
Set Alpha value of N to Max{Alpha value of N, Val};
When Alpha value of N >= B then
Return Alpha value of N endloop;
Return Alpha value of N;
end MINIMAX-AB;
Starting the Game
- At the start of the game, the MINIMAX-AB function is called with
the following parameters:
- the root of the game tree
- -infinity (-I) as alpha value
- +infinity (+I) as beta value
An Example of MiniMax with Alpha Beta Cutoff
In the game tree that you see, the root is a Max node. The scores
of the leaf nodes are presented immediately below them.
Trace of the Execution
Here is a trace of the execution of the Minimax strategy with Alpha
Beta Cutoff
NODE TYPE A B ALPHA BETA SCORE
A Max -I +I -I +I
B Min -I +I -I +I
C Max -I +I -I +I
D Min -I +I -I +I
E Max -I +I 10
D Min -I +I -I 10
F Max -I 10 11
D Min -I +I -I 10 10
C Max -I +I 10 +I
G Min 10 +I -I +I
H Max 10 +I 9
G Min 10 +I -I 9 9
C Max -I +I 10 +I 10
B Min -I +I -I 10
J Max -I 10 -I +I
K Min -I 10 -I +I
L Max -I 10 14
K Min -I 10 -I 14
M Max -I 10 15
K Min -I 10 -I 14 14
J Max -I 10 14 +I 14
B Min -I +I -I 10 10
A Max -I +I 10 +I
Q Min 10 +I -I +I
R Max 10 +I -I +I
S Min 10 +I -I +I
T Max 10 +I 15
S Min 10 +I -I 15
V Max 10 +I 2
S Min 10 +I -I 2 2
R Max 10 +I 2 +I
Y Min 10 +I -I +I
W Max 10 +I 4
Y Min 10 +I -I 4 4
R Max 10 +I 4 +I 4
Q Min 10 +I -I 4 4
A Max -I +I 10 4 10
Efficiency of the Alpha-Beta Procedure
- Alpha-Beta is guaranteed to compute the same minimax value for
the root node as computed by Minimax
- In the worst case Alpha-Beta does NO pruning, examining
b^d leaf nodes, where each node has b children
and a d-ply search is performed
- The efficiency of the Alpha-Beta procedure depends on the order in
which successors of a node are examined.
- If we were lucky, at a MIN node we would
always consider the nodes in order from low to high score and at a MAX node
the nodes in order from high to low score.
- For example, a "lucky" game tree (binary, of depth 4, complete) has
leaf scores, from left
to right, 4,3,8,7,2,1,6,5. Of these leaves only 5 will be examined.
- In the best case, Alpha-Beta will examine only (2b)^(d/2) leaf
nodes.
- In general it can be shown that in the most favorable
circumstances the alpha-beta search opens as many leaves as minimax on
a game tree with double its depth.
- In the chess program Deep Blue, they found empirically that
Alpha-Beta pruning meant that the average branching factor at each
node was about 6 instead of about 35-40
Games that Involve Chance
- Minimax and alpha-beta must be modified when we deal with games
that involve chance.
- For example, in backgammon the moves of each
player take place after a throw of the dices.
- One modifies the game tree to add after each normal node chance
nodes to represent the outcomes of the throw
- The player chooses moves from these chance nodes.
- The score of the chance nodes is computed as usual.
- The score of the original node is the weighted sum of the scores of its
successor chance nodes weighted by their probabilities.
Cutting off Search
- So far we have assumed a fixed depth d where the search is stopped
and the static evaluation function is applied. But there are variations
on this that are important to note:
- Don't stop at non-quiescent nodes.
- Iterative Deepening is frequently used with Alpha-Beta
to allow searches to successively deeper plies if there is time
Non-Quiescent Nodes
- If a node represents a state in the middle of an exchange of
pieces, then the node is not quiescent and therefore the evaluation
function may not give a reliable estimate of board quality.
- A definition for chess: "a state is
non-quiescent if any piece is attacked by one of lower value, or
by more pieces than defenses, or if any check exists on a square
controlled by the opponent."
- In this case, expand more nodes and
only apply the evaluation function at quiescent nodes.
- The identification of non-quiescent nodes partially deals
with the horizon effect.
Horizon Effect
- A negative horizon is where the state
seen by the evaluation function is evaluated as better than it really
is because an undesirable effect is just beyond this node (i.e., the
search horizon).
- A positive horizon is where the evaluation function
wrongly underestimates the value of a state when positive actions just
over the search horizon indicate otherwise.
Iterative Deepening
- Iterative Deepening is frequently used with Alpha-Beta
so that searches to successively deeper plies can be attempted if
there is time
- The move selected is the one computed by the deepest search
completed when the time limit is reached.