Game Playing Using MiniMax with Alpha-Beta Cutoff

Perfect Information Games

We are considering two-player, perfect information games.
The two players take turns and try respectively to maximize and minimize an Evaluation function (also called utility function).
The two players are called respectively MAX and MIN. We assume that the MAX player makes the first move.

Evaluation function

An Evaluation function is used to evaluate the "goodness" of a configuration of the game.
Unlike in A* search where the evaluation function was a non-negative estimate of the cost from the start node to a goal and passing through the given node, here the evaluation function estimates board quality in leading to a win for one player.
Instead of modeling the two players separately, the fact that we don't have, in general, any information about how our opponent plays, means we'll use a single evaluation function to describe the goodness of a board with respect to BOTH players.
That is, f(n) = large positive value means the board associated with node n is good for MAX and bad for MIN.
f(n) = large negative value means the board is bad for MAX and good for MIN.
f(n) near 0 means the board is a neutral position.
f(n) = +infinity means a winning position for MAX.
f(n) = -infinity means a winning position for MIN.
Example of an Evaluation Function for Tic-Tac-Toe:
f(n) = [number of 3-lengths open for MAX] - [number of 3-lengths open for MIN]
where a 3-length is a complete row, column, or diagonal.
Most evaluation functions are specified as a weighted sum of "features:" (w1 * feat1) + (w2 * feat2) + ... + (wn * featn).
- For example, in chess some features evaluate piece placement on the board and other features describe configurations of several pieces.
- Deep Blue has about 6000 features in its evaluation function.

MiniMax

The game as represented as a tree where the nodes represent the current position and the arcs represent moves.
Since players take turns, successive nodes represent positions where different players must move.
We call the nodes MAX or MIN nodes depending of who is the player that must move at that node.
A game tree could be infinite.
The leaves represent terminal positions, i.e. positions where MAX wins (score: +infinity) or MIN wins (score: -infinity).
The ply of a node is the number of moves needed to reach that node (i.e. arcs from the root of the tree).
The ply of a tree is the maximum of the plies of its nodes.

Imperfect Information Game

An imperfect information game is one where we do not expand fully the game tree and some of the leaves represent positions that could be analyzed further.
For these positions, in place of a utility function value, we will have a scoring function value. Otherwise the game trees for perfect and imperfect information games will be treated alike.

MINIMAX Game Strategy

The MINIMAX GAME STRATEGY for the MAX (MIN) player is to select the move that leads to the successor node with the highest (lowest) score.
The scores are computed starting from the leaves of the tree and backing up their scores to their predecessor in accordance with the Minimax strategy.
The problem with this strategy is that it explores each node in the tree.

Function MINIMAX

	function MINIMAX(N) is
	begin
	   if N is a leaf then
	        return the estimated score of this leaf
	   else
	        Let N1, N2, .., Nm be the successors of N;
	        if N is a Min node then
		    return min{MINIMAX(N1), .., MINIMAX(Nm)}
	        else
		    return max{MINIMAX(N1), .., MINIMAX(Nm)}
	end MINIMAX;

ALPHA-BETA Pruning

ALPHA-BETA cutoff is a method for reducing the number of nodes explored in the Minimax strategy. For the nodes it explores it computes, in addition to the score, an alpha value and a beta value.

ALPHA value of a node

It is a value never greater than the true score of this node.
Initially it is the score of that node, if the node is a leaf, otherwise it is -infinity.
Then at a MAX node it is set to the largest of the scores of its successors explored up to now, and at a MIN node to the alpha value of its predecessor.

BETA value of a node

It is a value never smaller than the true score of this node. Initially it is the score of that node, if the node is a leaf, otherwise it is +infinity.
Then at a MIN node it is set to the smallest of the scores of its successors explored up to now, and at a MAX node to the beta value of its predecessor.

It Is Guaranteed That:

The score of a node will always be no less than the alpha value and no greater than the beta value of that node.
As the algorithm evolves, the alpha and beta values of a node may change, but the alpha value will never decrease, and the beta value will never increase.
When a node is visited last, its score is set to the alpha value of that node, if it is a MAX node, otherwise it is set to the beta value.

Function MINIMAX-AB

	function MINIMAX-AB(N, A, B) is ;; Here A is always less than B
	begin
	   if N is a leaf then
	        return the estimated score of this leaf
	   else
		Set Alpha value of N to -infinity and 
                    Beta value of N to +infinity;
	        if N is a Min node then
	            For each successor Ni of N loop
		       Let Val be MINIMAX-AB(Ni, A, Min{B,Beta of N});
		       Set Beta value of N to Min{Beta value of N, Val};
		       When A >= Beta value of N then 
			   Return Beta value of N endloop;
                    Return Beta value of N;
	        else
	            For each successor Ni of N loop
		       Let Val be MINIMAX-AB(Ni, Max{A,Alpha value of N}, B);
		       Set Alpha value of N to Max{Alpha value of N, Val};
		       When Alpha value of N >= B then 
			  Return Alpha value of N endloop;
		    Return Alpha value of N;
	end MINIMAX-AB;

Starting the Game

At the start of the game, the MINIMAX-AB function is called with the following parameters:
- the root of the game tree
- -infinity (-I) as alpha value
- +infinity (+I) as beta value

An Example of MiniMax with Alpha Beta Cutoff

In the game tree that you see, the root is a Max node. The scores of the leaf nodes are presented immediately below them.

Trace of the Execution

Here is a trace of the execution of the Minimax strategy with Alpha Beta Cutoff

	NODE	TYPE	 A	B	ALPHA	BETA	SCORE

	A	Max	-I	+I	-I	+I
	B	Min	-I	+I	-I	+I
	C	Max	-I	+I	-I	+I
	D	Min	-I	+I	-I	+I
	E	Max	-I	+I			10
	D	Min	-I	+I	-I	10
	F	Max	-I	10			11
	D	Min	-I	+I	-I	10	10
	C	Max	-I	+I	10	+I
	G	Min	10	+I	-I	+I
	H	Max	10	+I			9
	G	Min	10	+I	-I	9	9
	C	Max	-I	+I	10	+I	10
	B	Min	-I	+I	-I	10
	J	Max	-I	10	-I	+I
	K	Min	-I	10	-I	+I
	L	Max	-I	10			14
	K	Min	-I	10	-I	14
	M	Max	-I	10			15
	K	Min	-I	10	-I	14	14
	J	Max	-I	10	14	+I	14
	B	Min	-I	+I	-I	10	10
	A	Max	-I	+I	10	+I
	Q	Min	10	+I	-I	+I
	R	Max	10	+I	-I	+I
	S	Min	10	+I	-I	+I
	T	Max	10	+I			15
	S	Min	10	+I	-I	15
	V	Max	10	+I			2
	S	Min	10	+I	-I	2	2
	R	Max	10	+I	2	+I
	Y	Min	10	+I	-I	+I
	W	Max	10	+I			4
	Y	Min	10	+I	-I	4	4
	R	Max	10	+I	4	+I	4
	Q	Min	10	+I	-I	4	4
	A	Max	-I	+I	10	4	10

Efficiency of the Alpha-Beta Procedure

Alpha-Beta is guaranteed to compute the same minimax value for the root node as computed by Minimax
In the worst case Alpha-Beta does NO pruning, examining b^d leaf nodes, where each node has b children and a d-ply search is performed
The efficiency of the Alpha-Beta procedure depends on the order in which successors of a node are examined.
If we were lucky, at a MIN node we would always consider the nodes in order from low to high score and at a MAX node the nodes in order from high to low score.
For example, a "lucky" game tree (binary, of depth 4, complete) has leaf scores, from left to right, 4,3,8,7,2,1,6,5. Of these leaves only 5 will be examined.
In the best case, Alpha-Beta will examine only (2b)^(d/2) leaf nodes.
In general it can be shown that in the most favorable circumstances the alpha-beta search opens as many leaves as minimax on a game tree with double its depth.
In the chess program Deep Blue, they found empirically that Alpha-Beta pruning meant that the average branching factor at each node was about 6 instead of about 35-40

Games that Involve Chance

Minimax and alpha-beta must be modified when we deal with games that involve chance.
- For example, in backgammon the moves of each player take place after a throw of the dices.
One modifies the game tree to add after each normal node chance nodes to represent the outcomes of the throw
The player chooses moves from these chance nodes.
The score of the chance nodes is computed as usual.
The score of the original node is the weighted sum of the scores of its successor chance nodes weighted by their probabilities.

Cutting off Search

So far we have assumed a fixed depth d where the search is stopped and the static evaluation function is applied. But there are variations on this that are important to note:
Don't stop at non-quiescent nodes.
Iterative Deepening is frequently used with Alpha-Beta to allow searches to successively deeper plies if there is time

Non-Quiescent Nodes

If a node represents a state in the middle of an exchange of pieces, then the node is not quiescent and therefore the evaluation function may not give a reliable estimate of board quality.
A definition for chess: "a state is non-quiescent if any piece is attacked by one of lower value, or by more pieces than defenses, or if any check exists on a square controlled by the opponent."
In this case, expand more nodes and only apply the evaluation function at quiescent nodes.
The identification of non-quiescent nodes partially deals with the horizon effect.

Horizon Effect

A negative horizon is where the state seen by the evaluation function is evaluated as better than it really is because an undesirable effect is just beyond this node (i.e., the search horizon).
A positive horizon is where the evaluation function wrongly underestimates the value of a state when positive actions just over the search horizon indicate otherwise.

Iterative Deepening

Iterative Deepening is frequently used with Alpha-Beta so that searches to successively deeper plies can be attempted if there is time
The move selected is the one computed by the deepest search completed when the time limit is reached.