Table of Contents
- 1 micmac ASDF System Details
- 2 Introduction
- 3 Graph Search
- 4 Metropolis Hastings
- 5 Game Theory
[in package MICMAC]
- Version: 0.0.2
- Description: Micmac is mainly a library of graph search algorithms such as alpha-beta, UCT and beam search, but it also has some MCMC and other slightly unrelated stuff.
- Licence: MIT, see COPYING.
- Author: Gábor Melis
- Mailto: firstname.lastname@example.org
- Homepage: http://quotenil.com
Alpha-beta pruning for two player, zero-sum maximax (like minimax but both players maximize and the score is negated when passed between depths). Return the score of the game
STATEfrom the point of view of the player to move at
DEPTHand as the second value the list of actions of the principal variant.
CALL-WITH-ACTIONis a function of (
ACTIONFN). It carries out
NIL) to get the state corresponding to
DEPTHand calls FN with that state. It may destructively modify
STATEprovided it undoes the damage after FN returns.
CALL-WITH-ACTIONis called with
ACTIONfor the root of the tree, in this case
STATEneed not be changed. FN returns the same kinds of values as
ALPHA-BETA. They may be useful for logging.
MAYBE-EVALUATE-STATEis a function of (
DEPTHis a terminal node then it returns the score from the point of view of the player to move and as the second value a list of actions that lead from
STATEto the position that was evaluated. The list of actions is typically empty. If we are not at a terminal node then
LIST-ACTIONSis a function of (
DEPTH) and returns a non-empty list of legal candidate moves for non-terminal nodes. Actions are tried in the order
LIST-ACTIONSreturns them: stronger moves
RECORD-BEST, if non-NIL, is a function of (
ACTIONS). It is called when at
DEPTHa new best action is found.
ACTIONSis a list of all the actions in the principle variant corresonding to the newly found best score.
RECORD-BESTis useful for graceful degradation in case of timeout.
NIL(equivalent to -infinity, +infinity) but any real number is allowed if the range of scores can be boxed.
test/test-alpha-beta.lispfor an example.
In a graph, search for nodes that with the best scores with beam search. That is, starting from
START-NODESperform a breadth-first search but at each depth only keep
BEAM-WIDTHnumber of nodes with the best scores. Keep the best
N-SOLUTIONS(at most) complete solutions. Discard nodes known to be unable to get into the best
UPPER-BOUND-FN). Finally, return the solutions and the active nodes (the beam) as adjustable arrays sorted by score in descending order.
START-NODES(a sequence of elements of arbitrary type).
FINISHEDP-FNare all functions of one argument: the node.
SOLUTIONP-FNchecks whether a node represents a complete solution (i.e. some goal is reached).
SCORE-FNreturns a real number that's to be maximized, it's only called for node for which
NIL) returns a real number that equal or greater than the score of all solutions reachable from that node.
FINISHEDP-FNreturns true iff there is nowhere to go from the node.
EXPAND-NODE-FNis also a function of a single node argument. It returns a sequence of nodes to 'one step away' from its argument node.
EXPAND-BEAM-FNis similar, but it takes a vector of nodes and returns all nodes one step away from any of them. It's enough provide either
EXPAND-BEAM-FN. The purpose of
EXPAND-BEAM-FN. is to allow more efficient, batch-like operations.
test/test-beam-search.lispfor an example.
This is very much like
BEAM-SEARCHexcept it solves a number of instances of the same search problem starting from different sets of nodes. The sole purpose of
PARALLEL-BEAM-SEARCHis to amortize the cost
EXPAND-BEAMS-FNis called with sequence of beams (i.e. it's a sequence of sequence of nodes) and it must return another sequence of sequences of nodes. Each element of the returned sequence is the reachable nodes of the nodes in the corresponding element of its argument sequence.
PARALLEL-BEAM-SEARCHreturns a sequence of solutions sequences, and a sequence of active node sequences.
test/test-beam-search.lispfor an example.
[in package MICMAC.UCT]
UCT Monte Carlo tree search. This is what makes current Go programs
tick. And Hex programs as well, for that matter. This is a cleanup
and generalization of code originally created in course of the
Google AI Challenge 2010.
For now, the documentation is just a reference. See
test/test-uct.lisp for an example.
A node in the
UCTtree. Roughly translates to a state in the search space. Note that the state itself is not stored explicity, but it can be recovered by `replaying' the actions from the starting state or by customizing
Average reward over random playouts started from below this node. See
An edge in the
UCTtree. Represents an action taken from a state. The value of an action is the value of its target state which is not quite as generic as it could be; feel free to specialize
AVERAGE-REWARDfor the edges if that's not the case.
The node this edge points to if the edge has been visited or
Choose an action to take from a state, in other words an edge to follow from
NODEin the tree. The default implementation chooses randomly from the yet unvisited edges or if there is none moves down the edge with the maximum
EDGE-SCORE. If you are thinking of customizing this, for example to make it choose the minimum at odd depths, the you may want to consider specializing REWARD or
Compute the reward for a node in the tree from
OUTCOMEthat is the result of a playout. This is called by the default implementation of
UPDATE-UCT-STATISTICS. This is where one typically negates depending on the parity of
DEPTHin two player games.
Increment the number of visits and update the average reward in nodes and edges of
PATH. By default, edges simply get their visit counter incremented while nodes also get an update to
AVERAGE-REWARDbased on what
Create a node representing the state of that
EDGEleads to from
PARENT. Specialize this if you want to keep track of the state which is not done by default as it can be expensive, especially in light of TAKE-ACTION mutating it. The default implementation simply creates an instance of the class of
PARENTso that one can start from a subclass of
UCT-NODEand be sure that that class is going to be used for nodes below it.
Return the state that corresponds to
NODE. This is not a straightforward accessor unless
NODEis customized to store it. The rest of the parameters are provided so that one can reconstruct the state by taking the action of
PARENT. It's okay to destroy
PARENT-STATEin the process as long as it's not stored elsewhere. This function must be customized.
Return a list of edges representing the possible actions from
STATE. This function must be customized.
Play a random game from
STATEand return the outcome that's fed into
UPDATE-UCT-STATISTICS. The way the random game is played is referred to as `default policy' and that's what makes or breaks
UCTsearch. This function must be customized.
Starting from the
ROOTnode search the tree expanding it one node for each playout. Finally return the mutated
ROOTmay be the root node of any tree, need not be a single node with no edges.
FRESH-ROOT-STATEis a function that returns a fresh state corresponding to
ROOT. This state will be destroyed unless special care is taken in
[in package MICMAC.METROPOLIS-HASTINGS]
Generic interface for the Metropolis-Hastings algorithm, also Metropolis Coupled MCMC.
Markov Chain Monte Carlo and Gibbs Sampling Lecture Notes for EEB 581, version 26 April 2004 c B. Walsh 2004 http://web.mit.edu/~wingated/www/introductions/mcmc-gibbs-intro.pdf
Geyer, C.J. (1991) Markov chain Monte Carlo maximum likelihood
For now, the documentation is just a reference. See
test/test-metropolis-hastings.lisp for an example.
A simple markov chain for Metropolis Hastings. With temperature it is suitable for
The PROBABILITY-RATIO of samples is raised to the power of 1 /
TEMPERATUREbefore calculating the acceptance probability. This effectively flattens the peaks if
TEMPERATURE> 1 which makes it easier for the chain to traverse deep valleys.
From the current state of
JUMP(from the current distribution of
CHAIN) and return the sample where we landed. Reuse
Prepare for sampling from the F(X) = Q(SAMPLE->X) distribution. Called by
RANDOM-JUMP. The around method ensures that nothing is done unless there was a state change.
Sample a jump from the current distribution of jumps that was computed by
SAMPLE2). It's in the log domain to avoid overflows and the ratio part is because that it may allow computational shortcuts as opposed to calculating unnormalized probabilities separately.
Return Q(TARGET->STATE) / Q(STATE->TARGET) where Q is the jump distribution and
JUMPis from the current
Calculate the acceptance probability of
JUMPleads from the current
CANDIDATE. It does nothing by default, it's just a convenience for debugging.
Randomly accept or reject
CANDIDATEfrom the current state of
Take a step on the markov chain. Return a boolean indicating whether the proposed jump was accepted.
High probability island separated by low valley make the chain poorly mixing.
MC3-CHAINhas a number of
HOT-CHAINSthat have state probabilities similar to that of the main chain but less jagged. Often it suffices to set the temperatures of the
HOT-CHAINShigher use the very same base probability distribution.
Swap the states of
Called when the swap of states of
CHAIN2is rejected. It does nothing by default, it's just a convenience for debugging.
Swap of states of
Choose two chains randomly and swap their states with
A simple abstract chain subclass that explicitly enumerates the probabilities of the distribution.
[in package MICMAC.GAME-THEORY]
Find a Nash equilibrium of a zero-sum game represented by
PAYOFFmatrix (a 2d matrix or a nested list).
PAYOFFis from the point of view of the row player: the player who choses column wants to minimize, the row player wants to maximize. The first value returned is a vector of unnormalized probabilities assigned to each action of the row player, the second value is the same for the column player and the third is the expected payoff of the row player in the nash equilibrium represented by the oddment vectors.