Micmac Manual
Table of Contents
[in package MICMAC]
1 micmac
ASDF System
 Version: 0.0.2
 Description: Micmac is mainly a library of graph search algorithms such as alphabeta, UCT and beam search, but it also has some MCMC and other slightly unrelated stuff.
 Licence: MIT, see COPYING.
 Author: Gábor Melis mailto:mega@retes.hu
 Mailto: mega@retes.hu
 Homepage: http://melisgl.github.io/mglgpr
 Bug tracker: https://github.com/melisgl/mglgpr/issues
 Source control: GIT
2 Introduction
2.1 Overview
MICMAC is a Common Lisp library by Gábor Melis focusing on graph search algorithms.
2.2 Links
Here is the official repository and the HTML documentation for the latest version.
3 Graph Search
[function] alphabeta state &key (depth 0) alpha beta callwithaction maybeevaluatestate listactions recordbest
Alphabeta pruning for two player, zerosum maximax (like minimax but both players maximize and the score is negated when passed between depths). Return the score of the game
state
from the point of view of the player to move atdepth
and as the second value the list of actions of the principal variant.callwithaction
is a function of (state
depth
action
FN). It carries outaction
(returned bylistactions
ornil
) to get the state corresponding todepth
and calls FN with that state. It may destructively modifystate
provided it undoes the damage after FN returns.callwithaction
is called withnil
asaction
for the root of the tree, in this casestate
need not be changed. FN returns the same kinds of values asalphabeta
. They may be useful for logging.maybeevaluatestate
is a function of (state
depth
). Ifstate
atdepth
is a terminal node then it returns the score from the point of view of the player to move and as the second value a list of actions that lead fromstate
to the position that was evaluated. The list of actions is typically empty. If we are not at a terminal node thenmaybeevaluatestate
returnsnil
.listactions
is a function of (state
depth
) and returns a nonempty list of legal candidate moves for nonterminal nodes. Actions are tried in the orderlistactions
returns them: stronger movescallwithaction
,maybeevaluatestate
andlistactions
are mandatory.recordbest
, if nonNIL, is a function of (depth
score
actions
). It is called when atdepth
a new best action is found.actions
is a list of all the actions in the principle variant corresonding to the newly found best score.recordbest
is useful for graceful degradation in case of timeout.alpha
andbeta
are typicallynil
(equivalent to infinity, +infinity) but any real number is allowed if the range of scores can be boxed.See
test/testalphabeta.lisp
for an example.
[function] beamsearch startnodes &key maxdepth (nsolutions 1) (beamwidth (length startnodes)) expandnodefn expandbeamfn scorefn upperboundfn solutionpfn (finishedpfn solutionpfn)
In a graph, search for nodes that with the best scores with beam search. That is, starting from
startnodes
perform a breadthfirst search but at each depth only keepbeamwidth
number of nodes with the best scores. Keep the bestnsolutions
(at most) complete solutions. Discard nodes known to be unable to get into the bestnsolutions
(due toupperboundfn
). Finally, return the solutions and the active nodes (the beam) as adjustable arrays sorted by score in descending order.startnodes
(a sequence of elements of arbitrary type).scorefn
,upperboundfn
,solutionpfn
,finishedpfn
are all functions of one argument: the node.solutionpfn
checks whether a node represents a complete solution (i.e. some goal is reached).scorefn
returns a real number that's to be maximized, it's only called for node for whichsolutionpfn
returned true.upperboundfn
(if notnil
) returns a real number that equal or greater than the score of all solutions reachable from that node.finishedpfn
returns true iff there is nowhere to go from the node.expandnodefn
is also a function of a single node argument. It returns a sequence of nodes to 'one step away' from its argument node.expandbeamfn
is similar, but it takes a vector of nodes and returns all nodes one step away from any of them. It's enough provide eitherexpandnodefn
orexpandbeamfn
. The purpose ofexpandbeamfn
. is to allow more efficient, batchlike operations.See
test/testbeamsearch.lisp
for an example.
[function] parallelbeamsearch startnodeseqs &key maxdepth (nsolutions 1) beamwidth expandnodefn expandbeamsfn scorefn upperboundfn solutionpfn (finishedpfn solutionpfn)
This is very much like
beamsearch
except it solves a number of instances of the same search problem starting from different sets of nodes. The sole purpose ofparallelbeamsearch
is to amortize the costexpandbeamfn
if possible.expandbeamsfn
is called with sequence of beams (i.e. it's a sequence of sequence of nodes) and it must return another sequence of sequences of nodes. Each element of the returned sequence is the reachable nodes of the nodes in the corresponding element of its argument sequence.parallelbeamsearch
returns a sequence of solutions sequences, and a sequence of active node sequences.See
test/testbeamsearch.lisp
for an example.
3.1 UCT
[in package MICMAC.UCT]
uct
Monte Carlo tree search. This is what makes current Go programs
tick. And Hex programs as well, for that matter. This is a cleanup
and generalization of code originally created in course of the
Google AI Challenge 2010.
For now, the documentation is just a reference. See
test/testuct.lisp
for an example.

A node in the
uct
tree. Roughly translates to a state in the search space. Note that the state itself is not stored explicity, but it can be recovered by `replaying' the actions from the starting state or by customizingmakeuctnode
.
[accessor] edges uctnode
Outgoing edges.
[accessor] averagereward uctnode (:averagereward = 0)
Average reward over random playouts started from below this node. See
updateuctstatistics
and REWARD.

An edge in the
uct
tree. Represents an action taken from a state. The value of an action is the value of its target state which is not quite as generic as it could be; feel free to specializeaveragereward
for the edges if that's not the case.
[accessor] fromnode uctedge (:fromnode)
The node this edge starts from.
[accessor] tonode uctedge (= nil)
The node this edge points to if the edge has been visited or
nil
.
 [function] visitededges node
 [genericfunction] edgescore node edge explorationbias
[genericfunction] selectedge node explorationbias
Choose an action to take from a state, in other words an edge to follow from
node
in the tree. The default implementation chooses randomly from the yet unvisited edges or if there is none moves down the edge with the maximumedgescore
. If you are thinking of customizing this, for example to make it choose the minimum at odd depths, the you may want to consider specializing REWARD orupdateuctstatistics
instead.
[genericfunction] outcome>reward node outcome
Compute the reward for a node in the tree from
outcome
that is the result of a playout. This is called by the default implementation ofupdateuctstatistics
. This is where one typically negates depending on the parity ofdepth
in two player games.
[genericfunction] updateuctstatistics root path outcome
Increment the number of visits and update the average reward in nodes and edges of
path
. By default, edges simply get their visit counter incremented while nodes also get an update toaveragereward
based on whatoutcome>reward
returns.
[genericfunction] makeuctnode parent edge parentstate
Create a node representing the state that
edge
leads to (fromparent
). Specialize this if you want to keep track of the state, which is not done by default as it can be expensive, especially in light of TAKEACTION mutating it. The default implementation simply creates an instance of the class ofparent
so that one can start from a subclass ofuctnode
and be sure that that class is going to be used for nodes below it.
[genericfunction] state node parent edge parentstate
Return the state that corresponds to
node
. This is not a straightforward accessor unlessnode
is customized to store it. The rest of the parameters are provided so that one can reconstruct the state by taking the action ofedge
in theparentstate
ofparent
. It's allowed to mutateparentstate
and return it. This function must be specialized.
[genericfunction] listedges node state
Return a list of edges representing the possible actions from
node
withstate
. This function must be customized.
[genericfunction] playout node state reversepath
Play a random game from
node
withstate
and return the outcome that's fed intoupdateuctstatistics
. The way the random game is played is referred to as `default policy' and that's what makes or breaksuct
search. This function must be customized.
[function] uct &key root freshrootstate explorationbias maxnplayouts
Starting from the
root
node, search the tree expanding it one node for each playout. Finally return the mutatedroot
.root
may be the root node of any tree, need not be a single node with no edges.freshrootstate
is a function that returns a fresh state corresponding toroot
. This state will be destroyed unless special care is taken instate
.
4 Metropolis Hastings
[in package MICMAC.METROPOLISHASTINGS with nicknames MICMAC.MH]
Generic interface for the MetropolisHastings algorithm, also Metropolis Coupled MCMC.
References:
http://en.wikipedia.org/wiki/Metropolis–Hastings_algorithm
Markov Chain Monte Carlo and Gibbs Sampling Lecture Notes for EEB 581, version 26 April 2004 c B. Walsh 2004 http://web.mit.edu/~wingated/www/introductions/mcmcgibbsintro.pdf
Geyer, C.J. (1991) Markov chain Monte Carlo maximum likelihood
For now, the documentation is just a reference. See
test/testmetropolishastings.lisp
for an example.

A simple markov chain for Metropolis Hastings. With temperature it is suitable for
mc3
.
[accessor] temperature mcchain (:temperature = 1.0d0)
The PROBABILITYRATIO of samples is raised to the power of 1 /
temperature
before calculating the acceptance probability. This effectively flattens the peaks iftemperature
> 1 which makes it easier for the chain to traverse deep valleys.
[function] jumptosample chain jump &key (resultsample (state chain))
From the current state of
chain
makejump
(from the current distribution ofchain
) and return the sample where we landed. Reuseresultsample
when possible.
[genericfunction] jumptosample* chain jump resultsample
This function is called by
jumptosample
. It is wherejumptosample
behaviour shall be customized.
[genericfunction] preparejumpdistribution chain
Prepare for sampling from the F(X) = Q(SAMPLE>X) distribution. Called by
randomjump
. The around method ensures that nothing is done unless there was a state change.
[genericfunction] randomjump chain
Sample a jump from the current distribution of jumps that was computed by
preparejumpdistribution
.
[genericfunction] logprobabilityratio chain sample1 sample2
Return P(
sample1
)/P(sample2
). It's in the log domain to avoid overflows and the ratio part is because that it may allow computational shortcuts as opposed to calculating unnormalized probabilities separately.
[genericfunction] logprobabilityratiotojumptarget chain jump target
Return P(
target
)/P(state
) wherejump
is from the current state ofchain
totarget
sample. This can be specialized for speed. The default implementation just falls back onlogprobabilityratio
.
[genericfunction] logjumpprobabilityratio chain jump target
Return Q(TARGET>STATE) / Q(STATE>TARGET) where Q is the jump distribution and
jump
is from the currentstate
ofchain
totarget
sample.
[genericfunction] acceptanceprobability chain jump candidate
Calculate the acceptance probability of
candidate
to whichjump
leads from the currentstate
ofchain
.
[genericfunction] acceptjump chain jump candidate
Called when
chain
acceptsjump
tocandidate
.
[genericfunction] rejectjump chain jump candidate
Called when
chain
rejectsjump
tocandidate
. It does nothing by default, it's just a convenience for debugging.
[genericfunction] maybejump chain jump candidate acceptanceprobability
Randomly accept or reject
jump
tocandidate
from the current state ofchain
withacceptanceprobability
.
[genericfunction] jump chain
Take a step on the markov chain. Return a boolean indicating whether the proposed jump was accepted.

High probability island separated by low valley make the chain poorly mixing.
mc3chain
has a number ofhotchains
that have state probabilities similar to that of the main chain but less jagged. Often it suffices to set the temperatures of thehotchains
higher use the very same base probability distribution.
[genericfunction] acceptswapchainstates mc3 chain1 chain2
Swap the states of
chain1
andchain2
ofmc3
.
[genericfunction] rejectswapchainstates mc3 chain1 chain2
Called when the swap of states of
chain1
andchain2
is rejected. It does nothing by default, it's just a convenience for debugging.
[genericfunction] maybeswapchainstates mc3 chain1 chain2 acceptanceprobability
Swap of states of
chain1
andchain2
ofmc3
withacceptanceprobability
.
[genericfunction] jumpbetweenchains mc3
Choose two chains randomly and swap their states with
mc3
acceptance probability.
[class] enumeratingchain mcchain
A simple abstract chain subclass that explicitly enumerates the probabilities of the distribution.

Mix this in with your chain to have it print trace of acceptances/rejections.
5 Game Theory
[in package MICMAC.GAMETHEORY]
[function] findnashequilibrium payoff &key (niterations 100)
Find a Nash equilibrium of a zerosum game represented by
payoff
matrix (a 2d matrix or a nested list).payoff
is from the point of view of the row player: the player who choses column wants to minimize, the row player wants to maximize. The first value returned is a vector of unnormalized probabilities assigned to each action of the row player, the second value is the same for the column player and the third is the expected payoff of the row player in the nash equilibrium represented by the oddment vectors.