PROBABILISTIC INFERENCE USING MARKOV CHAIN MONTE CARLO METHODS

                         Radford M. Neal
                Department of Computer Science 
                     University of Toronto 

                       25 September 1993

Probabilistic inference is an attractive approach to uncertain
reasoning and empirical learning in artificial intelligence.
Computational difficulties arise, however, because probabilistic
models with the necessary realism and flexibility lead to 
complex distributions over high-dimensional spaces.

Related problems in other fields have been tackled using Monte 
Carlo methods based on sampling using Markov chains, providing 
a rich array of techniques that can be applied to problems in
artificial intelligence.  The "Metropolis algorithm" has been 
used to solve difficult problems in statistical physics for over 
forty years, and, in the last few years, the related method of 
"Gibbs sampling" has been applied to problems of statistical 
inference.  Concurrently, an alternative method for solving 
problems in statistical physics by means of dynamical simulation 
has been developed as well, and has recently been unified with 
the Metropolis algorithm to produce the "hybrid Monte Carlo" 
method.  In computer science, Markov chain sampling is the basis 
of the heuristic optimization technique of "simulated 
annealing", and has recently been used in randomized algorithms 
for approximate counting of large sets.

In this review, I outline the role of probabilistic inference in
artificial intelligence, present the theory of Markov chains, 
and describe various Markov chain Monte Carlo algorithms, along 
with a number of supporting techniques. I try to present a 
comprehensive picture of the range of methods that have been 
developed, including techniques from the varied literature that 
have not yet seen wide application in artificial intelligence, 
but which appear relevant.  As illustrative examples, I use the 
problems of probabilistic inference in expert systems, discovery 
of latent classes from data, and Bayesian learning for neural 
networks.