Neural Networks for Machine Learning
Lecture 12a
The Boltzmann Machine learning algorithm Geoffrey Hinton
Nitish Srivastava,
Kevin Swersky
Tijmen Tieleman
Abdel-rahman Mohamed
The goal of learning
• We want to maximize the
product of the probabilities that the Boltzmann machine
assigns to the binary vectors in the training set.
– This is equivalent to
maximizing the sum of the
log probabilities that the
Boltzmann machine
assigns to the training
vectors. • It is also equivalent to
maximizing the probability that we would obtain exactly the N
training cases if we did the
following
四川商情网
– Let the network settle to its stationary distribution N
different times with no
external input.
铝矿石– Sample the visible vector once each time.
w2 w3 w4 Why the learning could be difficult
Consider a chain of units with visible units at the ends
蒸发皿I f the training set consists of (1,0) and (0,1) we want the product of
all the weights to be negative.
So to know how to change w1 or w5 we must know w3.
hidden visible w1 w5
A very surprising fact
• Everything that one weight needs to know about the other weights and the data is contained in the difference of two correlations.
∂log p (v )
∂w ij
=s i s
j
v
−s i s j
model
Derivative of log probability of one training vector, v under the model.
Expected value of product of states at thermal equilibrium when v is clamped on the visible units
Expected value of product of states at thermal equilibrium
with no clamping
Δw ij ∝s i s j
data
−s i s j
model
化维纤胶囊Why is the derivative so simple?
−∂E ∂w ij
=s i s j • The energy is a linear function of the weights and states, so:
• The process of settling to
thermal equilibrium propagates information about the weights. – We don’t need backprop.
• The probability of a global configuration at thermal
equilibrium is an exponential function of its energy.
– So settling to equilibrium makes the log probability a linear function of the energy.
如梦令赏析