WTA-Boltzmann

 Preliminary Draft-please send comments to James Bonaiuto

A series of simulations were run to demonstrate that the WTA mechanism in ACQ approximates the behavior of a Boltzmann machine. Each simulation tested a different value for \sigma_{PP}^2, the variance of the noise in the parallel planning layer. In each simulation, 1000 vectors of size N (set to 10) were generated with random elements between 0.0 and 1.0. To assess the probability of each element of each vector, x_{i}, being selected by the ACQ mechanism, x was input to the network and the network was run until a winner was found. This was done 100 times for each x and the normalized value of x_{i}\frac{x_{i}}{\sum_{j=1}^{N}x_{j}, and the percentage of the time the element was selected as the winner by the network were recorded. In a Boltzmann machine the probability of the i^{th} element of a set x being on is given by the sigmoid p(x_{i})=\frac{1}{1+e^{-\triangle E_{i}/T}}, where T is the temperature of the system, and \triangle E_{i} is the change in energy of the system given that unit i is turned on. Here, the change in energy corresponds to the competitive weight of the element x_{i}, which in this case is given by the value x_{i} in the vector x.

a.Boltzman_pp_0_ratio b.Boltzman_pp_0.01_ratio c.Boltzman_pp_0.02_ratio
 d.Boltzman_pp_0.03_ratio  e.Boltzman_pp_0.04_ratio  f.Boltzman_pp_0.05_ratio
 g.Boltzman_pp_0.06_ratio  h.Boltzman_pp_0.07_ratio  i.Boltzman_pp_0.08_ratio
 j.Boltzman_pp_0.09_ratio  k.Boltzman_pp_0.1_ratio  l.Boltzman_pp_0.2_ratio
 m.Boltzman_pp_0.3_ratio  n.Boltzman_pp_0.4_ratio  o.Boltzman_pp_0.5_ratio
 p.Boltzman_pp_0.6_ratio  q.Boltzman_pp_0.7_ratio  r.Boltzman_pp_0.8_ratio
 s.Boltzman_pp_0.9_ratio  t.Boltzman_pp_1_ratio  

The probability of an element of a randomly generated vector being chosen by ACQ’s mechanism as a function of the normalized value of the element. Panels a-j show trials where \sigma_{PP}^{2}=[0.0:0.01:0.09]. Panels k-t show trials where \sigma_{PP}^{2}=[0.1:0.1:1.0].

Temp_sigma_pp

The fitted parameter T representing the temperature of the system, as a function of the standard deviation of the noise in the parallel planning layer activity, .

For each trial, a sigmoid function \frac{1}{1+e^{-\frac{x-\beta}{T}}} with parameters β and T was fitted to the data. The β parameter determines the position of the function, while Tdetermines its width. In the context of the Boltzmann equation however, T is referred to as temperature. Panels a-t in the previous figure show how the data points and fitted sigmoid functions vary as the variance of the noise in the parallel planning layer is changed from 0.0 to 0.09 in increments of 0.01 (a-j) and from 0.1 to 1.0 in increments of 0.1 (k-t). It appears that the data is well represented by a sigmoid and that the steepness of the function decreases as \sigma_{PP}^{2}increases (the width, or temperature, increases). The one exception is a) which shows an entirely deterministic pattern of selection when the variance of the noise is at 0.0. To derive the relationship between the temperature of the approximated Boltzmann machine and the noise in the parallel planning layer of ACQ, the fitted parameter T was compared for all tested values of the standard deviation of the noise, \sigma_{PP}. A linear function was fit to the data, yielding the following relationship T=0.0271\sigma_{PP}+0.0183.

It should be noted that the approximation to a Boltzmann machine is a network property and not a direct result of the activation function of the neurons in ACQ, which use the saturation function Θ(x) rather than a sigmoidal activation function. However, an important difference between Boltzmann machines and the WTA network is that Boltzmann machines may take a long time to converge. The WTA network predicts contrast-dependent latency in convergence, but the longest mean latency with \sigma_{PP}^{2} was only 35ms, which in these simulations corresponded to 35 discretized time steps.