Connectionist models

Connectionism

If we want to create accurate models of how cognitive processes function, we need to be able to explain how the processes and representations in these models can be implemented in the brain.

Given that, why not construct models based upon the properties of neurons? The goal of connectionism is to build models of cognitive processing out of neuron-like units.

Connectionism’s Stone Age

The Perceptron: A simplified version of today’s neural networks.

Perceptrons were Boolean units, meaning that they were either on or off. Units were connected by either inhibitory or excitatory connections. If a unit received any inhibitory inputs then it was off, otherwise it turned on if the sum of its excitatory inputs exceeded its threshold.

Pandemonium

The Dark Ages

Minsky & Papert: In 1969, Perceptrons, a book criticizing network models was published. It contained three primary critiques:

Perceptron networks can only use feed-forward learning, meaning that they cannot learn problems that are not linearly separable, such as XOR.
Even if one concedes that the first critique can be addressed by adding extra layers to the network, Minsky & Papert asked whether or not the size of networks could just be continually increased to handle more difficult problems. In other words, they questioned how well networks would scale.

The Third Critique

Perhaps most damningly, Minsky & Papert pointed out that because of the nature of early neural networks, they were essentially unable to be applied to any problem other than those of learning mere associations between inputs and outputs. They were effectively unable to model any processes between the inputs and the outputs.

Sound familiar?

Coming so soon after behaviorism had been debunked, this spelled bad news for network models.

The Renaissance

Parallel Distributed Processing (PDP):

Developed by Rumelhart & McClelland to address the Minsky & Papert critiques. PDP networks differ from perceptrons in two important ways:

1. Units have continuous levels of activation between —1 and 1.

2. Inhibitory inputs do not automatically turn a unit off. Rather, they contribute a negative value of activation to the unit’s total activation.

Early PDP models

Jets and Sharks:

This model was developed to show how a PDP network could function as a piece of LTM holding information that is known about various members of the Jets and Sharks gangs.

In a sense, this model is basically an implementation of spreading activation.

Interactive Activation (IAC) model:

Developed to account for the word superiority effect. This was the first model to use interactive connections as an analogue for top-down processes.

Neither of these models involves any learning….

Learning in PDP networks

Feed-forward networks:

Pattern associators - Simple, two layer feed-forward networks that learn to associate one set of inputs with another set.

While more powerful than perceptron networks, pattern associators suffer from many of the same drawbacks, largely because of the feed-forward learning functions.

Hebbian learning and the Delta rule

There are two main feed-forward learning algorithms that are used:

Hebbian learning: The weight of the connection between two units should be changed as a function of the product of their activations.

D w_ij = e a_ia_j

Delta rule: The weight of the connection between two units should be changed as a function of the difference between the target activation of the output unit and its actual activation, and the activation of the input unit on the other side of the connection.

D w_ij = e (t_i — a_i)a_j

Why simple feed-forward learning doesn’t work

Using a feed-forward learning algorithm, there is nothing you can do in a network with more than two layers that cannot be done with a network of only two layers.

In other words, feed-forward learning algorithms cannot be used to solve difficult problems that would require more than two layers in the network.

In particular, take a problem like trying to design a pattern associator to determine the exclusive OR of two inputs.

Back-propagation

The back-propogation learning algorithm was developed to be a non-linear learning system, allowing for the development of multi-layer networks that could solve linearly non-separable problems.

There are two stages to back-propagation learning:

1. Feed-forward: The first stage is a feed forward stage where activation propagates through the network.

2. Backward pass: In this stage, you start with the output units and compare their activation to the target output and adjust the weights of the connections leading into the output units accordingly.