Generation 5: Artificial Intelligence Repository - Perceptrons

#### Perceptrons

This essay is the next up from "Introduction to Neural Networks" - we will look at the perceptrons (both single and the concept of multi-types) and the mathematics behind them. Before I start the main body of the essay, a point needs to be cleared up. The term 'perceptron' is probably one of the most widely used terms in neural networking. There is not one single definition of a perceptron, nor do all 'perceptron-based' programs use the same algorithms.

##### Single Perceptron
The structure of a single perceptron is very simple. There are two inputs, a bias, and an output. A simple schematic diagram is shown at the right - the two circles at the bottom are the inputs, the black square is the bias, and the circle at the top is the output. Note that both the inputs and outputs of a perceptron are binary - that is they can only be 0 or 1.

Each of the inputs and the bias is connected to the main perceptron by a weight. A weight is generally an real number between 0 and 1. When the input number is fed into the perceptron, it is multiplied by the corresponding weight. After this, the weights are all summed up and fed through a hard-limiter. Basically, a hard-limiter is a function that defines the threshold values for 'firing' the perceptron. For example, the limiter could be:

For example, if the sum of the input multiplied by the weights is -2, the limiting function would return 0. Or if the sum was 3, the function would return 1.

Now, the way a perceptron learns to distinguish patterns is through modifying its weights - the concept of a learning rule must be introduced. In the perceptron, the most common form of learning is by adjusting the weights by the difference between the desired output and the actual output. Mathematically, this can be written:

Learning on a perceptron is guarenteed, as stated by the Perceptron Convergence Theorem which states that if a solution can be implemented on a perceptron, the learning rule will find the solution in a finite number of steps. Proof of this theorem can be found in Minsky and Papert's 1989 book, Perceptrons.

 x1 x2 y 0 0 0 0 1 0 1 0 1 1 1 0
Perceptrons can only solve problems where the solutions can be divided by a line (or hyperplane) - this is called linear separation. To explain the concept of linear separation further, let us look at the function shown to the left. For those of you unfamiliar with binary notation, the function reads 'x1 and (not x2)'. Let us assume that we run the function through a perceptron, and the weights converge at 0 for the bias, and 2, -2 for the inputs. If we calculate the net value (the weighted sum) we get:

Note, that x0 is the bias, x1 and x2 are the inputs. Now, if x1 is plotted on the y-axis, and x2 on the x-axis, the equation can be reduced to x1 = x2. Look at the truth table plotted on the axis, and the line that the perceptron 'derives':

So the perceptron correctly draws a line that divides the two groups of points. Note that it doesn't only have to be a line, it can be a hyperplane dividing points in 3-D space, or beyond. This is where the power of perceptrons lies, but also where its limitations lie. For example, perceptrons cannot solve the XOR or NXOR binary functions.

##### Multi-Perceptrons
The XOR problem can be solved by using three perceptrons. Note that this is not synonymous with multi-layer networks. The key is splitting up the XOR problem into three different parts - this requires a little boolean mathematics. Remember that '^' is AND, and 'v' is OR, NOT is '¬':

The problem is now broken down into three different linearly separable problems. Whereby, the results from the first two equations can be used in a third linear separble problem. From the final three equations, you can see that the perceptrons would be 'arranged' something like this:

To prove that it works, lets look at the weights (and thus the lines) that the perceptron converges at. The first perceptron balances at {0,1,1} (remember, the first element is the bias), the second at {2,-1, -1} and the final one at {-1, 1, 1}. The three equations are as follows:

Remember that the final equation is the equation that covers the third perceptron that takes the output of the first two as its inputs, so this will be plotted on another graph. The two graphs look like this:

You can see that the layer 1 lines cut the graph into three parts. The centre region between the two lines is where the perceptron will generalize as a '1', with the other areas on and above/beneath the two other lines as '0'. Layer 2, you see how the third perceptron creates the final result. Notice, that the two lines in Layer 1 do not intersect at the origin, so the third perceptron never has to deal with it.

Setting the Weights
You can see how the multi-perceptron architecture works. So, now you can see that any function can be implemented on group of perceptrons (simply connected or, more commonly, layered) as long as you can set the weights. Now, for something as small as the above example, by hand is not too much trouble, but for problems concerning tens, hundreds of neurons this is not an option. So, how do you do it? There are many ways, a technique called back-propagation is the most popular, research in using genetic algorithms to evolve them is also showing some success. Nevertheless, all the techniques are complicated and are not covered here.