Generation 5: Artificial Intelligence Repository - BP Example: XOR Net

#### BP Case Study: The XOR Net

Firstly, please read the back-propagation essay. Also, if you want to make use of the programming classes, you must have a good working knowledge of C++. For the non-programmers out there, the programming part is just a small part at the end of the essay - you won't miss anything important! For the programmers - all major programming discussion takes place in another essay.

To demonstrate back-propagation were are going to look at three layer, 5 neuron (2 input, 2 hidden, 1 output) network (shown to the right). Before we start looking at the calculations, let us get some terminology straight.

The inputs/outputs of the neurons are described as follows. Each layer has number, starting at 1 for the input layer. The inputs for each layer are indicated by xl-1(n) where l is the layer, and n is the neuron. So, for example, the inputs fed into the networks are x0(0) and x0(1) (marked red). Or the outputs from the hidden layer to the output layer are x2(0) and x2(1) (marked green).

Weights are defined by wl(f,n), where l is the layer, f is the neuron number is came from in the previous layer, and n is the number of the neuron itself. Note that when f = 0, it refers to the bias for the neuron. For example, the weight for the output of the second neuron in the input layer to the input of the first neuron in the hidden layer is w2(2,1) (marked blue).

#### Weights and Calculations

Firstly, the network would be initialized, and given random weights. Let's assign these initial weights. The weights can be anything between -1 and 1.

 Hidden Neuron 1: w2(0,1) = 0.341232 w2(1,1) = 0.129952 w2(2,1) =-0.923123 Hidden Neuron 2: w2(0,2) =-0.115223 w2(1,2) = 0.570345 w2(2,2) =-0.328932 Output Neuron: w3(0,1) =-0.993423 w3(1,1) = 0.164732 w3(2,1) = 0.752621

Since back-propagation and training requires thousands of steps, we are obviously not going to go through it all, I will merely look at the first iteration that occurs. So, let us look at what would happen during training of (0,0). Firstly, the sum has to calculated, then run through the sigmoid function to limit it.

```x1(0) = 1 (bias)
x1(1) = 0
x1(2) = 0

Neuron 1: (1 * 0.341232) + (0 * 0.129952) + (0 * -0.923123) =  0.341232
Neuron 2: (1 *-0.115223) + (0 * 0.570345) + (0 * -0.328932) = -0.115223
```
So, we now have the net (weighted sum) values of the two hidden neurons. Now, to run them through our hard-limiter function.
```x2(1) = 1/(1+e^(-0.341232)) = 0.584490
x2(2) = 1/(1+e^( 0.115223)) = 0.471226
```
We now have the outputs for the hidden layer. So, let us now do the same for the output layer. Using x2(1) and x2(2) as the inputs for the output layer we can make the following calculations:
```x2(0) = 1 (bias)
x2(1) = 0.584490
x2(2) = 0.471226

Net: (1 *-0.993423) + (0.584490 * 0.164732) + (0.471226 * 0.752621) = -0.542484

Therefore, x3(1) = 1/(1+e^(0.542484)) = 0.367610
```
This is the value that the network would output. This is only half of the training process though, we now have to adjust all the weights to get the result closer to the one we want (0 in this case). So, lets calculate our deltas using the formulas discussed in the BP essay. We will first calculate the delta for the output layer:
```d3(1) = x3(1)(1 - x3(1))(d - x3(1))
= 0.367610 * (1 - 0.367610)(0 - 0.367610)
=-0.085459
```
Now that we have that, we can use it to propagate the error backwards:
```d2(1) = x2(1)(1 - x2(1))w3(1,1)d3(1)
= 0.584490 * (1 - 0.584490)*(0.164732)*(-0.85459) = -0.034190
d2(2) = 0.471226 * (1 - 0.471226)*(0.752621)*(-0.85459) = -0.160263
```
That's all the deltas calculated for layers. Now to actually alter the weights - remember that the learning coefficient h is defined by the user and I have picked 0.5 to work with . Now, for some of them the weight change will be 0, because you are multiplying by the inputs, which in our case is 0. Therefore, I am only going to show the calculations for the ones that change:
```dw2(0,1) = h*x1(0)*d2(1) = 0.5 * 1 * -0.034190 = -0.17095
dw2(1,1) = 0
dw2(2,1) = 0

dw2(0,2) = 0.5 * 1 * -0.160263 = -0.080132
dw2(1,2) = 0
dw2(2,2) = 0

dw3(0,1) = 0.5 * 1 * -0.085459 = -0.042730
dw3(1,1) = 0.5 * 0.584490 * -0.085459 = -0.024975
dw3(2,1) = 0.5 * 0.471226 * -0.085459 = -0.020135
```
So, these are the weight changes. You would add these to their respective weights, then run the entire process again on the next set of training data. Slowly, as the training data is fed in and the network in retrained a few thousand times, the network could balance out to values such as these:

 Hidden Neuron 1: w2(0,1) =-6.062263 w2(1,1) =-6.072185 w2(2,1) = 2.454509 Hidden Neuron 2: w2(0,2) =-4.893081 w2(1,2) =-4.894898 w2(2,2) = 7.293063 Output Neuron: w3(0,1) =-9.792470 w3(1,1) = 9.484580 w3(2,1) =-4.473972

With these outputs, you would get the follow results for XOR:

```0 XOR 0 = 0.017622
0 XOR 1 = 0.981504
1 XOR 0 = 0.981491
1 XOR 1 = 0.022782
```
Which, with a small amount of rounding, is the correct truth table. Now, for a brief look at the C++ class.

#### C++ Class Code

The C++ class for this is very simple. You only have two functions you really care about, Train() and Run(). Train takes three floating point values, the two inputs and an expected value. The function returns the output of the net. Run() only takes the two inputs, and returns the output. Therefore, to apply the network the above example, you main() should look like:
```void main() {
CBPNet bp;

for (int i=0;i<BPM_ITER;i++) {
bp.Train(0,0,0);
bp.Train(0,1,1);
bp.Train(1,0,1);
bp.Train(1,1,0);
}

cout << "0,0 = " << bp.Run(0,0) << endl;
cout << "0,1 = " << bp.Run(0,1) << endl;
cout << "1,0 = " << bp.Run(1,0) << endl;
cout << "1,1 = " << bp.Run(1,1) << endl;
}
```
BPM_ITER is defined as the number of iterations the network is to run for. Here is some sample output from the program:
```C:\Program Files\DevStudio\MyProjects\BPNet\Release>bpnet.exe
0,0 = 0.0494681
0,1 = 0.955633
1,0 = 0.942529
1,1 = 0.0433488
```
To look into the class code, please see the CBPNet essay. You can download the code from here.

• Introduction to Neural Networks - A simple intro to NNs.
• Multilayer Feedforward Network and the Backpropagation Algorithm.
• Back-propagation for the Uninitiated.
• Back-propagation Case Study.
• Self-organizing Neural networks.
• Associative neural networks.
• Perceptrons.

• PDA32 - Perceptron Demonstration Application for Win95.
• Hopfield Image Recognizor for Win95.
• Kohonen Demonstration Program for Win95.
• Optical Number Recognizor - Uses perceptrons.