Firstly, please read the back-propagation essay. Also, if you want to make use of the programming classes, you must have a good working knowledge of C++. For the non-programmers out there, the programming part is just a small part at the end of the essay - you won't miss anything important! For the programmers - all major programming discussion takes place in another essay.
The inputs/outputs of the neurons are described as follows. Each layer has number, starting at 1 for the input layer. The inputs for each layer are indicated by xl-1(n) where l is the layer, and n is the neuron. So, for example, the inputs fed into the networks are x0(0) and x0(1) (marked red). Or the outputs from the hidden layer to the output layer are x2(0) and x2(1) (marked green). Weights are defined by wl(f,n), where l is the layer, f is the neuron number is came from in the previous layer, and n is the number of the neuron itself. Note that when f = 0, it refers to the bias for the neuron. For example, the weight for the output of the second neuron in the input layer to the input of the first neuron in the hidden layer is w2(2,1) (marked blue). Firstly, the network would be initialized, and given random weights. Let's assign these initial weights. The weights can be anything between -1 and 1.
Since back-propagation and training requires thousands of steps, we are obviously not going to go through it all, I will merely look at the first iteration that occurs. So, let us look at what would happen during training of (0,0). Firstly, the sum has to calculated, then run through the sigmoid function to limit it.
x1(0) = 1 (bias) x1(1) = 0 x1(2) = 0 Neuron 1: (1 * 0.341232) + (0 * 0.129952) + (0 * -0.923123) = 0.341232 Neuron 2: (1 *-0.115223) + (0 * 0.570345) + (0 * -0.328932) = -0.115223So, we now have the net (weighted sum) values of the two hidden neurons. Now, to run them through our hard-limiter function. x2(1) = 1/(1+e^(-0.341232)) = 0.584490 x2(2) = 1/(1+e^( 0.115223)) = 0.471226We now have the outputs for the hidden layer. So, let us now do the same for the output layer. Using x2(1) and x2(2) as the inputs for the output layer we can make the following calculations: x2(0) = 1 (bias) x2(1) = 0.584490 x2(2) = 0.471226 Net: (1 *-0.993423) + (0.584490 * 0.164732) + (0.471226 * 0.752621) = -0.542484 Therefore, x3(1) = 1/(1+e^(0.542484)) = 0.367610This is the value that the network would output. This is only half of the training process though, we now have to adjust all the weights to get the result closer to the one we want (0 in this case). So, lets calculate our deltas using the formulas discussed in the BP essay. We will first calculate the delta for the output layer:
d3(1) = x3(1)(1 - x3(1))(d - x3(1))
= 0.367610 * (1 - 0.367610)(0 - 0.367610)
=-0.085459
Now that we have that, we can use it to propagate the error backwards:
d2(1) = x2(1)(1 - x2(1))w3(1,1)d3(1)
= 0.584490 * (1 - 0.584490)*(0.164732)*(-0.85459) = -0.034190
d2(2) = 0.471226 * (1 - 0.471226)*(0.752621)*(-0.85459) = -0.160263
That's all the deltas calculated for layers. Now to actually alter the weights - remember that the learning coefficient h is defined by the user and I have picked 0.5 to work with . Now, for some of them the weight change will be 0, because you are multiplying by the inputs, which in our case is 0. Therefore, I am only going to show the calculations for the ones that change:
dw2(0,1) = h*x1(0)*d2(1) = 0.5 * 1 * -0.034190 = -0.17095 dw2(1,1) = 0 dw2(2,1) = 0So, these are the weight changes. You would add these to their respective weights, then run the entire process again on the next set of training data. Slowly, as the training data is fed in and the network in retrained a few thousand times, the network could balance out to values such as these:
With these outputs, you would get the follow results for XOR: 0 XOR 0 = 0.017622 0 XOR 1 = 0.981504 1 XOR 0 = 0.981491 1 XOR 1 = 0.022782Which, with a small amount of rounding, is the correct truth table. Now, for a brief look at the C++ class. The C++ class for this is very simple. You only have two functions you really care about, Train() and Run(). Train takes three floating point values, the two inputs and an expected value. The function returns the output of the net. Run() only takes the two inputs, and returns the output. Therefore, to apply the network the above example, you main() should look like: void main() {
CBPNet bp;
for (int i=0;i<BPM_ITER;i++) {
bp.Train(0,0,0);
bp.Train(0,1,1);
bp.Train(1,0,1);
bp.Train(1,1,0);
}
cout << "0,0 = " << bp.Run(0,0) << endl;
cout << "0,1 = " << bp.Run(0,1) << endl;
cout << "1,0 = " << bp.Run(1,0) << endl;
cout << "1,1 = " << bp.Run(1,1) << endl;
}
BPM_ITER is defined as the number of iterations the network is to run for. Here is some sample output from the program:
C:\Program Files\DevStudio\MyProjects\BPNet\Release>bpnet.exe 0,0 = 0.0494681 0,1 = 0.955633 1,0 = 0.942529 1,1 = 0.0433488To look into the class code, please see the CBPNet essay. You can download the code from here. ![]()
| |||||||||||||||||||||||||||||||||||||||||||