Implementing a Perceptron
In this part we will start implementing a single perceptron and do some experiments.
first part can be found here.
A single Perceptron
Similarly the Perceptrons can be trained to classify data as per the training data. but we can only classify linearly separable data.
What is a linearly separable?
Linear separability can be termed as, if the plot of data can be separated by a single straight line. This means depending on the separation criterion, there exists a straight line dividing the data points in to 2 regions.
Now lets see a small code implementing it step by step. NOTE: We need a matrix manipulation library for all our Neural network adventure. For this purpose my tmatlib on github is quite suitable. Though one should not expect it to perform like blas. In my code i used a similar version known as smatlib. As it is continuously growing I have not published it on github. But stay assure the functionality and performance of both are same.
// structure of our perceptron.
typedef struct perceptron
{
int n_inp; // number inputs
matrix W; // weight matrix
double lr; // learning rate
double bias; // bias
}perceptron;
A straight line on a 2-dimentional plain can be expressed as y=mx+c. The Bias parameter in our perceptron is related to c. Our training data is x and y, x is input and y is expected or target. Our perceptron learns the value of m. Learning rate(lr) determines how fast or slow the perceptron should learn. If Learning rate is too low then it may be trapped in local minima. If it is too high then it may oscillate and may not reach the solution. So Bottom line is one has to experiment with learning rate(lr) value to achieve better result. Weight matrix is initialized with random values. Gradually with training it is adjusted/updated to achieve better result.
I uploaded all the codes at Simple-NeuralNetwork.
In the repo there are 4 files perceptron.h, linearsep.c, and_gate.c and or_gate.c. We will discuss bits from those files and see how they work.
In perceptron.h file, there are functions/methods to create, manipulate, train, display and delete the perceptron. General sequence is
- create a perceptron
- prepare data set
- train
- test with predict function
This is simple, is not it? Now we will look into 2 functions, namely predict and train. These functions are the core of this perceptron.
predict:
double predict(perceptron p, double inp[])
{
double res=0.0;
for(int r = 0; r < p.n_inp; r++)
{
res += inp[r]*get_cell(p.W,0,r);
}
res += p.bias;
double ret= sigmoid(res);
return(ret);
}
This function multiplies inputs with corresponding weights and adds them. Next, to activate the output apply sigmoid function. A sigmoid function ranges from 0 to 1. Please see the first part here where the sum process and sigmoid function is defined.
Train:
void train(perceptron *p,double *data,double desired)
{
double guess = predict(*p,data);
double err = desired - guess;
for(int n = 0; nn_inp; n++)
{
double cur_w = get_cell(p->W,0,n);
set_cell(&p->W,0,n,cur_w+p->lr*err*data[n]);
}
p->bias += p->lr *err;
}
In this function, perceptron is trained by adjusting weights and bias.The steps can be as follows.
- calculate output
- find error by subtracting output from target
- find delta weight by multiplying error, data(input) and learning rate
- find delta bias by multiplying error and learning rate
- update weights
- update bias
In step 3 we are calculating the delta weights. This is the most important step in whole code. How and why delta value is multiplication of error with input is a topic itself. As an Hint it is gradient, hence a derivative of error term with inputs and weights are to be considered.
Testing
To tryout this implementation, 3 code files were provided. which adheres to the general sequence. First one is linear_sep.c, In this demo program, random data sets are created and the perceptron is trained; then with a known data set it is compared. the output on my system is as bellow.
************Perceptron:**************
number of inputs = 2
Weights:
[ 1.00000000 ]
[ 1.00000000 ]
Bias = 1.000000
*************************************
************Perceptron:**************
number of inputs = 2
Weights:
[ 61.70302487 ]
[ -31.13024636 ]
Bias = 0.622900
*************************************
is (2.000000 < 2 x 20.000000 + 1) predicted = 1.000000
For other 2 demo codes. Can this artificial neuron mimic reliably a digital gate? Well that was the motive when artificial neuron was proposed. It is observed that AND and OR gate are possible to simulate but not XOR gate. WHY??? from bellow diagrams it is obvious that OR function and AND function are linear separable but not the XOR function.
// OR gate implementation.
// x1 x2 y
// 0 0 0
// 0 1 1
// 1 0 1
// 1 1 1
//
// 0,1-------1,1
// | |
// \| |
// |\ |
// | \ |
// 0,0-\-----1,0
// \-separating line
// AND gate implementation.
// x1 x2 y
// 0 0 0
// 0 1 0
// 1 0 0
// 1 1 1
//
// 0,1---\----1,1
// | \ |
// | \ |
// | \|
// | |\-line separating
// 0,0-------1,0
// XOR function
// x1 x2 y
// 0 0 0
// 0 1 1
// 1 0 1
// 1 1 0
// /----------|----- 2 lines separates
// 0,1--/-----1,1 |
// | / | |
// | / | |
// |/ |/---|
// / /
// /| /|
// 0,0-------/-1,0
Output of OR gate
************Perceptron:**************
number of inputs = 2
Weights:
[ 0.00000000 ]
[ 0.00000000 ]
Bias = 1.000000
*************************************
************Perceptron:**************
number of inputs = 2
Weights:
[ 11.48282021 ]
[ 11.48242093 ]
Bias = -5.281243
*************************************
inputs 0, 0 predicted = 0.005060
inputs 1, 0 predicted = 0.997978
inputs 0, 1 predicted = 0.997977
inputs 1, 1 predicted = 1.000000
Output of AND gate
************Perceptron:**************
number of inputs = 2
Weights:
[ 0.00000000 ]
[ 0.00000000 ]
Bias = 1.000000
*************************************
************Perceptron:**************
number of inputs = 2
Weights:
[ 10.22930534 ]
[ 10.22993658 ]
Bias = -15.512550
*************************************
inputs 0, 0 predicted = 0.000000
inputs 1, 0 predicted = 0.005050
inputs 0, 1 predicted = 0.005053
inputs 1, 1 predicted = 0.992943
In the above result if we consider 0.99 as 1 and anything less that 0.005 as 0 then our results are at par with the truth table.
As we saw above XOR is not linearly separable, we can not simulate it with a single perceptron. we need more than one layer to simulate it we will do it in a future post.
Till then happy coding.
No comments:
Post a Comment