swldas: May 2023

XOR Problem

In part 2, AND gate and OR gate could be trained using a single perceptron. But XOR gate could not be as it is not linearly separable. The XOR function can be represented as \begin{align} y &= x_{1}\bar{x_{2}}+\bar{x_{1}}x_{2} \\ z_{1} &= x_{1}\bar{x_{2}} \\ z_{2} &= \bar{x_{1}}x_{2} \\ y &= z_{1}+z_{2} \end{align} Looking at the above equations \begin{align} z_{1} &= x_{1} \land \bar{x_{2}} \\ z_{2} &= \bar{x_{1}} \land x_{2} \\ y &= z_{1} \lor z_{2} \end{align} y is splited into 2 equation z1 and z2, 2 linearly separable functions hence it can not be realized using a single neuron. Then the solution will be a multiple neurons. The organization of multiple neurons stacked in different layers is known as "Multi Layer Perceptron". In this part we will see a very basic multi layer perceptron to solve XOR problem. This chapter involves calculus differential. For Information purpose basic things will be discussed but no in-depth calculus will be used. It is assumed the reader is well aware of derivatives.

Multi layer Perceptron

A Multi layer perceptron (MLP) is a network of neurons arranged in more than one layer. That is Output, Hidden and Input. Generally Input Layer is a direct connecting layer. Hidden layer is where the real magic happens and the output layer is interface to user. A hidden layer is called as such because user doesn't interact with that layer. In an MLP multiple hidden layers may be present. But there is a limitation which is due to "Vanishing Gradient Problem". If interested then There are enough material present on internet. In an MLP with one input layer, one hidden layer and one output layer, and the network is fully connected that is each neuron of a layer is connected each neuron of immediate nearby layer. Please See the picture Bellow.

One thing to observe here is Hidden layer. Each the neurons in this layer has connections from each neuron in Input layer, and they are connected to output neurons. This is a simple fully connected neural network with 4 inputs, 5 hidden and one output neuron. In Future multiple output networks will be discussed and used.

The connections between layers are weights. These weights are generally represented using matrices. for convenience we will represent each weight as w_ij where i is i^th neuron of next layer and j is j^th neuron of current layer. In above case weight matrix between input and hidden layer should be defined as a matrix Wih[5,4]=> 5 rows and 4 columns and the weight matrix between hidden and output is Who[1,5]. For Input, hidden output and final output the matrix will be a column matrix.

Forward Propagation

Input data moves through the network and creates some output. This is called forward propagation. Previously in single neuron, the calculation was easier. Here in case of multiple layers it is a bit involved. It involves matrix multiplication. Consider above Image.input arriving at i^th neuron of hidden layer can be expressed as: \begin{equation} h_{i} = w_{i0}x_{0}+w_{i1}x_{1}+w_{i2}x_{2}+w_{i3}x_{3} \end{equation} or in general terms it can be expressed as \begin{equation} h_{i} = \sum_{j=0}^{n}w_{ij}x_{j} \end{equation} If set of equation for h_i is observed closely then it is seen that it is result of matrix multiplication. $$ \begin{bmatrix} h_{0} \\ h_{1} \\ h_{2} \\ h_{3} \\ h_{4} \end{bmatrix} = \begin{bmatrix} w_{00}^{ih}&w_{01}^{ih}&w_{02}^{ih}&w_{03}^{ih} \\ w_{10}^{ih}&w_{11}^{ih}&w_{12}^{ih}&w_{13}^{ih} \\ w_{20}^{ih}&w_{21}^{ih}&w_{22}^{ih}&w_{23}^{ih} \\ w_{30}^{ih}&w_{31}^{ih}&w_{32}^{ih}&w_{33}^{ih} \\ w_{40}^{ih}&w_{41}^{ih}&w_{42}^{ih}&w_{43}^{ih} \end{bmatrix} \times \begin{bmatrix} x_{0} \\ x_{1} \\ x_{2} \\ x_{3} \end{bmatrix} $$ $$H = W^{ih}.X$$

This resultant matrix need to be applied activation function of hidden layer to produce output. Thus generated values is input to the out put layer(in our case output neuron). As previous layer we need to do a matrix multiplication to find the input to the output layer and apply activation function of output layer to get the network output. $$ \begin{bmatrix} O_{0} \end{bmatrix} = \begin{bmatrix} w_{0}^{ho}&w_{1}^{ho}&w_{2}^{ho}&w_{3}^{ho}&w_{4}^{ho} \end{bmatrix} \times \begin{bmatrix} h_{0} \\ h_{1} \\ h_{2} \\ h_{3} \\ h_{4} \end{bmatrix} $$ $$ O = W^{ho}.H $$

Back Propagation

The Output O is dependent on weights W^ho and H. H is dependent on weights W^ih. As the selection of initial weights are random, hence they are unlikely to be perfect. Hence the output O is different than expected or target. The difference of Target and Output is known as Error. This error has to be compensated by adjusting weights W^ho and W^ih. The adjustment of weights starts from last layer and goes down to beginning. Hence it is called Back Propagation.

Back propagation of errors has to be adjusted among different layers, depending on how much each layer contributed. To find out contribution of error a partial derivative of error has to be found out with respect to different contributing elements. Description and derivation of this is beyond the scope as it will lengthen the article. For reference one can watch this video

We can represent the updated weight matrix of a layer as follows $$ W_{new}=W_{cur}+\alpha[target-actual output].X$$ where

X is input to the layer
α is learning rate

Implementation

To adjust weights in W^ho>, we need to find error in output, then apply derivative of activation function to it. Next multiply by scalar learning rate(α). This finish scalar calculations. The resultant multiplied by the Input to the layer that is activated output of hidden layer will give us delta weights. this delta weights added to the weight matrix W^oh gives updated weights in Hidden to Output layer.

Same procedure have to be repeated for input to hidden layer. To begin with we need to know error at hidden layer first. This can be found by multiplying (W^oh)^T with error at output. Rest procedures are same as above.


	// find error in output
	matrix err_out=subtract_matrix(expect,final_out);
	matrix whoT = transpose(nn.who);
	matrix err_hid=mat_mult(whoT,err_out);
	
	// calculate delta errors 
	matrix_map(&final_out,sigmoid_prime);
	mat_mult_hadamard(&err_out,&final_out);
	matrix hidT= transpose(hidden_out);
	matrix delta_who = mat_mult(err_out,hidT);
	mat_mult_scalar(nn.lr,&delta_who);
	
	matrix_map(&hidden_out,sigmoid_prime);
	mat_mult_hadamard(&err_hid,&hidden_out);
	matrix inpT = transpose(inp);
	matrix delta_wih = mat_mult(err_hid,inpT);
	mat_mult_scalar(nn.lr,&delta_wih);
	// update weights
	matrix tempa =add_matrix(nn.who,delta_who);
	assign_matrix(&nn.who,&tempa);
	matrix tempb = add_matrix(nn.wih,delta_wih);
	assign_matrix(&nn.wih,&tempb);

With this we are ready to make a very small neuralnet and train it for XOR problem. MNIST dataset can also be trained but needs patience and time( I achieved with 91.6% accuracy). Iris Dataset and wheat seed data set are very easily trained. Bellow is XOR problem implementation.


#include <stdio .h >
#include "../mlp1.c"
int main(int argc, char *argv[])
{
	srand(0);
	nnet my_nn =create_nn(2,3,1);

	// training
	
	for(int n = 0; n< 30000; n++)
	{
		matrix inp = create_matrix(2,1);
		matrix expected = create_matrix(1,1);
		
		set_cell(&inp,0,0,1.0);
		set_cell(&inp,1,0,0.0);
		set_cell(&expected,0,0,1);
		train(my_nn,inp,expected);
		
		set_cell(&inp,0,0,1.0);
		set_cell(&inp,1,0,1.0);
		set_cell(&expected,0,0,0);
		train(my_nn,inp,expected);
		
		set_cell(&inp,0,0,0.0);
		set_cell(&inp,1,0,1.0);
		set_cell(&expected,0,0,1);
		train(my_nn,inp,expected);
		
		set_cell(&inp,0,0,0.0);
		set_cell(&inp,1,0,0.0);
		set_cell(&expected,0,0,0);
		train(my_nn,inp,expected);
		
		delete_matrix(&expected);	
		delete_matrix(&inp);
	}
	// save the trained netork
	save_nn(my_nn,"xor2.net");

	// load nn from file
	my_nn = load_nn("xor2.net");
	// test trained network
	matrix inp = create_matrix(2,1);
	set_cell(&inp,0,0,1.0);
	set_cell(&inp,1,0,1.0);
	matrix out = predict(my_nn,inp);
	display_matrix(out);
	delete_matrix(&out);
	set_cell(&inp,0,0,0.0);
	set_cell(&inp,1,0,0.0);
	out = predict(my_nn,inp);
	display_matrix(out);
	delete_matrix(&out);
	set_cell(&inp,0,0,0.0);
	set_cell(&inp,1,0,1.0);
	out=predict(my_nn,inp);
	display_matrix(out);
	delete_matrix(&out);
	set_cell(&inp,0,0,1.0);
	set_cell(&inp,1,0,0.0);
	out = predict(my_nn,inp);
	display_matrix(out);
	
	delete_matrix(&out);
	delete_nn(my_nn);
	delete_matrix(&inp);
	return 0;
}
=======output====
[ashok@fedora xor]$ ./xor
[ 0.00559855 ]

[ 0.06528170 ]

[ 0.96165163 ]

[ 0.96165793 ]

Though I may put all the code here or on github, but I would like the reader to implement his own version analogous to this implementation. In any problem Contact me for the Full code its free But before asking please try to implement yourself.

Till now we discussed

What is an artificial neuron
How to implement it
What are the areas it can be used(linear separable)
How to train it
What is XOR problem
What is Multi layer neural network
How to implement one
How to train
Example of XOR implementation

This is the end of 3 parts for simple neural network. Part1 and Part2 can be found here. In some future article we will discuss Convolutional Neural Network(CNN)

swldas

Tuesday 2 May 2023

Neural Network from Scratch Using C (Part-3)