# Coding Backpropagation and Gradient Descent From Scratch without using any libraries

Backpropagation is considered one of the core algorithms in Machine Learning. It is mainly used in training the neural network. And backpropagation is basically gradient descent. What if we tell you that understanding and implementing it is not that hard?  Anyone who knows basic Mathematics and has knowledge of the basics of Python Language can learn this in 2 hours. Let’s get started.

Though there are many high-level overviews of the backpropagation and gradient descent algorithms what I found is that unless one implements these from scratch, one is not able to understand many ideas behind neural networks.

Getting Started:

1. Clone this repository in your home directory
git clone https://github.com/cloudxlab/iitr-deep-learning-spl-tf2
2. Open ‘backprop_from_scratch.ipynb’ notebook in the folder

## Writing Functions – Python

A function is basically a sequence of operations. It may take inputs via arguments. And it may give the result by returning a value.

Let us define a function square which basically gives a square of whatever we pass as an argument.

You can visualize a function like this. There is input and there is output.

## Definition

In [2]:

```def sq(x):
return x*x
```

In [3]:

```sq(4)
```

Out[3]:

`16`

In [4]:

```sq(10)
```

Out[4]:

`100`

A function could also take multiple arguments

In [5]:

```# function could also take multiple arguments
# Here a and b are the two arguments and function is returning a single value.
return a+b

```

Out[5]:

`30`

In [8]:

```add(10.5, 120)
```

Out[8]:

`130.5`

## Variable arguments to a function

In [9]:

```# We can define functions with variable arguments
def sum(*args):
s = 0
for i in args:
s += i
return s
```

In [10]:

```sum(1)
```

Out[10]:

`1`

In [11]:

```sum(2, 3)
```

Out[11]:

`5`

In [12]:

```sum(2, 3, 4)
```

Out[12]:

`9`

## Calling a function with variable arguments

In [14]:

```# If we have a usual function such as the following:

return a+b+c

# we can call it using variable argument by putting an asterix
arr = range(3)
```

Out[14]:

`3`

## Passing a function as argument

In python, you can pass a function in another function just like another variable.In [15]:

```def double_it(x):
return 2*x

def apply_transformation(f, arr):
result = []
for a in arr:
v = f(a)
result.append(v)
return result
apply_transformation(double_it, [1,2,3])
```

Out[15]:

`[2, 4, 6]`

In [14]:

```# Similarly we can us another function - the one that we defined above.
apply_transformation(sq, [1,2,3])
```

Out[14]:

`[1, 4, 9]`

## Differentiation

Differentiation is basically about finding the rate of change.

Rate of change of distance is velocity. Rate of change of velocity is acceleration.

If you plot rate of change of y with respect to x for a straight line y = mx +c , you will get a constant horizontal line.

If you plot, rate of change of y with respect to x for a parabola, it gives a straight line.

## Numeric Differentiation

In [16]:

```# Slightly increase x and compute the result.
# Then compute the ratio of change in result with change in x

def diff(fn, x):
delta = 0.000000000001
y = fn(x)
x1 = x + delta
y1 = fn(x1)
return (y1 - y)/(x1 - x)
```

In [17]:

```diff(sq, 2)
```

Out[17]:

`4.0`

In [17]:

```x = 2
2*x
```

Out[17]:

`4`

In [18]:

```def cb(x):
return x*x*x
cb(10)
```

Out[18]:

`1000`

In [19]:

```diff(cb, 2)
```

Out[19]:

`12.0`

In [20]:

```# If you know symbolic / algebric differentiation,
# you would be knowing that the differentiation of an algebric function x^3 is 3*x^2.
# The result that we computed above is same as the algebric differentiation
x = 2
3*x**2
```

Out[20]:

`12`

## Partial Derivative

If there is function which is taking multiple inputs, the rate of change of output could be computed with respect of any of the inputs.In [21]:

```# Here is a very simple function
def fn(x1, x2):
return x1*5 + x2*x2

# Lets calculate the rate of change of output of this function wrt x1
x1 = 10
x2 = 20
delta_x = 0.000000000001

delta_y = fn(x1+delta_x, x2) - fn(x1, x2)
df_dx1 = delta_y / delta_x

print(df_dx1)
```
```5.002220859751105
```

In [22]:

```# Now, lets calculate the rate of change of output of this function wrt x2

delta_y = fn(x1, x2+delta_x) - fn(x1, x2)
df_dx2 = delta_y / delta_x
print(df_dx2)
```
```39.960923459148034
```

In [23]:

```### Now, let us generalize it a bit

import numpy as np

# The first argument is the function,
# second argument is the idx of the input with respect to which we are differentiating.
# Third arguments is a variable arguments - there can be many values. The function is called with these values.

def diffp(fn, idx, *args):
delta = 0.000000000001

# Call the underlying function with args as such
y = fn(*args)

# Now, increase only one input - at idx.
args = list(args)
args[idx] += delta

# Now, calulate the output after changing one of the inputs a little bit.
y1 = fn(*args)
return (y1 - y)/delta
```

In [24]:

```# Let us test it using a very simple function.
# This function can be differentiated with respect to m, c or x.
def line_y(m, c, x):
return m*x + c

# Notice how we are passing the function as the first argument.
# The second argument is the index of the value with respect to which we are differentiating
# Remaining values are the values of the arguments to be passed while calling line_y
diffp(line_y, 0, 1, 2, 3) # differentiate wrt m
```

Out[24]:

`3.000266701747023`

In [25]:

```diffp(line_y, 1, 1, 2, 3) # differentiate wrt c
```

Out[25]:

`1.000088900582341`

In [26]:

```diffp(line_y, 2, 1, 2, 3) # differentiate wrt x
```

Out[26]:

`1.000088900582341`

## Chain Rule

In [76]:

```# If we have a sequence of functions as mentioned above, we can represent it using another function:
def sq(x):
return x*x

def cb(t):
return t*t*t

def sqcb(x):
t = sq(x)
y = cb(t)
return y
```

In [77]:

```# Differentiating it will be really straightforward
diffp(sqcb, 0, 2)
```

Out[77]:

`192.01706891180947`

Third method:

In [28]:

```x = 2
dt_dx = diffp(sq, 0, x)
```

In [29]:

```t = sq(2)
dy_dt = diffp(cb, 0, t)
```

In [30]:

```dt_dx * dy_dt
```

Out[30]:

`192.03413934105515`

When we have a function and we want to find out at what point this function would attain a minima or maxima. See more here: https://youtu.be/4Cu9fizAtAoIn [38]:

```# Let us first generate some data

import numpy as np

# Generate 100 random numbers
X = 2*np.random.random((100, 1))
```

In [39]:

```# Now, let us generate the values of y corresponding to a line with slope 3 and intercept 4
# plus some noise
y = 4 + 3 * X + .3*np.random.randn(100, 1)
```

In [40]:

```# Let us take a look at first three records
y[0:3]
```

Out[40]:

```array([[7.46275157],
[5.42272672],
[5.26000136]])```

In [41]:

```X[0:3]
```

Out[41]:

```array([[1.13629739],
[0.52740146],
[0.61857458]])```

In [42]:

```4+3*X[0:3]
```

Out[42]:

```array([[7.40889218],
[5.58220439],
[5.85572374]])```

In [43]:

```# Let us plot it using matplotlib

from matplotlib import pyplot as plt

plt.figure(figsize=(11,4))
plt.scatter(X, y)
plt.show()

# horizontal axis is x and vertical axis is y
```

In [45]:

```def error(m, c, X, y):
ypred = m*X + c
err = ypred-y
return np.sum(err*err)
```

In [46]:

```error(3, 4, np.array([1,2,3]), np.array([1,2,3]))
```

Out[46]:

`200`

In [47]:

```error(3, 4, X, y)
```

Out[47]:

`11.370852782527962`

In [52]:

```error(3.04, 4, X, y)
```

Out[52]:

`11.164979946174462`

In [83]:

```def grad_fit(X, y):
m = np.random.random()
c = np.random.random()
learning_rate = 0.001
for i in range(100):
print("i: ", i, " Error: ", error(m, c, X, y))
dE_dm = diffp(error, 0, m, c, X, y)
dE_dc = diffp(error, 1, m, c, X, y)

m = m - learning_rate * dE_dm
c = c - learning_rate * dE_dc
return (m, c)
```

In [84]:

```m, c = grad_fit(X, y)
```
```i:  0  Error:  4083.7285975679897
i:  1  Error:  1357.6252100867644
i:  2  Error:  459.3592139100733
i:  3  Error:  163.29051705436652
i:  4  Error:  65.5610011647214
i:  5  Error:  33.1328146842778
i:  6  Error:  22.218172770940345
i:  7  Error:  18.399259306669478
i:  8  Error:  16.927083201555945
i:  9  Error:  16.240253000396848
i:  10  Error:  15.823423526524639
i:  11  Error:  15.506227542324433
i:  12  Error:  15.232001423349436
i:  13  Error:  14.981432087611957
i:  14  Error:  14.747680617493739
i:  15  Error:  14.52798983150515
i:  16  Error:  14.321092393591112
i:  17  Error:  14.125943647997243
i:  18  Error:  13.941884312505449
i:  19  Error:  13.76823354607412
i:  20  Error:  13.604392639484571
i:  21  Error:  13.449850857643208
i:  22  Error:  13.304041069116783
i:  23  Error:  13.1664926916126
i:  24  Error:  13.036677643076112
i:  25  Error:  12.914182340103585
i:  26  Error:  12.798551326229774
i:  27  Error:  12.689583643386808
i:  28  Error:  12.58679533656726
i:  29  Error:  12.48975407746471
i:  30  Error:  12.398176189973645
i:  31  Error:  12.311842550845657
i:  32  Error:  12.230368725423476
i:  33  Error:  12.15349017582421
i:  34  Error:  12.080929504768589
i:  35  Error:  12.01246968129011
i:  36  Error:  11.947881746213836
i:  37  Error:  11.886969741939057
i:  38  Error:  11.829508805337317
i:  39  Error:  11.775232938950976
i:  40  Error:  11.7240189978844
i:  41  Error:  11.675756159108447
i:  42  Error:  11.630169982103672
i:  43  Error:  11.58715098844346
i:  44  Error:  11.546590599235017
i:  45  Error:  11.508352340820826
i:  46  Error:  11.472259383091766
i:  47  Error:  11.438172320591587
i:  48  Error:  11.406064361894988
i:  49  Error:  11.375765097590975
i:  50  Error:  11.347175072319152
i:  51  Error:  11.320198663860097
i:  52  Error:  11.294768656072876
i:  53  Error:  11.27073288512202
i:  54  Error:  11.248069676645077
i:  55  Error:  11.22667985210804
i:  56  Error:  11.206523779032805
i:  57  Error:  11.187457348111439
i:  58  Error:  11.169498175945936
i:  59  Error:  11.15254221747177
i:  60  Error:  11.13654887644347
i:  61  Error:  11.12145424762985
i:  62  Error:  11.10723362998187
i:  63  Error:  11.093814960235605
i:  64  Error:  11.081145535104787
i:  65  Error:  11.069214843723582
i:  66  Error:  11.057944186806685
i:  67  Error:  11.04730077050891
i:  68  Error:  11.037282428082056
i:  69  Error:  11.027789502107026
i:  70  Error:  11.018836113798875
i:  71  Error:  11.010404557679433
i:  72  Error:  11.00246237284482
i:  73  Error:  10.994951898575751
i:  74  Error:  10.987863959791374
i:  75  Error:  10.981173260576831
i:  76  Error:  10.97487583850806
i:  77  Error:  10.968924851820436
i:  78  Error:  10.963317694512957
i:  79  Error:  10.95803070674304
i:  80  Error:  10.953029845889434
i:  81  Error:  10.948319335119882
i:  82  Error:  10.943855597092096
i:  83  Error:  10.939657430608891
i:  84  Error:  10.935700404741592
i:  85  Error:  10.93196932790682
i:  86  Error:  10.928446914760144
i:  87  Error:  10.925122307189463
i:  88  Error:  10.921987557948412
i:  89  Error:  10.919030199853355
i:  90  Error:  10.916238322489436
i:  91  Error:  10.913601802488085
i:  92  Error:  10.911115528431818
i:  93  Error:  10.908765814951405
i:  94  Error:  10.906556316793806
i:  95  Error:  10.904466955489564
i:  96  Error:  10.902504471647127
i:  97  Error:  10.900643630382485
i:  98  Error:  10.898883927123046
i:  99  Error:  10.897232145447704
```

In [85]:

```m, c
```

Out[85]:

`(3.153937346115943, 3.856832667230562)`

In [86]:

```from matplotlib import pyplot as plt

x1 = 0
x2 = 2

y1 = m*x1 + c
y2 = m*x2 + c

ycap = m*X + c
plt.figure(figsize=(11,4))
plt.subplot()

# plt.scatter(X, ycap, color='r')
plt.plot([x1, x2], [y1, y2], color = 'r')
plt.subplot()
plt.scatter(X, y, color='b')
plt.show()
```
```/usr/local/anaconda/lib/python3.6/site-packages/ipykernel_launcher.py:15: MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance.  In a future version, a new instance will always be created and returned.  Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.
from ipykernel import kernelapp as app
```

## Backpropagation

In [88]:

```## Generate Data
import numpy as np

X = np.random.random((100, 1))
y = 4 + 3 * X + .3*np.random.randn(100, 1)
```

In [89]:

```def neuron(w11, w12, x):
# We will talk about Activation function later.
r = w11*x + w12
return r

def forward_nn(w11, w12, w21, w22, x):
h1 = neuron(w11, w12, x)
o1 = neuron(w21, w22, h1)
return o1

def sq_error(o1, yy):
e = o1 - yy
return e*e

def error_nn(w11, w12, w21, w22, x, yy):
o1 = forward_nn(w11, w12, w21, w22, x)
e = o1 - yy
return e*e
```

In [90]:

```w11 = 0.5
w12 = 0.5
w21 = 0.5
w22 = 0.5

learning_rate = 0.001

for epoch in range(100):
for i in range(len(X)):
x = X[i]
yy = y[i]

dE_dw11 = diffp(error_nn, 0, w11, w12, w21, w22, x, yy)
dE_dw12 = diffp(error_nn, 1, w11, w12, w21, w22, x, yy)
dE_dw21 = diffp(error_nn, 2, w11, w12, w21, w22, x, yy)
dE_dw22 = diffp(error_nn, 3, w11, w12, w21, w22, x, yy)

w11 = w11 - learning_rate * dE_dw11
w12 = w12 - learning_rate * dE_dw12

w21 = w21 - learning_rate * dE_dw21
w22 = w22 - learning_rate * dE_dw22
```

In [91]:

```ycap = forward_nn(w11, w12, w21, w22, X)
```

In [ ]:

In [92]:

```from matplotlib import pyplot as plt

plt.figure(figsize=(11,4))
plt.subplot()
plt.plot(X, ycap, color='r')
plt.subplot()
plt.scatter(X, y, color='b')
plt.show()
```
```/usr/local/anaconda/lib/python3.6/site-packages/ipykernel_launcher.py:6: MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance.  In a future version, a new instance will always be created and returned.  Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.

```

## Further Improvements – Backpropagation

In [93]:

```# Initialize
w11 = 0.5
w12 = 0.5
w21 = 0.5
w22 = 0.5

def neuron(w11, w12, x):
#TODO: activation function
return w11*x + w12

def sq_error(o1, yy):
e = o1 - yy
return e*e

# Learning Rate
eta = 0.01

for epoch in range(50):
for i in range(len(X)):
x = X[i]
yy = y[i]

#Forward Pass
h1 = neuron(w11, w12, x)
o1 = neuron(w21, w22, h1)

dE_do1 = diffp(sq_error, 0, o1, yy)
do1_dw21 = diffp(neuron, 0, w21, w22, h1)
dE_w21 = dE_do1 * do1_dw21

do1_dw22 = diffp(neuron, 1, w21, w22, h1)
dE_w22 = dE_do1 * do1_dw22

do1_dh = diffp(neuron, 2, w21, w22, h1)
dE_dh = dE_do1 * do1_dh
dh_dw11 = diffp(neuron, 0, w11, w12, x)
dE_dw11 = dE_dh * dh_dw11

dh_dw12 = diffp(neuron, 1, w11, w12, x)
dE_dw12 = dE_do1 * do1_dh * dh_dw12

w11 = w11 - eta * dE_dw11
w12 = w12 - eta * dE_dw12

w21 = w21 - eta * dE_w21
w22 = w22 - eta * dE_w22
```

In [94]:

```# ycap = forward_nn(w11, w12, w21, w22, X)

h1 = neuron(w11, w12, X)
ycap1 = neuron(w21, w22, h1)

from matplotlib import pyplot as plt

plt.figure(figsize=(11,4))
# plt.subplot()
# plt.scatter(X, ycap, color='r')
plt.subplot()
plt.plot(X, ycap1, color='g')
plt.subplot()
plt.scatter(X, y, color='b')
plt.show()
```
```/usr/local/anaconda/lib/python3.6/site-packages/ipykernel_launcher.py:13: MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance.  In a future version, a new instance will always be created and returned.  Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.
del sys.path[0]
```

## Extensions

1. Neurons are many. Weight are computed in a single shot using numpy.
2. The layers are many and the code needs to be flexible to stitch many layers flexible
3. Neuron definition is complex – activation function

Let us know in the comments below if you find it helpful.

In order to claim the certificate from E&ICT Academy, IIT Roorkee, visit https://www.cloudxlab.com

Website ► https://www.cloudxlab.com