Backpropagation is considered one of the core algorithms in Machine Learning. It is mainly used in training the neural network. And backpropagation is basically gradient descent. What if we tell you that understanding and implementing it is not that hard? Anyone who knows basic Mathematics and has knowledge of the basics of Python Language can learn this in 2 hours. Let’s get started.
Though there are many high-level overviews of the backpropagation and gradient descent algorithms what I found is that unless one implements these from scratch, one is not able to understand many ideas behind neural networks.
Getting Started:
- Clone this repository in your home directory
git clone https://github.com/cloudxlab/iitr-deep-learning-spl-tf2 - Open ‘backprop_from_scratch.ipynb’ notebook in the folder
Writing Functions – Python
A function is basically a sequence of operations. It may take inputs via arguments. And it may give the result by returning a value.
Let us define a function square which basically gives a square of whatever we pass as an argument.
You can visualize a function like this. There is input and there is output.
Definition
In [2]:
def sq(x): return x*x
In [3]:
sq(4)
Out[3]:
16
In [4]:
sq(10)
Out[4]:
100
A function could also take multiple arguments
In [5]:
# function could also take multiple arguments # Here a and b are the two arguments and function is returning a single value. def add(a, b): return a+b add(10, 20)
Out[5]:
30
In [8]:
add(10.5, 120)
Out[8]:
130.5
Variable arguments to a function
In [9]:
# We can define functions with variable arguments def sum(*args): s = 0 for i in args: s += i return s
In [10]:
sum(1)
Out[10]:
1
In [11]:
sum(2, 3)
Out[11]:
5
In [12]:
sum(2, 3, 4)
Out[12]:
9
Calling a function with variable arguments
In [14]:
# If we have a usual function such as the following: def add(a, b, c): return a+b+c # we can call it using variable argument by putting an asterix arr = range(3) add(*arr)
Out[14]:
3
Passing a function as argument
In python, you can pass a function in another function just like another variable.In [15]:
def double_it(x): return 2*x def apply_transformation(f, arr): result = [] for a in arr: v = f(a) result.append(v) return result apply_transformation(double_it, [1,2,3])
Out[15]:
[2, 4, 6]
In [14]:
# Similarly we can us another function - the one that we defined above. apply_transformation(sq, [1,2,3])
Out[14]:
[1, 4, 9]
Differentiation
Differentiation is basically about finding the rate of change.
Rate of change of distance is velocity. Rate of change of velocity is acceleration.
If you plot rate of change of y with respect to x for a straight line y = mx +c , you will get a constant horizontal line.
If you plot, rate of change of y with respect to x for a parabola, it gives a straight line.
Numeric Differentiation
In [16]:
# Slightly increase x and compute the result. # Then compute the ratio of change in result with change in x def diff(fn, x): delta = 0.000000000001 y = fn(x) x1 = x + delta y1 = fn(x1) return (y1 - y)/(x1 - x)
In [17]:
diff(sq, 2)
Out[17]:
4.0
In [17]:
x = 2 2*x
Out[17]:
4
In [18]:
def cb(x): return x*x*x cb(10)
Out[18]:
1000
In [19]:
diff(cb, 2)
Out[19]:
12.0
In [20]:
# If you know symbolic / algebric differentiation, # you would be knowing that the differentiation of an algebric function x^3 is 3*x^2. # The result that we computed above is same as the algebric differentiation x = 2 3*x**2
Out[20]:
12
Partial Derivative
If there is function which is taking multiple inputs, the rate of change of output could be computed with respect of any of the inputs.In [21]:
# Here is a very simple function def fn(x1, x2): return x1*5 + x2*x2 # Lets calculate the rate of change of output of this function wrt x1 x1 = 10 x2 = 20 delta_x = 0.000000000001 delta_y = fn(x1+delta_x, x2) - fn(x1, x2) df_dx1 = delta_y / delta_x print(df_dx1)
5.002220859751105
In [22]:
# Now, lets calculate the rate of change of output of this function wrt x2 delta_y = fn(x1, x2+delta_x) - fn(x1, x2) df_dx2 = delta_y / delta_x print(df_dx2)
39.960923459148034
In [23]:
### Now, let us generalize it a bit import numpy as np # The first argument is the function, # second argument is the idx of the input with respect to which we are differentiating. # Third arguments is a variable arguments - there can be many values. The function is called with these values. def diffp(fn, idx, *args): delta = 0.000000000001 # Call the underlying function with args as such y = fn(*args) # Now, increase only one input - at idx. args = list(args) args[idx] += delta # Now, calulate the output after changing one of the inputs a little bit. y1 = fn(*args) return (y1 - y)/delta
In [24]:
# Let us test it using a very simple function. # This function can be differentiated with respect to m, c or x. def line_y(m, c, x): return m*x + c # Notice how we are passing the function as the first argument. # The second argument is the index of the value with respect to which we are differentiating # Remaining values are the values of the arguments to be passed while calling line_y diffp(line_y, 0, 1, 2, 3) # differentiate wrt m
Out[24]:
3.000266701747023
In [25]:
diffp(line_y, 1, 1, 2, 3) # differentiate wrt c
Out[25]:
1.000088900582341
In [26]:
diffp(line_y, 2, 1, 2, 3) # differentiate wrt x
Out[26]:
1.000088900582341
Chain Rule
In [76]:
# If we have a sequence of functions as mentioned above, we can represent it using another function: def sq(x): return x*x def cb(t): return t*t*t def sqcb(x): t = sq(x) y = cb(t) return y
In [77]:
# Differentiating it will be really straightforward diffp(sqcb, 0, 2)
Out[77]:
192.01706891180947
Third method:
In [28]:
x = 2 dt_dx = diffp(sq, 0, x)
In [29]:
t = sq(2) dy_dt = diffp(cb, 0, t)
In [30]:
dt_dx * dy_dt
Out[30]:
192.03413934105515
Gradient descent
When we have a function and we want to find out at what point this function would attain a minima or maxima. See more here: https://youtu.be/4Cu9fizAtAoIn [38]:
# Let us first generate some data import numpy as np # Generate 100 random numbers X = 2*np.random.random((100, 1))
In [39]:
# Now, let us generate the values of y corresponding to a line with slope 3 and intercept 4 # plus some noise y = 4 + 3 * X + .3*np.random.randn(100, 1)
In [40]:
# Let us take a look at first three records y[0:3]
Out[40]:
array([[7.46275157], [5.42272672], [5.26000136]])
In [41]:
X[0:3]
Out[41]:
array([[1.13629739], [0.52740146], [0.61857458]])
In [42]:
4+3*X[0:3]
Out[42]:
array([[7.40889218], [5.58220439], [5.85572374]])
In [43]:
# Let us plot it using matplotlib from matplotlib import pyplot as plt plt.figure(figsize=(11,4)) plt.scatter(X, y) plt.show() # horizontal axis is x and vertical axis is y
In [45]:
def error(m, c, X, y): ypred = m*X + c err = ypred-y return np.sum(err*err)
In [46]:
error(3, 4, np.array([1,2,3]), np.array([1,2,3]))
Out[46]:
200
In [47]:
error(3, 4, X, y)
Out[47]:
11.370852782527962
In [52]:
error(3.04, 4, X, y)
Out[52]:
11.164979946174462
In [83]:
def grad_fit(X, y): m = np.random.random() c = np.random.random() learning_rate = 0.001 for i in range(100): print("i: ", i, " Error: ", error(m, c, X, y)) dE_dm = diffp(error, 0, m, c, X, y) dE_dc = diffp(error, 1, m, c, X, y) m = m - learning_rate * dE_dm c = c - learning_rate * dE_dc return (m, c)
In [84]:
m, c = grad_fit(X, y)
i: 0 Error: 4083.7285975679897 i: 1 Error: 1357.6252100867644 i: 2 Error: 459.3592139100733 i: 3 Error: 163.29051705436652 i: 4 Error: 65.5610011647214 i: 5 Error: 33.1328146842778 i: 6 Error: 22.218172770940345 i: 7 Error: 18.399259306669478 i: 8 Error: 16.927083201555945 i: 9 Error: 16.240253000396848 i: 10 Error: 15.823423526524639 i: 11 Error: 15.506227542324433 i: 12 Error: 15.232001423349436 i: 13 Error: 14.981432087611957 i: 14 Error: 14.747680617493739 i: 15 Error: 14.52798983150515 i: 16 Error: 14.321092393591112 i: 17 Error: 14.125943647997243 i: 18 Error: 13.941884312505449 i: 19 Error: 13.76823354607412 i: 20 Error: 13.604392639484571 i: 21 Error: 13.449850857643208 i: 22 Error: 13.304041069116783 i: 23 Error: 13.1664926916126 i: 24 Error: 13.036677643076112 i: 25 Error: 12.914182340103585 i: 26 Error: 12.798551326229774 i: 27 Error: 12.689583643386808 i: 28 Error: 12.58679533656726 i: 29 Error: 12.48975407746471 i: 30 Error: 12.398176189973645 i: 31 Error: 12.311842550845657 i: 32 Error: 12.230368725423476 i: 33 Error: 12.15349017582421 i: 34 Error: 12.080929504768589 i: 35 Error: 12.01246968129011 i: 36 Error: 11.947881746213836 i: 37 Error: 11.886969741939057 i: 38 Error: 11.829508805337317 i: 39 Error: 11.775232938950976 i: 40 Error: 11.7240189978844 i: 41 Error: 11.675756159108447 i: 42 Error: 11.630169982103672 i: 43 Error: 11.58715098844346 i: 44 Error: 11.546590599235017 i: 45 Error: 11.508352340820826 i: 46 Error: 11.472259383091766 i: 47 Error: 11.438172320591587 i: 48 Error: 11.406064361894988 i: 49 Error: 11.375765097590975 i: 50 Error: 11.347175072319152 i: 51 Error: 11.320198663860097 i: 52 Error: 11.294768656072876 i: 53 Error: 11.27073288512202 i: 54 Error: 11.248069676645077 i: 55 Error: 11.22667985210804 i: 56 Error: 11.206523779032805 i: 57 Error: 11.187457348111439 i: 58 Error: 11.169498175945936 i: 59 Error: 11.15254221747177 i: 60 Error: 11.13654887644347 i: 61 Error: 11.12145424762985 i: 62 Error: 11.10723362998187 i: 63 Error: 11.093814960235605 i: 64 Error: 11.081145535104787 i: 65 Error: 11.069214843723582 i: 66 Error: 11.057944186806685 i: 67 Error: 11.04730077050891 i: 68 Error: 11.037282428082056 i: 69 Error: 11.027789502107026 i: 70 Error: 11.018836113798875 i: 71 Error: 11.010404557679433 i: 72 Error: 11.00246237284482 i: 73 Error: 10.994951898575751 i: 74 Error: 10.987863959791374 i: 75 Error: 10.981173260576831 i: 76 Error: 10.97487583850806 i: 77 Error: 10.968924851820436 i: 78 Error: 10.963317694512957 i: 79 Error: 10.95803070674304 i: 80 Error: 10.953029845889434 i: 81 Error: 10.948319335119882 i: 82 Error: 10.943855597092096 i: 83 Error: 10.939657430608891 i: 84 Error: 10.935700404741592 i: 85 Error: 10.93196932790682 i: 86 Error: 10.928446914760144 i: 87 Error: 10.925122307189463 i: 88 Error: 10.921987557948412 i: 89 Error: 10.919030199853355 i: 90 Error: 10.916238322489436 i: 91 Error: 10.913601802488085 i: 92 Error: 10.911115528431818 i: 93 Error: 10.908765814951405 i: 94 Error: 10.906556316793806 i: 95 Error: 10.904466955489564 i: 96 Error: 10.902504471647127 i: 97 Error: 10.900643630382485 i: 98 Error: 10.898883927123046 i: 99 Error: 10.897232145447704
In [85]:
m, c
Out[85]:
(3.153937346115943, 3.856832667230562)
In [86]:
from matplotlib import pyplot as plt x1 = 0 x2 = 2 y1 = m*x1 + c y2 = m*x2 + c ycap = m*X + c plt.figure(figsize=(11,4)) plt.subplot() # plt.scatter(X, ycap, color='r') plt.plot([x1, x2], [y1, y2], color = 'r') plt.subplot() plt.scatter(X, y, color='b') plt.show()
/usr/local/anaconda/lib/python3.6/site-packages/ipykernel_launcher.py:15: MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance. In a future version, a new instance will always be created and returned. Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance. from ipykernel import kernelapp as app
Backpropagation
Learn more about neural networks and backpropagation here: https://youtu.be/4Cu9fizAtAo?t=1652
In [88]:
## Generate Data import numpy as np X = np.random.random((100, 1)) y = 4 + 3 * X + .3*np.random.randn(100, 1)
In [89]:
def neuron(w11, w12, x): # We will talk about Activation function later. r = w11*x + w12 return r def forward_nn(w11, w12, w21, w22, x): h1 = neuron(w11, w12, x) o1 = neuron(w21, w22, h1) return o1 def sq_error(o1, yy): e = o1 - yy return e*e def error_nn(w11, w12, w21, w22, x, yy): o1 = forward_nn(w11, w12, w21, w22, x) e = o1 - yy return e*e
In [90]:
w11 = 0.5 w12 = 0.5 w21 = 0.5 w22 = 0.5 learning_rate = 0.001 for epoch in range(100): for i in range(len(X)): x = X[i] yy = y[i] dE_dw11 = diffp(error_nn, 0, w11, w12, w21, w22, x, yy) dE_dw12 = diffp(error_nn, 1, w11, w12, w21, w22, x, yy) dE_dw21 = diffp(error_nn, 2, w11, w12, w21, w22, x, yy) dE_dw22 = diffp(error_nn, 3, w11, w12, w21, w22, x, yy) w11 = w11 - learning_rate * dE_dw11 w12 = w12 - learning_rate * dE_dw12 w21 = w21 - learning_rate * dE_dw21 w22 = w22 - learning_rate * dE_dw22
In [91]:
ycap = forward_nn(w11, w12, w21, w22, X)
In [ ]:
In [92]:
from matplotlib import pyplot as plt plt.figure(figsize=(11,4)) plt.subplot() plt.plot(X, ycap, color='r') plt.subplot() plt.scatter(X, y, color='b') plt.show()
/usr/local/anaconda/lib/python3.6/site-packages/ipykernel_launcher.py:6: MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance. In a future version, a new instance will always be created and returned. Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.
Further Improvements – Backpropagation
In [93]:
# Initialize w11 = 0.5 w12 = 0.5 w21 = 0.5 w22 = 0.5 def neuron(w11, w12, x): #TODO: activation function return w11*x + w12 def sq_error(o1, yy): e = o1 - yy return e*e # Learning Rate eta = 0.01 for epoch in range(50): for i in range(len(X)): x = X[i] yy = y[i] #Forward Pass h1 = neuron(w11, w12, x) o1 = neuron(w21, w22, h1) dE_do1 = diffp(sq_error, 0, o1, yy) do1_dw21 = diffp(neuron, 0, w21, w22, h1) dE_w21 = dE_do1 * do1_dw21 do1_dw22 = diffp(neuron, 1, w21, w22, h1) dE_w22 = dE_do1 * do1_dw22 do1_dh = diffp(neuron, 2, w21, w22, h1) dE_dh = dE_do1 * do1_dh dh_dw11 = diffp(neuron, 0, w11, w12, x) dE_dw11 = dE_dh * dh_dw11 dh_dw12 = diffp(neuron, 1, w11, w12, x) dE_dw12 = dE_do1 * do1_dh * dh_dw12 w11 = w11 - eta * dE_dw11 w12 = w12 - eta * dE_dw12 w21 = w21 - eta * dE_w21 w22 = w22 - eta * dE_w22
In [94]:
# ycap = forward_nn(w11, w12, w21, w22, X) h1 = neuron(w11, w12, X) ycap1 = neuron(w21, w22, h1) from matplotlib import pyplot as plt plt.figure(figsize=(11,4)) # plt.subplot() # plt.scatter(X, ycap, color='r') plt.subplot() plt.plot(X, ycap1, color='g') plt.subplot() plt.scatter(X, y, color='b') plt.show()
/usr/local/anaconda/lib/python3.6/site-packages/ipykernel_launcher.py:13: MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance. In a future version, a new instance will always be created and returned. Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance. del sys.path[0]
Extensions
- Neurons are many. Weight are computed in a single shot using numpy.
- The layers are many and the code needs to be flexible to stitch many layers flexible
- Neuron definition is complex – activation function
Let us know in the comments below if you find it helpful.
In order to claim the certificate from E&ICT Academy, IIT Roorkee, visit https://www.cloudxlab.com
Website ► https://www.cloudxlab.com
Facebook ► https://www.facebook.com/cloudxlab
Instagram ► https://www.instagram.com/cloudxlab
Twitter ► http://www.twitter.com/cloudxlab