In LeNet architecture, the maps columns has values 6,6,16,16 and so on. By map, what I understood from tutorial is number of filters, but nowhere we defined what kind of filters they are.For instance, in the example of china.jpg image, we took horizontal and vertical filters. Can, you please clear me what kind of filters they are and is there any calculation for defining the numbers ?
And at the ouput , size is 10 and rest of the parameters become null. So, is this a normal ANN where previous layers neurons are connected to 10 ouput layer. And why 10, is there are we classify 10 different classes of images here because in ANN we saw, number of neurons in output layer is equal to number of classes.
It is the number of filters. They act as feature extractors like detecting edges, colors, shades, etc in earlier layers of the network, and the deeper filters act as feature extractors on top of the basic features extracted in the earler layers. So training means modifying the values of filters such that the final result is close to actual label.
In the final layer, we have dense layers to output the probability of the image belonging to each class.
Number of output features is equal to the number of filters ryt ? So, when you say 5x5 ouputting 200 feature maps of size 150x100 then it means out of 150X100 image we are getting 200 different features by applying 200 different filters of size 5x5 ryt.
And does the value of bias depends upon the number of strides ?
Pooling confers stability to deformation at initialization but the stability changes significantly over the course of training and converges to a similar stability regardless of whether pooling is used. At slide, 95, In case of Alexnet Pooling is used for C1 and C3. not for rest C layers due to my first statement.
My question is how would I come to know that these many pooling layers are required to get deformation stability?
Unfortunately, there are no hard and fast rules for choosing the number of layers. There is no one-size-fits-all when it comes to ML/DL. It would vary with each dataset/problem at hand.
At slide no. 41, second statement CNN is capable of detecting multiple features anywhere in its inputs.
if i apply rotation in image position of specific feature will change. and limitation of the feature map output of convolutional layers is that they record the precise position of features in the input. This means that small movements in the position of the feature in the input image will result in a different feature map. So one feature map will have different parameters for before rotation and after rotation image. Then how would we say CNN will detect object correctly for both image.
I think there is gap in my understanding. Kindly fill it.
In the lecture, we talk about a filter being defined as an array, but in the final classification, we define filter as conv1_fmaps = 32 . How is this filter applied - shouldn't this also be an array? Or does it mean that each individual pixel is multipled by 32?
These are the individual components. If you notice just above that, we have height, width, channels etc. These are also the individual componenets. If you notice below you will see we are using them together to form maps and strides etc.
Sorry for asking again. Like in the previous example, we had 2 filters - one vertical and one horizontal which had only 1st in a column and a row respectively. In this example, what is the filter array genereted?
There are 2 convolutional layers here, conv1 and conv2. For each, we have defined a fmap, ksize, stride, and a padding value. I would suggest you to consult the slides to understand what each of these does. Also, a filter will not always be a simple horizontal or vertical filter, it was shown that way so that you can understand it's function easily.
Sure, I get that. But fmap is a simple number here? Isn't fmap the filter here - the reason I am confused is that a filter is represented by a single number as opposed to an array. ksize, stride and padding - I totally understand but my question is purely about the filter or fmap. Can a filter be a single number? is my understanding correct?
Here, filter is the dimensionality of the output space (i.e. the number of output filters in the convolution). It only accepts an integer input. Please go through the official documentation for more details:
Apologies for the typo. I mean the convolutional_neural_networks.ipynb notebook in our repository. If you look at that you will find that there are no section numbers. So it would help if you can mention the sections.
This is our GitHub repository, if you refer to this notebook you wil notice that there are no section numbers. So it would be very helpful if you would please tell me the name of the sections. Also, as mentioned in my previous comments, I would suggest you to go through the course materials first. Without going through them you will not be able to clear your concepts for Machine Learning or Deep Learning.
CNN has applications in image and video recognition, recommender systems, image classification, medical image analysis, natural language processing, and financial time series.
I checked from my end that you have not gone through most of the lecture videos in both Machine Learning and Deep Learning. I would suggest you to start from topic# 1, and move onwards only after you have gone through each and every assessment and lecture video in that topic. Without going through these topics, it would be next to impossible for you to put together the pieces in this course.
now , what is this colon defining and 3 defining, Is it length is 3 and width is all. And, the images with this dimension is equal to 1.? I have a little doubt about this
Stride is the number of pixels shifts over the input matrix. When the stride is 1 then we move the filters to 1 pixel at a time. When the stride is 2 then we move the filters to 2 pixels at a time and so on. Stride controls how the filter convolves around the input volume.
Hello If CNN is trained on MNIST data with input shape 128*128, then if a new data of size say 130*130 is provided to the trained CNN for prediction then will the trained CNN be able to recognise the changed input shape and do the prediction? I ask this because the trained network connection weights are according to 128*128 shape.
Please login to comment
54 Comments
Hi Team,
In LeNet architecture, the maps columns has values 6,6,16,16 and so on. By map, what I understood from tutorial is number of filters, but nowhere we defined what kind of filters they are.For instance, in the example of china.jpg image, we took horizontal and vertical filters. Can, you please clear me what kind of filters they are and is there any calculation for defining the numbers ?
And at the ouput , size is 10 and rest of the parameters become null. So, is this a normal ANN where previous layers neurons are connected to 10 ouput layer. And why 10, is there are we classify 10 different classes of images here because in ANN we saw, number of neurons in output layer is equal to number of classes.
Please clarify my above doubts.
Regards,
Birendra Singh
Upvote ShareHi,
It is the number of filters. They act as feature extractors like detecting edges, colors, shades, etc in earlier layers of the network, and the deeper filters act as feature extractors on top of the basic features extracted in the earler layers. So training means modifying the values of filters such that the final result is close to actual label.
In the final layer, we have dense layers to output the probability of the image belonging to each class.
Thanks.
Upvote ShareHi Team,
Number of output features is equal to the number of filters ryt ? So, when you say 5x5 ouputting 200 feature maps of size 150x100 then it means out of 150X100 image we are getting 200 different features by applying 200 different filters of size 5x5 ryt.
And does the value of bias depends upon the number of strides ?
Regards,
Birendra Singh
Upvote ShareHi,
Just like in ANN, bias and filter values are something which we optimize through training.
Thanks.
Upvote ShareThis comment has been removed.
Hi,
Pooling confers stability to deformation at initialization but the stability changes significantly over the course of training and converges to a similar stability regardless of whether pooling is used. At slide, 95, In case of Alexnet Pooling is used for C1 and C3. not for rest C layers due to my first statement.
My question is how would I come to know that these many pooling layers are required to get deformation stability?
Thanks!
Upvote ShareHi,
Unfortunately, there are no hard and fast rules for choosing the number of layers. There is no one-size-fits-all when it comes to ML/DL. It would vary with each dataset/problem at hand.
Thanks.
1 Upvote ShareHi,
At slide no. 41, second statement CNN is capable of detecting multiple features anywhere in its inputs.
if i apply rotation in image position of specific feature will change. and limitation of the feature map output of convolutional layers is that they record the precise position of features in the input. This means that small movements in the position of the feature in the input image will result in a different feature map. So one feature map will have different parameters for before rotation and after rotation image. Then how would we say CNN will detect object correctly for both image.
I think there is gap in my understanding. Kindly fill it.
Thanks!
Upvote ShareHi,
Great question!
You can find a detailed explanation at the below link:
https://stats.stackexchange.com/questions/239076/about-cnn-kernels-and-scale-rotation-invariance
Thanks.
1 Upvote ShareIn the lecture, we talk about a filter being defined as an array, but in the final classification, we define filter as conv1_fmaps = 32 . How is this filter applied - shouldn't this also be an array? Or does it mean that each individual pixel is multipled by 32?
Upvote ShareHi,
Could you please refer me to the timestamp of that part of the video where we have a fliter defined by the number 32?
Thanks.
Upvote ShareHi,
This is at 2:45:03 , though it is also in the jupyter notebook. Producing the screenshot below:
Hi,
These are the individual components. If you notice just above that, we have height, width, channels etc. These are also the individual componenets. If you notice below you will see we are using them together to form maps and strides etc.
Thanks.
Upvote ShareSorry for asking again. Like in the previous example, we had 2 filters - one vertical and one horizontal which had only 1st in a column and a row respectively. In this example, what is the filter array genereted?
Upvote ShareHi,
There are 2 convolutional layers here, conv1 and conv2. For each, we have defined a fmap, ksize, stride, and a padding value. I would suggest you to consult the slides to understand what each of these does. Also, a filter will not always be a simple horizontal or vertical filter, it was shown that way so that you can understand it's function easily.
Thanks.
Upvote ShareSure, I get that. But fmap is a simple number here? Isn't fmap the filter here - the reason I am confused is that a filter is represented by a single number as opposed to an array. ksize, stride and padding - I totally understand but my question is purely about the filter or fmap. Can a filter be a single number? is my understanding correct?
Upvote ShareHi,
Here, filter is the dimensionality of the output space (i.e. the number of output filters in the convolution). It only accepts an integer input. Please go through the official documentation for more details:
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D
Thanks.
Upvote ShareHello,
Can you please explain on what basis we need to decide the number of filters, kernel size, stride, number of convolution layers, pooling layer, etc?
Regards,
Sekar MP
Upvote ShareHi,
This should explain it in detail:
https://stackoverflow.com/questions/36243536/what-is-the-number-of-filter-in-cnn
Thanks.
Upvote Shareis it possible to do something like ?
model = keras.sequential([conv2d,
where conv2d is a layer as created in this video
Upvote ShareHi,
You could do that. This might help you: https://keras.io/api/layers/convolution_layers/convolution2d/
Thanks.
Upvote Sharearchitecture of resnet is not in the slide?
Upvote ShareHi,
You can refer the RESNET-50 architecture here : http://ethereon.github.io/netscope/#/gist/db945b393d40bfa26006
All the best!
Upvote ShareAccording to the convolutional_neural_networks.ipynb
There are 2 ways for prediction,Am I right?
1, chain the layers, and take the output of softmax or dense
2.use model, compile and predict from model
Upvote ShareHi,
Could you please tell me which part of the notebook are you referring to in each pointer?
Thanks.
Upvote Sharesection 3
and section 4
Upvote ShareHi,
Could you please name the sections because RNN notebook does not have any section numbers.
Thanks.
Upvote Shareit is the CNN notebook, I am referring to
Upvote ShareHi,
Apologies for the typo. I mean the convolutional_neural_networks.ipynb notebook in our repository. If you look at that you will find that there are no section numbers. So it would help if you can mention the sections.
Thanks.
Upvote Sharehttps://jupyter.e.cloudxlab.com/user/manjarisingh8687/notebooks/ml/deep_learning/convolutional_neural_networks.ipynb
Upvote ShareHi,
Please refer to the below link and let me know the sections your are referring to:
https://github.com/cloudxlab/ml/blob/master/deep_learning/convolutional_neural_networks.ipynb
This is our GitHub repository, if you refer to this notebook you wil notice that there are no section numbers. So it would be very helpful if you would please tell me the name of the sections. Also, as mentioned in my previous comments, I would suggest you to go through the course materials first. Without going through them you will not be able to clear your concepts for Machine Learning or Deep Learning.
Thanks.
1 Upvote Sharewhat does the line
image = china[150:220,130:250] do?
---------------------------
I did the same
import numpy
arr = numpy.array([[1,2,3],[3,4,5]])
sarr= arr[0:1,1:2]
print(arr)
print(sarr)
---------
, it gave ,
---------
output
[[1 2 3] [3 4 5]]
[[2]]
Upvote ShareHi,
This splits the china image.
Thanks.
1 Upvote ShareAre there tools/libraries in market, which do all these?
such as OpenCV?
This seems like too much manual work
Upvote ShareHi,
Here we have explained the concepts. Without understanding the concepts, you would not ve able to understand which library to use and where.
Thanks.
1 Upvote ShareAll the types of CNN layers, and GoogLeNet etc, are most relevant for only image classification?
Upvote ShareHi,
CNN has applications in image and video recognition, recommender systems, image classification, medical image analysis, natural language processing, and financial time series.
Thanks.
1 Upvote ShareHi Manjari,
I checked from my end that you have not gone through most of the lecture videos in both Machine Learning and Deep Learning. I would suggest you to start from topic# 1, and move onwards only after you have gone through each and every assessment and lecture video in that topic. Without going through these topics, it would be next to impossible for you to put together the pieces in this course.
Thanks.
Upvote ShareHow we define the filter eg
[ : , 3, : , 0 ] =1 # vertical line
now , what is this colon defining and 3 defining, Is it length is 3 and width is all. And, the images with this dimension is equal to 1.? I have a little doubt about this
Upvote ShareHi,
Would request you to go over the lecture video once again to understand the concept of filters. It has been explained in detail there.
Thanks.
Upvote ShareIs stride, 4 number eg, (1,1,2,3) or 1 number eg, single number 2???
Upvote ShareHi,
Stride is the number of pixels shifts over the input matrix. When the stride is 1 then we move the filters to 1 pixel at a time. When the stride is 2 then we move the filters to 2 pixels at a time and so on. Stride controls how the filter convolves around the input volume.
Thanks.
Upvote Sharehow one should take stride = 1 or 2 or 3, aren't we missing regular neurons data by just jumping the layers.
Upvote ShareHi,
Would request you to go through the lecture to understand strides in details.
Thanks.
Upvote ShareThis comment has been removed.
This comment has been removed.
This comment has been removed.
This comment has been removed.
Hi,
what does these 2 lines mean in cnn code ?
fmap[3, 3, 0, 2] = 1
plot_image(fmap[:, :, 0, 2])
Upvote ShareHi,
This means that those elements of the fmap array are being set to 1. You can check after every such line by printing the fmap array in a new cell.
Thanks.
Upvote Sharewhy is strides a four dimensional tensor
Upvote ShareHi,
The inputs are 4 dimensional and are of form: [batch_size, image_rows, image_cols, number_of_colors]
You can find more information in this thread:
https://stackoverflow.com/q...
Thanks.
-- Rajtilak Bhattacharjee
Upvote ShareHello
Upvote ShareIf CNN is trained on MNIST data with input shape 128*128, then if a new data of size say 130*130 is provided to the trained CNN for prediction then will the trained CNN be able to recognise the changed input shape and do the prediction? I ask this because the trained network connection weights are according to 128*128 shape.
Hi Alok,
You will need to resize the image to 128x128. For this you can use pillow library.
Thanks
Upvote Share