Please clarify my doubts regarding the manifolds data.
1) During the initial data exploration of the given dataset how would be able to know whether the data is manifold?
2) During initial data exploration we generally do the scatterplot of data to visualize the spread and orientation of the data. So if we have to also check whether the data is manifold or not are there any steps that we need to include to arrive at this conclusion so that we exactly know what we need to do with the data?
Manifold Learning is a class of unsupervised estimators that seeks to describe datasets as low-dimensional manifolds embedded in high-dimensional spaces. So if you have a high-dimensional dataset, and you want to reduce the number of dimensions, then you can use the manifold learning classes. Now, to check the shape of the dataset, you can plot it using Matplotlib. The codes can be found on your Jupyter notebook for this course:
I have fixed this too, could you please recheck? As for a single place to download all slides, we do not have that facility as of now. So, would request you to download the slides as and when required.
Yes i understand the files keep changing.What I'm trying to say is that archives are supposed to be references but page 105 is basically page 57.Please recheck this.
There is no WhatsApp group for learners as of now. However, you can post your queries here in your comments, in our discussion forum, or you can mail us at reachus@cloudxlab.com.
Hi In KPCA , how we would choose gamma values in options for selecting best gamma in below code: param_grid = [{ "kpca__gamma": np.linspace(0.03, 0.05, 10), "kpca__kernel": ["rbf", "sigmoid"]}]
One of the most common methods of model selection (in this case the parameter gamma) is Cross Validation. The idea is to hold a subset of your data that you will not use for training your algorithm, and then you will compare the cost functions associated with the two sets (training and CV) in order to find the “sweet spot” between a high variance and a high bias. You can find more details here: https://stats.stackexchange... Thanks.
Hhi In randomized PCA, there would be possibility of leaving some data behind during choosing random samples.So As compared to randomized it would be better to choose incremental PCA over batch PCA???
Incremental principal component analysis (IPCA) is typically used as a replacement for principal component analysis (PCA) when the dataset to be decomposed is too large to fit in memory. IPCA builds a low-rank approximation for the input data using an amount of memory which is independent of the number of input data samples. It is still dependent on the input data features, but changing the batch size allows for control of memory usage. You can find more information here: https://scikit-learn.org/st... Thanks.
When we used MNIST dataset to form an image we used 28*28, but when we dimesnionally reduced number of dimensions to 157. how we will form an image as like 28*28 for features of 784?
We can recover the data by say inverse transform of pca, if we have used for pca for dimensionality reduction. This will give us all the 784 features but will lose some information in the recovery process termed as reconstruction error
If the dimensions are orthogonal to each other then you will be able to clearly distinguish between the components and the effect of one will not distort the other.
Please login to comment
38 Comments
Hello
Upvote ShareI did not notice that in slide 60
w2. , u, s, v
What are they?
Can you explain?
Hi, You can refer to https://numpy.org/doc/stable/reference/generated/numpy.linalg.svd.html for the details
Upvote ShareThank you very much for your constant help
Upvote ShareAfter PCA , can we find out which features have been considered. If yes, then how can we find out
Upvote ShareHello,
Please clarify my doubts regarding the manifolds data.
1) During the initial data exploration of the given dataset how would be able to know whether the data is manifold?
2) During initial data exploration we generally do the scatterplot of data to visualize the spread and orientation of the data. So if we have to also check whether the data is manifold or not are there any steps that we need to include to arrive at this conclusion so that we exactly know what we need to do with the data?
Thanks
Upvote ShareHi,
Manifold Learning is a class of unsupervised estimators that seeks to describe datasets as low-dimensional manifolds embedded in high-dimensional spaces. So if you have a high-dimensional dataset, and you want to reduce the number of dimensions, then you can use the manifold learning classes. Now, to check the shape of the dataset, you can plot it using Matplotlib. The codes can be found on your Jupyter notebook for this course:
ml/dimensionality_reduction.ipynb at master · cloudxlab/ml (github.com)
Thanks.
Upvote ShareSlides for dimensionality reduction arent accesible and not updated with archives. Please do so
Upvote ShareHi,
I have updated the link, could you please check and let me know if it is working now.
Thanks.
Upvote ShareI feel the archive page is incorrect at the end. Moreover the video has 105 slides and this ppt has 103.Can you please recheck?
Upvote ShareHi,
I have fixed this too, could you please recheck? As for a single place to download all slides, we do not have that facility as of now. So, would request you to download the slides as and when required.
Thanks.
Upvote ShareThanks but i still feel there's a problem in the slide.Pg 57 and 105 are the same and there isn't any archives page at 105
Upvote ShareHi,
Slide# 104 onwards are the archive slides. The presentation may have changed slightly as we keep updating our course materials.
Thanks.
Upvote ShareYes i understand the files keep changing.What I'm trying to say is that archives are supposed to be references but page 105 is basically page 57.Please recheck this.
Thanks
Upvote ShareThhis is slide number 105 where archives should be a URL or related documents ?
Hi,
You are correct! As of now we do not have any content for the archive section, so this is the complete set of slides.
Thanks.
Upvote ShareAlso is there any place we can get all the slides in one place for convenience ?
Is there any WhatsApp group for CloudXLab Learners as mentioned by Sandeep sir in this session ?
Upvote ShareHi,
There is no WhatsApp group for learners as of now. However, you can post your queries here in your comments, in our discussion forum, or you can mail us at reachus@cloudxlab.com.
Thanks.
-- Rajtilak Bhattacharjee
Upvote ShareHi
Upvote ShareIf manifold techniques are better than PCA, then should we always approach manifold techniques for dimentionality reduction?
Thanks
Hi,
Manifold is used in case on non-linear data. So, it depends on the data you are working with.
Thanks.
-- Rajtilak Bhattacharjee
1 Upvote ShareHi
Upvote ShareIn KPCA , how we would choose gamma values in options for selecting best gamma in below code:
param_grid = [{
"kpca__gamma": np.linspace(0.03, 0.05, 10),
"kpca__kernel": ["rbf", "sigmoid"]}]
Hi,
One of the most common methods of model selection (in this case the parameter gamma) is Cross Validation. The idea is to hold a subset of your data that you will not use for training your algorithm, and then you will compare the cost functions associated with the two sets (training and CV) in order to find the “sweet spot” between a high variance and a high bias. You can find more details here:
https://stats.stackexchange...
Thanks.
-- Rajtilak Bhattacharjee
Upvote ShareHi
Upvote ShareCan Memory map be used in any kind of PCA implementation?
Hi,
Memory mapping and PCA are not related, but can be used in conjunction.
Thanks.
-- Rajtilak Bhattacharjee
1 Upvote ShareHhi
Upvote ShareIn randomized PCA, there would be possibility of leaving some data behind during choosing random samples.So As compared to randomized it would be better to choose incremental PCA over batch PCA???
Hi,
Incremental principal component analysis (IPCA) is typically used as a replacement for principal component analysis (PCA) when the dataset to be decomposed is too large to fit in memory. IPCA builds a low-rank approximation for the input data using an amount of memory which is independent of the number of input data samples. It is still dependent on the input data features, but changing the batch size allows for control of memory usage. You can find more information here:
https://scikit-learn.org/st...
Thanks.
-- Rajtilak Bhattacharjee
Upvote ShareHi
Upvote ShareCan we have some mock interviews?
Thanks
Hi,
Thank you for your suggestion. We will look into this and will get back to you.
Thanks.
-- Rajtilak Bhattacharjee
2 Upvote ShareThanks for considering. It would be really great if this can be arranged.Looking forward to it.
Upvote ShareThanks
Hi Prachi/ Cloudxlab team, I am also looking forward for mock interview session.
It will be of great help.
Best Wishes! Sasmita
Upvote ShareHi Sasmita,
As of now we do not have any provision for mock interview sessions.
Thanks.
Upvote ShareIn slide no 61 of Dimensionality reductions, please explain that in "d= np.argmax(cumsum >= 0.95) + 1" what is the significance of "+1".
Upvote Sharei cant download the slides for Xgboost and Naives
Upvote ShareWhen we used MNIST dataset to form an image we used 28*28, but when we dimesnionally reduced number of dimensions to 157. how we will form an image as like 28*28 for features of 784?
Upvote ShareHi, Vinod.
Good question.
That is what is the concept of "PCA" Principal Component Analysis and Dimensionality Reductions.
Where by using the minimum number of the dimensions you will be able to retrieve the information with minimum number of components.
All the best
Upvote ShareWe can recover the data by say inverse transform of pca, if we have used for pca for dimensionality reduction. This will give us all the 784 features but will lose some information in the recovery process termed as reconstruction error
Upvote ShareWhat is purpose of each dimension being orthogonal to each other. What happens if those dimensions are not orthogonal?
Upvote ShareHi, Vinod.
If the dimensions are orthogonal to each other then you will be able to clearly distinguish between the components and the effect of one will not distort the other.
All the best!
1 Upvote Share