Principal Components Analysis chooses the first PCA axis as that line that goes through the centroid, but also minimizes the square of the distance of each point to that line. Thus, in some sense, the line is as close to all of the data as possible. Equivalently, the line goes through the maximum variation in the data.
In case of image recognition ML problem the sparseness of data may be dependant on no. of dimensions. But in other cases say data collected on features of a certain product to build a ML model, if we reduce any particular dimention, the data collected corresponding to that dimension will also be removed. So, in this case how sparseness of data can be said to depend on no. of dimensions? I am confused because dimension reducttion also implies feature selection, so when a feature is removed the data collected agains it is also removed. Pl explain.
Dimensionality reduction refers to techniques that reduce the number of input variables in a dataset. More input features often make a predictive modeling task more challenging to model, more generally referred to as the curse of dimensionality. The performance of machine learning algorithms can degrade with too many input variables.
Fewer input dimensions often mean correspondingly fewer parameters or a simpler structure in the machine learning model, referred to as degrees of freedom. A model with too many degrees of freedom is likely to overfit the training dataset and therefore may not perform well on new data.
It is desirable to have simple models that generalize well, and in turn, input data with few input variables. This is particularly true for linear models where the number of inputs and the degrees of freedom of the model are often closely related.
Thanks for the detailed concept of dimensionality reduction and its importance in ML. But am I correct to assume that sparseness of data due to more no. of dimensions is particularly related to image recognition problems? In statistics we reduce the dimentions/features by factor analysis, where only most signicant features/dimentions are retained for final analysis.
Dimentionality reduction has far more varied application than only for solving image related problems. For example, we can use it for spam classifiers.
Please login to comment
19 Comments
I case of PCA I suppose we are fixing one axis then how it can have multiple orthogonal axis?
Upvote ShareHi,
Principal Components Analysis chooses the first PCA axis as that line that goes through the centroid, but also minimizes the square of the distance of each point to that line. Thus, in some sense, the line is as close to all of the data as possible. Equivalently, the line goes through the maximum variation in the data.
Thanks.
Upvote ShareIn case of image recognition ML problem the sparseness of data may be dependant on no. of dimensions. But in other cases say data collected on features of a certain product to build a ML model, if we reduce any particular dimention, the data collected corresponding to that dimension will also be removed. So, in this case how sparseness of data can be said to depend on no. of dimensions? I am confused because dimension reducttion also implies feature selection, so when a feature is removed the data collected agains it is also removed. Pl explain.
Upvote ShareHi,
Dimensionality reduction refers to techniques that reduce the number of input variables in a dataset. More input features often make a predictive modeling task more challenging to model, more generally referred to as the curse of dimensionality. The performance of machine learning algorithms can degrade with too many input variables.
Fewer input dimensions often mean correspondingly fewer parameters or a simpler structure in the machine learning model, referred to as degrees of freedom. A model with too many degrees of freedom is likely to overfit the training dataset and therefore may not perform well on new data.
It is desirable to have simple models that generalize well, and in turn, input data with few input variables. This is particularly true for linear models where the number of inputs and the degrees of freedom of the model are often closely related.
Thanks.
Upvote ShareThis comment has been removed.
Thanks for the detailed concept of dimensionality reduction and its importance in ML. But am I correct to assume that sparseness of data due to more no. of dimensions is particularly related to image recognition problems? In statistics we reduce the dimentions/features by factor analysis, where only most signicant features/dimentions are retained for final analysis.
Upvote ShareHi,
Good question!
Dimentionality reduction has far more varied application than only for solving image related problems. For example, we can use it for spam classifiers.
Thanks.
Upvote ShareThis comment has been removed.
Hi
Upvote ShareWhy we have added 1 to this expression while computing no. of features after applying PCA i.e. 'd = np.argmax(cumsum >= 0.95) +1'
Hi,
This is because for visualization purposes it had to be reduced to 2 or 3.
Thanks.
-- Rajtilak Bhattacharjee
Upvote ShareHi
Upvote ShareI didnt get ur point of this reduction. Plz clarify.
Thanks
Hi,
Can you try the code without adding the 1 and then share the output?
Thanks.
-- Rajtilak Bhattacharjee
Upvote ShareHi
Upvote Shareoutput is 153 features left.But if we add 1 it will show 154 features.
Hi,
Please find the explanation here:
https://discuss.cloudxlab.c...
Thanks.
-- Rajtilak Bhattacharjee
Upvote ShareCould you please correct the notebooks where we do
from sklearn.datasets import fetch_mldata
?
there is always this error: ImportError: cannot import name 'fetch_mldata'
Upvote ShareHi,
We have already updated our notebooks. Would request you to clone the updated files from our GitHub repository.
Thanks.
-- Rajtilak Bhattacharjee
Upvote ShareThank you.
Upvote Sharethe slides are not available.... please kindly fix the issue !
Upvote ShareThanks
Hi Jean,
Apologies for the inconvenience. We have fixed the issue and now you would be able to view/download the slides.
Thanks.
-- Rajtilak Bhattacharjee
Upvote Share