Dimensionality Reduction Part -1

Please login to comment

19 Comments

RONIT ROY

5 years ago

I case of PCA I suppose we are fixing one axis then how it can have multiple orthogonal axis?

Upvote Share

Rajtilak Bhattacharjee

5 years ago

Hi,

Principal Components Analysis chooses the first PCA axis as that line that goes through the centroid, but also minimizes the square of the distance of each point to that line. Thus, in some sense, the line is as close to all of the data as possible. Equivalently, the line goes through the maximum variation in the data.

Thanks.

Upvote Share

SANJAY RAY

5 years ago

In case of image recognition ML problem the sparseness of data may be dependant on no. of dimensions. But in other cases say data collected on features of a certain product to build a ML model, if we reduce any particular dimention, the data collected corresponding to that dimension will also be removed. So, in this case how sparseness of data can be said to depend on no. of dimensions? I am confused because dimension reducttion also implies feature selection, so when a feature is removed the data collected agains it is also removed. Pl explain.

Upvote Share

Rajtilak Bhattacharjee

5 years ago

Hi,

Dimensionality reduction refers to techniques that reduce the number of input variables in a dataset. More input features often make a predictive modeling task more challenging to model, more generally referred to as the curse of dimensionality. The performance of machine learning algorithms can degrade with too many input variables.

Fewer input dimensions often mean correspondingly fewer parameters or a simpler structure in the machine learning model, referred to as degrees of freedom. A model with too many degrees of freedom is likely to overfit the training dataset and therefore may not perform well on new data.

It is desirable to have simple models that generalize well, and in turn, input data with few input variables. This is particularly true for linear models where the number of inputs and the degrees of freedom of the model are often closely related.

Thanks.

Upvote Share

This comment has been removed.

SANJAY RAY

5 years ago

Thanks for the detailed concept of dimensionality reduction and its importance in ML. But am I correct to assume that sparseness of data due to more no. of dimensions is particularly related to image recognition problems? In statistics we reduce the dimentions/features by factor analysis, where only most signicant features/dimentions are retained for final analysis.

Upvote Share

Rajtilak Bhattacharjee

5 years ago

Hi,

Good question!

Dimentionality reduction has far more varied application than only for solving image related problems. For example, we can use it for spam classifiers.

Thanks.

Upvote Share

This comment has been removed.