Dimensionality Reduction is method used to overcome the curse of dimensionality. When the number of independent features increase significantly, there is a general possibility of counting in redundant and unnecessary features. The method to handle such events is known as Dimensionality Reduction.
Two ways to counter this ->
1) Feature Selection
In Feature Selection, out of all the features, attempt is made to find out 'k' most significant features which are a good representative of the entire dataset. Here good representative is usually decided by setting a threshold value.
There are various tests & methods to perform this. ANOVA, Pearson, Spearman, Chi-Squared, Tree Method, Random Forest.
2) Feature Extraction
In Feature Extraction, a new set of processed feature set is generated which captures the majority of the variance of the dataset. PCA is usually considered the best method of performing Feature Extraction. Neural Networks have the mechanism of performing Feature Extraction. Linear Discriminant Analysis is also another method widely used. T-SNE is used to visualize the high dimensionality distribution in a lower 2or3 dimension.
Comments