Feature selection for Machine Learning
The data features used to train the machine learning model have a great impact on the ultimate performance. Irrelevant or partially relevant feature can negatively influence the model.
The various automatic feature selection techniques are:
- Univariate Selection
- Recursive feature elimination
- Principal Component Analysis
- Feature Importance
- Reduces overfitting
- Improves accuracy
- Reduces training time
Univariate Selection:
- This selection can be used to select the features that have the strongest relationship with the output variable.
- The example below uses scikit-learn which provides SeleceKBest class that can be used combinely with chi-squared (chi2) statistical test for non-negative features to select 4 of the best features from the dataset(Pima Indian).
Recursive Feature Elimination:
- REF works by recursively removing the attributes and building a model on those attributes that are left.
- The example below uses RFE with Logistic Regression algorithm to select the top 4 features.
Principal Component Analysis:
- PCA is a data compression or data reduction technique.
- We can choose the number of dimensions or principal components in the transformed result.
Comments
Post a Comment