Machine Learning in Python

December 18, 2018

Feature selection for Machine Learning

The data features used to train the machine learning model have a great impact on the ultimate performance. Irrelevant or partially relevant feature can negatively influence the model.

The various automatic feature selection techniques are:

Univariate Selection
Recursive feature elimination
Principal Component Analysis
Feature Importance

Benefits of feature selection techniques:

Reduces overfitting
Improves accuracy
Reduces training time

Univariate Selection:

This selection can be used to select the features that have the strongest relationship with the output variable.
The example below uses scikit-learn which provides SeleceKBest class that can be used combinely with chi-squared (chi2) statistical test for non-negative features to select 4 of the best features from the dataset(Pima Indian).

Recursive Feature Elimination:

REF works by recursively removing the attributes and building a model on those attributes that are left.
The example below uses RFE with Logistic Regression algorithm to select the top 4 features.

Principal Component Analysis:

PCA is a data compression or data reduction technique.
We can choose the number of dimensions or principal components in the transformed result.

Feature Importance:

Decision trees like Random Forest and Extra Trees can be used to estimate the importance of features.
In the below example we used ExtraTreeClassifier class in scikit-learn library.
Larger score indicates Important Features.

Search This Blog

Machine Learning in Python

Feature selection for Machine Learning

Univariate Selection:

Recursive Feature Elimination:

Principal Component Analysis:

Feature Importance:

Comments

Post a Comment

Popular posts from this blog