Feature selection for Machine Learning The data features used to train the machine learning model have a great impact on the ultimate performance. Irrelevant or partially relevant feature can negatively influence the model. The various automatic feature selection techniques are: Univariate Selection Recursive feature elimination Principal Component Analysis Feature Importance Benefits of feature selection techniques: Reduces overfitting Improves accuracy Reduces training time Univariate Selection: This selection can be used to select the features that have the strongest relationship with the output variable. The example below uses scikit-learn which provides SeleceKBest class that can be used combinely with chi-squared (chi2) statistical test for non-negative features to select 4 of the best features from the dataset(Pima Indian). Recursive Feature Elimination: REF works by recursively removing the attributes a...
Popular posts from this blog
Knowing sample dataset for use in Machine Learning The Pima Indians Diabetic Dataset is used to illustrate the Machine Learning concepts. The dataset describes the medical records for Pima Indians and whether or not each patient will have an onset of diabetes within five years. # Columns description Pregnancies(preg) Number of times pregnant Glucose(plas) Plasma glucose concentration a 2 hours in an oral glucose tolerance test BloodPressure(pres) Diastolic blood pressure (mm Hg) SkinThickness(skin) Triceps skin fold thickness (mm) Insulin(test) 2-Hour serum insulin (mu U/ml) BMI(mass) Body mass index (weight in kg/(height in m)^2) DiabetesPedigreeFunction(pedi) Diabetes pedigree function Age(age) Age (years) Outcome(class) Class variable (0 or 1) 268 of 768 are 1, the others are 0 ...
Comments
Post a Comment