Knowing sample dataset for use in Machine Learning The Pima Indians Diabetic Dataset is used to illustrate the Machine Learning concepts. The dataset describes the medical records for Pima Indians and whether or not each patient will have an onset of diabetes within five years. # Columns description Pregnancies(preg) Number of times pregnant Glucose(plas) Plasma glucose concentration a 2 hours in an oral glucose tolerance test BloodPressure(pres) Diastolic blood pressure (mm Hg) SkinThickness(skin) Triceps skin fold thickness (mm) Insulin(test) 2-Hour serum insulin (mu U/ml) BMI(mass) Body mass index (weight in kg/(height in m)^2) DiabetesPedigreeFunction(pedi) Diabetes pedigree function Age(age) Age (years) Outcome(class) Class variable (0 or 1) 268 of 768 are 1, the others are 0 ...
Popular posts from this blog
Pre-processing the data before applying a Machine Learning Algorithm Rescale data Standardize data normalize data Barnarize data Rescale Data: Rescale all the attributes to have the same scale. Generally attributes are often rescaled into the range between 0 and 1 for better optimization. We can rescale the data using scikit-learn using the MinMaxScaler class. Standardize data: Standardization is a useful technique to transform all attributes to a standard Gaussian distribution with Mean 0 and Standard deviation 1 for better optimization. We can standardize the rescaled data using scikit-learn with the StandardScaler class. Normalize Data: Normalizing refers to rescaling each observation (row) to have a length of 1(unit norm). We can Normalize data with scikit-learn using the Normalizer class. This pre-processing is used for sparse datasets(attributes having lot of zero values) ...
Comments
Post a Comment