Pre-processing the data before applying a Machine Learning Algorithm
- Rescale data
- Standardize data
- normalize data
- Barnarize data
Rescale Data:
- Rescale all the attributes to have the same scale. Generally attributes are often rescaled into the range between 0 and 1 for better optimization.
- We can rescale the data using scikit-learn using the MinMaxScaler class.
Standardize data:
- Standardization is a useful technique to transform all attributes to a standard Gaussian distribution with Mean 0 and Standard deviation 1 for better optimization.
- We can standardize the rescaled data using scikit-learn with the StandardScaler class.
Normalize Data:
- Normalizing refers to rescaling each observation (row) to have a length of 1(unit norm).
- We can Normalize data with scikit-learn using the Normalizer class. This pre-processing is used for sparse datasets(attributes having lot of zero values)
Comments
Post a Comment