Pre-processing the data before applying a Machine Learning Algorithm

  • Rescale data
  • Standardize data
  • normalize data
  • Barnarize data

Rescale Data: 

  • Rescale all the attributes to have the same scale. Generally attributes are often rescaled into the range between 0 and 1 for better optimization.
  • We can rescale the data using scikit-learn using the MinMaxScaler class.









Standardize data:

  • Standardization is a useful technique to transform all attributes to a standard Gaussian distribution with Mean 0 and Standard deviation 1 for better optimization. 
  • We can standardize the rescaled data using scikit-learn with the StandardScaler class.










Normalize Data:

  • Normalizing refers to rescaling each observation (row) to have a length of  1(unit norm).
  • We can Normalize data with scikit-learn using the Normalizer class. This pre-processing is used for sparse datasets(attributes having lot of zero values)










Binarize Data:

  • We can transform the data using the binary threshold. All values above the threshold are marked as 1 and all equal to or below are marked as 0. 
  • Making crisp values when we have probabilities as feature values.
  • We use a scikit-learn binarizer class

Comments

Popular posts from this blog