Table of Contents
- Feature scaling
- Categorical feature encoding
Feature scaling
Motivation
-
The range of all features should be normalized so that each feature contributes approximately proportionately to the final distance.
-
Another reason why feature scaling is applied is that gradient descent converges much faster with feature scaling than without it.
-
It's also important to apply feature scaling if regularization is used as part of the loss function (so that coefficients are penalized appropriately).
Methods
-
Rescaling (min-max normalization)
Rescaling is the simplest method and consists in rescaling the range of features to scale the range in [0, 1] or [−1, 1].
- Mean normalization
- Standardization (Z-score Normalization)
- Scaling to unit length
Categorical feature encoding
Methods
-
Ordinal Encoding
We use this technique when the categorical feature is ordinal. In this case, retaining the orders is important.
-
One-hot Encoding
We use this categorical data encoding technique when the features are nominal(do not have any order). In one hot encoding, for each level of a categorical feature, we create a new variable. Each category is mapped with a binary variable containing either 0 or 1. Here, 0 represents the absence, and 1 represents the presence of that category. These newly created binary features are known as Dummy variables.
Disadvantages
-
In some caese, One-hot encoding introduce sparsity in the dataset. In other words, it creates multiple dummy features in the dataset without adding much information.
-
It might lead to a Dummy variable trap. It is a phenomenon where features are highly correlated. That means using the other variables, we can easily predict the value of a variable.
-
-
Binary Encoding
In this encoding scheme, the categorical feature is first converted into numerical using an ordinal encoder. Then the numbers are transformed in the binary number. After that binary value is split into different columns.
Advantages
-
Binary encoding is a memory-efficient encoding scheme as it uses fewer features than one-hot encoding.
-
It reduces the curse of dimensionality for data with high cardinality.
-











网友评论