# Feature Engineering Techniques

Standardization or Normalization of quantitative features is a standard step in a machine learning project. But, we seldom worry about what kind of feature scaling technique to use in our project.

This article by Shay Geller discusses in detail how choosing an appropriate scaling technique can increase the accuracy of our ML models. In this article, I will be showing how to implement various feature engineering techniques in Python and their effect on the data. To perform this analysis, I have used max acceleration values for different car models and years.

1. Max Abs Scaling

It make sure that the maximum value in the column is 1. It doesn’t shift or change the center if the data.

`from sklearn.preprocessing import MaxAbsScaleraccel_transformed = MaxAbsScaler().fit(accel)transform_data = accel_transformed.transform(accel)print("Mean of original data = {}".format(accel.mean()))print("Standard deviation of original data = {}".format(accel.std()))print("Max of original data = {}".format(accel.max()))print("\n")print("Mean of transformed data = {}".format(transform_data.mean()))print("Standard deviation of transformed data = {}".format(transform_data.std()))print("Max of transformed data = {}".format(transform_data.max()))Mean of original data = 15.568090452261307Standard deviation of original data = 2.7542223175940177Max of original data = 24.8Mean of transformed data = 0.6277455827524719Standard deviation of transformed data = 0.1110573515158878Max of transformed data = 1.0`

2. Min Max Scaling

Min Max scaling or is used to scale values such that they range between 0 and 1 after transformation.

`from sklearn.preprocessing import MinMaxScalerscaler = MinMaxScaler()accel_transformed = scaler.fit(accel)transform_data = accel_transformed.transform(accel)print("Mean of original data = {}".format(accel.mean()))print("Standard deviation of original data = {}".format(accel.std()))print("Max of original data = {}".format(accel.max()))print("\n")print("Mean of transformed data = {}".format(transform_data.mean()))print("Standard deviation of transformed data = {}".format(transform_data.std()))print("Max of transformed data = {}".format(transform_data.max()))Mean of original data = 15.568090452261307Standard deviation of original data = 2.7542223175940177Max of original data = 24.8Mean of transformed data = 0.45048157453936344Standard deviation of transformed data = 0.1639418046186915Max of transformed data = 0.9999999999999999`

3. Normalizer

Normalizer transforms each row of the data set so and resamples it to transform it to unit norm.

`from sklearn.preprocessing import Normalizertransformer = Normalizer(norm='l1').fit(accel)transform_data = transformer.transform(accel)print("Mean of original data = {}".format(accel.mean()))print("Standard deviation of original data = {}".format(accel.std()))print("Max of original data = {}".format(accel.max()))print("\n")print("Mean of transformed data = {}".format(transform_data.mean()))print("Standard deviation of transformed data = {}".format(transform_data.std()))print("Max of transformed data = {}".format(transform_data.max()))Mean of original data = 15.568090452261307Standard deviation of original data = 2.7542223175940177Max of original data = 24.8Mean of transformed data = 1.0Standard deviation of transformed data = 0.0Max of transformed data = 1.0`

4. Power Transformer (Yeo-Johnson)

This scaling applies a power transformation to the feature to make them mode like Gaussian distribution.

`from sklearn.preprocessing import PowerTransformerpt = PowerTransformer('yeo-johnson')transform = pt.fit(accel)transform_data = pt.transform(accel)print("Mean of original data = {}".format(accel.mean()))print("Standard deviation of original data = {}".format(accel.std()))print("Max of original data = {}".format(accel.max()))print("\n")print("Mean of transformed data = {}".format(transform_data.mean()))print("Standard deviation of transformed data = {}".format(transform_data.std()))print("Max of transformed data = {}".format(transform_data.max()))Mean of original data = 15.568090452261307Standard deviation of original data = 2.7542223175940177Max of original data = 24.8Mean of transformed data = -1.0711699534071861e-15Standard deviation of transformed data = 0.9999999999999999Max of transformed data = 3.043039377098009`

5. Quantile Transformation — Normal

Quantile Transformation normal is another technique to transform a data set to normal distribution.

`from sklearn.preprocessing import quantile_transformtransform_data = quantile_transform(accel, n_quantiles=398, random_state=1, copy=True, output_distribution='normal')print("Mean of original data = {}".format(accel.mean()))print("Standard deviation of original data = {}".format(accel.std()))print("Max of original data = {}".format(accel.max()))print("\n")print("Mean of transformed data = {}".format(transform_data.mean()))print("Standard deviation of transformed data = {}".format(transform_data.std()))print("Max of transformed data = {}".format(transform_data.max()))Mean of original data = 15.568090452261307Standard deviation of original data = 2.7542223175940177Max of original data = 24.8Mean of transformed data = 0.0008961428054773563Standard deviation of transformed data = 1.050687337227496Max of transformed data = 5.19933758270342`

6. Quantile Transformation — Uniform

Quantile Transformation normal is another technique to transform a data set to uniform distribution.

`from sklearn.preprocessing import quantile_transformtransform_data = quantile_transform(accel, n_quantiles=398, random_state=1, copy=True, output_distribution='uniform')print("Mean of original data = {}".format(accel.mean()))print("Standard deviation of original data = {}".format(accel.std()))print("Max of original data = {}".format(accel.max()))print("\n")print("Mean of transformed data = {}".format(transform_data.mean()))print("Standard deviation of transformed data = {}".format(transform_data.std()))print("Max of transformed data = {}".format(transform_data.max()))Mean of original data = 15.568090452261307Standard deviation of original data = 2.7542223175940177Max of original data = 24.8Mean of transformed data = 0.5001329063453287Standard deviation of transformed data = 0.28934184755702674Max of transformed data = 1.0`

7. Robust Scaler

This scaling technique removes the outliers first and then applies standard scaler to scale the data set.

`from sklearn.preprocessing import RobustScalertransformer = RobustScaler(with_centering = True, with_scaling = True).fit(accel)transform_data = transformer.transform(accel)print("Mean of original data = {}".format(accel.mean()))print("Standard deviation of original data = {}".format(accel.std()))print("Max of original data = {}".format(accel.max()))print("\n")print("Mean of transformed data = {}".format(transform_data.mean()))print("Standard deviation of transformed data = {}".format(transform_data.std()))print("Max of transformed data = {}".format(transform_data.max()))Mean of original data = 15.568090452261307Standard deviation of original data = 2.7542223175940177Max of original data = 24.8Mean of transformed data = 0.020325508137703445Standard deviation of transformed data = 0.8221559156997077Max of transformed data = 2.776119402985078`
`from sklearn.preprocessing import StandardScalerscaler = StandardScaler()scaler.fit(accel)transform_data = scaler.transform(accel)print("Mean of original data = {}".format(accel.mean()))print("Standard deviation of original data = {}".format(accel.std()))print("Max of original data = {}".format(accel.max()))print("\n")print("Mean of transformed data = {}".format(transform_data.mean()))print("Standard deviation of transformed data = {}".format(transform_data.std()))print("Max of transformed data = {}".format(transform_data.max()))Mean of original data = 15.568090452261307Standard deviation of original data = 2.7542223175940177Max of original data = 24.8Mean of transformed data = -2.6779248835179653e-16Standard deviation of transformed data = 0.9999999999999998Max of transformed data = 3.351911531892361`