Feature Engineering

Feature Engineering

作者: Kevin不会创作 | 来源:发表于2020-12-04 12:27 被阅读0次

Feature Engineering
Feature Engineering
Feature Engineering
Feature Engineering
Best Practices for Feature Engin
[Kaggle] M5 Forecasting - Accura
特征工程自动化如何为机器学习带来重大变化
pyspark自定义的pipeline无法保存
[转载]特征工程
特征工程笔记(Feature Engineering)

Table of Contents

Feature scaling
Categorical feature encoding

Feature scaling

Motivation

The range of all features should be normalized so that each feature contributes approximately proportionately to the final distance.
Another reason why feature scaling is applied is that gradient descent converges much faster with feature scaling than without it.
It's also important to apply feature scaling if regularization is used as part of the loss function (so that coefficients are penalized appropriately).

Methods

Rescaling (min-max normalization)

Rescaling is the simplest method and consists in rescaling the range of features to scale the range in [0, 1] or [−1, 1].

$x^{'}=\frac{x-min(x)}{max(x)-min(x)}$

Mean normalization

$x^{'}=\frac{x-avg(x)}{max(x)-min(x)}$

Standardization (Z-score Normalization)

$x^{'}=\frac{x-\bar{x}}{\sigma}$

Scaling to unit length

$x^{'}=\frac{x}{\lVert x \rVert}$

Categorical feature encoding

Methods

Ordinal Encoding

We use this technique when the categorical feature is ordinal. In this case, retaining the orders is important.

One-hot Encoding

We use this categorical data encoding technique when the features are nominal(do not have any order). In one hot encoding, for each level of a categorical feature, we create a new variable. Each category is mapped with a binary variable containing either 0 or 1. Here, 0 represents the absence, and 1 represents the presence of that category. These newly created binary features are known as Dummy variables.

Disadvantages
- In some caese, One-hot encoding introduce sparsity in the dataset. In other words, it creates multiple dummy features in the dataset without adding much information.
- It might lead to a Dummy variable trap. It is a phenomenon where features are highly correlated. That means using the other variables, we can easily predict the value of a variable.

Binary Encoding

In this encoding scheme, the categorical feature is first converted into numerical using an ordinal encoder. Then the numbers are transformed in the binary number. After that binary value is split into different columns.

Advantages
- Binary encoding is a memory-efficient encoding scheme as it uses fewer features than one-hot encoding.
- It reduces the curse of dimensionality for data with high cardinality.

相关文章

Feature Engineering
Feature Engineering
介绍如何在R语言中进行特征工程（未完待续）建议在R语言中亲自实践总述特征工程可以帮助我们提升模型的表现，但是这...
Feature Engineering
https://www.cnblogs.com/peizhe123/p/7412364.html
Feature Engineering
Table of Contents Feature scaling Categorical feature enc...
Best Practices for Feature Engin
Feature engineering, the process creating new input featu...
[Kaggle] M5 Forecasting - Accura
Chapter Backgroud Feature engineering Data selection Loss...
特征工程自动化如何为机器学习带来重大变化
本文为 AI 研习社编译的技术博客，原标题：Why Automated Feature Engineering ...
pyspark自定义的pipeline无法保存
自定义pipeline方法feature_engineering进行保存时出现错误ValueError: ('Pi...
[转载]特征工程
kaggle-feature-engineering 使用sklearn做特征工程 1. 什么是特征工程？ 2. ...
特征工程笔记(Feature Engineering)
数据和特征决定了机器学习的上限，而模型和算法只是不断地逼近这个上限。图片来自知乎往往拿到的数据会有以下问题： ...

网友评论

本文标题：Feature Engineering

本文链接：https://www.haomeiwen.com/subject/kuobwktx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

栏目导航

热点阅读

关于我们|服务条款|联系我们|Feature Engineering|投稿指南|网站地图|RSS订阅|排版工具|手机版

提供经典美文摘抄,优美散文欣赏,现代诗歌精选,短篇小说,心情随笔,表白情书范文,故事会在线阅读欣赏

Copyright © 2014-2023 Haomeiwen.com All Rights Reserved. 好美文阅读网版权所有

备案信息：桂公网安备 45052102000051号 · 桂ICP备13007215号-3

本站所收录作品、热点评论等信息部分来源互联网，目的只是为了系统归纳学习和传递资讯

所有作品版权归原创作者所有，与本站立场无关，如不慎侵犯了你的权益，请联系我们告知，我们将做删除处理！