美文网首页
2021-04-27 Stop Using All Your F

2021-04-27 Stop Using All Your F

作者: 春生阁 | 来源:发表于2021-04-27 18:12 被阅读0次

A real-world dataset contains a lot of relevant and redundant features. It is true that the presence of more data in terms of the number of instances or rows leads to training a better machine learning model.

Before proceeding, we must know why it’s not recommended to use all sets of features. To train a robust machine learning model, the data must be free of redundant features. There are various reasons why feature selection is important:

  • Garbage In, Garbage Out: The quality of data that goes towards training the model determines the quality of the output model. The real-world data contains a lot of redundant features, that need to remove, in order to train a robust model.
  • Curse of Dimensionality: As the dimensionality of data or the number of features in the data increases, the number of configurations covered by features decreases. If the data contains more features compared to the number of instances, the trained model would not generalize to the new samples.
  • Occam’s Razor: Model explainability decreases, when the input data has a lot of features, hence making it difficult to interpret the model.

So it’s essential to remove the irrelevant features from the dataset. A data scientist should be selective in terms of features he/she is using for model training. Selecting all combinations of features, then picking the best set of features is a polynomial solution. There are various techniques to select the best set of features, you need to know some feature selection techniques.

相关文章

网友评论

      本文标题:2021-04-27 Stop Using All Your F

      本文链接:https://www.haomeiwen.com/subject/rcjprltx.html