本周比较忙碌,遇上节假日回家,本周学习任务原先还有朴素贝叶斯模块,但是没有完成,下周抓紧补上。好在之前接触过,现在的学习更加偏重原理性,所以有难度,但目前还可以接受。
简单回顾一下本周的学习内容:
1、学习了计算机在推荐方案上的思考模式
recommending apps
decission tree
2、Entropy 熵和计算公式
1
2
3、Imformation Gain信息增益
下面三种分割方法中,那种方法会使我们获得更多有关数据的信息
information gain
信息增益=熵的变化值
在决策树中的每一个节点处,我们可以计算父节点处数据的熵,然后计算两个子节点的熵,父节点的熵与子节点熵的平均值之间的差值即为信息增益。
信息增益的计算公式
在构建决策树的时候,选择得到信息增益最大的方法。
4、Hyperparameters
(1)Maximum depth
(2)Minimum mumber of samples per leaf
(3)Minimum number of samples per split
(4)Maximum number of features
5、Decission Tree in sklearn
>>> from sklearn.tree import DecisionTreeClassifier
>>> model = DecisionTreeClassifier()
>>> model.fit(x_values, y_values)
>>> print(model.predict([ [0.2, 0.8], [0.5, 0.4] ]))
[[ 0., 1.]]
Hyperparameters
When we define the model, we can specify the hyperparameters. In practice, the most common ones are
max_depth: The maximum number of levels in the tree.
min_samples_leaf: The minimum number of samples allowed in a leaf.
min_samples_split: The minimum number of samples required to split an internal node.
max_features : The number of features to consider when looking for the best split.
For example, here we define a model where the maximum depth of the trees max_depth is 7, and the minimum number of elements in each leaf min_samples_leaf is 10.
>>> model = DecisionTreeClassifier(max_depth = 7, min_samples_leaf = 10)
>>>from sklearn.metrics import accuracy_score
>>>acc = accuracy_score(y,y_pred)












网友评论