7. Overfitting

作者: 玄语梨落 | 来源:发表于2020-08-17 13:57 被阅读0次

Overfitting

The Problem of Overfitting

  • undefitting(high bias): The algroithm doesn't fit the training set.
  • overfiting(high variance): If we have too many featuers, the learned hypothesis may fit the training set very well, so your cost funciton may very close to zero, and maybe zero exactly, but fail to generalize to new examples.

addressing overfitting

Options:

  1. Reduce number of features.
    • Manually select which features to keep.
    • Model selection algorithm.
  2. Regularization
    • Keep all the features, but reduce magnitude/values of parameters \theta_j.
    • Works well when we have a lot of features, each of which contributes a bit to predicting y.

Regularization

Suppose we penalize and make \theta_3,\theta_4(some of the parammerters) really small.

Regularization
Small values for parameters

  • "Smipler" hypothesis
  • Less prone to overfitting

J(\theta)=\frac{1}{2m}[\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2+\lambda\sum_{j=1}^n\theta_j^2]

Regularized Liner Regression

\theta_0:=\theta_0-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x^{(i)} \newline \sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})x^{(i)} \newline \theta_j:=\theta_j-\alpha[\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}+\frac{\lambda}{m}\theta_j] \newline \theta_j:=\theta_j(1-\alpha\frac{1}{m})-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}
We don't penalize \theta_0.

Normal equation

\theta=(X^TX+\lambda\begin{bmatrix} 0 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix})^{-1}X^Ty
The matrix should be a (n+1)x(n+1) matrix.

non-invertability issue

If m\le n,then X^TX would be non-invertable or singler.
if \lambda \ne0 ,the matrix there would be invertable.

Regularized logistic retression

Cost function:
J(\theta)=-[\frac{1}{m}\sum_{i=1}^my^{(i)}log \;h_\theta(x^{(i)})+(1-y^{(i)})log(1-h_\theta(x^{(i)}))]

相关文章

网友评论

    本文标题:7. Overfitting

    本文链接:https://www.haomeiwen.com/subject/rvxgdktx.html