1.Representation
activation of unit i in layer j
:matrix of weights controlling function mapping from layer j to layer j+1
Figure 1
Figure 2
If network has
represents the input of layer
;(
)
the output of layer
;(
)
weight controlling function mapping from the unit
in layer
to the unit
in layer
we use the sigmoid function in this nueral networks,we can get the relation:
To keep form consistency, (add the bias unit of layer 1)
According to the Figure 2,
so we can get , then
,but it should be noted that layer has a bias unit,so we should add it to the
, that is
,which MATLAB command is
a2=[1;a2].
in a similar way,, but we needn't to add the bias unit, because this is the last layer(output layer), so the output is
2.Learning
2.1. cost function
To get the which minimize
is our goal. We can use the gradient descent.
2.2. gradient descent
We only need to compute the and
., then use the advanced optimization,such as fmincg.
2.2.1. Forward propagation
(add
)
(add
)
---output
2.2.2 Backpropagation algorithm
Focusing on a single example the case of 1 output unit, and ignoring regularization(
)
we definite the error of cost for
(unit
in layer
).
Formally, ,
Goal:
compute ,further,update
Given:
Derivation:
obviously,
so we get the recursion about , we can use the loop for
after we compute the
.
finally,we have compute , now we should use the
to compute
Algorithm:
Training set .m 个训练样本.
Set (for all
) ,used to compute
, Cycle accumulation
For training example t=1 to m:
1. Set
2.perform forwardPropagation,compute for
3.compute error of output layer by the formula
4.Compute
5. or with vectorization,
.
ENDFOR
update:
3.summary
1. Randomly initialize weights
2. Implement forward propagation to get for any
3.Implement code to compute cost function
4.Implement backpropagation to comoute partial derivatives
5.Use gradient checking to compare computed using backpropagation vs. using numerical estimate of gradient of
.Then disable gradinet checking code
6.Use gradient descent or advanced optimization method with backpropagation to try to minimize as a function of parameters












网友评论