neural networks

作者: 习惯了千姿百态 | 来源:发表于2018-08-17 12:01 被阅读0次

Graph Neural Networks: A Review
05组——Non-local Neural Networks
LSTM
Binarized Neural Networks
Understanding CNN
neural networks
Neural Networks
Neural networks
net-compress_ICML(18')
[论文笔记]Learning Convolutional Neu

1.Representation

$a_i^{(j)}:$ activation of unit i in layer j
$\theta^{(j)}$ :matrix of weights controlling function mapping from layer j to layer j+1

Figure 1

Figure 2
If network has $s_j$ units in layer $j$ , $s_{j+1}$ units in layer $j+1$ , then $\theta^{(j)}$ will be of dimension $s_{j+1}*(s_j+1)$

$z^{(j)}:$ represents the input of layer $j$ ;( $j\ge2$ )
$a^{(j)}:$ the output of layer $j$ ;( $j\ge2$ )
$\theta_{i,j}^l:$ weight controlling function mapping from the unit $i$ in layer $l+1$ to the unit $j$ in layer $l$
we use the sigmoid function in this nueral networks,we can get the relation:
$a^{(j)}=g(z^{(j)})$
To keep form consistency, $a^{(1)}=[1;X^{(1)}]$ (add the bias unit of layer 1)
According to the Figure 2, $z_1^{(2)}=\theta_1^{(1)}*a_1^{(1)}\qquad$ $z_2^{(2)}=\theta_2^{(1)}*a_2^{(1)}\qquad$ $z_3^{(2)}=\theta_3^{(1)}*a_3^{(1)}$
so we can get $z^{(2)}=\theta^{(1)}*a^{(1)}$ , then $a^{(2)}=g(z^{(2)})$ ,but it should be noted that layer has a bias unit,so we should add it to the $a^{(2)}$ , that is $a_0^{(2)}=1$ ,which MATLAB command is a2=[1;a2].
in a similar way， $z^{(3)}=\theta^{(2)}*a^{(2)}$ , but we needn't to add the bias unit, because this is the last layer(output layer), so the output is $h_\theta(x)=a^{(3)}=g(z^{(3)})$

2.Learning

2.1. cost function

$h_\theta(x)\in R^K\qquad (h_\theta(x))_k=k^{th} output$
$J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}\sum_{k=1}^{k}\bigg [ y_{k}^{(i)}log((h_{\Theta}(x^{(i)}))_{k})+(1-y_{k}^{(i)})log(1-(h_{\Theta}(x^{(i)}))_{k}) \bigg] + \frac{\lambda}{2m}\sum_{l=1}^{L-1}\sum_{i=1}^{sl}\sum_{j=1}^{s_{(l+1)}}(\Theta_{j,i}^{(l)})^2$
To get the $\theta$ which minimize $J(\theta)$ is our goal. We can use the gradient descent.

2.2. gradient descent

We only need to compute the $J(\theta)$ and $\frac{\partial}{\partial\theta_{i,j}^l}J(\theta)$ ., then use the advanced optimization，such as fmincg.

2.2.1. Forward propagation

$a^{(1)}=x$
$z^{(2)}=\theta^{(1)}a^{(1)}$
$a^{(2)}=g(z^{(2)})$ (add $a_0^{(2)}=1$ )
$z^{(3)}=\theta^{(2)}a^{(2)}$
$a^{(3)}=g(z^{(3)})$ (add $a_0{(3)}=1$ )
$z^{(4)}=\theta^{(3)}a^{(3)}$
$a^{(4)}=g(z^{(4)})$ ---output

2.2.2 Backpropagation algorithm

Focusing on a single example $x^{(i)},y^{(i)},$ the case of 1 output unit, and ignoring regularization( $\lambda=0$ )
$J(\theta)=C(\theta)=-\bigg [ y\cdot log(a^{(L)})+(1-y)log(1-a^{(L)}) \bigg]$
we definite the $\delta^{(l)}$ error of cost for $a_j^{(l)}$ (unit $j$ in layer $l$ ).
Formally, $\delta^{(l)}=\frac{\partial C(\theta)}{\partial z^{(l)}}$ ,
Goal:
compute $\frac{\partial C}{\partial \theta^{(l)}}$ ,further,update $\theta^{(l)}:=\theta^{(l)}-\frac{\partial C}{\partial \theta^{(l)}}$
Given:
$\delta^{(l)}=\frac{\partial C}{\partial z^{(l)}}$
$z^{(l+1)}=\theta^{(l)}*a^{(l)}=\theta^{(l)}*g(z^{(l)})$
$g'(z^{(l)})=\frac{\partial g(z^{(l)})}{\partial z^{(l)}}=g(z^{(l)})[1-g(z^{(l)})]=a^{(l)}[1-a^{(l)}]$
Derivation:
obviously, $\delta^{(l)}=\frac{∂C}{∂z^{(l)}}=\frac{∂C}{∂z^{(l+1)}}\cdot\frac{∂z^{(l+1)}}{∂z^{(l)}}=\delta^{(l+1)} \cdot (\theta^{(l)})^{T} \cdot a^{(l)} \cdot (1-a^{(l)})-----（1）$
$\frac{∂C}{∂a^{(l)}} = \frac{\partial z^{(l+1)}}{\partial a^{(l)}} \frac{\partial C}{\partial z^{(l+1)}}=(\theta^{(l)})^{T} \cdot \delta^{l+1}$
so we get the recursion about $\delta^{(l)}$ , we can use the loop for $l=L-1,L-2,...,2$ after we compute the $\delta^{(L)}$ .
$\delta^{(L)}=\frac{\partial C}{\partial z^{(L)}}=\frac{\partial C}{\partial a^{(L)}}\frac{\partial a^{(L)}}{\partial z^{(L)}}=\frac{a^{(L)}-y}{(1-a^{(L)})a^{(L)}}[a^{(L)}(1-a^{(L)})]=a^{(L)}-y----(2)$
finally,we have compute $\delta^{(l)}$ , now we should use the $\delta^{(l)}$ to compute $\frac{\partial C}{\partial \theta^{(l)}}$
$\frac{\partial C}{\partial \theta^{(l)}}=\frac{\partial C}{\partial z^{(l+1)}}\frac{\partial z^{(l+1)}}{\partial \theta^{(l)}}=\delta^{(l)}a^{(l)}$

Algorithm:
Training set $(x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),\dots ,(x^{(m)},y^{(m)})$ .m 个训练样本.
Set $\Delta_{ij}^{(l)}=0$ (for all $l,i,j$ ) ,used to compute $\frac{∂}{∂\Theta_{i,j}^{(l)}}J(\Theta),$ , Cycle accumulation
For training example t=1 to m:
　1. Set $a^{(1)}:=x^{(t)}$
　2.perform forwardPropagation，compute $a^{(l)}$ for $l=2,3,…,L$
　3.compute error of output layer by the formula $\delta ^{(L)} = a ^{(L)} - y^{(t)}$
　4.Compute $\delta ^{(L-1)},\delta ^{(L-2)},\dots ,\delta ^{(2)}.\quad \\ \delta ^{(l)}=\big((\Theta^{(l)})^{T} \delta ^{(l+1)} \big).*a^{(l)}.*\big( 1-a^{(i)}\big)\quad \quad$
　5. $\Delta_{i,j}^{(l)}:=\Delta_{i,j}^{(l)}+a_{j}^{(l)}\delta_{i}^{(l+1)}$ or with vectorization, $\Delta^{(l)}:=\Delta^{(l)}+\delta^{(l+1)}(a^{(l)})^T$ .
ENDFOR
update:
$D_{i,j}^{(l)}:=\frac{1}{m}\big(\Delta_{i,j}^{(l)}+\lambda\Theta_{i,j}^{(l)} \big),if \quad j \neq 0$
$D_{i,j}^{(l)}:=\frac{1}{m}\Delta_{i,j}^{(l)},if\quad j =0$
$\boxed{\frac{∂}{∂\Theta_{i,j}^{(l)}}J(\Theta)=D_{i,j}^{(l)}}$

3.summary

1.  Randomly initialize weights
2.  Implement forward propagation to get $h_ \theta(x^{(i)})$ for any $x^{(i)}$
3.Implement code to compute cost function $J(\theta)$
4.Implement backpropagation to comoute partial derivatives $\frac{\partial}{\partial\theta_{j,k}^{(l)}}J(\theta)$
5.Use gradient checking to compare $\frac{\partial}{\partial\theta_{j,k}^{(l)}}J(\theta)$ computed using backpropagation vs. using numerical estimate of gradient of $J(\theta)$ .Then disable gradinet checking code
6.Use gradient descent or advanced optimization method with backpropagation to try to minimize $J(\theta)$ as a function of parameters $\theta$

Graph Neural Networks: A Review
Graph Neural Networks Graph Neural Networks: A Review of ...
05组——Non-local Neural Networks
Non-local Neural Networks Non-local Neural Networks是何凯明大佬...
LSTM
Recurrent Neural Networks networks with loops in them, al...
Binarized Neural Networks
Approach Experiment References:Binarized Neural Networks:...
Understanding CNN
An Intuitive Explanation of Convolutional Neural Networks...
neural networks
1.Representation activation of unit i in layer j:matrix o...
Neural Networks
神级网络算法： 1. Multilayer Feed-Forward Neural Network Includi...
Neural networks
Layer input layer output layer hidden layer is "activatio...
net-compress_ICML(18')
Pruning Compressing Neural Networks using the Variational...
[论文笔记]Learning Convolutional Neu
Learning Convolutional Neural Networks for Graphs (2016...