grad in y axis decreasing
LR is the same for different param
to refine this process, Adagrad is introduced here
sparse data -> only a few params are frequently updated
automatically decaying LR -> pro or con?
RMSprop = adadelta
Batch norm











网友评论