当前位置:网站首页>What does the residual network solve and why is it effective abstract
What does the residual network solve and why is it effective abstract
2022-07-22 04:07:00 【Gulu Gulu day】
Catalog
1. motivation : Deep neural networks “ Two black clouds ”
2. Formal definition and implementation of residual network
3. Explanation of residual network , Why it works ?
a. From the perspective of information dissemination
b. From the perspective of integrated learning
1. motivation : Deep neural networks “ Two black clouds ”
It is generally believed , After training, deep neural network can abstract data features layer by layer , Finally, the features needed to complete the task are extracted / Express , Finally, use a simple classifier ( Or other learning devices ), You can finish the task ; Therefore, deep learning is also called Express / Feature learning ;
Intuitive understanding , With the blessing of nonlinear activation function , The deeper neural network has a larger assumption space , Of course, it is more likely to contain an optimal solution ; But training is a bigger problem ; In addition to the over fitting problem , Deeper neural networks are more likely to appear Gradient dispersion / Explosion problem and Network degradation ;
Gradient dispersion : Neural network in back propagation , If i The output value of the activation function is 0~1 Between , When there are more layers , Due to the multiplication of chain derivation, the updated gradient value will be close to when approaching the input layer 0;
Gradient explosion : Neural network in back propagation , When the output value of the activation function >1 when , After multiple levels of tandem , The gradient value will be particularly large ;
Gradient dispersion and gradient explosion will make the model difficult to converge , But now this problem is effectively controlled by standard initialization and middle tier normalization methods to a large extent , It makes it easier for the deep neural network to converge ;
Network degradation : On the premise that the neural network can converge , As the depth of the network increases , The performance of the model first gradually increases to saturation , And then it goes down rapidly ; But the problem of network degradation is not caused by over fitting , Because even under the same training rounds , The training error rate of degraded networks with more layers is also higher than that of shallow Networks ; Over fitting problem refers to the problem after model training , The effect of training set is much higher than that of test set , The main reason is that the features provided by the training set are not comprehensive , Increasing training data is the most effective method ; Pictured :
2. Formal definition and implementation of residual network
F() Is the residual function ,H() It is the internal operation of this layer ; The residual is the function to be fitted by the neural network H() Split it into two parts ;
The residual element can be realized in the form of layer hopping connection : Add the output of the unit directly to the input of the unit , And then activate . Pictured :
Residual network is a good solution to the model degradation problem of deep neural network , stay ImageNet and CIFAR-10 And other image tasks have achieved very good results , Under the premise of the same number of layers, the residual network also converges faster ; And remove individual neural network layers , The performance of the residual network will not be significantly affected
3. Explanation of residual network , Why it works ?
a. From the perspective of information dissemination
The explanation given by the author he Kaiming is
After expansion , In forward propagation , The input signal can be directly transmitted from any bottom layer to the high layer ; Contains a natural identity mapping , To some extent, it solves the problem of network degradation
b. From the perspective of integrated learning
Andreas Veit From the perspective of integrated learning ; Expand the residual network , The following figure is obtained
The residual network can be seen as an integrated model composed of a series of paths , Different paths contain different network layer subsets ; Delete part of the network layer of the residual network or exchange the order of some network modules . Experiments show that the performance of the network is related to the correct number of network paths ; It shows that the expanded path of the residual network has a certain degree of independence and redundancy , Make the residual network behave like an integrated model ; He also proved that the residual network mainly contributed to the gradient in the training of those relatively short paths ;
c. Angle of gradient crushing
In standard feedforward neural networks , As the depth increases , The gradient gradually presents as white noise ; Through visualization, the author finds that as the number of layers increases , In the shallow layer, it looks like Brown noise , In the deep neural network, it behaves like White noise ; As the gradient correlation of neurons decreases exponentially , The spatial structure of gradient is gradually eliminated with the increase of depth ;
Gradient crushing problem ? Because many optimization methods assume that the gradient is similar at adjacent points , The gradient of fragmentation will greatly reduce the effectiveness of such optimization methods ; If the gradient behaves like white noise , Then the influence of a neuron on the output of the network will be very unstable ; He pointed out that the rate of gradient correlation reduction in residual network decreased from exponential level to sub linear level ;
4. NLP Residual structure in
Residual networks are widely used in deep learning , There are similar residual Networks Highway Network, But it adds a gating mechanism
边栏推荐
猜你喜欢
随机推荐
2020-2021新技术讲座课程
数据库原理及应用
shell运算符-数学运算,关系运算,字符串运算,文件检测运算
【CCF CSP】201509-1数列分段
How to unlock and decompile Cisco switch firmware
第三章课后习题24-33
关于Hook unistd中open, read, write, close的一些技巧
[CCF CSP] 201412-1 access control system
【CCF CSP】201312-1出现次数最多的数
Use the shell to move all files in a directory to a folder with the name of month and year
CountDownLatch与CyclicBarrier基本原理及区别
Yan Weimin Chapter II answers to exercises after class
Shell loop
[CCF CSP] 201403-1 opposite number
【CCF CSP】201809-1卖菜
50 places are limited to open | with the news of oceanbase's annual press conference coming!
Clion(CMake工具)中创建父子项目,引入第三方库的方法
shell-循环
马斯克自称大脑上云,是科学还是骗术?
[CCF CSP] 201903-1 small, medium and large