当前位置：网站首页>Deep learning - deep understanding of normalization and batchnorm (theoretical part)

Deep learning - deep understanding of normalization and batchnorm (theoretical part)

2022-07-20 19:31:00 【Program meow;】

normalization

List of articles

normalization

The role of normalization

Normalized yield （？？ Of ） Similar distribution , Make the model converge faster , You can use a greater learning rate

Why do we need normalization

Before neural network training , We need to do a normalization of the input data . The reason lies in The essence of neural network learning process is to learn data distribution , Once the training data and test data The distribution is different , Then the generalization ability of the network is greatly reduced ; On the other hand , once Every batch The distribution of training data varies , Then the network should learn in every iteration Adapt to different distributions , This will greatly reduce the speed of network training , That's why we need to do a normalization preprocessing for all the data .
Normalization also allows deep neural networks to be trained , The deeper the hidden layer , The smaller the gradient , It may cause that different layers cannot converge at the same time

BatchNorm Implementation of normalization

B It's a batch The input of
The parameters to learn are delta and beta

$\mu_B = \frac{1}{m} \sum_{i=1}^m x_i \\ \\ \sigma_B^2 = D(X) \\ \\ \hat {x_i} = \frac{x_i - \mu_B}{\sqrt{\sigma^2_B + \varepsilon}}\\ \\ y_i = \delta\hat x_i + \beta$

The use of normalization

1. Order of use

Convolution Come back Activate Pooling

2. The training set is different from the test set

The training set is to get every batch To normalize the mean and variance of

If the test set is carried out in the same way, there may be one batch There is only one training set , Cannot normalize
If the test set is not normalized, the output will be very different from the training set （ Training set normalization , But the test set is not normalized ）

The mean and variance of the test set should be the mean and variance of the whole data set

The specific update method of the global mean variance ：

The difference between normalization and standardization

The difference between standardization and normalization lies in the final linear transformation （ So normalization requires that linear transformation ）
Both standardization and normalization can compress data to （0～1） perhaps （-1～1） Within the interval , But standardization will change the distribution of the original data , Standardization will not , Standardization only deflates the original data

Normalization only scales the range of data , Tagging is to make the data conform to a new distribution （ Like the normal distribution ）PyTorch—— Resolve errors “RuntimeError: running_mean should contain * elements not *”_ Mo men -CSDN Blog
Fundamentals—— Neural network BN Location of the layer Empirical Study - You know (zhihu.com)

e.g.

1. normalization

2. Standardization

Why can normalization speed up the convergence of the model

Suppose there are two variables , Are evenly distributed ,x1 The scope is [10000,20000],x2 The scope is [1,2]. There are many points on the same line , We call this line L. If we want to make a classification now ,x2 Can be almost ignored ,x2 It was innocently killed , Just because of the so-called dimensional problem . Even if x2 Not to be killed , Now continue to solve , To do gradient descent . Obviously , If the descent direction we get at a certain step is not in a straight line L On , It is almost certain that this step will not fall . This will lead to non convergence , Or convergence is slow .