当前位置：网站首页>006_ SS_ Dual Diffusion Implicit Bridges For Image-to-Image Translation

006_ SS_ Dual Diffusion Implicit Bridges For Image-to-Image Translation

2022-07-21 23:28:00 【Artificial Idiots】

Dual Diffusion Implicit Bridges for Image-to-Image Translation(DDIB)

In this paper, we propose a new method of Image-to-Image Translation Methods DDIB, This method only needs to be in source Domain and target Train two on the field respectively DDIM Model (DDIM The model can refer to the previous Denoising Diffusion Implicit Model Notes of this article ), And then source Field corresponding to latent Direct to target Domain . This method does not require paired training data , and source Domain and target The training of domains can be independent of each other , So as to ensure the security of data .

Insert picture description here

1.Introduction

Usual Image-to-Image Translation Pairs of training data are required , But sometimes there is no paired data , In these cases, it is necessary to directly implement the conversion of two domains . The existing direct cross domain transformation unpaired image Translation Our methods are all based on GAN, such as cycleGAN, DualGAN, Other methods are based on normalizing Flow.

But although the existing methods of cross domain transformation can produce high-quality results , But a major drawback is the lack of adaptability. in other words , A trained model can only realize the transformation between two domains in one direction , To achieve $n$ The mutual transformation of domains requires $n^2$ A model . Then the author proposed DDIB, You only need to train one on each domain independently DDIM Model , That is, you only need to $n$ A model , Can be realized $n$ Mutual transformation between domains .

In addition, the author also puts forward , In some applications , Two domain Data of cannot be shared for security , So for many, we need two domain The traditional model of training data is useless , and DDIB What we need is to train separately in different domains DDIM Model , Therefore, this security requirement can be met .

2.DDIB

First, let's review DDIM take Diffusion And ODE Connect , That is to say, both forward and reverse processes can be used ODE To express , And these two processes are inverse to each other .

Insert picture description here

In a simple sentence , For a well trained DDIM Model , Any image $x_0$ Can pass forward ODE Get the only one $x_T$ , And the only one $x_T$ You can also use the reverse ODE obtain $x_0$ . This is a good property , With this foundation, let's see DDIB The method is clear at a glance .

DDIB The pseudo code for is as follows :

Insert picture description here

The method is simple and clear , Also is to use source Domain DDIM Prior to the ODE take source The image of the domain $x_0^{(s)}$ Convert to the representation of its hidden space $x^{(l)}$ , And then just put this $x^{(l)}$ As target Domain DDIM In reverse ODE The input of , You can get the corresponding target The output of the field $x_0^{(t)}$ .

The method seems simple , But why can the representation of this hidden space be used directly , Don't you need other operations ? let me put it another way , What is the feasible principle of this method ? Hold this question and look back .

3. Connection with optimal transmission

The author first introduces DDIM The goal of training is related to the most transmission problems .

Mongolian optimal transmission (Monge Optimal Transport) It refers to finding the minimum cost of moving from one data distribution to another .

Express in mathematical form , Suppose there are two probability measures $\alpha, \beta$ Corresponding to two spaces respectively $X, Y$ , The cost function is $c (x, y)$ , Then the mongri optimal transmission can be regarded as the following optimization problem :

Insert picture description here

So let's take a look at DDPM And DDIM The goal of optimization ( If you are not familiar with the optimization objectives, you can refer to the previous DDPM and DDIM The notes ):

Insert picture description here

use $l_t$ To represent a form of linear combination , Then there can be :

Insert picture description here

because DDIM During training , Optimize the goal of $x_t$ It can be done directly by $x_0$ obtain , So this optimization goal can be expressed in the following form :

Insert picture description here

Expressing this goal in the form of mathematical expectation can get :

Insert picture description here

Optimizing this goal is the same as optimizing the most promising transmission goal of Mongolia and Japan .

But I talked a lot , It just shows that training in the same domain DDIM The connection between the goal of and the optimal transmission . therefore DDIB It can be seen as starting from source Domain optimal transmission to latent Domain , Again from latent Domain optimal transmission to target Domain . But it still doesn't explain why clearing hidden variables in the middle can be used directly between two domains without any processing . So the author points out , Although this way of directly using implicit variables , It is not strictly from source Domain to target Optimal transmission of domain , But through experiments, we can see that this method is very close to the optimal transmission .

So the author did some small experiments on some two-dimensional distributions to prove this :

Insert picture description here

And the author will DDIB The results of this method are compared with those of other optimal transmission methods :

Insert picture description here

All in all , The author does not give a theoretical DDIB Feasible explanation , It just gives some evidence from the experimental level . This inevitably has some shortcomings .