【GAN论文-01】翻译-ProgressivegrowingofGANSforimpro。。。

【GAN论⽂-01】翻译-ProgressivegrowingofGANSforimpro。。。

决战朝鲜攻略Published as a conference paper at ICLR 2018

Tero Karras、Timo Aila、Samuli Laine and Jaakko Lehtinen

NVIDIA and Aalto University

⼀、论⽂翻译

ABSTRACT

We describe a new training methodology for generative adversarial networks. The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly fine details as training progresses. This both speeds the training up and greatly stabilizes it, allowing us to produce images of unprecedented quality, e.g., CELEBA images at 10242. We also propose a simple way to increase the variation in generated im ages, and achieve a record inception score of 8.80 in unsupervised CIFAR10. Additionally, we describe several implementation details that are important for discouraging unhealthy competition between the generator and discriminator. Finally, we suggest a new metric for evaluating GAN results, both in terms of image qualit

y and variation. As an additional contribution, we construct a higher-quality version of the CELEBA dataset.

摘要：

我们为GAN描述了⼀个新的训练⽅法。⽅法的关键点就是渐进的让⽣成器和判别器都增长：从⼀个低分辨率开始，在训练过程中，我们不断添加新层使模型增加更好的细节。这个⽅法既加速了训练⼜使训练更加稳定，⽣成的图⽚质量史⽆前例的好，例如：1024*1024⼤⼩的CELEBA图⽚。我们也提出了⼀个简单的在⽣成图⽚的过程中增加变量的⽅法，并且在⽆监督数据集CIFAR10上获得了8.80分的记录。另外，我们描述了若⼲实现细节，这对打压⽣成器和判别器之间的⾮健康竞争是⾮常重要的。最后，我们提出了⼀种新的评估GAN结果的⽅法，包括图像质量和变化。因为增加项的贡献，我们构建了⼀个更⾼质量的CELEBA数据集版本。

1 INTRODUCTION

Generative methods that produce novel samples from high-dimensional data distributions, such as images, are finding widespread use, for example in speech synthesis (van den Oord et al., 2016a), image-to-image translation (Zhu et al., 2017; Liu et al., 2017; Wang et al., 2017), and image inpainting (Iizuka et al., 2017). Currently the most prominent approaches are autoregressive models(

van den Oord et al., 2016b;c), variational autoencoders (VAE) (Kingma & Welling, 2014), and generative adversarial networks (GAN) (Goodfellow et al., 2014). Currently they all have significant

strengths and weaknesses. Autoregressive models – such as PixelCNN – produce sharp images but are slow to evaluate and do not have a latent representation as they directly model the conditional distribution over pixels, potentially limiting their applicability. VAEs are easy to train but tend to produce blurry results due to restrictions in the model, although recent work is improving this (Kingma et al., 2016). GANs produce sharp images, albeit only in fairly small resolutions and with somewhat limited variation, and the training continues to be unstable despite recent progress (Salimans et al., 2016; Gulrajani et al., 2017; Berthelot et al., 2017; Kodali et al., 2017). Hybrid methods combine various strengths of the three, but so far lag behind GANs in image quality (Makhzani & Frey, 2017; Ulyanov et al., 2017; Dumoulin et al., 2016).

1 介绍

我们发现从⾼维度的数据分布中（例如图⽚）产⽣新颖样本的⽣成式⽅法正在⼴泛使⽤，例如语⾳合成（van den Oord et al., 2016a），图像到图像的转换（Zhu et al., 2017;

Liu et al., 2017）以及图像绘制（Iizuka et al.,2017）。⽬前最好的⽅法是⾃动回归模型（van den Oo

rd et al.,2016b;c），可变⾃动编码（VAE）（Kingma & Welling, 2014）以及GAN（(Goodfellow et al., 2014）。⽬前他们都有显著的优势和劣势。⾃动回归模型–例如PixelCNN–会产⽣锐化的图⽚但是评估缓慢并且不具备⼀个潜在的代表性，因为他们是直接在像素上模型化条件分布，潜在的限制了他们的适⽤性。VAEs⽅法训练简单但是由于模型的限制倾向于产⽣模糊的结果，虽然最近的⼯作正在改善这个缺点（Kingma et al., 2016）。GANs⽅法虽然只能在相当⼩的分辨率并且带有⼀些限制的可变性分辨率上产⽣锐化图像，尽管最近有新的进展 (Salimans et al., 2016; Gulrajaniet al., 2017; Berthelot et al., 2017; Kodali et al., 2017)但是在训练上仍然是不稳定的。混合的⽅法结合了这三个⽅法的不同优点，但是⽬前在图⽚质量上仍然不如

GANs(Makhzani & Frey, 2017; Ulyanov et al.,2017; Dumoulin et al., 2016)。

Typically, a GAN consists of two networks: generator and discriminator (aka critic). The generator produces a sample, e.g., an image, from a latent code, and the distribution of these images should ideally be indistinguishable from the training distribution. Since it is generally infeasible to engineer a function that tells whether that is the case, a discriminator network is trained to do the assessment, and since networks are differentiable, we also get a gradient we can use to steer both networks to the right direction.

Typically, the generator is of main interest – the discriminator is an adaptive loss function that gets discarded once the generator has been trained.

典型的，⼀个GAN模型包括两个⽹络：⽣成式⽹络和判别式⽹络（aka critic）。⽣成式⽹络⽣成⼀个样本，例如：从⼀个潜在的代码中⽣成⼀副图⽚，这些⽣成的图⽚分布和训练的图⽚分布是不可分辨的。因为通过创建⼀个函数来辨别是⽣成样本还是训练样本⼀般是不可能的，所以⼀个判别器⽹络被训练去做这样⼀个评估，因为⽹络是可区分的，所以我们也可以得到⼀个梯度⽤来引导⽹络⾛到正确的⽅向。典型的，⽣成器是主要兴趣⽅–判别器就是⼀个适应性的损失函数，即⼀旦⽣成器被训练后，这个函数就要被丢弃。

There are multiple potential problems with this formulation. When we measure the distance between the training distribution and the generated distribution, the gradients can point to more or less random directions if the distributions do not have substantial overlap, i.e., are too easy to tell apart (Arjovsky & Bottou, 2017). Originally, Jensen-Shannon divergence was used as a distance metric (Goodfellow et al., 2014), and recently that formulation has been improved (Hjelm et al., 2017) and a number of more stable alternatives have been proposed, including least squares (Mao et al., 2016b), absolute deviation with margin (Zhao et al., 2017), and Wasserstein distance (Arjovsky et al., 2017; Gulrajani et al., 2017). Our contributions are largely orthogonal to this ongoing discussion, and we pr

imarily use the improved Wasserstein loss, but also experiment with least-squares loss.

这个表述存在多种潜在的问题。例如：当我们测量训练分布和⽣成分布之间的距离时，如果分布之间没有⼤量的很容易分辨的重叠那么梯度可能指出或多或少的随机⽅向(Arjovsky& Bottou, 2017)。原来， Jensen-Shannon散度被⽤作距离度量（Goodfellow et al., 2014），最近这个公式已经被改善（Hjelm et al., 2017）并且⼤量更多的可选⽅案被提出，包括least squares (Mao et al., 2016b)，绝对边缘误差（absolute deviation with margin (Zhao et al., 2017)），以及Wasserstein 距离(Arjovsky et al., 2017;

Gulrajani et al., 2017)。我们的贡献和⽬前正在进⾏的讨论⼤部分是正交的，并且我们基本使⽤改善的Wasserstein 损失，但是也有基于least-squares损失的实验。

The generation of high-resolution images is difficult because higher resolution makes it easier to tell the generated images apart from training images (Odena et al., 2017), thus drastically amplifying the gradient problem. Large resolutions also necessitate using smaller minibatches due to memory constraints, further compromising training stability. Our key insight is that we can grow both the generator and discriminator progressively, starting from easier low-resolution images, and add new layers that introduce higher-resolution details as the training progresses. This greatly speeds up training and improves stability in high resolutions, as we will discuss in Section 2.

⾼分辨率图⽚的⽣成是困难的因为更⾼的分辨率使得判别器更容易分辨是⽣成的图⽚还是训练图⽚（Odena et al., 2017），因此彻底放⼤了这个梯度问题。由于内存的限制，⼤分辨率使⽤更⼩的minibatches也是需要的，所以要和训练稳定性进⾏折中。我们的关键亮点在于我们可以同时渐进促进⽣产器和判别器增长，从⽐较简单的低分辨率

开始，随着训练的发展，不断添加新的层引进更⾼分辨率细节。这个很⼤程度上加速了训练并且改善了在⾼分辨率图⽚上的稳定性，正如我们在Section 2中讨论的。

The GAN formulation does not explicitly require the entire training data distribution to be represented by the resulting generative model. The conventional wisdom has been that there is a tradeoff between image quality and variation, but that view has been recently challenged (Odena et al., 2017). The degree of preserved variation is currently receiving attention and various methods have been suggested for measuring it, including inception score (Salimans et al., 2016), multi-scale structural similarity (MS-SSIM) (Odena et al., 2017; Wang et al., 2003), birthday paradox (Arora & Zhang, 2017), and explicit tests for the number of discrete modes discovered (Metz et al., 2016). We will describe our method for encouraging variation in Section 3, and propose a new metric for evaluating the quality and variation in Section 5.

GAN公式没有明确要求所有的训练数据分布都由⽣成的⽣成式模型来表述。传统⽅法会在图⽚质量和可变性之间有⼀个折中，但是这个观点最近已经改变 (Odena et al., 2017)。保留的可变性的程度⽬前受到关注并且提出了多种⽅法去测量可变性，包括初始分数 (Salimans et al., 2016)，多尺度结构相似性 (MS-SSIM) (Odena et al., 2017;

Wang et al., 2003)，⽣⽇悖论(Arora & Zhang,2017)，以及发现的离散模式的显⽰测试 (Metz et al., 2016)。我们将在Section 3中描述我们⿎励可变性的⽅法，并在 Section 5中提出⼀个评估质量和可变性的新的度量。

Section 4.1 discusses a subtle modification to the initialization of networks, leading to a more balanced learning speed for different layers. Furthermore, we observe that mode collapses traditionally plaguing GANs tend to happen very quickly, over the course of a dozen minibatches. Commonly they start when the discriminator overshoots, leading to exaggerated gradients, and an unhealthy competition follows where the signal magnitudes escalate in both networks. We propose a mechanism to stop the generator from participating in such escalation, overcoming the issue (Section 4.2).

Section 4.1中对⽹络的初始化讨论了⼀个细⼩的修改，使得不同层的学习速度更加平衡。更进⼀步，

我们观察到在⼗⼏个minibatches的过程中，GAN会更快速的发⽣令⼈讨厌的传统的模式崩塌现象，通常当判别器处理过度时模式崩塌开始，导致梯度过⼤，并且会在两个⽹络信号幅度增⼤的地⽅伴随着⼀个不健康的竞争。我们提出了⼀个机制去阻⽌⽣成器参与这样的升级，以克服这个问题 (Section 4.2)。

We evaluate our contributions using the CELEBA, LSUN, CIFAR10 datasets. We improve the best published inception score for CIFAR10. Since the datasets commonly used in benchmarking generative methods are limited to a fairly low resolution, we have also created a higher quality version of the CELEBA dataset that allows experimentation with output resolutions up to 1024 × 1024 pixels. This dataset and our full implementation are available at

2 PROGRESSIVE GROWING OF GANS

Our primary contribution is a training methodology for GANs where we start with low-resolution images, and then progressively increase the resolution by adding layers to the networks as visualized in Figure 1. This incremental nature allows the training to first discover large-scale structure of the image distribution and then shift attention to increasingly finer scale detail, instead of having to learn all scales simultaneously.

2 GANs的渐进增长

我们的主要贡献就是GANs的训练⽅法：从低分辨率图⽚开始，然后通过向⽹络中添加层逐渐的增加分辨率，正如Figure 1所⽰。这个增加的本质使得训练⾸先发现⼤尺度结构的图⽚分布，然后将关注点逐渐的转移到更好尺度细节上，⽽不是必须同时学习所有的尺度。

Figure1：我们的训练开始于有着⼀个4*4像素的低空间分辨率的⽣成器和判别器。随着训练的改善，我们逐渐的向⽣成器和判别器⽹络中添加层，因此增加⽣成图⽚的空间分辨率。所有现存的层通过进程保持可训练性。这⾥N×N是指卷积层在N×N的空间分辨率上进⾏操作。这个⽅法使得在⾼分辨率上稳定合成并且加快了训练速度。右图我们展⽰了六张通过使⽤在1024 × 1024空间分辨率上渐进增长的⽅法⽣成的样例图⽚。

We use generator and discriminator networks that are mirror images of each other and always grow in synchrony. All existing layers in both networks remain trainable throughout the training process. When new layers are added to the networks, we fade them in smoothly, as illustrated in Figure 2. This avoids sudden shocks to the already well-trained, smaller-resolution layers. Appendix A describes structure of the generator and discriminator in detail, along with other training parameters.

我们使⽤⽣成器⽹络和判别器⽹络作为相互的镜⼦并且同步促进两者的增长。同时在两个⽹络中的所有现存的层通过训练进程保持可训练性。当新的层被添加到⽹络中时，我们平滑的减弱它们，正如Fig2中所解释的。这样就避免了给已经训练好的更⼩分辨率的层带来突然的打击。附录A从细节上描述⽣成器⽹络和判别器⽹络的结构，并附有其他的训练参数。粗钢

Figure 2：当把⽣成器和判别器的分辨率加倍时，我们会平滑的增强新的层。这个样例解释了如何从16 × 16像素的图⽚转换到32 × 32像素的图⽚。在转换（b）过程中，我们把在更⾼分辨率上操作的层视为⼀个残缺块，权重α从0到1线性增长。这⾥的2× 和 0.5× 指利⽤最近邻滤波和平均池化分别对图

⽚分辨率加倍和折半。toRGB表⽰将⼀个层中的特征向量投射到RGB颜⾊空间中，fromRGB正好是相反的过程；这两个过程都是利⽤1 × 1卷积。当训练判别器时，我们插⼊下采样后的真实图⽚去匹配⽹络中的当前分辨率。在分辨率转换过程中，我们在两张真实图⽚的分辨率之间插值，类似于如何将两个分辨率结合到⼀起⽤⽣产器输出。

We observe that the progressive training has several benefits. Early on, the generation of smaller images is substantially more stable because there is less class information and fewer modes (Odena et al., 2017). By increasing the resolution little by little we are continuously asking a much simpler question compared to the end goal of discovering a mapping from latent vectors 10242 images. This approach has conceptual similarity to recent work by Chen & Koltun (2017). In practice it stabilizes the training sufficiently for us to reliably synthesize megapixel-scale images using WGAN-GP loss (Gulrajani et al., 2017) and even LSGAN loss (Mao et al., 2016b).

Another benefit is the reduced training time. With progressively growing GANs most of the iterations are done at lower resolutions, and comparable result quality is often obtained up to 2–6 times faster, depending on the final output resolution.

我们观察到渐进训练有若⼲好处。早期，更⼩图像的⽣成⾮常稳定因为分类信息较少⽽且模式也少（

Odena et al.,2017）。通过⼀点⼀点的增加分辨率，我们正不断的寻⼀个更简单的问题，即：和最终⽬标进⾏⽐较，最终⽬标：从潜在向量中（例如1024*1024的图⽚）发现⼀个匹配。这个⽅法在概念上类似于最近Chen&Koltun（2017）的⼯作。在实践上，对于我们来说，它使训练充分稳点，因此在利⽤WGANGP损失(Gulrajani et al., 2017 )甚⾄LSGAN损失( Mao et al., 2016b)去合成megapixel-scale图⽚变得可靠。

另外⼀个好处是减少了训练时间。随着GANs⽹络的渐进增长，⼤部分的迭代都在较低分辨率下完成，对⽐结果质量加快了2-6倍的速度，这都依赖最后的输出分辨率。

The idea of growing GANs progressively is related to the work of Wang et al. (2017), who use multiple discriminators that operate on different spatial resolutions. That work in turn is motivated by Durugkar et al. (2016) who use one generator and multiple discriminators concurrently, and Ghosh et al. (2017) who do the opposite with multiple generators and one discriminator. Hierarchical GANs (Denton et al., 2015; Huang et al., 2016; Zhang et al., 2017) define a generator and discrimi nator for each level of an image pyramid. These methods build on the same observation as our work – that the complex mapping from latents to high-resolution images is easier to learn in steps – but the crucial difference is that we have only a single GAN instead of a hierarchy of them. In contrast to early work on adaptively growing networks, e.g., growing neural gas (Fritzke, 1995) and neuro evolut

ion of augmenting topologies (Stanley & Miikkulainen, 2002) that grow networks greedily, we simply defer the introduction of pre-configured layers. In that sense our approach resembles layer-wise training of autoencoders (Bengio et al., 2007).

这个渐进增长的GANs想法是和课程GANs（他们使⽤多种不同空间分辨率的鉴别器）相关的，这个想法就是：把多个在不同空间分辨率上操作的判别器和⼀个单⼀的⽣成器连接，进⼀步的把调整两个分辨率之间的平衡作为训练时间的⼀个函数。这个想法按照两个⽅流⼯作，即Durugkar et al. (2016)提出的同时使⽤⼀个⽣成器和多个判别器的⽅法以及Ghosh et al. (2017)提出的相反的使⽤多个⽣成器和⼀个判别器的⽅法。和早期的⾃适应增长型⽹络相⽐，例如：使⽹络贪婪增长的增长型神经⽓(Fritzke, 1995)以及增强型拓扑结构的神经进化(Stanley & Miikkulainen, 2002)，我们简单的推迟了预配置层的介⼊。这种情况下，我们的⽅法和⾃动编码的智能层训练(Bengio et al., 2007)相像。

3 INCREASING VARIATION USING MINIBATCH STANDARD DEVIATION

GANs have a tendency to capture only a subset of the variation found in training data, and Salimans et al. (2016) suggest "minibatch discrimination" as a solution. They compute feature statistics not only from individual images but also across the minibatch, thus encouraging the minibatches of generated and training images to show similar statistics. This is implemented by adding a minibatch l

ayer towards the end of the discriminator, where the layer learns a large tensor that projects the input activation to an array of statistics. A separate set of statistics is produced for each example in a minibatch and it is concatenated to the layer's output, so that the discriminator can use the statistics internally. We simplify this approach drastically while also improving the variation.

3 使⽤⼩批量标准偏差增加可变性

抓取在训练数据中发现的变量的仅⼀个⼦集是GANs的⼀个趋势，Salimans et al. (2016)提出了"minibatch discrimination"作为解决⽅案。他们不仅从单个图⽚中⽽且还从⼩批量图⽚中计算特征统计，因此促进了⽣成的⼩批量图⽚和训练图⽚展⽰出了相似的统计。这是通过向判别器末端增加⼀个⼩批量层来实施，这个层学习⼀个⼤的张量将输⼊激活投射到⼀个统计数组中。在⼀个⼩批量中的每个样例会产⽣⼀个独⽴的统计集并且和输出层连接，以⾄于判别器可以从本质上使⽤这个统计。我们⼤⼤简化了这个⽅法同时提⾼了可变性。

Our simplified solution has neither learnable parameters nor new hyperparameters. We first compute the standard deviation for each feature in each spatial location over the minibatch. We then average these estimates over all features and spatial locations to arrive at a single value. We replicate the value and concatenate it to all spatial locations and over the minibatch, yielding one additional (const

ant) feature map. This layer could be inserted anywhere in the discriminator, but we have found it best to insert it towards the end (see Appendix A.1 for details). We experimented with a richer set of statistics, but were not able to improve the variation further. In parallel work, Lin et al. (2017) provide theoretical insights about the benefits of showing multiple images to the discriminator.

我们简化的解决⽅案既没有可学习的参数也没有新的超参数。我们⾸先计算基于⼩批量的每个空间位置的每个特征的标准偏差。然后对所有特征和空间位置的评估平均化到⼀个单⼀的值。我们复制这个值并且将它连接到所有空间位置以及⼩批量上，服从⼀个额外的（不变的）特征映射。这个层可以在⽹络中的任何地⽅插⼊，但是我们发现最好是插⼊到末端(see Appendix A.1 for details)。我们⽤⼀个丰富的统计集做实验，但是不能进⼀步提⾼可变性。

Alternative solutions to the variation problem include unrolling the discriminator (Metz et al., 2016) to regularize its updates, and a "repelling regularizer" (Zhao et al., 2017) that adds a new loss term to the generator, trying to encourage it to orthogonalize the feature vectors in a minibatch. The multiple generators of Ghosh et al. (2017) also serve

a similar goal. We acknowledge that these solutions may increase the variation even more than our solution – or possibly be orthogonal to it – but leave a detailed

李民庆comparison to a later time.

针对可变性这个问题另⼀个解决⽅案包括：展开判别器(Metz et al., 2016)去正则化它的更新，以及⼀个 "repelling regularizer" (Zhao et al., 2017)⽅法，即向⽣成器中添加⼀个新的损失项，尝试促进它与⼀个⼩批量中的特征向量正交化。Ghosh et al. (2017）提出的多个⽣成器也满⾜这样⼀个相似的⽬标。我们承认这些解决⽅案可能会增加可变性甚⾄⽐我们的解决⽅案更多–或者可能与它正交–但是后⾯留有⼀个细节性的⽐较。

4 NORMALIZATION IN GENERATOR AND DISCRIMINATOR

GANs are prone to the escalation of signal magnitudes as a result of unhealthy competition between the two networks. Most if not all earlier solutions discourage this by using a variant of batch normalization (Ioffe & Szegedy, 2015; Salimans & Kingma, 2016; Ba et al., 2016) in the generator, and often also in the discriminator. These normalization methods were originally introduced to eliminate covariate shift. However, we have not observed that to be an issue in GANs, and thus believe that the actual need in GANs is constraining signal magnitudes and competition. We use a different approach that consists of two ingredients, neither of which include learnable parameters.

4 在⽣成器和判别器中规范化

由于两个⽹络之间的不健康的⼀个竞争结果，GANs往往会有信号幅度升级情况。⼤多数早期的解决⽅案并不⿎励这种在⽣成器以及在判别器中使⽤批处理正则化的⼀个变量(Ioffe & Szegedy, 2015; Salimans & Kingma, 2016; Ba et al., 2016)的⽅式。这些正则化⽅法原来是消除协变量偏移的。然⽽，我们没有观察到在GANs中存在这个问题，因此相信在GANs中需要的是制约信号幅度以及竞争问题。我们使⽤两个因素且都不包含可学习参数的不同⽅法。

4.1 EQUALIZED LEARNING RATE

We deviate from the current trend of careful weight initialization, and instead use a trivial N (0; 1) initialization and then explicitly scale the weights at runtime. To be precise, we set w^i = wi=c, where wi are the weights and c is the per-layer normalization constant from He's initializer (He et al., 2015). The benefit of doing this dynamically instead of during initialization is somewhat subtle, and relates to the scale-invariance in commonly used adaptive stochastic gradient descent methods such as RMSProp (Tieleman & Hinton, 2012) and Adam (Kingma & Ba, 2015). These

methods normalize a gradient update by its estimated standard deviation, thus making the update independent of the scale of the parameter. As a result, if some parameters have a larger dynamic range than others, they will take longer to adjust. This is a scenario modern initializers cause, and th

三星笔记本r439us it is possible that a learning rate is both too large and too small at the same time. Our approach ensures that the dynamic range, and thus the learning speed, is the same for all weights. A similar reasoning was independently used by van Laarhoven (2017)

4.1 调节学习速率

我们脱离了当前谨慎的权重初始化趋势，使⽤了⼀个数学上最简单的正太分布N (0; 1)初始化，然后在运⾏阶段显⽰缩放权重。为了更精确，我们设置这⾥写图⽚描述，wi是权重，c是来⾃于He等的初始化⽅法 (He et al., 2015)的前⼀层正则化常量。在初始化过程中动态做这种操作的好处是有⼀些微妙的，它关系到常规的使⽤⾃适应随机梯度下降法例如RMSProp (Tieleman & Hinton, 2012) 和 Adam (Kingma & Ba, 2015)⽅法保持的尺度不变性。这些⽅法通过评估标准差正则化⼀个梯度更新，因此使更新不依赖于参数的变化。结果，如果⼀些参数相⽐较其他参数⽽⾔有⼀个更⼤范围的动态变化，他们将花费更长的时间去调整。这是⼀个现在初始化问题⾯临的场景，因此有可能出现在同⼀时间学习速率既是最⼤值也是最⼩值的情况。我们的⽅法保证了动态范围，因此对于所有权重，学习速度都是⼀样的。

4.2 PIXELWISE FEATURE VECTOR NORMALIZATION IN GENERATOR

4.2 ⽣成器中的pixelwise特征向量归⼀化

由于竞争的结果，为了防⽌出现在⽣成器和判别器中的量级逐渐脱离控制的场景，我们对每个像素中的特征向量进⾏归⼀化使每个卷积层之后的⽣成器中的长度可以单位化。

承担民事责任的方式我们只⽤⼀个"局部相应正则化" (Krizhevsky et al., 2012)变量，按照公式

这⾥写图⽚描述配置，其中这⾥写图⽚描述 N表⽰特征匹配的数量，ax,y和bx,y分别表⽰像素（x,y）中的原始和归⼀化特征向量。我们惊喜的发现这个粗率的限制在任何⽅

式下看起来都不会危害到这个⽣成器并且对于⼤多数数据集，它也不会改变太多结果，但是它却在有需要的时候有效的防⽌了信号幅度的增⼤。

5 MULTI-SCALE STATISTICAL SIMILARITY FOR ASSESSING GAN RESULTS

In order to compare the results of one GAN to another, one needs to investigate a large number of images, which can be tedious, difficult, and subjective. Thus it is desirable to rely on automated methods that compute some indicative metric from large image collections. We noticed that existing methods such as MS-SSIM (Odena et al., 2017) find large-scale mode collapses reliably but fail to react to smaller effects such as loss of variation in colors or textures, and they also do not directly

assess image quality in terms of similarity to the training set.

5 评估GAN结果的多尺度统计相似性

为了把⼀个GAN的结果和另⼀个做⽐较，需要调查⼤量的图⽚，这可能是乏味的，困难的并且主观性的。因此依赖⾃动化⽅法–从⼤量的收集图⽚中计算⼀些指⽰性指标是可取的。我们注意到现存的⽅法例如MS-SSIM (Odena et al., 2017)在发现⼤尺度模式的崩塌很可靠，但是对⽐较⼩的影响没有反应例如在颜⾊或者纹理上的损失变化，⽽且它们也不能直接对训练集相似的图⽚质量进⾏评估。

We build on the intuition that a successful generator will produce samples whose local image structure is similar to the training set over all scales. We propose to study this by considering the multiscale statistical similarity between distributions of local image patches drawn from Laplacian pyramid (Burt & Adelson, 1987) representations of generated and target images, starting at a low-pass resolution of 16 × 16 pixels. As per standard practice, the pyramid progressively doubles until the full resolution is reached, each successive level encoding the difference to an up-sampled version of the previous level.

我们的直觉是⼀个成功的⽣成器会基于所有尺度，产⽣局部图像结构和训练集是相似的样例。我们建议通过考虑两个分别来⾃于⽣成样例和⽬标图⽚的 Laplacian⾦字塔表⽰的局部图⽚匹配分布的多尺度统计相似性，并从 16 × 16像素的低通过分辨率开始，进⾏学习。随着每⼀个标准的训练，这个⾦字塔双倍的渐增直到获得全部分辨率，每个连续的⽔平的编码都不同于它先前的上采样版本。

快鸟卫星⼀个单⼀的拉普拉斯⾦字塔等级对应着⼀个特定空间频率带。我们随机采样16384 张图⽚并从拉普拉斯⾦字塔中的每⼀级中提取出128个描述符，每⼀级给我们2.1M描述符。

每⼀个描述符都是带有3个颜⾊通道的 7 × 7相邻像素，通过

来指定。我们把训练集和⽣成集中的l级的匹配分别指定为

我们⾸先标准

述每个颜⾊通道的均值和标准差，然后通过计算他们的

（sliced Wasserstein distance）值评估统计相似性，这是⼀种有效的使⽤512个映射 (Rabin et al., 2011)计算随机近似的EMD值（earthmovers distance）的⽅法。

Intuitively a small Wasserstein distance indicates that the distribution of the patches is similar, meaning that the training images and generator samples appear similar in both appearance and variation

at this spatial resolution. In particular, the distance between the patch sets extracted from the lowestresolution 16 × 16 images indicate similarity in large-scale image structures, while the finest-level

patches encode information about pixel-level attributes such as sharpness of edges and noise.

直观上，⼀个⼩的Wasserstein距离表⽰了块⼉间的分布是相似的，意味着训练样例和⽣成样例在外貌以及空间分辨率的变化上都是相似的。特别是，从最低的分辨率 16 ×16的图⽚上提取出的块⼉集之间的距离表明在⼤尺度图像结构⽅⾯是相似的，然⽽finest-level的块⼉编码了关于像素级属性的信息例如边界的尖锐性和噪声。

6 EXPERIMENTS

In this section we discuss a set of experiments that we conducted to evaluate the quality of

our results. Please refer to Appendix A for detailed description of our network structures

and training configurations. We also invite the reader to consult the accompanying video

In this section we will distinguish between the network structure (e.g., convolutional layers, resizing), training configuration (various normalization layers, minibatch-related

本文发布于:2024-09-21 14:48:41，感谢您对本站的认可！

本文链接：https://www.17tex.com/xueshu/349283.html

上一篇：gan魔方玩法

下一篇：光流估计补偿结合生成对抗网络提高视频超分辨率感知质量

标签：训练分辨率判别成器

留言与评论（共有 0 条评论）