Title

Image Processing Using Multi-Code GAN Prior

Information

论文地址:https://arxiv.org/abs/1912.07116
github地址:https://github.com/genforce/mganprior

Summary

作者提出利用多维latent code实现图像重建,通过预训练好的GAN网络将这些latent code生成的特征图在某一层用自适应权重结合起来,可以得到较好的图像重建效果。该方法也同样可以用在上色、超分、修复、去噪等任务中。

Research Objective

提高图像重建的质量

Problem Statement

图像重建的普遍做法是从图像中获取latent code, 为此有两种方法。

  1. 根据重建的损失函数反向传播来优化latent code
  2. 训练一个额外的encoder对图像实现编码 但这些重建都不符合预期质量,同时如果输入图片和训练GAN的数据不在同一个domain差距会更大。作者认为只靠single latent code是无法真正重建图像的(否则图像压缩技术将取得重大发展)

作者期望基于已有的GAN模型,实现高清图像的高质量重建

Method(s)

Structure

Image Processing Using Multi-Code GAN Prior - 图1
作者使用了多维latent code,使用GAN生成中间特征,用自适应通道权重将它们组合起来(Feature Composition & Adaptive Channel Importance),以生成最终结果。

optimization

  • loss的一般形式:

Image Processing Using Multi-Code GAN Prior - 图2

  • 图像上色:

Image Processing Using Multi-Code GAN Prior - 图3

  • 超分:

Image Processing Using Multi-Code GAN Prior - 图4

  • 图像修补:

Image Processing Using Multi-Code GAN Prior - 图5

Image Processing Using Multi-Code GAN Prior - 图6表示像素级乘法

Evaluation

先验GAN网络使用了PGGAN和StyleGAN,训练基于不同数据集,包括CelebA-HQ, FFHQ, LSUN。

和其它Inversion methods的比较

  1. 直接优化单一latent code
  2. 用encoder生成latent code
  3. 1和2的结合
  4. 作者提出的mGANprior方法

评价标准:Peak Signal-to-Noise Ratio(PSNR,越高越好)和LPIPS(越低越好)
Image Processing Using Multi-Code GAN Prior - 图7
Image Processing Using Multi-Code GAN Prior - 图8
作者提出的方法在PSNR和LPIPS均有优势,同时细节还原得更好,也可以在西方人的预训练GAN上还原东方人。

自身的比较

  1. latent code的维度
    Image Processing Using Multi-Code GAN Prior - 图9
    Latent code增多会带来更好的细节,但增长到20以后就没有什么提升了
  2. 在哪一层做feature composition
    结果同样在figure4中,基于PGGAN的实验表示,层数越高,效果越好。但层数高了以后,特征图中会含有更多的pixel信息而非语义信息,这不利于复用GAN的先验语义知识
  3. 可视化latent code对最终结果的影响
    Image Processing Using Multi-Code GAN Prior - 图10
    可以看出每个latent code负责生成图像的某一个部分,因此multiple latent codes是有意义的

    不同图像任务的实验

    不细写了,就基于不同任务(上色、修补这些)去做对比实验

    不同的层对结果的影响

  • 重建:层越高,重建效果又好,因为高层表示内容细节
  • 着色:第8层,因为着色是低级渲染任务
  • 修复:第4层,因为需要GAN去填补丢失的部分

Conclusion

作者认为自己的方法能高还原地重建图像, 在上色、超分、图像修补、语义操控上都有着杰出表现。

Criticism

latent code的维度实验中,作者画了个图,纵轴是correlation,这个图不能理解

Reference

  • 提高GAN生成能力
    • [23]Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of gans for improved quality, stability, and variation. In ICLR, 2018
    • [8]Andrew Brock, Jeff Donahue, and Karen Simonyan. Large scale gan training for high fidelity natural image synthesis. In ICLR, 2019.
    • [24]Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. In CVPR, 2019
  • 增强GAN的训练稳定性
    • [1] Martin Arjovsky, Soumith Chintala, and L´eon Bottou. Wasserstein gan. arXiv preprint arXiv:1701.07875, 2017.
    • [7] David Berthelot, Thomas Schumm, and Luke Metz. Began: Boundary equilibrium generative adversarial networks. arXiv preprint arXiv:1703.10717, 2017
    • [17]Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. Improved training of wasserstein gans. In NeurIPS, 2017.
  • 人脸属性编辑
    • [27]-新的网络结构 Guillaume Lample, Neil Zeghidour, Nicolas Usunier, Antoine Bordes, Ludovic Denoyer, and Marc’Aurelio Ranzato. Fader networks: Manipulating images by sliding attributes. In NeurIPS, 2017
    • [36]-新loss Yujun Shen, Ping Luo, Junjie Yan, Xiaogang Wang, and Xiaoou Tang. Faceid-gan: Learning a symmetry three-player gan for identity-preserving face synthesis. In CVPR, 2018
    • [35]Yujun Shen, Jinjin Gu, Xiaoou Tang, and Bolei Zhou. Interpreting the latent space of gans for semantic face editing. In CVPR, 2020

  • 超分
    • [28]-新loss Christian Ledig, Lucas Theis, Ferenc Husz´ar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. Photorealistic single image super-resolution using a generative adversarial network. In CVPR, 2017
    • [42]Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Yu Qiao, and Chen Change Loy. Esrgan: Enhanced super-resolution generative adversarial networks. In ECCV Workshop, 2018.
  • image2image
    • [53]-新的网络结构 Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image translation using cycleconsistent adversarial networks. In ICCV, 2017.
    • [11] Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In CVPR, 2018.
    • [31]Ming-Yu Liu, Xun Huang, Arun Mallya, Tero Karras, Timo Aila, Jaakko Lehtinen, and Jan Kautz. Few-shot unsupervised image-to-image translation. In ICCV, 2019.
  • GAN作为先验用于图像生成和重建
    • [40]Paul Upchurch, Jacob Gardner, Geoff Pleiss, Robert Pless, Noah Snavely, Kavita Bala, and Kilian Weinberger. Deep feature interpolation for image content changes. In CVPR, 2017
    • [39]Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky.
      Deep image prior. In CVPR, 2018
    • [2]Latent convolutional models. In ICLR, 2019.
  • 从图像生成latent code(GAN Inversion)
    • 根据重建损失反向传播优化latent
      • [30] Zachary C Lipton and Subarna Tripathi. Precise recovery of latent vectors from generative adversarial networks. In ICLR Workshop, 2017.
      • [12] Antonia Creswell and Anil Anthony Bharath. Inverting the generator of a generative adversarial network. TNNLS, 2018
      • [32]Fangchang Ma, Ulas Ayaz, and Sertac Karaman. Invertibility of convolutional generative networks from partial measurements. In NeurIPS, 2018.
    • 训练额外的encoder
      • [34] Guim Perarnau, Joost Van De Weijer, Bogdan Raducanu, and Jose M ´Alvarez. Invertible conditional gans for image editing. In NeurIPS Workshop, 2016.
      • [52] Jun-Yan Zhu, Philipp Kr¨ahenb¨uhl, Eli Shechtman, and Alexei A Efros. Generative visual manipulation on the natural image manifold. In ECCV, 2016.
      • [6]David Bau, Jun-Yan Zhu, Jonas Wulff, William Peebles, Hendrik Strobelt, Bolei Zhou, and Antonio Torralba. Seeing what a gan cannot generate. In ICCV, 2019.
      • [5]David Bau, Jun-Yan Zhu, Jonas Wulff, William Peebles, Hendrik Strobelt, Bolei Zhou, and Antonio Torralba. Inverting layers of a large generator. In ICLR Workshop, 2019.
  • perceptual features
    • [22]Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In ECCV, 2016.