Super-Resolution - Attention-based Multi-Reference Learning for Image Super-Resolution - 《机器学习》

Abstract
1.Introduction
2. Related work
- 2.1 Single-image super-resolution(SISR)

Abstract

This paper proposes a novel Attention-based Multi Reference Super-resolution network (AMRSR) that, given a low-resolution image, learns to adaptively transfer the most similar texture from multiple reference images to the super-resolution output whilst maintaining spatial coherence.
【从相关的图像中学习纹理细节用于恢复图像。】The use of multiple reference images together with attention-based sampling is demonstrated to achieve signifificantly improved performance over state-of-the-art reference super-resolution approaches on multiple benchmark datasets. Reference super-resolution approaches have recently been proposed to overcome the ill-posed problem of image super-resolution by providing additional information from a high-resolution reference image. Multi-reference super-resolution extends this approach by providing a more diverse pool of image features to overcome the inherent information defificit whilst maintaining memory effificiency. A novel hierarchical attention-based sampling approach is introduced to learn the similarity between low-resolution image features and multiple reference images based on a perceptual loss. Ablation demonstrates the contribution of both multi-reference and hierarchical attention-based sampling to overall performance. Perceptual and quantitative ground-truth evaluation demonstrates signifificant improvement in performance even when the reference images deviate signifificantly from the target image. The project website can be found at https://marcopesavento.github.io/AMRSR/

这篇论文的核心思路在于，用相似图像来学习细节的内容，以用于图像超分。

Ablation demonstrates the contribution of both multi-reference and hierarchical attention-based sampling to overall performance.
作者又做了一个实验，看multi-reference和hierarchical attention-based sampling
这里提到了两个核心的概念，multi-reference 和 hierarchical attention-based sampling。
其中的multi-reference我的理解是，用其他图片的纹理来恢复当前的图片。hierarchical attention-based sampling是什么呢。层次的基于注意力的采样吗

什么叫

Perceptual and quantitative ground-truth evaluation demonstrates significant improvement in performance even when the reference images deviate significantly from the target image.

1.Introduction

Image super-resolution (SR) aims to estimate a perceptually plausible high-resolution (HR) image from a low-resolution (LR) input image [38]. This problem is ill-posed due to the inherent information defificit between LR and HR images. Classic super-resolution image processing [24] and deep learning based approaches [37] result in visual artefacts for large up-scaling factors (4
⇥). To overcome this limitation, recent research has introduced the sub-problem of reference
image super-resolution (RefSR) [6, 41, 46]. Given an input LR image and a similar HR reference image, RefSR approaches estimate a SR image. Reference super-resolution with a single reference image has been demonstrated to improve performances over general SR methods achieving
large up-scaling with reduced visual artefacts.

从这里可以看出，
Classic super-resolution image processing and deep learning based approaches result in visual artefacts for large up-scaling factors.传统的超分和深度学习的方法会导致图像伪影。

Reference super-resolution with a single reference image has been demonstrated to improved performances over general SR methods achieveing large up-scaling with reduced visual artefacts.
这里reference 方法可以降低图像伪影。

In this paper we generalise reference super-resolution to use multiple reference images giving a pool of image features and propose a novel attention-based sampling approach to learn the perceptual similarity between reference features and the LR input.
这里是用novel attention-based sampling approach to learn the perceptual similarity.
这里说明，reference 数据集，attention-based sampling 是学习方法。

The proposed attention-based multiple reference super-resolution network (AMRSR) is designed
to allow multiple HR reference images by introducing a hierarchical attention-based mapping of LR input feature subvectors into HR reference feature vectors, focusing the learning attention on the LR input.This allows training with multiple HR reference images which would not be possible
with a naive extension of existing single-reference super resolution methods without a significant increase in memory footprint. Figure 1 qualitatively illustrates the performance of the proposed AMRSR approach against state-of-the-art single-image super-resolution (CSNLN [22], RSRGAN [42]) and RefSR (SRNTT [44]) approaches.

什么叫attention-based multiple reference super-resolution network就是说，我们是在attention-based 的基础之上，对multiple reference 进行处理的，super-resolution network.通过引入hierarchical attention-based mapping of LR input feature subvectors into HR reference feature vectors, focusing the learning attention on the LR input.采用hierarchical attention-based 方法来mapping LR input feature subvectors 和 HR reference feature vectors。关键在于学习，attention onn the LR input.

This allows training with multiple HR reference images which would not be possible with a naive extension of existing single-reference super resolution methods without a significant increase in memory footprint.
没理解为什么这个方法就跟内存也有关系了。

Figure 1 qualitatively illustrates the performance of the proposed AMRSR approach against state-of the-art single-image super-resolution (CSNLN [22], RSRGAN [42]) and RefSR (SRNTT [44]) approaches. Given NM _reference images, AMRSR produces a 4⇥ _SR image which is perceptually plausible and has a similar level of detail to the ground-truth HR image.

The primary contributions of the AMRSR approach presented in this paper are:
• Generalisation of single reference super-resolution to multiple reference images whilst improving memory efficiency thanks to a part-based mechanism.
• Hierarchical attention-based adaptive sampling for perceptual similarity learning between low-resolution image features and multiple HR reference images.
• Improved quantitative and perceptual performance for image super-resolution compared with state-of-the-art single-image RefSR.

Generalisation of single reference super-resolution to multiple reference images whilst improving memory efficiency thanks to a part-based machanism.
通过single reference super-resolution 来产生 multiple reference images，同时利用part-based machanism 方法来提高内存效率。

Hierarchical attention-based adaptive sampling for perceptual similarity learning between low-resolution image features and multiple HR reference images.
通过Hierarchical attention-based adaptive sampling来学习LR和HR之间的perceptual similarity.

Improved quantitative and perceptual performance for image super-resolution compared with state-of-the-art single-image RefSR.
提高量化和感知表现

2. Related work

2.1 Single-image super-resolution(SISR)

A breakthrough in the SISR task was achieved when Dong et al. [9] tackled the problem with a convolutional neural network (CNN). From this work, the application of deep learning progressively replaced classic SR computer vision methods [37]. The pioneer work of Dong et al. [9] belongs to a group of SR methods that use mean squared error (MSE) as their objective function. VDSR [14] shows the importance of a deep layer architecture while SRResNet [15] and EDSR [19] demonstrate the benefifit of using residual block [12] to alleviate the training. Several modifications of the residual structure such as skip connections [33], recursive structures [31] and channel attention [43] further improved the accuracy of SISR. The state-of-the-art CSNLN [22] integrates a cross-scale non-local attention module to learn dependencies between the
LR and HR images. Other works propose lightweight networks to alleviate computational cost [20, 23]. These residual networks ignore the human perception and only aim to high values of PSNR and SSIM, producing blurry SR images [37]. Generative adversarial networks (GANs), introduced in the SR task by Ledig et al. with SRGAN [15], aim to enhance the perceptual quality of the SR images. The performances of SRGAN were improved by ESRGAN [15], which replaces the adversarial loss with a relativistic adversarial loss. RSRGAN [42] develops a rank-content loss by
training a ranker to obtain state-of-the-art visual results.

这一段主要说的是，自从Dong第一个开始使用深度学习网络来完成SR任务的时候，SR领域迈入了一个新的时代。传统的SR方法逐渐被取代。随着网络的升级，损失函数的选择也在不断的升级。VDSR——Very Deep Convolutional Neural Network，在VDSR的基础之上，又提出了SRResNet，解决了网络太深梯度损失和梯度爆炸的问题，引入了Residual Block来解决了。
EDSR=Enhanced Deep Residual Network for Single Image Super Resolution
EDSR说了啥啊，我给忘了。EDSR就是在SRResNet的基础上把BN去掉了。

3D appearance super-resolution: There are only two deep learning works that super-resolve texture maps to enhance the appearance of 3D objects. The method proposed by Li et al. [18] processes, with a modifified version of EDSR [19], LR texture maps and their normal maps
to incorporate geometric information of the model in the learning. The pre-process to create normal maps introduces heavy computational cost. In the second work [25],a redundancy-based encoder generates a blurry texture map from LR images that is then deblurred by a SISR decoder.
Its main objective is not the super-resolution but the creation of texture maps from a set of LR multi-view images.