LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人

1、[CV] Enhancing Photorealism Enhancement

S R. Richter, H A AlHaija, V Koltun
[Intel Lab]

We present an approach to enhancing the realism of synthetic images. The images are enhanced by a convolutional network that leverages intermediate representations produced by conventional rendering pipelines. The network is trained via a novel adversarial objective, which provides strong supervision at multiple perceptual levels. We analyze scene layout distributions in commonly used datasets and find that they differ in important ways. We hypothesize that this is one of the causes of strong artifacts that can be observed in the results of many prior methods. To address this we propose a new strategy for sampling image patches during training. We also introduce multiple architectural improvements in the deep network modules used for photorealism enhancement. We confirm the benefits of our contributions in controlled experiments and report substantial gains in stability and realism in comparison to recent image-to-image translation methods and a variety of other baselines.


2、[LG] Diffusion Models Beat GANs on Image Synthesis

P Dhariwal, A Nichol
扩散模型在图像合成上击败GAN。本文表明,扩散模型可实现优于当前最先进的生成模型的图像样本质量,在无条件图像合成上实现了这一点,通过一系列消融找到了更好的架构。对于有条件的图像合成,通过分类器引导进一步提高样本质量:这是一种简单的、计算效率高的方法,利用分类器梯度来权衡多样性和样本质量。在ImageNet 128×128上实现了2.97的FID,在ImageNet 256×256上实现了4.59的FID,在ImageNet 512×512上实现了7.72的FID,在每个样本只有25次前向传播的情况下就能与BigGAN-deep相匹配,同时保持更好的分布覆盖。同时,分类器指导与上采样扩散模型结合得很好,在ImageNet 512×512上的FID进一步提高到3.85。

We show that diffusion models can achieve image sample quality superior to the current state-of-the-art generative models. We achieve this on unconditional image synthesis by finding a better architecture through a series of ablations. For conditional image synthesis, we further improve sample quality with classifier guidance: a simple, compute-efficient method for trading off diversity for sample quality using gradients from a classifier. We achieve an FID of 2.97 on ImageNet 128× 128, 4.59 on ImageNet 256× 256, and 7.72 on ImageNet 512× 512, and we match BigGAN-deep even with as few as 25 forward passes per sample, all while maintaining better coverage of the distribution. Finally, we find that classifier guidance combines well with upsampling diffusion models, further improving FID to 3.85 on ImageNet 512× 512. We release our code at https://github.com/ openai/guided-diffusion.


3、[LG] Deep Neural Networks as Point Estimates for Deep Gaussian Processes

V Dutordoir, J Hensman, M v d Wilk, C H Ek, Z Ghahramani, N Durrande
[University of Cambridge & Amazon & Imperial College London]
用深度神经网络作为深度高斯过程的点估计。由于与贝叶斯推理相关的挑战和成本,深高斯过程(DGP)在应用中一直在努力寻求相关性。本文为DGP提出一种稀疏的变种近似方法,其近似的后验平均值具有与深度神经网络相同的数学结构。通过找到一种域间变换,将GP后验均值表示为ReLU基函数的总和,使DGP的前向传递与ReLU DNN相当。这种统一使DGP作为一个神经网络进行初始化和训练,利用了深度学习社区的成熟做法,极大地帮助了推理任务。实验表明,与目前的DGP方法相比,精度提高了,训练速度加快了,同时保留了有利的预测不确定性。

Deep Gaussian processes (DGPs) have struggled for relevance in applications due to the challenges and cost associated with Bayesian inference. In this paper we propose a sparse variational approximation for DGPs for which the approximate posterior mean has the same mathematical structure as a Deep Neural Network (DNN). We make the forward pass through a DGP equivalent to a ReLU DNN by finding an interdomain transformation that represents the GP posterior mean as a sum of ReLU basis functions. This unification enables the initialisation and training of the DGP as a neural network, leveraging the well established practice in the deep learning community, and so greatly aiding the inference task. The experiments demonstrate improved accuracy and faster training compared to current DGP methods, while retaining favourable predictive uncertainties.


4、[CV] VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning

A Bardes, J Ponce, Y LeCun
[Facebook AI Research & PSL Research University]

Recent self-supervised methods for image representation learning are based on maximizing the agreement between embedding vectors from different views of the same image. A trivial solution is obtained when the encoder outputs constant vectors. This collapse problem is often avoided through implicit biases in the learning architecture, that often lack a clear justification or interpretation. In this paper, we introduce VICReg (Variance-Invariance-Covariance Regularization), a method that explicitly avoids the collapse problem with a simple regularization term on the variance of the embeddings along each dimension individually. VICReg combines the variance term with a decorrelation mechanism based on redundancy reduction and covariance regularization, and achieves results on par with the state of the art on several downstream tasks. In addition, we show that incorporating our new variance term into other methods helps stabilize the training and leads to performance improvements.


5、[CV] NeRD: Neural 3D Reflection Symmetry Detector

Y Zhou, S Liu, Y Ma
[UC Berkeley & Univ. of Southern California]

Recent advances have shown that symmetry, a structural prior that most objects exhibit, can support a variety of single-view 3D understanding tasks. However, detecting 3D symmetry from an image remains a challenging task. Previous works either assume that the symmetry is given or detect the symmetry with a heuristic-based method. In this paper, we present NeRD, a Neural 3D Reflection Symmetry Detector, which combines the strength of learning-based recognition and geometry-based reconstruction to accurately recover the normal direction of objects’ mirror planes. Specifically, we first enumerate the symmetry planes with a coarse-to-fine strategy and then find the best ones by building 3D cost volumes to examine the intra-image pixel correspondence from the symmetry. Our experiments show that the symmetry planes detected with our method are significantly more accurate than the planes from direct CNN regression on both synthetic and real-world datasets. We also demonstrate that the detected symmetry can be used to improve the performance of downstream tasks such as pose estimation and depth map regression. The code of this paper has been made public at https://github.com/zhou13/nerd .



[CV] Measuring Model Biases in the Absence of Ground Truth

O Aka, K Burke, A Bäuerle, C Greer, M Mitchell
[Google & Ulm University]

[SI] COVID-19 Vaccine Hesitancy on Social Media: Building a Public Twitter Dataset of Anti-vaccine Content, Vaccine Misinformation and Conspiracies

G Muric, Y Wu, E Ferrara
[University of Southern California]

[LG] Leveraging Sparse Linear Layers for Debuggable Deep Networks

E Wong, S Santurkar, A Mądry

[LG] Optimization of Graph Neural Networks: Implicit Acceleration by Skip Connections and More Depth

K Xu, M Zhang, S Jegelka, K Kawaguchi
[MIT & The University of Maryland & Harvard University]