LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人

1、[LG] World-GAN: a Generative Model for Minecraft Worlds

M Awiszus, F Schubert, B Rosenhahn
[Leibniz University Hannover]

This work introduces World-GAN, the first method to perform data-driven Procedural Content Generation via Machine Learning in Minecraft from a single example. Based on a 3D Generative Adversarial Network (GAN) architecture, we are able to create arbitrarily sized world snippets from a given sample. We evaluate our approach on creations from the community as well as structures generated with the Minecraft World Generator. Our method is motivated by the dense representations used in Natural Language Processing (NLP) introduced with word2vec [1]. The proposed block2vec representations make World-GAN independent from the number of different blocks, which can vary a lot in Minecraft, and enable the generation of larger levels. Finally, we demonstrate that changing this new representation space allows us to change the generated style of an already trained generator. World-GAN enables its users to generate Minecraft worlds based on parts of their creations.


2、[LG] Distributed Deep Learning in Open Collaborations

M Diskin, A Bukhtiyarov, M Ryabinin, L Saulnier, Q Lhoest, A Sinitsin, D Popov, D Pyrkin, M Kashirin, A Borzunov, A V d Moral, D Mazur, I Kobelev, Y Jernite, T Wolf, G Pekhimenko
[Yandex & Hugging Face & HSE University & University of Toronto]

Modern deep learning applications require increasingly more compute to train state-of-the-art models. To address this demand, large corporations and institutions use dedicated High-Performance Computing clusters, whose construction and maintenance are both environmentally costly and well beyond the budget of most organizations. As a result, some research directions become the exclusive domain of a few large industrial and even fewer academic actors. To alleviate this disparity, smaller groups may pool their computational resources and run collaborative experiments that benefit all participants. This paradigm, known as gridor volunteer computing, has seen successful applications in numerous scientific areas. However, using this approach for machine learning is difficult due to high latency, asymmetric bandwidth, and several challenges unique to volunteer computing. In this work, we carefully analyze these constraints and propose a novel algorithmic framework designed specifically for collaborative training. We demonstrate the effectiveness of our approach for SwAV and ALBERT pretraining in realistic conditions and achieve performance comparable to traditional setups at a fraction of the cost. Finally, we provide a detailed report of successful collaborative language model pretraining with 40 participants.


3、[LG] Riemannian Convex Potential Maps

S Cohen, B Amos, Y Lipman
[University College London & Facebook AI Research]

Modeling distributions on Riemannian manifolds is a crucial component in understanding nonEuclidean data that arises, e.g., in physics and geology. The budding approaches in this space are limited by representational and computational tradeoffs. We propose and study a class of flows that uses convex potentials from Riemannian optimal transport. These are universal and can model distributions on any compact Riemannian manifold without requiring domain knowledge of the manifold to be integrated into the architecture. We demonstrate that these flows can model standard distributions on spheres, and tori, on synthetic and geological data.


4、[CV] How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers

A Steiner, A Kolesnikov, X Zhai, R Wightman, J Uszkoreit, L Beyer
[Google Research]

Vision Transformers (ViT) have been shown to attain highly competitive performance for a wide range of vision applications, such as image classification, object detection and semantic image segmentation. In comparison to convolutional neural networks, the Vision Transformer’s weaker inductive bias is generally found to cause an increased reliance on model regularization or data augmentation (“AugReg” for short) when training on smaller training datasets. We conduct a systematic empirical study in order to better understand the interplay between the amount of training data, AugReg, model size and compute budget. 1 As one result of this study we find that the combination of increased compute and AugReg can yield models with the same performance as models trained on an order of magnitude more training data: we train ViT models of various sizes on the public ImageNet-21k dataset which either match or outperform their counterparts trained on the larger, but not publicly available JFT-300M dataset.


5、[CV] HifiFace: 3D Shape and Semantic Prior Guided High Fidelity Face Swapping

Y Wang, X Chen, J Zhu, W Chu, Y Tai, C Wang, J Li, Y Wu, F Huang, R Ji
[Tencent & Xiamen University]

In this work, we propose a high fidelity face swapping method, called HifiFace, which can well preserve the face shape of the source face and generate photo-realistic results. Unlike other existing face swapping works that only use face recognition model to keep the identity similarity, we propose 3D shape-aware identity to control the face shape with the geometric supervision from 3DMM and 3D face reconstruction method. Meanwhile, we introduce the Semantic Facial Fusion module to optimize the combination of encoder and decoder features and make adaptive blending, which makes the results more photo-realistic. Extensive experiments on faces in the wild demonstrate that our method can preserve better identity, especially on the face shape, and can generate more photo-realistic results than previous state-of-the-art methods.



[LG] Non Gaussian Denoising Diffusion Models

E Nachmani, R S Roman, L Wolf
[Tel-Aviv University]

[CL] Memory-efficient Transformers via Top-k Attention

A Gupta, G Dar, S Goodman, D Ciprut, J Berant
[IBM Research & Tel Aviv University]

[CV] Bridging the Gap Between Object Detection and User Intent via Query-Modulation

M Fornoni, C Yan, L Luo, K Wilber, A Stark, Y Cui, B Gong, A Howard
[Google Research & University of Texas at Arlington]

[LG] Learning to Generate Code Sketches

D Guo, A Svyatkovskiy, J Yin, N Duan, M Brockschmidt, M Allamanis
[Microsoft Research & Microsoft & Sun Yat-sen University]