1、[CL] Larger-Scale Transformers for Multilingual Masked Language Modeling

N Goyal, J Du, M Ott, G Anantharaman, A Conneau
[Facebook AI]

Recent work has demonstrated the effectiveness of cross-lingual language model pretraining for cross-lingual understanding. In this study, we present the results of two larger multilingual masked language models, with 3.5B and 10.7B parameters. Our two new models dubbed XLM-RXL and XLM-RXXL outperform XLM-R by 1.8% and 2.4% average accuracy on XNLI. Our model also outperforms the RoBERTa-Large model on several English tasks of the GLUE benchmark by 0.3% on average while handling 99 more languages. This suggests pretrained models with larger capacity may obtain both strong performance on high-resource languages while greatly improving low-resource languages. We make our code and models publicly available.


2、[CV] Neural Monocular 3D Human Motion Capture with Physical Awareness

S Shimada, V Golyanik, W Xu, P Pérez, C Theobalt
[Max Planck Institute for Informatics & Facebook Reality Labs & Valeo.ai]
物理感知神经网络单目3D人体运动捕捉。提出了一种新的可训练系统,以实现物理上可信的无标记3D人体运动捕捉,在广泛的挑战性场景中取得了最先进的结果。与大多数人体运动捕捉的神经网络方法不同,所提出的physionical,具有对物理和环境限制的感知,以一种完全可微的方式结合了几个关键创新:1.比例-派生控制器,其增益由神经网络预测,即使快速运动情况下也能减少延迟;2.明确的刚体动力学模型;3.新的优化层,防止物理上不可信的脚底穿透作为一个硬约束条件。系统输入是二维联合关键点,以一种新的方式被规范化,在训练时和测试时减少对固有相机参数的依赖。3D标注不可用时,模型只使用2D标注进行微调。在各种具有挑战性的场景(包括新录制场景)中,能以交互式帧率产生平滑的、符合物理原理的3D运动,其优势在实际序列中尤其明显,这些序列与常见的3D姿态估计基准(如Human 3.6M和MPI-INF-3DHP)有很大不同。

We present a new trainable system for physically plausible markerless 3D human motion capture, which achieves state-of-the-art results in a broad range of challenging scenarios. Unlike most neural methods for human motion capture, our approach, which we dub physionical, is aware of physical and environmental constraints. It combines in a fully differentiable way several key innovations, i.e., 1. a proportional-derivative controller, with gains predicted by a neural network, that reduces delays even in the presence of fast motions, 2. an explicit rigid body dynamics model and 3. a novel optimisation layer that prevents physically implausible foot-floor penetration as a hard constraint. The inputs to our system are 2D joint keypoints, which are canonicalised in a novel way so as to reduce the dependency on intrinsic camera parameters — both at train and test time. This enables more accurate global translation estimation without generalisability loss. Our model can be finetuned only with 2D annotations when the 3D annotations are not available. It produces smooth and physically principled 3D motions in an interactive frame rate in a wide variety of challenging scenes, including newly recorded ones. Its advantages are especially noticeable on in-the-wild sequences that significantly differ from common 3D pose estimation benchmarks such as Human 3.6M and MPI-INF-3DHP. Qualitative results are available atthis http URL


3、[CV] Robust and Generalizable Visual Representation Learning via Random Convolutions

Z Xu, D Liu, J Yang, C Raffel, M Niethammer
[University of North Carolina at Chapel Hill & Yale University]

While successful for various computer vision tasks, deep neural networks have shown to be vulnerable to texture style shifts and small perturbations to which humans are robust. In this work, we show that the robustness of neural networks can be greatly improved through the use of random convolutions as data augmentation. Random convolutions are approximately shape-preserving and may distort local textures. Intuitively, randomized convolutions create an infinite number of new domains with similar global shapes but random local texture. Therefore, we explore using outputs of multi-scale random convolutions as new images or mixing them with the original images during training. When applying a network trained with our approach to unseen domains, our method consistently improves the performance on domain generalization benchmarks and is scalable to ImageNet. In particular, in the challenging scenario of generalizing to the sketch domain in PACS and to ImageNet-Sketch, our method outperforms state-of-art methods by a large margin. More interestingly, our method can benefit downstream tasks by providing a more robust pretrained visual representation.


4、[CL] BERT memorisation and pitfalls in low-resource scenarios

M Tänzer, S Ruder, M Rei
[Imperial College London & DeepMind]

State-of-the-art pre-trained models have been shown to memorise facts and perform well with limited amounts of training data. To gain a better understanding of how these models learn, we study their generalisation and memorisation capabilities in noisy and low-resource scenarios. We find that the training of these models is almost unaffected by label noise and that it is possible to reach near-optimal performances even on extremely noisy datasets. Conversely, we also find that they completely fail when tested on low-resource tasks such as fewshot learning and rare entity recognition. To mitigate such limitations, we propose a novel architecture based on BERT and prototypical networks that improves performance in lowresource named entity recognition tasks.


5、[AI] One Model to Rule them All: Towards Zero-Shot Learning for Databases

B Hilprecht, C Binnig
[TU Darmstadt]

In this paper, we present our vision of so called zero-shot learning for databases which is a new learning approach for database components. Zero-shot learning for databases is inspired by recent advances in transfer learning of models such as GPT-3 and can support a new database out-of-the box without the need to train a new model. As a first concrete contribution in this paper, we show the feasibility of zero-shot learning for the task of physical cost estimation and present very promising initial results. Moreover, as a second contribution we discuss the core challenges related to zero-shot learning for databases and present a roadmap to extend zero-shot learning towards many other tasks beyond cost estimation or even beyond classical database systems and workloads.



