LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人

1、[CL] Reliability Testing for Natural Language Processing Systems

S Tan, S Joty, K Baxter, A Taeihagh, G A. Bennett, M Kan
[Salesforce Research & National University of Singapore]

Questions of fairness, robustness, and transparency are paramount to address before deploying NLP systems. Central to these concerns is the question of reliability: Can NLP systems reliably treat different demographics fairly and function correctly in diverse and noisy environments? To address this, we argue for the need for reliability testing and contextualize it among existing work on improving accountability. We show how adversarial attacks can be reframed for this goal, via a framework for developing reliability tests. We argue that reliability testing — with an emphasis on interdisciplinary collaboration — will enable rigorous and targeted testing, and aid in the enactment and enforcement of industry standards.


2、[CV] Computer-Aided Design as Language

Y Ganin, S Bartunov, Y Li, E Keller, S Saliceti
[DeepMind & Onshape]

Computer-Aided Design (CAD) applications are used in manufacturing to model everything from coffee mugs to sports cars. These programs are complex and require years of training and experience tomaster. A component of all CAD models particularly difficult to make are the highly structured 2D sketches that lie at the heart of every 3D construction. In this work, we propose a machine learning model capable of automatically generating such sketches. Through this, we pave the way for developing intelligent tools that would help engineers create better designs with less effort. Our method is a combination of a general-purpose language modeling technique alongside an off-the-shelf data serialization protocol. We show that our approach has enough flexibility to accommodate the complexity of the domain and performs well for both unconditional synthesis and image-to-sketch translation.


3、[CV] Function4D: Real-time Human Volumetric Capture from Very Sparse Consumer RGBD Sensors

T Yu, Z Zheng, K Guo, P Liu, Q Dai, Y Liu
[Tsinghua University & Google & Chinese Academy of Sciences]
Function4D: 用非常稀疏的消费级RGBD传感器实时捕捉人体数据。人体捕捉是计算机视觉和计算机图形学领域的一个长期课题。尽管使用复杂的离线系统可以获得高质量的结果,但对复杂场景下的实时人体捕捉,尤其是使用轻量级的设置,仍然具有挑战性。本文中提出一种结合时体融合和深度隐函数的人体捕捉方法。为实现高质量和时间连续重建,提出了动态滑动融合,将相邻的深度观测值与拓扑结构一致性融合起来。为生成详细和完整的表面,为RGBD输入提出了保留细节的深度隐函数,不仅可保留深度输入的几何细节,还可生成更合理的纹理结果。实验表明,该方法在视图稀疏度、泛化能力、重建质量和运行时间效率方面都优于现有方法。

4、[CL] Keep Learning: Self-supervised Meta-learning for Learning from Inference

A Kedia, S C Chinthakindi
[Samsung Research]

A common approach in many machine learning algorithms involves self-supervised learning on large unlabeled data before fine-tuning on downstream tasks to further improve performance. A new approach for language modelling, called dynamic evaluation, further fine-tunes a trained model during inference using trivially-present ground-truth labels, giving a large improvement in performance. However, this approach does not easily extend to classification tasks, where ground-truth labels are absent during inference. We propose to solve this issue by utilizing self-training and back-propagating the loss from the model’s own class-balanced predictions (pseudo-labels), adapting the Reptile algorithm from meta-learning, combined with an inductive bias towards pre-trained weights to improve generalization. Our method improves the performance of standard backbones such as BERT, Electra, and ResNet-50 on a wide variety of tasks, such as question answering on SQuAD and NewsQA, benchmark task SuperGLUE, conversation response selection on Ubuntu Dialog corpus v2.0, as well as image classification on MNIST and ImageNet without any changes to the underlying models. Our proposed method outperforms previous approaches, enables self-supervised finetuning during inference of any classifier model to better adapt to target domains, can be easily adapted to any model, and is also effective in online and transfer-learning settings.


5、[CV] 4DComplete: Non-Rigid Motion Estimation Beyond the Observable Surface

Y Li, H Takehara, T Taketomi, B Zheng, M Nießner
[The University of Tokyo & Huawei & Technical University Munich]
4DComplete: 超越可观察表面的非刚性运动估计。用测距传感器跟踪非刚性变形有许多应用,包括计算机视觉、AR/VR和机器人技术。然而,由于遮挡和距离传感器的物理限制,现有方法只处理可见表面,从而导致运动场的不连续和不完整。本文提出4DComplete,一种新的数据驱动方法,可以估计未观察到的几何体的非刚性运动。4DComplete将部分形状和运动观测作为输入,提取4D时间空间嵌入,用稀疏全卷积网络联合推断出缺失的几何体和运动场。构建了大规模的非刚性的4D数据集DeformingThings4D,用于训练和基准测试。该数据集由1,972个动画序列和122,365帧组成,横跨31个不同的动物或人形类别,具有密集的4D标注。

Tracking non-rigidly deforming scenes using range sensors has numerous applications including computer vision, AR/VR, and robotics. However, due to occlusions and physical limitations of range sensors, existing methods only handle the visible surface, thus causing discontinuities and incompleteness in the motion field. To this end, we introduce 4DComplete, a novel data-driven approach that estimates the non-rigid motion for the unobserved geometry. 4DComplete takes as input a partial shape and motion observation, extracts 4D time-space embedding, and jointly infers the missing geometry and motion field using a sparse fully-convolutional network. For network training, we constructed a large-scale synthetic dataset called DeformingThings4D, which consists of 1,972 animation sequences spanning 31 different animals or humanoid categories with dense 4D annotation. Experiments show that 4DComplete 1) reconstructs high-resolution volumetric shape and motion field from a partial observation, 2) learns an entangled 4D feature representation that benefits both shape and motion estimation, 3) yields more accurate and natural deformation than classic non-rigid priors such as As-RigidAs-Possible (ARAP) deformation, and 4) generalizes well to unseen objects in real-world sequences.



[LG] On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning

M A Vischer, R T Lange, H Sprekeler
[Technical University Berlin]

[CL] Scaling End-to-End Models for Large-Scale Multilingual ASR

B Li, R Pang, T N. Sainath, A Gulati, Y Zhang, J Qin, P Haghani, W. R Huang, M Ma
[Google LLC]

[CV] Learned Spatial Representations for Few-shot Talking-Head Synthesis

M Meshry, S Suri, L S. Davis, A Shrivastava
[University of Maryland]

[CV] SRDiff: Single Image Super-Resolution with Diffusion Probabilistic Models

H Li, Y Yang, M Chang, H Feng, Z Xu, Q Li, Y Chen
[Zhejiang University]