归档 - 爱可可AI论文推介(11月10日) - 《爱可可老师分享》

LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人 (*表示值得重点关注)

1、[LG] *Underspecification Presents Challenges for Credibility in Modern Machine Learning
A D’Amour, K Heller, D Moldovan, B Adlam…
[Google]
现代机器学习中部分指定(Underspecification)带来的可信度挑战。所谓部分指定，是机器学习管道可以返回多种在训练域具有相同held-out性能水平预测器的现象，是机器学习模型实际部署时表现出意外行为的关键因素之一。这些预测器，在部署域行为可能会非常不同。这种模糊性，会导致在实践中出现不稳定和糟糕的模型表现，而这种失败模式，与之前发现的由训练域和部署域之间结构性不匹配引起的问题，有明显区别。用计算机视觉、医学成像、自然语言处理、基于电子病例的临床风险预测和医学基因组学的例子，说明了这个问题在各种实际机器学习管道中普遍存在。在建模管道时，需要明确考虑到在所有领域实际部署的部分指定问题。 ML models often exhibit unexpectedly poor behavior when they are deployed in real-world domains. We identify underspecification as a key reason for these failures. An ML pipeline is underspecified when it can return many predictors with equivalently strong held-out performance in the training domain. Underspecification is common in modern ML pipelines, such as those based on deep learning. Predictors returned by underspecified pipelines are often treated as equivalent based on their training domain performance, but we show here that such predictors can behave very differently in deployment domains. This ambiguity can lead to instability and poor model behavior in practice, and is a distinct failure mode from previously identified issues arising from structural mismatch between training and deployment domains. We show that this problem appears in a wide variety of practical ML pipelines, using examples from computer vision, medical imaging, natural language processing, clinical risk prediction based on electronic health records, and medical genomics. Our results show the need to explicitly account for underspecification in modeling pipelines that are intended for real-world deployment in any domain. https://weibo.com/1402400261/Jtcb8khSl

2、[CV] *Modular Primitives for High-Performance Differentiable Rendering
S Laine, J Hellsten, T Karras, Y Seol, J Lehtinen, T Aila
[Nvidia]
用模块化图元实现高性能可微渲染。提出一个模块化、可微的渲染器设计，利用现有高度优化的硬件图形管道，可渲染复杂3D场景的高分辨率图像，比之前的方法快几个数量级，同时支持一些关键特征，如梯度准确的过滤纹理映射。** We present a modular differentiable renderer design that yields performance superior to previous methods by leveraging existing, highly optimized hardware graphics pipelines. Our design supports all crucial operations in a modern graphics pipeline: rasterizing large numbers of triangles, attribute interpolation, filtered texture lookups, as well as user-programmable shading and geometry processing, all in high resolutions. Our modular primitives allow custom, high-performance graphics pipelines to be built directly within automatic differentiation frameworks such as PyTorch or TensorFlow. As a motivating application, we formulate facial performance capture as an inverse rendering problem and show that it can be solved efficiently using our tools. Our results indicate that this simple and straightforward approach achieves excellent geometric correspondence between rendered results and reference imagery. https://weibo.com/1402400261/JtcmCFzUT
f01.png.jpg f02.png.jpg f03.png.jpg f04.png.jpg

3、[CV] **Disentangling 3D Prototypical Networks For Few-Shot Concept Learning
M Prabhudesai, S Lal, D Patil, H Tung, A W Harley, K Fragkiadaki
[CMU]
3D原型网络解缠少样本概念学习。提出了D3DP-Nets，用静态场景的多视图RGB-D视频，通过端到端自监督训练，学习场景和对象的解缠3D表示，提炼出形状和样式的3D和1D原型。探索了D3DP-Nets在少样本3D物体检测和少样本概念分类中的应用。所生成的3D神经网络表示是可合成的，可以通过混合目标形状和样式、调整大小，在背景场景特征图添加目标3D特征图来生成新的3D场景特征图。该表示比二维表示和二维解纠缠表示的泛化性能更好，且训练数据更少。** We present neural architectures that disentangle RGB-D images into objects’ shapes and styles and a map of the background scene, and explore their applications for few-shot 3D object detection and few-shot concept classification. Our networks incorporate architectural biases that reflect the image formation process, 3D geometry of the world scene, and shape-style interplay. They are trained end-to-end self-supervised by predicting views in static scenes, alongside a small number of 3D object boxes. Objects and scenes are represented in terms of 3D feature grids in the bottleneck of the network. We show that the proposed 3D neural representations are compositional: they can generate novel 3D scene feature maps by mixing object shapes and styles, resizing and adding the resulting object 3D feature maps over background scene feature maps. We show that classifiers for object categories, color, materials, and spatial relationships trained over the disentangled 3D feature sub-spaces generalize better with dramatically fewer examples than the current state-of-the-art, and enable a visual question answering system that uses them as its modules to generalize one-shot to novel objects in the scene. https://weibo.com/1402400261/JtcqGslpV
f01.png.jpg f02.png.jpg f03.png.jpg

4、[LG] Complex Query Answering with Neural Link Predictors
E Arakelyan, D Daza, P Minervini, M Cochez
[University College London & Vrije Universiteit Amsterdam]
基于神经网络链接预测的复杂查询解答。提出了复杂查询分解(CQD)框架，通过在嵌入空间实体集上的推理，来回答相对复杂的逻辑查询——回答复杂查询，被简化为回答每个子查询，并通过t-norm聚合所得分数。只需要训练原子查询的神经网络链接预测模型，就可用框架回答给定的复杂查询，而不需要用大量生成的复杂查询进行训练。同时，无论查询的复杂性如何，还能对查询回答过程的每一步进行解释。所提出的方法与查询类型无关，能在不明确训练特定类型查询的情况下进行归纳。 Neural link predictors are immensely useful for identifying missing edges in large scale Knowledge Graphs. However, it is still not clear how to use these models for answering more complex queries that arise in a number of domains, such as queries using logical conjunctions, disjunctions, and existential quantifiers, while accounting for missing edges. In this work, we propose a framework for efficiently answering complex queries on incomplete Knowledge Graphs. We translate each query into an end-to-end differentiable objective, where the truth value of each atom is computed by a pre-trained neural link predictor. We then analyse two solutions to the optimisation problem, including gradient-based and combinatorial search. In our experiments, the proposed approach produces more accurate results than state-of-the-art methods — black-box neural models trained on millions of generated queries — without the need of training on a large and diverse set of complex queries. Using orders of magnitude less training data, we obtain relative improvements ranging from 8% up to 40% in Hits@3 across different knowledge graphs containing factual information. Finally, we demonstrate that it is possible to explain the outcome of our model in terms of the intermediate solutions identified for each of the complex query atoms. https://weibo.com/1402400261/Jtcx1dcT5
f01.png.jpg f02.png.jpg f03.png.jpg

5、[CV] Large-scale multilingual audio visual dubbing
Y Yang, B Shillingford, Y Assael, M Wang, W Liu, Y Chen, Y Zhang, E Sezener, L C. Cobo, M Denil, Y Aytar, N d Freitas
[DeepMind & Google]
大规模多语言视听配音。提出了一个大规模视听翻译和配音系统，可同时对目标视频的音频和视觉内容进行翻译处理，通过唇动合成，使说话者与翻译的声音相匹配，创造目标语言的无缝视听体验。 We describe a system for large-scale audiovisual translation and dubbing, which translates videos from one language to another. The source language’s speech content is transcribed to text, translated, and automatically synthesized into target language speech using the original speaker’s voice. The visual content is translated by synthesizing lip movements for the speaker to match the translated audio, creating a seamless audiovisual experience in the target language. The audio and visual translation subsystems each contain a large-scale generic synthesis model trained on thousands of hours of data in the corresponding domain. These generic models are fine-tuned to a specific speaker before translation, either using an auxiliary corpus of data from the target speaker, or using the video to be translated itself as the input to the fine-tuning process. This report gives an architectural overview of the full system, as well as an in-depth discussion of the video dubbing component. The role of the audio and text components in relation to the full system is outlined, but their design is not discussed in detail. Translated and dubbed demo videos generated using our system can be viewed at > this https URL https://weibo.com/1402400261/JtcDapj9y

其他几篇值得关注的论文：

[LG] The Value Equivalence Principle for Model-Based Reinforcement Learning
基于模型强化学习的价值等效原则
C Grimm, A Barreto, S Singh, D Silver
[University of Michigan & DeepMind]
https://weibo.com/1402400261/JtcJi8YFj
f01.png.jpg f02.png.jpg f06.png.jpg

[CV] “What’s This?” — Learning to Segment Unknown Objects from Manipulation Sequences
学习从(机器人)操作序列中分割未知目标
W Boerdijk, M Sundermeyer, M Durner, R Triebel
[German Aerospace Center & Technical University of Munich]
https://weibo.com/1402400261/JtcNH5yis
f01.png.jpg f02.png.jpg