LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人

1、[CV] Adaptive adversarial neural networks for the analysis of lossy and domain-shifted datasets of medical images

M K Kanakasabapathy, P Thirumalaraju, H Kandula…
[Harvard Medical School]

In machine learning for image-based medical diagnostics, supervised convolutional neural networks are typically trained with large and expertly annotated datasets obtained using high-resolution imaging systems. Moreover, the network’s performance can degrade substantially when applied to a dataset with a different distribution. Here, we show that adversarial learning can be used to develop high-performing networks trained on unannotated medical images of varying image quality. Specifically, we used low-quality images acquired using inexpensive portable optical systems to train networks for the evaluation of human embryos, the quantification of human sperm morphology and the diagnosis of malarial infections in the blood, and show that the networks performed well across different data distributions. We also show that adversarial learning can be used with unlabelled data from unseen domain-shifted datasets to adapt pretrained supervised networks to new distributions, even when data from the original distribution are not available. Adaptive adversarial networks may expand the use of validated neural-network models for the evaluation of data collected from multiple imaging systems of varying quality without compromising the knowledge stored in the network. Adversarial learning can be used to develop high-performing networks trained on unannotated medical images of varying image quality, and to adapt pretrained supervised networks to new domain-shifted datasets.

爱可可AI前沿推介(6.13) - 图2爱可可AI前沿推介(6.13) - 图3爱可可AI前沿推介(6.13) - 图4爱可可AI前沿推介(6.13) - 图5

2、[LG] Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances

B Şimşek, F Ged, A Jacot, F Spadaro, C Hongler, W Gerstner, J Brea
[Ecole Polytechnique Federale de Lausanne]

We study how permutation symmetries in overparameterized multi-layer neural networks generate `symmetry-induced’ critical points. Assuming a network with L layers of minimal widths r∗1,…,r∗L−1 reaches a zero-loss minimum at r∗1!⋯r∗L−1! isolated points that are permutations of one another, we show that adding one extra neuron to each layer is sufficient to connect all these previously discrete minima into a single manifold. For a two-layer overparameterized network of width r∗+h=:m we explicitly describe the manifold of global minima: it consists of T(r∗,m) affine subspaces of dimension at least h that are connected to one another. For a network of width m, we identify the number G(r,m) of affine subspaces containing only symmetry-induced critical points that are related to the critical points of a smaller network of width r<r∗. Via a combinatorial analysis, we derive closed-form formulas for T and G and show that the number of symmetry-induced critical subspaces dominates the number of affine subspaces forming the global minima manifold in the mildly overparameterized regime (small h) and vice versa in the vastly overparameterized regime (h≫r∗). Our results provide new insights into the minimization of the non-convex loss function of overparameterized neural networks.


3、[LG] Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss

J Z. HaoChen, C Wei, A Gaidon, T Ma
[Stanford University & Toyota Research Institute]

Recent works in self-supervised learning have advanced the state-of-the-art by relying on the contrastive learning paradigm, which learns representations by pushing positive pairs, or similar examples from the same class, closer together while keeping negative pairs far apart. Despite the empirical successes, theoretical foundations are limited – prior analyses assume conditional independence of the positive pairs given the same class label, but recent empirical applications use heavily correlated positive pairs (i.e., data augmentations of the same image). Our work analyzes contrastive learning without assuming conditional independence of positive pairs using a novel concept of the augmentation graph on data. Edges in this graph connect augmentations of the same data, and ground-truth classes naturally form connected sub-graphs. We propose a loss that performs spectral decomposition on the population augmentation graph and can be succinctly written as a contrastive learning objective on neural net representations. Minimizing this objective leads to features with provable accuracy guarantees under linear probe evaluation. By standard generalization bounds, these accuracy guarantees also hold when minimizing the training contrastive loss. Empirically, the features learned by our objective can match or outperform several strong baselines on benchmark vision datasets. In all, this work provides the first provable analysis for contrastive learning where guarantees for linear probe evaluation can apply to realistic empirical settings.


4、[CV] Implicit-PDF: Non-Parametric Representation of Probability Distributions on the Rotation Manifold

K Murphy, C Esteves, V Jampani, S Ramalingam, A Makadia
[Google Research]
Implicit-PDF: 旋转流形概率分布的非参数化表示。单图像姿态估计是许多视觉和机器人任务中的一个基本问题,现有的深度学习方法由于没有完全建模和处理:i)预测的不确定性,以及ii)具有多个(有时是无限的)正确姿态的对称目标而受到影响。本文提出一种方法来估计SO(3)上的任意的、非参数的分布。其关键思想是隐式地表示分布,用一个神经网络来估计输入图像和候选姿态的概率。网格抽样或梯度上升可用来寻找最可能的姿态,但也可以评估任何姿态的概率,从而实现对对称性和不确定性的推理。这是代表流形上分布的最一般的方法,为了展示其丰富的表达能力,提出了一个具有挑战性的对称和近乎对称的物体的数据集。不需要对姿态的不确定性进行监督——该模型只对每个样本的单一姿态进行训练。尽管如此,隐式模型在处理复杂的三维姿态分布方面具有很强的表现力,同时在标准的非模糊环境中仍能获得准确的姿态估计,在Pascal3D+和ModelNet10-SO(3)基准上达到了最先进的性能。

Single image pose estimation is a fundamental problem in many vision and robotics tasks, and existing deep learning approaches suffer by not completely modeling and handling: i) uncertainty about the predictions, and ii) symmetric objects with multiple (sometimes infinite) correct poses. To this end, we introduce a method to estimate arbitrary, non-parametric distributions on SO(3). Our key idea is to represent the distributions implicitly, with a neural network that estimates the probability given the input image and a candidate pose. Grid sampling or gradient ascent can be used to find the most likely pose, but it is also possible to evaluate the probability at any pose, enabling reasoning about symmetries and uncertainty. This is the most general way of representing distributions on manifolds, and to showcase the rich expressive power, we introduce a dataset of challenging symmetric and nearly-symmetric objects. We require no supervision on pose uncertainty – the model trains only with a single pose per example. Nonetheless, our implicit model is highly expressive to handle complex distributions over 3D poses, while still obtaining accurate pose estimation on standard non-ambiguous environments, achieving state-of-the-art performance on Pascal3D+ and ModelNet10-SO(3) benchmarks. Code, data, and visualizations may be found at implicit-pdf.github.io.


5、[LG] XBNet : An Extremely Boosted Neural Network

T Sarkar
[KJ Somaiya College of Engineering]

Neural networks have proved to be very robust at processing unstructured data like images, text, videos, and audio. However, it has been observed that their performance is not up to the mark in tabular data; hence tree-based models are preferred in such scenarios. A popular model for tabular data is boosted trees, a highly efficacious and extensively used machine learning method, and it also provides good interpretability compared to neural networks. In this paper, we describe a novel architecture XBNet, which tries to combine tree-based models with that of neural networks to create a robust architecture trained by using a novel optimization technique, Boosted Gradient Descent for Tabular Data which increases its interpretability and performance.



[CV] Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

H K Cheng, Y Tai, C Tang
[University of Illinois Urbana-Champaign & Kuaishou Technology & The Hong Kong University of Science and Technology]

[LG] Consistency Regularization for Variational Auto-Encoders

S Sinha, A B. Dieng
[University of Toronto & Google Brain]

[CL] Ethical-Advice Taker: Do Language Models Understand Natural Language Interventions?

Ethical-Advice Taker:语言模型理解自然语言干预吗?
J Zhao, D Khashabi, T Khot, A Sabharwal, K Chang
[University of California, Los Angeles & Allen Institute for AI]

[LG] Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation

E Nikishin, R Abachi, R Agarwal, P Bacon
[Université de Montréal & University of Toronto]