LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人

1、[LG] Revisiting the Calibration of Modern Neural Networks

M Minderer, J Djolonga, R Romijnders, F Hubis, X Zhai, N Houlsby, D Tran, M Lucic
[Google Research]

Accurate estimation of predictive uncertainty (model calibration) is essential for the safe application of neural networks. Many instances of miscalibration in modern neural networks have been reported, suggesting a trend that newer, more accurate models produce poorly calibrated predictions. Here, we revisit this question for recent state-of-the-art image classification models. We systematically relate model calibration and accuracy, and find that the most recent models, notably those not using convolutions, are among the best calibrated. Trends observed in prior model generations, such as decay of calibration with distribution shift or model size, are less pronounced in recent architectures. We also show that model size and amount of pretraining do not fully explain these differences, suggesting that architecture is a major determinant of calibration properties.


2、[LG] Tree-Values: selective inference for regression trees

A C. Neufeld, L L. Gao, D M. Witten
[University of Washington & University of Waterloo]

We consider conducting inference on the output of the Classification and Regression Tree (CART) [Breiman et al., 1984] algorithm. A naive approach to inference that does not account for the fact that the tree was estimated from the data will not achieve standard guarantees, such as Type 1 error rate control and nominal coverage. Thus, we propose a selective inference framework for conducting inference on a fitted CART tree. In a nutshell, we condition on the fact that the tree was estimated from the data. We propose a test for the difference in the mean response between a pair of terminal nodes that controls the selective Type 1 error rate, and a confidence interval for the mean response within a single terminal node that attains the nominal selective coverage. Efficient algorithms for computing the necessary conditioning sets are provided. We apply these methods in simulation and to a dataset involving the association between portion control interventions and caloric intake.


3、[LG] On Learnability via Gradient Method for Two-Layer ReLU Neural Networks in Teacher-Student Setting

S Akiyama, T Suzuki
[University of Tokyo]

Deep learning empirically achieves high performance in many applications, but its training dynamics has not been fully understood theoretically. In this paper, we explore theoretical analysis on training two-layer ReLU neural networks in a teacher-student regression model, in which a student network learns an unknown teacher network through its outputs. We show that with a specific regularization and sufficient overparameterization, the student network can identify the parameters of the teacher network with high probability via gradient descent with a norm dependent stepsize even though the objective function is highly non-convex. The key theoretical tool is the measure representation of the neural networks and a novel application of a dual certificate argument for sparse estimation on a measure space. We analyze the global minima and global convergence property in the measure space.


4、[CL] Modeling Worlds in Text

P Ammanabrolu, M O. Riedl
[Georgia Institute of Technology]

We provide a dataset that enables the creation of learning agents that can build knowledge graph-based world models of interactive narratives.1 Interactive narratives—or text-adventure games—are partially observable environments structured as long puzzles or quests in which an agent perceives and interacts with the world purely through textual natural language. Each individual game typically contains hundreds of locations, characters, and objects—each with their own unique descriptions—providing an opportunity to study the problem of giving language-based agents the structured memory necessary to operate in such worlds. Our dataset provides 24198 mappings between rich natural language observations and: (1) knowledge graphs that reflect the world state in the form of a map; (2) natural language actions that are guaranteed to cause a change in that particular world state. The training data is collected across 27 games in multiple genres and contains a further 7836 heldout instances over 9 additional games in the test set. We further provide baseline models using rules-based, question-answering, and sequence learning approaches in addition to an analysis of the data and corresponding learning tasks.


5、[LG] Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization

K Ahuja, E Caballero, D Zhang, Y Bengio, I Mitliagkas, I Rish

The invariance principle from causality is at the heart of notable approaches such as invariant risk minimization (IRM) that seek to address out-of-distribution (OOD) generalization failures. Despite the promising theory, invariance principle-based approaches fail in common classification tasks, where invariant (causal) features capture all the information about the label. Are these failures due to the methods failing to capture the invariance? Or is the invariance principle itself insufficient? To answer these questions, we revisit the fundamental assumptions in linear regression tasks, where invariance-based approaches were shown to provably generalize OOD. In contrast to the linear regression tasks, we show that for linear classification tasks we need much stronger restrictions on the distribution shifts, or otherwise OOD generalization is impossible. Furthermore, even with appropriate restrictions on distribution shifts in place, we show that the invariance principle alone is insufficient. We prove that a form of the information bottleneck constraint along with invariance helps address key failures when invariant features capture all the information about the label and also retains the existing success when they do not. We propose an approach that incorporates both of these principles and demonstrate its effectiveness in several experiments.



[LG] A Neural Tangent Kernel Perspective of GANs

J Franceschi, E d Bézenac, I Ayed, M Chen, S Lamprier, P Gallinari
[Sorbonne Université & Valeo.ai]

[CV] SinIR: Efficient General Image Manipulation with Single Image Reconstruction

J Yoo, Q Chen
[Hong Kong University of Science and Technology]

[LG] Partial success in closing the gap between human and machine vision

R Geirhos, K Narayanappa, B Mitzkus, T Thieringer, M Bethge, F A. Wichmann, W Brendel
[University of Tübingen]

[LG] Training Graph Neural Networks with 1000 Layers

G Li, M Müller, B Ghanem, V Koltun
[Intel Labs]