归档 - 爱可可AI前沿推介(3.28) - 《爱可可老师分享》

1、[LG] Minimum-Distortion Embedding
2、[CV] Measuring and modeling the motor system with machine learning
3、[LG] Learning Neural Event Functions for Ordinary Differential Equations
4、[CV] Multimodal Motion Prediction with Stacked Transformers
5、[CV] Efficient Visual Pretraining with Contrastive Detection
[CV] Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark
[CV] Meta-DETR: Few-Shot Object Detection via Unified Image-Level Meta-Learning
[LG] How to decay your learning rate
[CL] Autoregressive Entity Retrieval

LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人 (*表示值得重点关注)

1、[LG] Minimum-Distortion Embedding

A Agrawal, A Ali, S Boyd
[Stanford University]
最小失真嵌入。研究向量嵌入问题，针对一组有限项，目标是给每个项分配一个代表性的向量，可能存在一些约束条件(如向量集合是标准化的，具有零均值和单位协方差)，选择一个总失真最小的嵌入，称之为最小失真嵌入(MDE)问题。MDE框架简单但通用，包括各种具体的嵌入方法，包括谱嵌入、主成分分析、多维缩放、欧氏距离问题、降维方法(如Isomap和UMAP)、半监督学习、球装问题、强制定向布局等，还包括新的嵌入，并提供了验证或合理性检查的老的和新的嵌入原则性方法。在少数特殊情况下，MDE问题可得到精确的解决，对于其他情况，开发了一种近似于最小化失真的准牛顿方法，并可扩展到非常大的数据集，同时对失真函数和约束条件作了少量假设。 We consider the vector embedding problem. We are given a finite set of items, with the goal of assigning a representative vector to each one, possibly under some constraints (such as the collection of vectors being standardized, i.e., have zero mean and unit covariance). We are given data indicating that some pairs of items are similar, and optionally, some other pairs are dissimilar. For pairs of similar items, we want the corresponding vectors to be near each other, and for dissimilar pairs, we want the corresponding vectors to not be near each other, measured in Euclidean distance. We formalize this by introducing distortion functions, defined for some pairs of the items. Our goal is to choose an embedding that minimizes the total distortion, subject to the constraints. We call this the minimum-distortion embedding (MDE) problem.The MDE framework is simple but general. It includes a wide variety of embedding methods, such as spectral embedding, principal component analysis, multidimensional scaling, dimensionality reduction methods (like Isomap and UMAP), force-directed layout, and others. It also includes new embeddings, and provides principled ways of validating historical and new embeddings alike.We develop a projected quasi-Newton method that approximately solves MDE problems and scales to large data sets. We implement this method in PyMDE, an open-source Python package. In PyMDE, users can select from a library of distortion functions and constraints or specify custom ones, making it easy to rapidly experiment with different embeddings. Our software scales to data sets with millions of items and tens of millions of distortion functions. To demonstrate our method, we compute embeddings for several real-world data sets, including images, an academic co-author network, US county demographic data, and single-cell mRNA transcriptomes. https://weibo.com/1402400261/K8dFkeOFB

2、[CV] Measuring and modeling the motor system with machine learning

S B. Hausmann, A M Vargas, A Mathis, M W. Mathis
[Swiss Federal Institute of Technology]
机器学习运动系统度量建模综述。机器学习在理解运动系统方面的实用性有望在如何收集、测量和分析数据方面带来一场革命。运动科学领域已经优雅地结合了理论和工程原理来指导实验工作，在这篇综述中，讨论了机器学习日益增长的用途：从姿态估计、运动学分析、降维和闭环反馈，到它在理解神经关联和解开传感器运动系统中的用途，给出了对新途径的看法，在这些新途径中，无标注运动捕捉、生物力学建模和神经网络相结合，可成为假设驱动研究的新平台。 The utility of machine learning in understanding the motor system is promising a revolution in how to collect, measure, and analyze data. The field of movement science already elegantly incorporates theory and engineering principles to guide experimental work, and in this review we discuss the growing use of machine learning: from pose estimation, kinematic analyses, dimensionality reduction, and closed-loop feedback, to its use in understanding neural correlates and untangling sensorimotor systems. We also give our perspective on new avenues where markerless motion capture combined with biomechanical modeling and neural networks could be a new platform for hypothesis-driven research. https://weibo.com/1402400261/K8dMEq7g5

f01.png.jpg f02.png.jpg f03.png.jpg

3、[LG] Learning Neural Event Functions for Ordinary Differential Equations

R T. Q. Chen, B Amos, M Nickel
[University of Toronto & Facebook AI Research]
常微分方程神经事件函数学习。在求解常微分方程(ODE)的背景下，用神经网络对事件函数进行参数化，这些事件函数可以链在一起，并通过链来进行区分，将神经ODE扩展到隐式定义的终止时间，能对连续时间系统中的离散事件进行建模——例如物理系统中碰撞的标准和影响——以及基于仿真的时间点过程训练，并应用于离散控制。神经事件ODE能对连续时间系统中的离散和瞬时变化进行建模，而不需要事先知道这些变化应该在什么时候发生，或者应该存在多少个这样的变化。可以通过利用只存在于连续时间环境中的梯度来训练具有离散动作的确定性策略。 The existing Neural ODE formulation relies on an explicit knowledge of the termination time. We extend Neural ODEs to implicitly defined termination criteria modeled by neural event functions, which can be chained together and differentiated through. Neural Event ODEs are capable of modeling discrete (instantaneous) changes in a continuous-time system, without prior knowledge of when these changes should occur or how many such changes should exist. We test our approach in modeling hybrid discrete- and continuous- systems such as switching dynamical systems and collision in multi-body systems, and we propose simulation-based training of point processes with applications in discrete control. https://weibo.com/1402400261/K8dRyyaxa

f01.png.jpg f02.png.jpg f03.png.jpg

4、[CV] Multimodal Motion Prediction with Stacked Transformers

Y Liu, J Zhang, L Fang, Q Jiang, B Zhou
[The Chinese University of Hong Kong & SenseTime Research]
堆叠Transformer多模态运动预测。提出一种用于多模态运动预测的新Transformer框架mmTransformer，设计了基于堆叠Transformer的新网络架构，以一组固定独立候选提案在特征层面对多模态进行建模，开发了基于区域的训练策略以鼓励生成候选提案的多模性。在Argoverse数据集上的实验表明，所提出的模型在运动预测上达到了最先进的性能，大幅提高了预测轨迹的多样性和准确性。 Predicting multiple plausible future trajectories of the nearby vehicles is crucial for the safety of autonomous driving. Recent motion prediction approaches attempt to achieve such multimodal motion prediction by implicitly regularizing the feature or explicitly generating multiple candidate proposals. However, it remains challenging since the latent features may concentrate on the most frequent mode of the data while the proposal-based methods depend largely on the prior knowledge to generate and select the proposals. In this work, we propose a novel transformer framework for multimodal motion prediction, termed as mmTransformer. A novel network architecture based on stacked transformers is designed to model the multimodality at feature level with a set of fixed independent proposals. A region-based training strategy is then developed to induce the multimodality of the generated proposals. Experiments on Argoverse dataset show that the proposed model achieves the state-of-the-art performance on motion prediction, substantially improving the diversity and the accuracy of the predicted trajectories. Demo video and code are available at > this https URL. https://weibo.com/1402400261/K8dVQ7Fqa

f02.png.jpg f03.png.jpg f04.png.jpg f05.png.jpg

5、[CV] Efficient Visual Pretraining with Contrastive Detection

O J. Hénaff, S Koppula, J Alayrac, A v d Oord, O Vinyals, J Carreira
[DeepMind]
对比检测高效视觉预训练方法。提出DetCon，现有自监督学习算法(如SimCLR和BYOL)的一个简单而强大的变体。制定了新的对比性目标，可最大化场景中所有目标的相似度，其中目标区域是由一个简单的、无监督的启发式方法提供的，该目标减轻了自监督迁移学习的计算负担，通过利用低级线索将图像组织成目标和背景区域等实体，DetCon将大型数据集上的预训练效率提高了5倍，同时也提高了下游任务中学习的表征的准确性。其最佳模型在ImageNet上预训练的自监督方法中，达到了最先进性能，与最近的最先进方法在更大的数据集上训练一个更大的模型的性能相似。 Self-supervised pretraining has been shown to yield powerful representations for transfer learning. These performance gains come at a large computational cost however, with state-of-the-art methods requiring an order of magnitude more computation than supervised pretraining. We tackle this computational bottleneck by introducing a new self-supervised objective, contrastive detection, which tasks representations with identifying object-level features across augmentations. This objective extracts a rich learning signal per image, leading to state-of-the-art transfer performance from ImageNet to COCO, while requiring up to 5x less pretraining. In particular, our strongest ImageNet-pretrained model performs on par with SEER, one of the largest self-supervised systems to date, which uses 1000x more pretraining data. Finally, our objective seamlessly handles pretraining on more complex images such as those in COCO, closing the gap with supervised transfer learning from COCO to PASCAL. https://weibo.com/1402400261/K8dZ4telK

f02.png.jpg f03.png.jpg

另外几篇值得关注的论文：

[CV] Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark

Sewer-ML：多标签下水道缺陷分类数据集和基准
J B Haurum, T B. Moeslund
[Aalborg University]
https://weibo.com/1402400261/K8e2Qdp4u

f03.png.jpg f05.png.jpg f16.png.jpg