LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人 (*表示值得重点关注)

1、[CV] DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort

Y Zhang, H Ling, J Gao, K Yin, J Lafleche, A Barriuso, A Torralba, S Fidler

We introduce DatasetGAN: an automatic procedure to generate massive datasets of high-quality semantically segmented images requiring minimal human effort. Current deep networks are extremely data-hungry, benefiting from training on large-scale datasets, which are time consuming to annotate. Our method relies on the power of recent GANs to generate realistic images. We show how the GAN latent code can be decoded to produce a semantic segmentation of the image. Training the decoder only needs a few labeled examples to generalize to the rest of the latent space, resulting in an infinite annotated dataset generator! These generated datasets can then be used for training any computer vision architecture just as real datasets are. As only a few images need to be manually segmented, it becomes possible to annotate images in extreme detail and generate datasets with rich object and part segmentations. To showcase the power of our approach, we generated datasets for 7 image segmentation tasks which include pixel-level labels for 34 human face parts, and 32 car parts. Our approach outperforms all semi-supervised baselines significantly and is on par with fully supervised methods, which in some cases require as much as 100x more annotated data as our method.

爱可可AI前沿推介(4.16) - 图1
爱可可AI前沿推介(4.16) - 图2
爱可可AI前沿推介(4.16) - 图3爱可可AI前沿推介(4.16) - 图4爱可可AI前沿推介(4.16) - 图5

2、[CL] Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little

K Sinha, R Jia, D Hupkes, J Pineau, A Williams, D Kiela
[Facebook AI Research]

A possible explanation for the impressive performance of masked language model (MLM) pre-training is that such models have learned to represent the syntactic structures prevalent in classical NLP pipelines. In this paper, we propose a different explanation: MLMs succeed on downstream tasks almost entirely due to their ability to model higher-order word co-occurrence statistics. To demonstrate this, we pre-train MLMs on sentences with randomly shuffled word order, and show that these models still achieve high accuracy after fine-tuning on many downstream tasks — including on tasks specifically designed to be challenging for models that ignore word order. Our models perform surprisingly well according to some parametric syntactic probes, indicating possible deficiencies in how we test representations for syntactic information. Overall, our results show that purely distributional information largely explains the success of pre-training, and underscore the importance of curating challenging evaluation datasets that require deeper linguistic knowledge.

爱可可AI前沿推介(4.16) - 图6
爱可可AI前沿推介(4.16) - 图7爱可可AI前沿推介(4.16) - 图8爱可可AI前沿推介(4.16) - 图9爱可可AI前沿推介(4.16) - 图10

3、[CV] Aligning Latent and Image Spaces to Connect the Unconnectable

I Skorokhodov, G Sotnikov, M Elhoseiny

In this work, we develop a method to generate infinite high-resolution images with diverse and complex content. It is based on a perfectly equivariant generator with synchronous interpolations in the image and latent spaces. Latent codes, when sampled, are positioned on the coordinate grid, and each pixel is computed from an interpolation of the nearby style codes. We modify the AdaIN mechanism to work in such a setup and train the generator in an adversarial setting to produce images positioned between any two latent vectors. At test time, this allows for generating complex and diverse infinite images and connecting any two unrelated scenes into a single arbitrarily large panorama. Apart from that, we introduce LHQ: a new dataset of \lhqsize high-resolution nature landscapes. We test the approach on LHQ, LSUN Tower and LSUN Bridge and outperform the baselines by at least 4 times in terms of quality and diversity of the produced infinite images. The project page is located atthis https URL.

爱可可AI前沿推介(4.16) - 图11
爱可可AI前沿推介(4.16) - 图12爱可可AI前沿推介(4.16) - 图13爱可可AI前沿推介(4.16) - 图14爱可可AI前沿推介(4.16) - 图15

4、[CV] Few-shot Image Generation via Cross-domain Correspondence

U Ojha, Y Li, J Lu, A A. Efros, Y J Lee, E Shechtman, R Zhang
[Adobe Research & UC Davis]

Training generative models, such as GANs, on a target domain containing limited examples (e.g., 10) can easily result in overfitting. In this work, we seek to utilize a large source domain for pretraining and transfer the diversity information from source to target. We propose to preserve the relative similarities and differences between instances in the source via a novel cross-domain distance consistency loss. To further reduce overfitting, we present an anchor-based strategy to encourage different levels of realism over different regions in the latent space. With extensive results in both photorealistic and non-photorealistic domains, we demonstrate qualitatively and quantitatively that our few-shot model automatically discovers correspondences between source and target domains and generates more diverse and realistic images than previous methods.

爱可可AI前沿推介(4.16) - 图16
爱可可AI前沿推介(4.16) - 图17爱可可AI前沿推介(4.16) - 图18
爱可可AI前沿推介(4.16) - 图19爱可可AI前沿推介(4.16) - 图20

5、[CL] Large-Scale Self- and Semi-Supervised Learning for Speech Translation

C Wang, A Wu, J Pino, A Baevski, M Auli, A Conneau
[Facebook AI]
面向语音翻译的大规模自监督和半监督学习。通过以不同的互补方式有效利用大量未标记的语音和文本数据来改进语音翻译(ST)。通过用大型Libri-Light语音音频语料库和CommonCrawl的语言建模,探索预训练和自训练。通过利用wav2vec 2.0的预训练和自训练,推动了语音翻译的自监督和半监督学习的极限。这些技术可以在不使用CoVoST 2数据以外的任何类型监督的情况下,在四个语言方向上以平均1.3 BLEU的成绩超越之前的技术水平。证明了无监督预训练、自训练和语言模型解码的互补性,比之前方法高出2.6 BLEU。为语音翻译提供了更强、更简单的基线,并证明了wav2vec 2.0无监督预训练对语音翻译的有效性。

In this paper, we improve speech translation (ST) through effectively leveraging large quantities of unlabeled speech and text data in different and complementary ways. We explore both pretraining and self-training by using the large Libri-Light speech audio corpus and language modeling with CommonCrawl. Our experiments improve over the previous state of the art by 2.6 BLEU on average on all four considered CoVoST 2 language pairs via a simple recipe of combining wav2vec 2.0 pretraining, a single iteration of self-training and decoding with a language model. Different to existing work, our approach does not leverage any other supervision than ST data. Code and models will be publicly released.

爱可可AI前沿推介(4.16) - 图21
爱可可AI前沿推介(4.16) - 图22爱可可AI前沿推介(4.16) - 图23


[AS] Non-autoregressive sequence-to-sequence voice conversion

T Hayashi, W Huang, K Kobayashi, T Toda
[TARVO Inc & Nagoya University]
爱可可AI前沿推介(4.16) - 图24
爱可可AI前沿推介(4.16) - 图25爱可可AI前沿推介(4.16) - 图26爱可可AI前沿推介(4.16) - 图27

[CL] Modeling Framing in Immigration Discourse on Social Media

J Mendelsohn, C Budak, D Jurgens
[University of Michigan]
爱可可AI前沿推介(4.16) - 图28
爱可可AI前沿推介(4.16) - 图29爱可可AI前沿推介(4.16) - 图30爱可可AI前沿推介(4.16) - 图31

[CV] Revisiting Hierarchical Approach for Persistent Long-Term Video Prediction

W Lee, W Jung, H Zhang, T Chen, J Y Koh, T Huang, H Yoon, H Lee, S Hong
[KAIST & Google Research & University of Michigan]
爱可可AI前沿推介(4.16) - 图32
爱可可AI前沿推介(4.16) - 图33爱可可AI前沿推介(4.16) - 图34
爱可可AI前沿推介(4.16) - 图35

[AI] Towards a framework for evaluating the safety, acceptability and efficacy of AI systems for health: an initial synthesis

J Morley, C Morton, K Karpathakis, M Taddeo, L Floridi
[University of Oxford]
爱可可AI前沿推介(4.16) - 图36
爱可可AI前沿推介(4.16) - 图37爱可可AI前沿推介(4.16) - 图38