Improving Convolutional Networks with Self-Calibrated Convolutions

Rethinking Depthwise Separable Convolutions: How Intra-Kernel Correlations Lead to Improved MobileNets

Compositional Convolutional Neural Networks: A Deep Architecture with Innate Robustness to Partial Occlusion

Spatially Attentive Output Layer for Image Classification

AugFPN: Improving Multi-scale Feature Learning for Object Detection

Noise-Aware Fully Webly Supervised Object Detection

Learning a Unified Sample Weighting Network for Object Detection

D2Det: Towards High Quality Object Detection and Instance Segmentation

Dynamic Refinement Network for Oriented and Densely Packed Object Detection

Scale-Equalizing Pyramid Convolution for Object Detection
Revisiting the Sibling Head in Object Detector

Detection in Crowded Scenes: One Proposal, Multiple Predictions

Instance-aware, Context-focused, and Memory-efficient Weakly Supervised Object Detection

Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection

BiDet: An Efficient Binarized Object Detector

Harmonizing Transferability and Discriminability for Adapting Object Detectors

CentripetalNet: Pursuing High-quality Keypoint Pairs for Object Detection

Hit-Detector: Hierarchical Trinity Architecture Search for Object Detection

EfficientDet: Scalable and Efficient Object Detection

Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud Object Detection

What You See is What You Get: Exploiting Visibility for 3D Object Detection

Learning Depth-Guided Convolutions for Monocular 3D Object Detection

Structure Aware Single-stage 3D Object Detection from Point Cloud

IDA-3D: Instance-Depth-Aware 3D Object Detection from Stereo Vision for Autonomous Driving

Train in Germany, Test in The USA: Making 3D Object Detectors Generalize

MLCVNet: Multi-Level Context VoteNet for 3D Object Detection

3DSSD: Point-based 3D Single Stage Object Detector

Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity Estimation

End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection

DSGN: Deep Stereo Geometry Network for 3D Object Detection

LiDAR-based Online 3D Video Object Detection with Graph-based Message Passing and Spatiotemporal Transformer Attention

PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection

Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud

D3S — A Discriminative Single Shot Segmentation Tracker

ROAM: Recurrently Optimizing Tracking Model

Siam R-CNN: Visual Tracking by Re-Detection

Cooling-Shrinking Attack: Blinding the Tracker with Imperceptible Noises

High-Performance Long-Term Tracking with Meta-Updater

AutoTrack: Towards High-Performance Visual Tracking for UAV with Automatic Spatio-Temporal Regularization

Probabilistic Regression for Visual Tracking

MAST: A Memory-Augmented Self-supervised Tracker

Siamese Box Adaptive Network for Visual Tracking

Super-BPD: Super Boundary-to-Pixel Direction for Fast Image Segmentation

Single-Stage Semantic Segmentation from Image Labels

Learning Texture Invariant Representation for Domain Adaptation of Semantic Segmentation

MSeg: A Composite Dataset for Multi-domain Semantic Segmentation

CascadePSP: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement

Unsupervised Intra-domain Adaptation for Semantic Segmentation through Self-Supervision

Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

Temporally Distributed Networks for Fast Video Segmentation

Context Prior for Scene Segmentation

Strip Pooling: Rethinking Spatial Pooling for Scene Parsing

Cars Can’t Fly up in the Sky: Improving Urban-Scene Segmentation via Height-driven Attention Networks

Learning Dynamic Routing for Semantic Segmentation

PolarMask: Single Shot Instance Segmentation with Polar Representation

CenterMask : Real-Time Anchor-Free Instance Segmentation

BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation

Deep Snake for Real-Time Instance Segmentation

Mask Encoding for Single Shot Instance Segmentation

Pixel Consensus Voting for Panoptic Segmentation

BANet: Bidirectional Aggregation Network with Occlusion Handling for Panoptic Segmentation


A Transductive Approach for Video Object Segmentation

State-Aware Tracker for Real-Time Video Object Segmentation

Learning Fast and Robust Target Models for Video Object Segmentation

Learning Video Object Segmentation from Unlabeled Videos

Densely Connected Search Space for More Flexible Neural Architecture Search

MTL-NAS: Task-Agnostic Neural Architecture Search towards General-Purpose Multi-Task Learning

FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions

Neural Architecture Search for Lightweight Non-Local Networks

Rethinking Performance Estimation in Neural Architecture Search

CARS: Continuous Evolution for Efficient Neural Architecture Search

Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation

Distribution-induced Bidirectional Generative Adversarial Network for Graph Representation Learning

PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer

Semantically Mutil-modal Image Synthesis

Unpaired Portrait Drawing Generation via Asymmetric Cycle Mapping

Learning to Cartoonize Using White-box Cartoon Representations

GAN Compression: Efficient Architectures for Interactive Conditional GANs

Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral Distributions

COCAS: A Large-Scale Clothes Changing Person Dataset for Re-identification

Transferable, Controllable, and Inconspicuous Adversarial Attacks on Person Re-identification With Deep Mis-Ranking

Pose-guided Visible Part Matching for Occluded Person ReID

Weakly supervised discriminative feature learning with state information for person identification

Global-Local Bidirectional Reasoning for Unsupervised Representation Learning of 3D Point Clouds

Grid-GCN for Fast and Scalable Point Cloud Learning

FPConv: Learning Local Flattening for Point Convolution

Weakly Supervised Semantic Point Cloud Segmentation:Towards 10X Fewer Labels

PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation

Learning to Segment 3D Point Clouds in 2D Image Space

D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features

RPM-Net: Robust Point Matching using Learned Features

Learning Meta Face Recognition in Unseen Domains

FaceScape: a Large-scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction

HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation

The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation

Distribution-Aware Coordinate Representation for Human Pose Estimation

Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A Geometric Approach

Bodies at Rest: 3D Human Pose and Shape Estimation from a Pressure Image using Synthetic Data

Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image Synthesis

Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation

VIBE: Video Inference for Human Body Pose and Shape Estimation

Back to the Future: Joint Aware Temporal Deep Learning 3D Human Pose Estimation

Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS

ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection

UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World

ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network

Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition

Learning Texture Transformer Network for Image Super-Resolution

Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining

Structure-Preserving Super Resolution with Gradient Guidance

Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Analysis and a New Strategy


TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution

Space-Time-Aware Multi-Resolution Video Enhancement

Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution

Forward and Backward Information Retention for Accurate Binary Neural Networks

Towards Efficient Model Compression via Learned Global Ranking

HRank: Filter Pruning using High-Rank Feature Map

GAN Compression: Efficient Architectures for Interactive Conditional GANs

Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression

PREDICT & CLUSTER: Unsupervised Skeleton Based Action Recognition

Intra- and Inter-Action Understanding via Temporal Action Parsing

3DV: 3D Dynamic Voxel for Action Recognition in Depth Video

FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding

TEA: Temporal Excitation and Aggregation for Action Recognition

X3D: Expanding Architectures for Efficient Video Recognition

Temporal Pyramid Network for Action Recognition

Focus on defocus: bridging the synthetic to real domain gap for depth estimation

Bi3D: Stereo Depth Estimation via Binary Classifications

AANet: Adaptive Aggregation Network for Efficient Stereo Matching

Towards Better Generalization: Joint Depth-Pose Learning without PoseNet

3D Packing for Self-Supervised Monocular Depth Estimation

Domain Decluttering: Simplifying Images to Mitigate Synthetic-Real Domain Shift and Improve Depth Estimation

MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion

EPOS: Estimating 6D Pose of Objects with Symmetries
G2L-Net: Global to Local Network for Real-time 6D Pose Estimation with Embedding Vector Features

Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data

UC-Net: Uncertainty Inspired RGB-D Saliency Detection via Conditional Variational Autoencoders

CycleISP: Real Image Restoration via Improved Data Synthesis

Detail-recovery Image Deraining via Context Aggregation Networks

Multi-Scale Boosted Dehazing Network with Dense Feature Fusion

FeatureFlow: Robust Video Interpolation via Structure-to-Texture Generation

Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution

Space-Time-Aware Multi-Resolution Video Enhancement

Scene-Adaptive Video Frame Interpolation via Meta-Learning

Softmax Splatting for Video Frame Interpolation

Collaborative Distillation for Ultra-Resolution Universal Style Transfer

Detailed 2D-3D Joint Representation for Human-Object Interaction

Cascaded Human-Object Interaction Recognition

VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions

Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction

MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird’s Eye View Maps

Towards Large yet Imperceptible Adversarial Image Perturbations with Perceptual Color Distance

Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization

Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion

STEFANN: Scene Text Editor using Font Adaptive Neural Network

Interactive Object Segmentation with Inside-Outside Guidance

Video Panoptic Segmentation

FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation

3D-ZeF: A 3D Zebrafish Tracking Benchmark Dataset

TailorNet: Predicting Clothing in 3D as a Function of Human Pose, Shape and Garment Style

Oops! Predicting Unintentional Action in Video

The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction

Open Compound Domain Adaptation

Intra- and Inter-Action Understanding via Temporal Action Parsing

Dynamic Refinement Network for Oriented and Densely Packed Object Detection

COCAS: A Large-Scale Clothes Changing Person Dataset for Re-identification

KeypointNet: A Large-scale 3D Keypoint Dataset Aggregated from Numerous Human Annotations

MSeg: A Composite Dataset for Multi-domain Semantic Segmentation

AvatarMe: Realistically Renderable 3D Facial Reconstruction “in-the-wild”

Learning to Autofocus

FaceScape: a Large-scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction

Bodies at Rest: 3D Human Pose and Shape Estimation from a Pressure Image using Synthetic Data

FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding

A Local-to-Global Approach to Multi-modal Movie Scene Segmentation

Deep Homography Estimation for Dynamic Scenes

Assessing Image Quality Issues for Real-World Problems

UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World

PANDA: A Gigapixel-level Human-centric Video Dataset

IntrA: 3D Intracranial Aneurysm Dataset for Deep Learning

Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS

Learning to Learn Single Domain Generalization

Open Compound Domain Adaptation

Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision

QEBA: Query-Efficient Boundary-Based Blackbox Attack

Equalization Loss for Long-Tailed Object Recognition

Instance-aware Image Colorization

Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting

Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching

Epipolar Transformers

Bringing Old Photos Back to Life

MaskFlownet: Asymmetric Feature Matching with Learnable Occlusion Mask

Self-Supervised Viewpoint Learning from Image Collections

Towards Discriminability and Diversity: Batch Nuclear-norm Maximization under Label Insufficient Situations

Towards Learning Structure via Consensus for Face Segmentation and Parsing

Plug-and-Play Algorithms for Large-scale Snapshot Compressive Imaging

Lightweight Photometric Stereo for Facial Details Recovery

Footprints and Free Space from a Single Color Image

Self-Supervised Monocular Scene Flow Estimation

Quasi-Newton Solver for Robust Non-Rigid Registration

A Local-to-Global Approach to Multi-modal Movie Scene Segmentation

DeepFLASH: An Efficient Network for Learning-based Medical Image Registration

Self-Supervised Scene De-occlusion

Polarized Reflection Removal with Perfect Alignment in the Wild

Background Matting: The World is Your Green Screen

What Deep CNNs Benefit from Global Covariance Pooling: An Optimization Perspective

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Video Object Grounding using Semantic Roles in Language Description

Dynamic Hierarchical Mimicking Towards Consistent Optimization Objectives

SDFDiff: Differentiable Rendering of Signed Distance Fields for 3D Shape Optimization

On Translation Invariance in CNNs: Convolutional Layers can Exploit Absolute Spatial Location

GhostNet: More Features from Cheap Operations

AdderNet: Do We Really Need Multiplications in Deep Learning?

Deep Image Harmonization via Domain Verification

Blurry Video Frame Interpolation

Extremely Dense Point Correspondences using a Learned Feature Descriptor

Filter Grafting for Deep Neural Networks

Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation

Detecting Attended Visual Targets in Video

Deep Image Spatial Transformation for Person Image Generation

Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications


