Search Results for author: Stephen Lin

Found 65 papers, 36 papers with code

Towards Tokenized Human Dynamics Representation

1 code implementation22 Nov 2021 Kenneth Li, Xiao Sun, Zhirong Wu, Fangyun Wei, Stephen Lin

For human action understanding, a popular research direction is to analyze short video clips with unambiguous semantic content, such as jumping and drinking.

Action Segmentation Action Understanding +4

The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos

1 code implementation NeurIPS 2021 Runtao Liu, Zhirong Wu, Stella X. Yu, Stephen Lin

Our model starts with two separate pathways: an appearance pathway that outputs feature-based region segmentation for a single image, and a motion pathway that outputs motion features for a pair of images.

Contrastive Learning Fine-tuning +2

Bootstrap Your Object Detector via Mixed Training

1 code implementation NeurIPS 2021 Mengde Xu, Zheng Zhang, Fangyun Wei, Yutong Lin, Yue Cao, Stephen Lin, Han Hu, Xiang Bai

We introduce MixTraining, a new training paradigm for object detection that can improve the performance of existing detectors for free.

Data Augmentation Object Detection

Self-supervised Discovery of Human Actons from Long Kinematic Videos

no code implementations29 Sep 2021 Kenneth Li, Xiao Sun, Zhirong Wu, Fangyun Wei, Stephen Lin

However, methods for understanding short semantic actions cannot be directly translated to long kinematic sequences such as dancing, where it becomes challenging even to semantically label the human movements.

Action Understanding Tokenization

ACP++: Action Co-occurrence Priors for Human-Object Interaction Detection

1 code implementation9 Sep 2021 Dong-Jin Kim, Xiao Sun, Jinsoo Choi, Stephen Lin, In So Kweon

A common problem in the task of human-object interaction (HOI) detection is that numerous HOI classes have only a small number of labeled examples, resulting in training sets with a long-tailed distribution.

Human-Object Interaction Detection

Video Swin Transformer

5 code implementations24 Jun 2021 Ze Liu, Jia Ning, Yue Cao, Yixuan Wei, Zheng Zhang, Stephen Lin, Han Hu

The vision community is witnessing a modeling shift from CNNs to Transformers, where pure Transformer architectures have attained top accuracy on the major video recognition benchmarks.

 Ranked #1 on Action Recognition on Something-Something V2 (using extra training data)

Action Classification Action Recognition +3

Aligning Pretraining for Detection via Object-Level Contrastive Learning

1 code implementation NeurIPS 2021 Fangyun Wei, Yue Gao, Zhirong Wu, Han Hu, Stephen Lin

Image-level contrastive representation learning has proven to be highly effective as a generic model for transfer learning.

Contrastive Learning Object Detection +3

Neural Articulated Radiance Field

1 code implementation ICCV 2021 Atsuhiro Noguchi, Xiao Sun, Stephen Lin, Tatsuya Harada

We present Neural Articulated Radiance Field (NARF), a novel deformable 3D representation for articulated objects learned from images.

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

32 code implementations ICCV 2021 Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo

This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision.

Ranked #3 on Semantic Segmentation on FoodSeg103 (using extra training data)

Image Classification Instance Segmentation +2

Instance Localization for Self-supervised Detection Pretraining

1 code implementation CVPR 2021 Ceyuan Yang, Zhirong Wu, Bolei Zhou, Stephen Lin

The pretext task is to predict the instance category given the composited images as well as the foreground bounding boxes.

Classification General Classification +4

Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection Consistency

1 code implementation4 Feb 2021 Seokju Lee, Sunghoon Im, Stephen Lin, In So Kweon

We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision.

Instance Segmentation Monocular Depth Estimation +4

Global Context Networks

3 code implementations24 Dec 2020 Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, Han Hu

The Non-Local Network (NLNet) presents a pioneering approach for capturing long-range dependencies within an image, via aggregating query-specific global context to each query position.

Instance Segmentation

Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning

5 code implementations CVPR 2021 Zhenda Xie, Yutong Lin, Zheng Zhang, Yue Cao, Stephen Lin, Han Hu

We argue that the power of contrastive learning has yet to be fully unleashed, as current methods are trained only on instance-level pretext tasks, leading to representations that may be sub-optimal for downstream tasks requiring dense pixel predictions.

Contrastive Learning Object Detection +2

Object-based Illumination Estimation with Rendering-aware Neural Networks

no code implementations ECCV 2020 Xin Wei, Guojun Chen, Yue Dong, Stephen Lin, Xin Tong

With the estimated lighting, virtual objects can be rendered in AR scenarios with shading that is consistent to the real scene, leading to improved realism.

SRNet: Improving Generalization in 3D Human Pose Estimation with a Split-and-Recombine Approach

1 code implementation ECCV 2020 Ailing Zeng, Xiao Sun, Fuyang Huang, Minhao Liu, Qiang Xu, Stephen Lin

With the reduced dimensionality of less relevant body areas, the training set distribution within network branches more closely reflects the statistics of local poses instead of global body poses, without sacrificing information important for joint inference.

3D Human Pose Estimation

Detecting Human-Object Interactions with Action Co-occurrence Priors

1 code implementation17 Jul 2020 Dong-Jin Kim, Xiao Sun, Jinsoo Choi, Stephen Lin, In So Kweon

A common problem in human-object interaction (HOI) detection task is that numerous HOI classes have only a small number of labeled examples, resulting in training sets with a long-tailed distribution.

Human-Object Interaction Detection

RepPoints V2: Verification Meets Regression for Object Detection

1 code implementation NeurIPS 2020 Yihong Chen, Zheng Zhang, Yue Cao, Li-Wei Wang, Stephen Lin, Han Hu

Though RepPoints provides high performance, we find that its heavy reliance on regression for object localization leaves room for improvement.

Instance Segmentation Object Detection +2

Point-Set Anchors for Object Detection, Instance Segmentation and Pose Estimation

1 code implementation ECCV 2020 Fangyun Wei, Xiao Sun, Hongyang Li, Jingdong Wang, Stephen Lin

A recent approach for object detection and human pose estimation is to regress bounding boxes or human keypoints from a central point on the object or person.

Instance Segmentation Object Detection +2

Disentangled Non-Local Neural Networks

4 code implementations ECCV 2020 Minghao Yin, Zhuliang Yao, Yue Cao, Xiu Li, Zheng Zhang, Stephen Lin, Han Hu

This paper first studies the non-local block in depth, where we find that its attention computation can be split into two terms, a whitened pairwise term accounting for the relationship between two pixels and a unary term representing the saliency of every pixel.

Action Recognition Object Detection +1

Spatially Adaptive Inference with Stochastic Feature Sampling and Interpolation

1 code implementation ECCV 2020 Zhenda Xie, Zheng Zhang, Xizhou Zhu, Gao Huang, Stephen Lin

In the feature maps of CNNs, there commonly exists considerable spatial redundancy that leads to much repetitive processing.

Cross-Iteration Batch Normalization

2 code implementations CVPR 2021 Zhuliang Yao, Yue Cao, Shuxin Zheng, Gao Huang, Stephen Lin

We thus compensate for the network weight changes via a proposed technique based on Taylor polynomials, so that the statistics can be accurately estimated and batch normalization can be effectively applied.

Image Classification Object Detection

Dense RepPoints: Representing Visual Objects with Dense Point Sets

2 code implementations ECCV 2020 Ze Yang, Yinghao Xu, Han Xue, Zheng Zhang, Raquel Urtasun, Li-Wei Wang, Stephen Lin, Han Hu

We present a new object representation, called Dense RepPoints, that utilizes a large set of points to describe an object at multiple levels, including both box level and pixel level.

Object Detection

Instance-wise Depth and Motion Learning from Monocular Videos

1 code implementation19 Dec 2019 Seokju Lee, Sunghoon Im, Stephen Lin, In So Kweon

We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision.

Instance Segmentation Monocular Depth Estimation +3

Leveraging Multi-view Image Sets for Unsupervised Intrinsic Image Decomposition and Highlight Separation

no code implementations17 Nov 2019 Renjiao Yi, Ping Tan, Stephen Lin

We present an unsupervised approach for factorizing object appearance into highlight, shading, and albedo layers, trained by multi-view real images.

Intrinsic Image Decomposition

Single Image Reflection Removal through Cascaded Refinement

2 code implementations CVPR 2020 Chao Li, Yixiao Yang, Kun He, Stephen Lin, John E. Hopcroft

IBCLN is a cascaded network that iteratively refines the estimates of transmission and reflection layers in a manner that they can boost the prediction quality to each other, and information across steps of the cascade is transferred using an LSTM.

Community Detection Reflection Removal

Learning Residual Flow as Dynamic Motion from Stereo Videos

no code implementations16 Sep 2019 Seokju Lee, Sunghoon Im, Stephen Lin, In So Kweon

Based on rigid projective geometry, the estimated stereo depth is used to guide the camera motion estimation, and the depth and camera motion are used to guide the residual flow estimation.

Depth And Camera Motion Motion Estimation +4

DPSNet: End-to-end Deep Plane Sweep Stereo

1 code implementation ICLR 2019 Sunghoon Im, Hae-Gon Jeon, Stephen Lin, In So Kweon

The cost volume is constructed using a differentiable warping process that allows for end-to-end training of the network.

Optical Flow Estimation

GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond

9 code implementations25 Apr 2019 Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, Han Hu

In this paper, we take advantage of this finding to create a simplified network based on a query-independent formulation, which maintains the accuracy of NLNet but with significantly less computation.

Instance Segmentation Object Detection +1

An Empirical Study of Spatial Attention Mechanisms in Deep Networks

1 code implementation ICCV 2019 Xizhou Zhu, Dazhi Cheng, Zheng Zhang, Stephen Lin, Jifeng Dai

Attention mechanisms have become a popular component in deep neural networks, yet there has been little examination of how different influencing factors and methods for computing attention from these factors affect performance.

Mimicking the In-Camera Color Pipeline for Camera-Aware Object Compositing

no code implementations27 Mar 2019 Jun Gao, Xiao Li, Li-Wei Wang, Sanja Fidler, Stephen Lin

We present a method for compositing virtual objects into a photograph such that the object colors appear to have been processed by the photo's camera imaging pipeline.

Angle-Closure Detection in Anterior Segment OCT based on Multi-Level Deep Network

no code implementations10 Feb 2019 Huazhu Fu, Yanwu Xu, Stephen Lin, Damon Wing Kee Wong, Mani Baskaran, Meenakshi Mahesh, Tin Aung, Jiang Liu

A Multi-Level Deep Network (MLDN) is proposed to formulate this learning, which utilizes three particular AS-OCT regions based on clinical priors: the global anterior segment structure, local iris region, and anterior chamber angle (ACA) patch.

Deep Metric Transfer for Label Propagation with Limited Annotated Data

1 code implementation20 Dec 2018 Bin Liu, Zhirong Wu, Han Hu, Stephen Lin

In this paper, we propose a generic framework that utilizes unlabeled data to aid generalization for all three tasks.

Metric Learning Object Recognition +1

Deformable ConvNets v2: More Deformable, Better Results

18 code implementations CVPR 2019 Xizhou Zhu, Han Hu, Stephen Lin, Jifeng Dai

The superior performance of Deformable Convolutional Networks arises from its ability to adapt to the geometric variations of objects.

Instance Segmentation Object Detection +1

Explicit Spatiotemporal Joint Relation Learning for Tracking Human Pose

no code implementations17 Nov 2018 Xiao Sun, Chuankang Li, Stephen Lin

We present a method for human pose tracking that is based on learning spatiotemporal relationships among joints.

Optical Flow Estimation Pose Estimation +1

Recurrent Transformer Networks for Semantic Correspondence

1 code implementation NeurIPS 2018 Seungryong Kim, Stephen Lin, Sangryul Jeon, Dongbo Min, Kwanghoon Sohn

Our networks accomplish this through an iterative process of estimating spatial transformations between the input images and using these transformations to generate aligned convolutional activations.

General Classification Semantic correspondence

An Integral Pose Regression System for the ECCV2018 PoseTrack Challenge

1 code implementation17 Sep 2018 Xiao Sun, Chuankang Li, Stephen Lin

For the ECCV 2018 PoseTrack Challenge, we present a 3D human pose estimation system based mainly on the integral human pose regression method.

3D Human Pose Estimation

Multi-Context Deep Network for Angle-Closure Glaucoma Screening in Anterior Segment OCT

no code implementations10 Sep 2018 Huazhu Fu, Yanwu Xu, Stephen Lin, Damon Wing Kee Wong, Baskaran Mani, Meenakshi Mahesh, Tin Aung, Jiang Liu

A major cause of irreversible visual impairment is angle-closure glaucoma, which can be screened through imagery from Anterior Segment Optical Coherence Tomography (AS-OCT).

General Classification

A High-Quality Denoising Dataset for Smartphone Cameras

no code implementations CVPR 2018 Abdelrahman Abdelhamed, Stephen Lin, Michael S. Brown

We propose a systematic procedure for estimating ground truth for noisy images that can be used to benchmark denoising performance for smartphone cameras.

Image Denoising

Faces as Lighting Probes via Unsupervised Deep Highlight Extraction

no code implementations ECCV 2018 Renjiao Yi, Chenyang Zhu, Ping Tan, Stephen Lin

We present a method for estimating detailed scene illumination using human faces in a single image.

Exposure: A White-Box Photo Post-Processing Framework

1 code implementation27 Sep 2017 Yuanming Hu, Hao He, Chenxi Xu, Baoyuan Wang, Stephen Lin

Retouching can significantly elevate the visual appeal of photos, but many casual photographers lack the expertise to do this well.

DCTM: Discrete-Continuous Transformation Matching for Semantic Flow

no code implementations ICCV 2017 Seungryong Kim, Dongbo Min, Stephen Lin, Kwanghoon Sohn

In this way, our approach draws solutions from the continuous space of affine transformations in a manner that can be computed efficiently through constant-time edge-aware filtering and a proposed affine-varying CNN-based descriptor.

Affine Transformation Semantic correspondence

Radiometric Calibration From Faces in Images

no code implementations CVPR 2017 Chen Li, Stephen Lin, Kun Zhou, Katsushi Ikeuchi

We present a method for radiometric calibration of cameras from a single image that contains a human face.

Specular Highlight Removal in Facial Images

no code implementations CVPR 2017 Chen Li, Stephen Lin, Kun Zhou, Katsushi Ikeuchi

An important practical feature of the proposed method is that the skin color model is utilized in a way that does not require color calibration of the camera.

FC4: Fully Convolutional Color Constancy With Confidence-Weighted Pooling

1 code implementation CVPR 2017 Yuanming Hu, Baoyuan Wang, Stephen Lin

However, the patch-based CNNs that exist for this problem are faced with the issue of estimation ambiguity, where a patch may contain insufficient information to establish a unique or even a limited possible range of illumination colors.

Color Constancy

FCSS: Fully Convolutional Self-Similarity for Dense Semantic Correspondence

1 code implementation CVPR 2017 Seungryong Kim, Dongbo Min, Bumsub Ham, Sangryul Jeon, Stephen Lin, Kwanghoon Sohn

The sampling patterns of local structure and the self-similarity measure are jointly learned within the proposed network in an end-to-end and multi-scale manner.

Semantic correspondence

Image Deblurring Using Smartphone Inertial Sensors

no code implementations CVPR 2016 Zhe Hu, Lu Yuan, Stephen Lin, Ming-Hsuan Yang

Removing image blur caused by camera shake is an ill-posed problem, as both the latent image and the point spread function (PSF) are unknown.

Deblurring Image Deblurring

Deep Self-Convolutional Activations Descriptor for Dense Cross-Modal Correspondence

no code implementations21 Mar 2016 Seungryong Kim, Dongbo Min, Stephen Lin, Kwanghoon Sohn

We present a novel descriptor, called deep self-convolutional activations (DeSCA), designed for establishing dense correspondences between images taken under different imaging modalities, such as different spectral ranges or lighting conditions.

Automatic Thumbnail Generation Based on Visual Representativeness and Foreground Recognizability

no code implementations ICCV 2015 Jingwei Huang, Huarong Chen, Bin Wang, Stephen Lin

We present an automatic thumbnail generation technique based on two essential considerations: how well they visually represent the original photograph, and how well the foreground can be recognized after the cropping and downsizing steps of thumbnailing.

Unsupervised Extraction of Video Highlights Via Robust Recurrent Auto-encoders

no code implementations ICCV 2015 Huan Yang, Baoyuan Wang, Stephen Lin, David Wipf, Minyi Guo, Baining Guo

With the growing popularity of short-form video sharing platforms such as \em{Instagram} and \em{Vine}, there has been an increasing need for techniques that automatically extract highlights from video.

Data-Driven Depth Map Refinement via Multi-Scale Sparse Representation

no code implementations CVPR 2015 Hyeokhyen Kwon, Yu-Wing Tai, Stephen Lin

Depth maps captured by consumer-level depth cameras such as Kinect are usually degraded by noise, missing values, and quantization.

Dictionary Learning Quantization

Object-Based RGBD Image Co-Segmentation With Mutex Constraint

no code implementations CVPR 2015 Huazhu Fu, Dong Xu, Stephen Lin, Jiang Liu

We present an object-based co-segmentation method that takes advantage of depth data and is able to correctly handle noisy images in which the common foreground object is missing.

A Learning-to-Rank Approach for Image Color Enhancement

no code implementations CVPR 2014 Jianzhou Yan, Stephen Lin, Sing Bing Kang, Xiaoou Tang

We present a machine-learned ranking approach for automatically enhancing the color of a photograph.

Learning-To-Rank

Object-based Multiple Foreground Video Co-segmentation

no code implementations CVPR 2014 Huazhu Fu, Dong Xu, Bao Zhang, Stephen Lin

We present a video co-segmentation method that uses category-independent object proposals as its basic element and can extract multiple foreground objects in a video set.

Bayesian Depth-from-Defocus with Shading Constraints

no code implementations CVPR 2013 Chen Li, Shuochen Su, Yasuyuki Matsushita, Kun Zhou, Stephen Lin

We present a method that enhances the performance of depth-from-defocus (DFD) through the use of shading information.

Depth Estimation

Shading-Based Shape Refinement of RGB-D Images

no code implementations CVPR 2013 Lap-Fai Yu, Sai-Kit Yeung, Yu-Wing Tai, Stephen Lin

We present a shading-based shape refinement algorithm which uses a noisy, incomplete depth map from Kinect to help resolve ambiguities in shape-from-shading.

Cannot find the paper you are looking for? You can Submit a new open access paper.