1 code implementation • 19 Dec 2022 • Huimin Wu, Chenyang Lei, Xiao Sun, Peng-Shuai Wang, Qifeng Chen, Kwang-Ting Cheng, Stephen Lin, Zhirong Wu
Towards this end, masking has emerged as a generic and powerful tool where content is withheld along the sequential dimension, e. g., spatial in images, temporal in audio, and syntactic in language.
no code implementations • 21 Nov 2022 • Zhihang Zhong, Mingxi Cheng, Zhirong Wu, Yuhui Yuan, Yinqiang Zheng, Ji Li, Han Hu, Stephen Lin, Yoichi Sato, Imari Sato
Image cropping has progressed tremendously under the data-driven paradigm.
no code implementations • 15 Nov 2022 • Kun He, Chang Liu, Stephen Lin, John E. Hopcroft
And further combination with our feature augmentation techniques, termed LOMA_IF&FO, can continue to strengthen the model and outperform advanced intensity transformation methods for data augmentation.
no code implementations • 3 Nov 2022 • Yutong Lin, Ze Liu, Zheng Zhang, Han Hu, Nanning Zheng, Stephen Lin, Yue Cao
In this paper, we present a study of frozen pretrained models when applied to diverse and representative computer vision tasks, including object detection, semantic segmentation and video action recognition.
Ranked #3 on
Action Recognition In Videos
on Kinetics-400
no code implementations • 19 Sep 2022 • Sunghwan Hong, Seokju Cho, Seungryong Kim, Stephen Lin
The current state-of-the-art are Transformer-based approaches that focus on either feature descriptors or cost volume aggregation.
Ranked #1 on
Geometric Matching
on HPatches
1 code implementation • 22 Jul 2022 • Sunghwan Hong, Seokju Cho, Jisu Nam, Stephen Lin, Seungryong Kim
However, the tokenization of a correlation map for transformer processing can be detrimental, because the discontinuity at token boundaries reduces the local context available near the token edges and decreases inductive bias.
Ranked #1 on
Semantic correspondence
on PF-WILLOW
1 code implementation • 20 Jul 2022 • Zhihang Zhong, Xiao Sun, Zhirong Wu, Yinqiang Zheng, Stephen Lin, Imari Sato
Existing solutions to this problem estimate a single image sequence without considering the motion ambiguity for each region.
1 code implementation • 9 Jun 2022 • Zhirong Wu, Zihang Lai, Xiao Sun, Stephen Lin
The paper presents a scalable approach for learning spatially distributed visual representations over individual tokens and a holistic instance representation simultaneously.
no code implementations • 19 Apr 2022 • Atsuhiro Noguchi, Xiao Sun, Stephen Lin, Tatsuya Harada
We propose an unsupervised method for 3D geometry-aware representation learning of articulated objects, in which no image-pose pairs or foreground masks are used for training.
3 code implementations • 5 Apr 2022 • Jiequan Cui, Yuhui Yuan, Zhisheng Zhong, Zhuotao Tian, Han Hu, Stephen Lin, Jiaya Jia
In this paper, we study the problem of class imbalance in semantic segmentation.
Ranked #18 on
Semantic Segmentation
on ADE20K
1 code implementation • 12 Mar 2022 • Zhihang Zhong, Mingdeng Cao, Xiao Sun, Zhirong Wu, Zhongyi Zhou, Yinqiang Zheng, Stephen Lin, Imari Sato
In this paper, instead of two consecutive frames, we propose to exploit a pair of images captured by dual RS cameras with reversed RS directions for this highly challenging task.
2 code implementations • CVPR 2022 • Yutong Chen, Fangyun Wei, Xiao Sun, Zhirong Wu, Stephen Lin
Concretely, we pretrain the sign-to-gloss visual network on the general domain of human actions and the within-domain of a sign-to-gloss dataset, and pretrain the gloss-to-text translation network on the general domain of a multilingual corpus and the within-domain of a gloss-to-text corpus.
Ranked #2 on
Sign Language Translation
on CSL-Daily
no code implementations • CVPR 2022 • Yinghao Xu, Fangyun Wei, Xiao Sun, Ceyuan Yang, Yujun Shen, Bo Dai, Bolei Zhou, Stephen Lin
Typically in recent work, the pseudo-labels are obtained by training a model on the labeled data, and then using confident predictions from the model to teach itself.
1 code implementation • 22 Nov 2021 • Kenneth Li, Xiao Sun, Zhirong Wu, Fangyun Wei, Stephen Lin
For human action understanding, a popular research direction is to analyze short video clips with unambiguous semantic content, such as jumping and drinking.
1 code implementation • NeurIPS 2021 • Runtao Liu, Zhirong Wu, Stella X. Yu, Stephen Lin
Our model starts with two separate pathways: an appearance pathway that outputs feature-based region segmentation for a single image, and a motion pathway that outputs motion features for a pair of images.
Ranked #12 on
Video Polyp Segmentation
on SUN-SEG-Easy (Unseen)
1 code implementation • NeurIPS 2021 • Mengde Xu, Zheng Zhang, Fangyun Wei, Yutong Lin, Yue Cao, Stephen Lin, Han Hu, Xiang Bai
We introduce MixTraining, a new training paradigm for object detection that can improve the performance of existing detectors for free.
no code implementations • 29 Sep 2021 • Kenneth Li, Xiao Sun, Zhirong Wu, Fangyun Wei, Stephen Lin
However, methods for understanding short semantic actions cannot be directly translated to long kinematic sequences such as dancing, where it becomes challenging even to semantically label the human movements.
1 code implementation • 9 Sep 2021 • Dong-Jin Kim, Xiao Sun, Jinsoo Choi, Stephen Lin, In So Kweon
A common problem in the task of human-object interaction (HOI) detection is that numerous HOI classes have only a small number of labeled examples, resulting in training sets with a long-tailed distribution.
Ranked #31 on
Human-Object Interaction Detection
on HICO-DET
12 code implementations • CVPR 2022 • Ze Liu, Jia Ning, Yue Cao, Yixuan Wei, Zheng Zhang, Stephen Lin, Han Hu
The vision community is witnessing a modeling shift from CNNs to Transformers, where pure Transformer architectures have attained top accuracy on the major video recognition benchmarks.
Ranked #21 on
Action Classification
on Kinetics-600
(using extra training data)
1 code implementation • NeurIPS 2021 • Fangyun Wei, Yue Gao, Zhirong Wu, Han Hu, Stephen Lin
Image-level contrastive representation learning has proven to be highly effective as a generic model for transfer learning.
1 code implementation • ICCV 2021 • Atsuhiro Noguchi, Xiao Sun, Stephen Lin, Tatsuya Harada
We present Neural Articulated Radiance Field (NARF), a novel deformable 3D representation for articulated objects learned from images.
62 code implementations • ICCV 2021 • Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo
This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision.
Ranked #2 on
Image Classification
on OmniBenchmark
1 code implementation • CVPR 2021 • Ceyuan Yang, Zhirong Wu, Bolei Zhou, Stephen Lin
The pretext task is to predict the instance category given the composited images as well as the foreground bounding boxes.
1 code implementation • 4 Feb 2021 • Seokju Lee, Sunghoon Im, Stephen Lin, In So Kweon
We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision.
3 code implementations • 24 Dec 2020 • Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, Han Hu
The Non-Local Network (NLNet) presents a pioneering approach for capturing long-range dependencies within an image, via aggregating query-specific global context to each query position.
Ranked #34 on
Instance Segmentation
on COCO minival
7 code implementations • CVPR 2021 • Zhenda Xie, Yutong Lin, Zheng Zhang, Yue Cao, Stephen Lin, Han Hu
We argue that the power of contrastive learning has yet to be fully unleashed, as current methods are trained only on instance-level pretext tasks, leading to representations that may be sub-optimal for downstream tasks requiring dense pixel predictions.
no code implementations • ECCV 2020 • Xin Wei, Guojun Chen, Yue Dong, Stephen Lin, Xin Tong
With the estimated lighting, virtual objects can be rendered in AR scenarios with shading that is consistent to the real scene, leading to improved realism.
1 code implementation • ECCV 2020 • Ailing Zeng, Xiao Sun, Fuyang Huang, Minhao Liu, Qiang Xu, Stephen Lin
With the reduced dimensionality of less relevant body areas, the training set distribution within network branches more closely reflects the statistics of local poses instead of global body poses, without sacrificing information important for joint inference.
Ranked #13 on
Monocular 3D Human Pose Estimation
on Human3.6M
1 code implementation • 17 Jul 2020 • Dong-Jin Kim, Xiao Sun, Jinsoo Choi, Stephen Lin, In So Kweon
A common problem in human-object interaction (HOI) detection task is that numerous HOI classes have only a small number of labeled examples, resulting in training sets with a long-tailed distribution.
1 code implementation • NeurIPS 2020 • Yihong Chen, Zheng Zhang, Yue Cao, Li-Wei Wang, Stephen Lin, Han Hu
Though RepPoints provides high performance, we find that its heavy reliance on regression for object localization leaves room for improvement.
Ranked #69 on
Object Detection
on COCO test-dev
1 code implementation • ECCV 2020 • Fangyun Wei, Xiao Sun, Hongyang Li, Jingdong Wang, Stephen Lin
A recent approach for object detection and human pose estimation is to regress bounding boxes or human keypoints from a central point on the object or person.
no code implementations • ICLR 2021 • Nanxuan Zhao, Zhirong Wu, Rynson W. H. Lau, Stephen Lin
Contrastive visual pretraining based on the instance discrimination pretext task has made significant progress.
4 code implementations • ECCV 2020 • Minghao Yin, Zhuliang Yao, Yue Cao, Xiu Li, Zheng Zhang, Stephen Lin, Han Hu
This paper first studies the non-local block in depth, where we find that its attention computation can be split into two terms, a whitened pairwise term accounting for the relationship between two pixels and a unary term representing the saliency of every pixel.
Ranked #14 on
Semantic Segmentation
on Cityscapes test
1 code implementation • CVPR 2020 • Yizhuo Zhang, Zhirong Wu, Houwen Peng, Stephen Lin
Semi-supervised video object segmentation aims to separate a target object from a video sequence, given the mask in the first frame.
no code implementations • 14 Apr 2020 • Nanxuan Zhao, Zhirong Wu, Rynson W. H. Lau, Stephen Lin
To address this problem, we propose a data-driven approach for learning invariance to backgrounds.
1 code implementation • ECCV 2020 • Zhenda Xie, Zheng Zhang, Xizhou Zhu, Gao Huang, Stephen Lin
In the feature maps of CNNs, there commonly exists considerable spatial redundancy that leads to much repetitive processing.
2 code implementations • CVPR 2021 • Zhuliang Yao, Yue Cao, Shuxin Zheng, Gao Huang, Stephen Lin
We thus compensate for the network weight changes via a proposed technique based on Taylor polynomials, so that the statistics can be accurately estimated and batch normalization can be effectively applied.
Ranked #191 on
Object Detection
on COCO test-dev
2 code implementations • ECCV 2020 • Ze Yang, Yinghao Xu, Han Xue, Zheng Zhang, Raquel Urtasun, Li-Wei Wang, Stephen Lin, Han Hu
We present a new object representation, called Dense RepPoints, that utilizes a large set of points to describe an object at multiple levels, including both box level and pixel level.
1 code implementation • 19 Dec 2019 • Seokju Lee, Sunghoon Im, Stephen Lin, In So Kweon
We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision.
no code implementations • 17 Nov 2019 • Renjiao Yi, Ping Tan, Stephen Lin
We present an unsupervised approach for factorizing object appearance into highlight, shading, and albedo layers, trained by multi-view real images.
2 code implementations • CVPR 2020 • Chao Li, Yixiao Yang, Kun He, Stephen Lin, John E. Hopcroft
IBCLN is a cascaded network that iteratively refines the estimates of transmission and reflection layers in a manner that they can boost the prediction quality to each other, and information across steps of the cascade is transferred using an LSTM.
Ranked #1 on
Reflection Removal
on SIR^2(Postcard)
no code implementations • 16 Sep 2019 • Seokju Lee, Sunghoon Im, Stephen Lin, In So Kweon
Based on rigid projective geometry, the estimated stereo depth is used to guide the camera motion estimation, and the depth and camera motion are used to guide the residual flow estimation.
no code implementations • 16 Sep 2019 • Seokju Lee, Junsik Kim, Tae-Hyun Oh, Yongseop Jeong, Donggeun Yoo, Stephen Lin, In So Kweon
We postulate that success on this task requires the network to learn semantic and geometric knowledge in the ego-centric view.
1 code implementation • ICLR 2019 • Sunghoon Im, Hae-Gon Jeon, Stephen Lin, In So Kweon
The cost volume is constructed using a differentiable warping process that allows for end-to-end training of the network.
9 code implementations • 25 Apr 2019 • Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, Han Hu
In this paper, we take advantage of this finding to create a simplified network based on a query-independent formulation, which maintains the accuracy of NLNet but with significantly less computation.
Ranked #52 on
Instance Segmentation
on COCO minival
6 code implementations • ICCV 2019 • Ze Yang, Shaohui Liu, Han Hu, Li-Wei Wang, Stephen Lin
They furthermore do not require the use of anchors to sample a space of bounding boxes.
Ranked #82 on
Object Detection
on COCO minival
3 code implementations • ICCV 2019 • Han Hu, Zheng Zhang, Zhenda Xie, Stephen Lin
The convolution layer has been the dominant feature extractor in computer vision for years.
Ranked #764 on
Image Classification
on ImageNet
1 code implementation • ICCV 2019 • Xizhou Zhu, Dazhi Cheng, Zheng Zhang, Stephen Lin, Jifeng Dai
Attention mechanisms have become a popular component in deep neural networks, yet there has been little examination of how different influencing factors and methods for computing attention from these factors affect performance.
no code implementations • 27 Mar 2019 • Jun Gao, Xiao Li, Li-Wei Wang, Sanja Fidler, Stephen Lin
We present a method for compositing virtual objects into a photograph such that the object colors appear to have been processed by the photo's camera imaging pipeline.
no code implementations • 10 Feb 2019 • Huazhu Fu, Yanwu Xu, Stephen Lin, Damon Wing Kee Wong, Mani Baskaran, Meenakshi Mahesh, Tin Aung, Jiang Liu
A Multi-Level Deep Network (MLDN) is proposed to formulate this learning, which utilizes three particular AS-OCT regions based on clinical priors: the global anterior segment structure, local iris region, and anterior chamber angle (ACA) patch.
1 code implementation • 20 Dec 2018 • Bin Liu, Zhirong Wu, Han Hu, Stephen Lin
In this paper, we propose a generic framework that utilizes unlabeled data to aid generalization for all three tasks.
21 code implementations • CVPR 2019 • Xizhou Zhu, Han Hu, Stephen Lin, Jifeng Dai
The superior performance of Deformable Convolutional Networks arises from its ability to adapt to the geometric variations of objects.
Ranked #119 on
Object Detection
on COCO minival
no code implementations • 27 Nov 2018 • Zheng Zhang, Dazhi Cheng, Xizhou Zhu, Stephen Lin, Jifeng Dai
Accurate detection and tracking of objects is vital for effective video understanding.
Ranked #13 on
Video Object Detection
on ImageNet VID
no code implementations • 17 Nov 2018 • Xiao Sun, Chuankang Li, Stephen Lin
We present a method for human pose tracking that is based on learning spatiotemporal relationships among joints.
1 code implementation • NeurIPS 2018 • Seungryong Kim, Stephen Lin, Sangryul Jeon, Dongbo Min, Kwanghoon Sohn
Our networks accomplish this through an iterative process of estimating spatial transformations between the input images and using these transformations to generate aligned convolutional activations.
1 code implementation • 17 Sep 2018 • Xiao Sun, Chuankang Li, Stephen Lin
For the ECCV 2018 PoseTrack Challenge, we present a 3D human pose estimation system based mainly on the integral human pose regression method.
Ranked #1 on
3D Human Pose Estimation
on CHALL H80K
no code implementations • 10 Sep 2018 • Huazhu Fu, Yanwu Xu, Stephen Lin, Damon Wing Kee Wong, Baskaran Mani, Meenakshi Mahesh, Tin Aung, Jiang Liu
A major cause of irreversible visual impairment is angle-closure glaucoma, which can be screened through imagery from Anterior Segment Optical Coherence Tomography (AS-OCT).
no code implementations • CVPR 2018 • Abdelrahman Abdelhamed, Stephen Lin, Michael S. Brown
We propose a systematic procedure for estimating ground truth for noisy images that can be used to benchmark denoising performance for smartphone cameras.
no code implementations • ECCV 2018 • Renjiao Yi, Chenyang Zhu, Ping Tan, Stephen Lin
We present a method for estimating detailed scene illumination using human faces in a single image.
1 code implementation • 27 Sep 2017 • Yuanming Hu, Hao He, Chenxi Xu, Baoyuan Wang, Stephen Lin
Retouching can significantly elevate the visual appeal of photos, but many casual photographers lack the expertise to do this well.
no code implementations • ICCV 2017 • Seungryong Kim, Dongbo Min, Stephen Lin, Kwanghoon Sohn
In this way, our approach draws solutions from the continuous space of affine transformations in a manner that can be computed efficiently through constant-time edge-aware filtering and a proposed affine-varying CNN-based descriptor.
no code implementations • CVPR 2017 • Chen Li, Stephen Lin, Kun Zhou, Katsushi Ikeuchi
We present a method for radiometric calibration of cameras from a single image that contains a human face.
no code implementations • CVPR 2017 • Chen Li, Stephen Lin, Kun Zhou, Katsushi Ikeuchi
An important practical feature of the proposed method is that the skin color model is utilized in a way that does not require color calibration of the camera.
1 code implementation • CVPR 2017 • Yuanming Hu, Baoyuan Wang, Stephen Lin
However, the patch-based CNNs that exist for this problem are faced with the issue of estimation ambiguity, where a patch may contain insufficient information to establish a unique or even a limited possible range of illumination colors.
1 code implementation • CVPR 2017 • Seungryong Kim, Dongbo Min, Bumsub Ham, Sangryul Jeon, Stephen Lin, Kwanghoon Sohn
The sampling patterns of local structure and the self-similarity measure are jointly learned within the proposed network in an end-to-end and multi-scale manner.
no code implementations • CVPR 2016 • Zhe Hu, Lu Yuan, Stephen Lin, Ming-Hsuan Yang
Removing image blur caused by camera shake is an ill-posed problem, as both the latent image and the point spread function (PSF) are unknown.
no code implementations • 21 Mar 2016 • Seungryong Kim, Dongbo Min, Stephen Lin, Kwanghoon Sohn
We present a novel descriptor, called deep self-convolutional activations (DeSCA), designed for establishing dense correspondences between images taken under different imaging modalities, such as different spectral ranges or lighting conditions.
1 code implementation • 21 Mar 2016 • Seungryong Kim, Kihong Park, Kwanghoon Sohn, Stephen Lin
We present a method for jointly predicting a depth map and intrinsic images from single-image input.
no code implementations • ICCV 2015 • Jingwei Huang, Huarong Chen, Bin Wang, Stephen Lin
We present an automatic thumbnail generation technique based on two essential considerations: how well they visually represent the original photograph, and how well the foreground can be recognized after the cropping and downsizing steps of thumbnailing.
no code implementations • ICCV 2015 • Huan Yang, Baoyuan Wang, Stephen Lin, David Wipf, Minyi Guo, Baining Guo
With the growing popularity of short-form video sharing platforms such as \em{Instagram} and \em{Vine}, there has been an increasing need for techniques that automatically extract highlights from video.
no code implementations • CVPR 2015 • Hyeokhyen Kwon, Yu-Wing Tai, Stephen Lin
Depth maps captured by consumer-level depth cameras such as Kinect are usually degraded by noise, missing values, and quantization.
no code implementations • CVPR 2015 • Chen Li, Kun Zhou, Stephen Lin
We present a method for simulating makeup in a face image.
no code implementations • CVPR 2015 • Huazhu Fu, Dong Xu, Stephen Lin, Jiang Liu
We present an object-based co-segmentation method that takes advantage of depth data and is able to correctly handle noisy images in which the common foreground object is missing.
no code implementations • CVPR 2014 • Jianzhou Yan, Stephen Lin, Sing Bing Kang, Xiaoou Tang
We present a machine-learned ranking approach for automatically enhancing the color of a photograph.
no code implementations • CVPR 2014 • Huazhu Fu, Dong Xu, Bao Zhang, Stephen Lin
We present a video co-segmentation method that uses category-independent object proposals as its basic element and can extract multiple foreground objects in a video set.
no code implementations • CVPR 2013 • Jianzhou Yan, Stephen Lin, Sing Bing Kang, Xiaoou Tang
Image cropping is a common operation used to improve the visual quality of photographs.
no code implementations • CVPR 2013 • Lap-Fai Yu, Sai-Kit Yeung, Yu-Wing Tai, Stephen Lin
We present a shading-based shape refinement algorithm which uses a noisy, incomplete depth map from Kinect to help resolve ambiguities in shape-from-shading.
no code implementations • CVPR 2013 • Chen Li, Shuochen Su, Yasuyuki Matsushita, Kun Zhou, Stephen Lin
We present a method that enhances the performance of depth-from-defocus (DFD) through the use of shading information.