Search Results for author: Xiaohui Shen

Found 79 papers, 33 papers with code

Detecting and Aligning Faces by Image Retrieval

no code implementations • CVPR 2013 • Xiaohui Shen, Zhe Lin, Jonathan Brandt, Ying Wu

In order to overcome these challenges, we present a novel and robust exemplarbased face detector that integrates image retrieval and discriminative learning.

Attribute Face Alignment +5

Paper
Add Code

Towards Unified Human Parsing and Pose Estimation

no code implementations • CVPR 2014 • Jian Dong, Qiang Chen, Xiaohui Shen, Jianchao Yang, Shuicheng Yan

We study the problem of human body configuration analysis, more specifically, human parsing and human pose estimation.

Human Parsing Pose Estimation

Paper
Add Code

Efficient Boosted Exemplar-based Face Detection

no code implementations • CVPR 2014 • Haoxiang Li, Zhe Lin, Jonathan Brandt, Xiaohui Shen, Gang Hua

Despite the fact that face detection has been studied intensively over the past several decades, the problem is still not completely solved.

Face Detection

Paper
Add Code

Deep Human Parsing with Active Template Regression

1 code implementation • 9 Mar 2015 • Xiaodan Liang, Si Liu, Xiaohui Shen, Jianchao Yang, Luoqi Liu, Jian Dong, Liang Lin, Shuicheng Yan

The first CNN network is with max-pooling, and designed to predict the template coefficients for each label mask, while the second CNN network is without max-pooling to preserve sensitivity to label mask position and accurately predict the active shape parameters.

Human Parsing Position +1

152

Paper
Code

Matching-CNN Meets KNN: Quasi-Parametric Human Parsing

no code implementations • CVPR 2015 • Si Liu, Xiaodan Liang, Luoqi Liu, Xiaohui Shen, Jianchao Yang, Changsheng Xu, Liang Lin, Xiaochun Cao, Shuicheng Yan

Under the classic K Nearest Neighbor (KNN)-based nonparametric framework, the parametric Matching Convolutional Neural Network (M-CNN) is proposed to predict the matching confidence and displacements of the best matched region in the testing image for a particular semantic region in one KNN image.

Human Parsing

Paper
Add Code

Joint Object and Part Segmentation using Deep Learned Potentials

no code implementations • ICCV 2015 • Peng Wang, Xiaohui Shen, Zhe Lin, Scott Cohen, Brian Price, Alan Yuille

Segmenting semantic objects from images and parsing them into their respective semantic parts are fundamental steps towards detailed object understanding in computer vision.

Object Segmentation +1

Paper
Add Code

Inner and Inter Label Propagation: Salient Object Detection in the Wild

2 code implementations • 27 May 2015 • Hongyang Li, Huchuan Lu, Zhe Lin, Xiaohui Shen, Brian Price

For most natural images, some boundary superpixels serve as the background labels and the saliency of other superpixels are determined by ranking their similarities to the boundary labels based on an inner propagation scheme.

Computational Efficiency object-detection +4

Paper
Code

A Convolutional Neural Network Cascade for Face Detection

no code implementations • CVPR 2015 • Haoxiang Li, Zhe Lin, Xiaohui Shen, Jonathan Brandt, Gang Hua

To improve localization effectiveness, and reduce the number of candidates at later stages, we introduce a CNN-based calibration stage after each of the detection stages in the cascade.

Face Detection

Paper
Add Code

Towards Unified Depth and Semantic Prediction From a Single Image

no code implementations • CVPR 2015 • Peng Wang, Xiaohui Shen, Zhe Lin, Scott Cohen, Brian Price, Alan L. Yuille

By allowing for interactions between the depth and semantic information, the joint network provides more accurate depth prediction than a state-of-the-art CNN trained solely for depth prediction [5].

Depth Estimation Depth Prediction +1

Paper
Add Code

LCNN: Low-level Feature Embedded CNN for Salient Object Detection

no code implementations • 17 Aug 2015 • Hongyang Li, Huchuan Lu, Zhe Lin, Xiaohui Shen, Brian Price

In this paper, we propose a novel deep neural network framework embedded with low-level features (LCNN) for salient object detection in complex images.

object-detection RGB Salient Object Detection +1

Paper
Add Code

Proposal-free Network for Instance-level Object Segmentation

no code implementations • 9 Sep 2015 • Xiaodan Liang, Yunchao Wei, Xiaohui Shen, Jianchao Yang, Liang Lin, Shuicheng Yan

Instance-level object segmentation is an important yet under-explored task.

Clustering Object +3

Paper
Add Code

STC: A Simple to Complex Framework for Weakly-supervised Semantic Segmentation

1 code implementation • 10 Sep 2015 • Yunchao Wei, Xiaodan Liang, Yunpeng Chen, Xiaohui Shen, Ming-Ming Cheng, Jiashi Feng, Yao Zhao, Shuicheng Yan

Then, a better network called Enhanced-DCNN is learned with supervision from the predicted segmentation masks of simple images based on the Initial-DCNN as well as the image-level annotations.

object-detection RGB Salient Object Detection +4

Paper
Code

Automatic Content-Aware Color and Tone Stylization

no code implementations • CVPR 2016 • Joon-Young Lee, Kalyan Sunkavalli, Zhe Lin, Xiaohui Shen, In So Kweon

We introduce a new technique that automatically generates diverse, visually compelling stylizations for a photograph in an unsupervised manner.

Style Transfer

Paper
Add Code

Reversible Recursive Instance-level Object Segmentation

no code implementations • CVPR 2016 • Xiaodan Liang, Yunchao Wei, Xiaohui Shen, Zequn Jie, Jiashi Feng, Liang Lin, Shuicheng Yan

By being reversible, the proposal refinement sub-network adaptively determines an optimal number of refinement iterations required for each proposal during both training and testing.

Denoising Object +2

Paper
Add Code

Semantic Object Parsing with Local-Global Long Short-Term Memory

no code implementations • CVPR 2016 • Xiaodan Liang, Xiaohui Shen, Donglai Xiang, Jiashi Feng, Liang Lin, Shuicheng Yan

The long chains of sequential computation by stacked LG-LSTM layers also enable each pixel to sense a much larger region for inference benefiting from the memorization of previous dependencies in all positions along all dimensions.

Memorization Position

Paper
Add Code

Human Parsing With Contextualized Convolutional Neural Network

no code implementations • ICCV 2015 • Xiaodan Liang, Chunyan Xu, Xiaohui Shen, Jianchao Yang, Si Liu, Jinhui Tang, Liang Lin, Shuicheng Yan

In this work, we address the human parsing task with a novel Contextualized Convolutional Neural Network (Co-CNN) architecture, which well integrates the cross-layer context, global image-level context, within-super-pixel context and cross-super-pixel neighborhood context into a unified network.

Human Parsing

Paper
Add Code

Minimum Barrier Salient Object Detection at 80 FPS

no code implementations • ICCV 2015 • Jianming Zhang, Stan Sclaroff, Zhe Lin, Xiaohui Shen, Brian Price, Radomir Mech

Powered by this fast MBD transform algorithm, the proposed salient object detection method runs at 80 FPS, and significantly outperforms previous methods with similar speed on four large benchmark datasets, and achieves comparable or better performance than state-of-the-art methods.

Ranked #6 on Video Salient Object Detection on VOS-T (using extra training data)

Object object-detection +2

Paper
Add Code

Deep Multi-Patch Aggregation Network for Image Style, Aesthetics, and Quality Estimation

no code implementations • ICCV 2015 • Xin Lu, Zhe Lin, Xiaohui Shen, Radomir Mech, James Z. Wang

We propose a deep multi-patch aggregation network training approach, which allows us to train models using multiple patches generated from one image.

Ranked #8 on Aesthetics Quality Assessment on AVA

Aesthetics Quality Assessment Image Quality Estimation

Paper
Add Code

Semantic Object Parsing with Graph LSTM

no code implementations • 23 Mar 2016 • Xiaodan Liang, Xiaohui Shen, Jiashi Feng, Liang Lin, Shuicheng Yan

By taking the semantic object parsing task as an exemplar application scenario, we propose the Graph Long Short-Term Memory (Graph LSTM) network, which is the generalization of LSTM from sequential data or multi-dimensional data to general graph-structured data.

Object Superpixels

Paper
Add Code

A Multi-Level Contextual Model For Person Recognition in Photo Albums

no code implementations • CVPR 2016 • Haoxiang Li, Jonathan Brandt, Zhe Lin, Xiaohui Shen, Gang Hua

Our new framework enables efficient use of these complementary multi-level contextual cues to improve overall recognition rates on the photo album person recognition task, as demonstrated through state-of-the-art results on a challenging public dataset.

Person Recognition

Paper
Add Code

Shortlist Selection With Residual-Aware Distance Estimator for K-Nearest Neighbor Search

no code implementations • CVPR 2016 • Jae-Pil Heo, Zhe Lin, Xiaohui Shen, Jonathan Brandt, Sung-Eui Yoon

We have tested the proposed method with the inverted index and multi-index on a diverse set of benchmarks including up to one billion data points with varying dimensions, and found that our method robustly improves the accuracy of shortlists (up to 127% relatively higher) over the state-of-the-art techniques with a comparable or even faster computational cost.

Quantization

Paper
Add Code

Event-Specific Image Importance

no code implementations • CVPR 2016 • Yufei Wang, Zhe Lin, Xiaohui Shen, Radomir Mech, Gavin Miller, Garrison W. Cottrell

In this paper, we show that the selection of important images is consistent among different viewers, and that this selection process is related to the event type of the album.

Paper
Add Code

Unconstrained Salient Object Detection via Proposal Subset Optimization

1 code implementation • CVPR 2016 • Jianming Zhang, Stan Sclaroff, Zhe Lin, Xiaohui Shen, Brian Price, Radomir Mech

Our system leverages a Convolutional-Neural-Network model to generate location proposals of salient objects.

Object object-detection +2

Paper
Code

Photo Aesthetics Ranking Network with Attributes and Content Adaptation

2 code implementations • 6 Jun 2016 • Shu Kong, Xiaohui Shen, Zhe Lin, Radomir Mech, Charless Fowlkes

In this work, we propose to learn a deep convolutional neural network to rank photo aesthetics in which the relative ranking of photo aesthetics are directly modeled in the loss function.

Ranked #7 on Aesthetics Quality Assessment on AVA

Aesthetics Quality Assessment

291

Paper
Code

Progressive Attention Networks for Visual Attribute Prediction

1 code implementation • 8 Jun 2016 • Paul Hongsuck Seo, Zhe Lin, Scott Cohen, Xiaohui Shen, Bohyung Han

We propose a novel attention model that can accurately attends to target objects of various scales and shapes in images.

Attribute Hard Attention

Paper
Code

Salient Object Subitizing

no code implementations • CVPR 2015 • Jianming Zhang, Shugao Ma, Mehrnoosh Sameki, Stan Sclaroff, Margrit Betke, Zhe Lin, Xiaohui Shen, Brian Price, Radomir Mech

We study the problem of Salient Object Subitizing, i. e. predicting the existence and the number of salient objects in an image using holistic cues.

Image Retrieval Object +4

Paper
Add Code

Top-down Neural Attention by Excitation Backprop

3 code implementations • 1 Aug 2016 • Jianming Zhang, Zhe Lin, Jonathan Brandt, Xiaohui Shen, Stan Sclaroff

We aim to model the top-down attention of a Convolutional Neural Network (CNN) classifier for generating task-specific attention maps.

Paper
Code

SURGE: Surface Regularized Geometry Estimation from a Single Image

no code implementations • NeurIPS 2016 • Peng Wang, Xiaohui Shen, Bryan Russell, Scott Cohen, Brian Price, Alan L. Yuille

This paper introduces an approach to regularize 2. 5D surface normal and depth predictions at each pixel given a single input image.

Paper
Add Code

Video Scene Parsing with Predictive Feature Learning

no code implementations • ICCV 2017 • Xiaojie Jin, Xin Li, Huaxin Xiao, Xiaohui Shen, Zhe Lin, Jimei Yang, Yunpeng Chen, Jian Dong, Luoqi Liu, Zequn Jie, Jiashi Feng, Shuicheng Yan

In this way, the network can effectively learn to capture video dynamics and temporal context, which are critical clues for video scene parsing, without requiring extra manual annotations.

Representation Learning Scene Parsing

Paper
Add Code

Learning to Detect Multiple Photographic Defects

1 code implementation • 6 Dec 2016 • Ning Yu, Xiaohui Shen, Zhe Lin, Radomir Mech, Connelly Barnes

Our new dataset enables us to formulate the problem as a multi-task learning problem and train a multi-column deep convolutional neural network (CNN) to simultaneously predict the severity of all the defects.

Defect Detection Multi-Task Learning

Paper
Code

Deep Image Harmonization

2 code implementations • CVPR 2017 • Yi-Hsuan Tsai, Xiaohui Shen, Zhe Lin, Kalyan Sunkavalli, Xin Lu, Ming-Hsuan Yang

Compositing is one of the most common operations in photo editing.

Image Harmonization

149

Paper
Code

Interpretable Structure-Evolving LSTM

no code implementations • CVPR 2017 • Xiaodan Liang, Liang Lin, Xiaohui Shen, Jiashi Feng, Shuicheng Yan, Eric P. Xing

Instead of learning LSTM models over the pre-fixed structures, we propose to further learn the intermediate interpretable multi-level graph structures in a progressive and stochastic way from data during the LSTM network optimization.

Small Data Image Classification

Paper
Add Code

Look into Person: Self-supervised Structure-sensitive Learning and A New Benchmark for Human Parsing

1 code implementation • CVPR 2017 • Ke Gong, Xiaodan Liang, Dongyu Zhang, Xiaohui Shen, Liang Lin

Human parsing has recently attracted a lot of research interests due to its huge application potentials.

Ranked #13 on Semantic Segmentation on LIP val

Human Parsing Self-Supervised Learning +1

227

Paper
Code

Recurrent Multimodal Interaction for Referring Image Segmentation

1 code implementation • ICCV 2017 • Chenxi Liu, Zhe Lin, Xiaohui Shen, Jimei Yang, Xin Lu, Alan Yuille

In this paper we are interested in the problem of image segmentation given natural language descriptions, i. e. referring expressions.

Image Segmentation Segmentation +1

Paper
Code

Learning to Predict Indoor Illumination from a Single Image

no code implementations • 1 Apr 2017 • Marc-André Gardner, Kalyan Sunkavalli, Ersin Yumer, Xiaohui Shen, Emiliano Gambaretto, Christian Gagné, Jean-François Lalonde

We propose an automatic method to infer high dynamic range illumination from a single, limited field-of-view, low dynamic range photograph of an indoor scene.

Lighting Estimation

Paper
Add Code

Skeleton Key: Image Captioning by Skeleton-Attribute Decomposition

no code implementations • CVPR 2017 • Yufei Wang, Zhe Lin, Xiaohui Shen, Scott Cohen, Garrison W. Cottrell

Furthermore, our algorithm can generate descriptions with varied length, benefiting from the separate control of the skeleton and attributes.

Attribute Image Captioning +2

Paper
Add Code

Recognizing and Curating Photo Albums via Event-Specific Image Importance

1 code implementation • 19 Jul 2017 • Yufei Wang, Zhe Lin, Xiaohui Shen, Radomir Mech, Gavin Miller, Garrison W. Cottrell

Automatic organization of personal photos is a problem with many real world ap- plications, and can be divided into two main tasks: recognizing the event type of the photo collection, and selecting interesting images from the collection.

Vocal Bursts Type Prediction

Paper
Code

FoveaNet: Perspective-aware Urban Scene Parsing

no code implementations • ICCV 2017 • Xin Li, Zequn Jie, Wei Wang, Changsong Liu, Jimei Yang, Xiaohui Shen, Zhe Lin, Qiang Chen, Shuicheng Yan, Jiashi Feng

Thus, they suffer from heterogeneous object scales caused by perspective projection of cameras on actual scenes and inevitably encounter parsing failures on distant objects as well as other boundary and recognition errors.

Scene Parsing

Paper
Add Code

Personalized Image Aesthetics

no code implementations • ICCV 2017 • Jian Ren, Xiaohui Shen, Zhe Lin, Radomir Mech, David J. Foran

To accommodate our study, we first collect two distinct datasets, a large image dataset from Flickr and annotated by Amazon Mechanical Turk, and a small dataset of real personal albums rated by owners.

Active Learning

Paper
Add Code

Learning to Segment Human by Watching YouTube

no code implementations • 4 Oct 2017 • Xiaodan Liang, Yunchao Wei, Liang Lin, Yunpeng Chen, Xiaohui Shen, Jianchao Yang, Shuicheng Yan

An intuition on human segmentation is that when a human is moving in a video, the video-context (e. g., appearance and motion clues) may potentially infer reasonable mask information for the whole human body.

Human Detection Segmentation +5

Paper
Add Code

Scene Parsing with Global Context Embedding

1 code implementation • ICCV 2017 • Wei-Chih Hung, Yi-Hsuan Tsai, Xiaohui Shen, Zhe Lin, Kalyan Sunkavalli, Xin Lu, Ming-Hsuan Yang

We present a scene parsing method that utilizes global context information based on both the parametric and non- parametric models.

Scene Parsing

Paper
Code

Predicting Scene Parsing and Motion Dynamics in the Future

no code implementations • NeurIPS 2017 • Xiaojie Jin, Huaxin Xiao, Xiaohui Shen, Jimei Yang, Zhe Lin, Yunpeng Chen, Zequn Jie, Jiashi Feng, Shuicheng Yan

The ability of predicting the future is important for intelligent systems, e. g. autonomous vehicles and robots to plan early and make decisions accordingly.

Autonomous Vehicles motion prediction +2

Paper
Add Code

MAttNet: Modular Attention Network for Referring Expression Comprehension

1 code implementation • CVPR 2018 • Licheng Yu, Zhe Lin, Xiaohui Shen, Jimei Yang, Xin Lu, Mohit Bansal, Tamara L. Berg

In this paper, we address referring expression comprehension: localizing an image region described by a natural language expression.

Ranked #7 on Generalized Referring Expression Segmentation on gRefCOCO

Generalized Referring Expression Segmentation Referring Expression +1

292

Paper
Code

Generative Image Inpainting with Contextual Attention

28 code implementations • CVPR 2018 • Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang

Motivated by these observations, we propose a new deep generative model-based approach which can not only synthesize novel image structures but also explicitly utilize surrounding image features as references during network training to make better predictions.

Image Inpainting

3,158

Paper
Code

Look into Person: Joint Body Parsing & Pose Estimation Network and A New Benchmark

3 code implementations • 5 Apr 2018 • Xiaodan Liang, Ke Gong, Xiaohui Shen, Liang Lin

To further explore and take advantage of the semantic correlation of these two tasks, we propose a novel joint human parsing and pose estimation network to explore efficient context modeling, which can simultaneously predict parsing and pose with extremely high quality.

Ranked #10 on Semantic Segmentation on LIP val

Human Parsing Pose Estimation +1

368

Paper
Code

Towards Interpretable Face Recognition

1 code implementation • ICCV 2019 • Bangjie Yin, Luan Tran, Haoxiang Li, Xiaohui Shen, Xiaoming Liu

Deep CNNs have been pushing the frontier of visual recognition over past years.

Face Recognition

Paper
Code

Learning to Understand Image Blur

no code implementations • CVPR 2018 • Shanghang Zhang, Xiaohui Shen, Zhe Lin, RadomÃr MÄch, JoÃ£o P. Costeira, JosÃ© M. F. Moura

In this paper, we propose a unified framework to estimate a spatially-varying blur map and understand its desirability in terms of image quality at the same time.

Paper
Add Code

Good View Hunting: Learning Photo Composition From Dense View Pairs

no code implementations • CVPR 2018 • Zijun Wei, Jianming Zhang, Xiaohui Shen, Zhe Lin, RadomÃr Mech, Minh Hoai, Dimitris Samaras

Finding views with good photo composition is a challenging task for machine learning methods.

Image Cropping Transfer Learning

Paper
Add Code

Free-Form Image Inpainting with Gated Convolution

30 code implementations • ICCV 2019 • Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas Huang

We present a generative image inpainting system to complete images with free-form mask and guidance.

Ranked #3 on Image Inpainting on Places2 val

feature selection Image Inpainting +1

3,158

Paper
Code

A Modulation Module for Multi-task Learning with Applications in Image Retrieval

1 code implementation • ECCV 2018 • Xiangyun Zhao, Haoxiang Li, Xiaohui Shen, Xiaodan Liang, Ying Wu

shared parameters.

Image Retrieval Multi-Task Learning +1

Paper
Code

Concept Mask: Large-Scale Segmentation from Semantic Concepts

no code implementations • ECCV 2018 • Yufei Wang, Zhe Lin, Xiaohui Shen, Jianming Zhang, Scott Cohen

Then, we refine and extend the embedding network to predict an attention map, using a curated dataset with bounding box annotations on 750 concepts.

Image Segmentation Segmentation +1

Paper
Add Code

Compositing-aware Image Search

no code implementations • ECCV 2018 • Hengshuang Zhao, Xiaohui Shen, Zhe Lin, Kalyan Sunkavalli, Brian Price, Jiaya Jia

We present a new image search technique that, given a background image, returns compatible foreground objects for image compositing tasks.

Image Retrieval Object

Paper
Add Code

Learning to Blend Photos

1 code implementation • ECCV 2018 • Wei-Chih Hung, Jianming Zhang, Xiaohui Shen, Zhe Lin, Joon-Young Lee, Ming-Hsuan Yang

Specifically, given a foreground image and a background image, our proposed method automatically generates a set of blending photos with scores that indicate the aesthetics quality with the proposed quality network and policy network.

Paper
Code

DeepLens: Shallow Depth Of Field From A Single Image

no code implementations • 18 Oct 2018 • Lijun Wang, Xiaohui Shen, Jianming Zhang, Oliver Wang, Zhe Lin, Chih-Yao Hsieh, Sarah Kong, Huchuan Lu

To achieve this, we propose a novel neural network model comprised of a depth prediction module, a lens blur module, and a guided upsampling module.

Depth Estimation Depth Prediction

Paper
Add Code

Sequence-to-Segment Networks for Segment Detection

no code implementations • NeurIPS 2018 • Zijun Wei, Boyu Wang, Minh Hoai Nguyen, Jianming Zhang, Zhe Lin, Xiaohui Shen, Radomir Mech, Dimitris Samaras

Detecting segments of interest from an input sequence is a challenging problem which often requires not only good knowledge of individual target segments, but also contextual understanding of the entire input sequence and the relationships between the target segments.

Temporal Action Proposal Generation Video Summarization

Paper
Add Code

Regional Homogeneity: Towards Learning Transferable Universal Adversarial Perturbations Against Defenses

1 code implementation • ECCV 2020 • Yingwei Li, Song Bai, Cihang Xie, Zhenyu Liao, Xiaohui Shen, Alan L. Yuille

We observe the property of regional homogeneity in adversarial perturbations and suggest that the defenses are less robust to regionally homogeneous perturbations.

object-detection Object Detection +1

Paper
Code

Graphonomy: Universal Human Parsing via Graph Transfer Learning

1 code implementation • CVPR 2019 • Ke Gong, Yiming Gao, Xiaodan Liang, Xiaohui Shen, Meng Wang, Liang Lin

By distilling universal semantic graph representation to each specific task, Graphonomy is able to predict all levels of parsing labels in one system without piling up the complexity.

Human Parsing Transfer Learning

287

Paper
Code

Fashion Editing with Adversarial Parsing Learning

no code implementations • CVPR 2020 • Haoye Dong, Xiaodan Liang, Yixuan Zhang, Xujie Zhang, Zhenyu Xie, Bowen Wu, Ziqi Zhang, Xiaohui Shen, Jian Yin

Interactive fashion image manipulation, which enables users to edit images with sketches and color strokes, is an interesting research problem with great application value.

Generative Adversarial Network Human Parsing +1

Paper
Add Code

EnlightenGAN: Deep Light Enhancement without Paired Supervision

8 code implementations • 17 Jun 2019 • Yifan Jiang, Xinyu Gong, Ding Liu, Yu Cheng, Chen Fang, Xiaohui Shen, Jianchao Yang, Pan Zhou, Zhangyang Wang

Deep learning-based methods have achieved remarkable success in image restoration and enhancement, but are they still competitive when there is a lack of paired training data?

Ranked #1 on Low-Light Image Enhancement on AFLW (Zhang CVPR 2018 crops)

Generative Adversarial Network Image Restoration +1

1,360

Paper
Code

Anatomy-aware 3D Human Pose Estimation with Bone-based Pose Decomposition

1 code implementation • 24 Feb 2020 • Tianlang Chen, Chen Fang, Xiaohui Shen, Yiheng Zhu, Zhili Chen, Jiebo Luo

In this work, we propose a new solution to 3D human pose estimation in videos.

Ranked #12 on Monocular 3D Human Pose Estimation on Human3.6M

Anatomy Monocular 3D Human Pose Estimation

153

Paper
Code

Human Motion Transfer from Poses in the Wild

no code implementations • 7 Apr 2020 • Jian Ren, Menglei Chai, Sergey Tulyakov, Chen Fang, Xiaohui Shen, Jianchao Yang

In this paper, we tackle the problem of human motion transfer, where we synthesize novel motion video for a target person that imitates the movement from a reference video.

Translation

Paper
Add Code

Unsupervised Low-light Image Enhancement with Decoupled Networks

no code implementations • 6 May 2020 • Wei Xiong, Ding Liu, Xiaohui Shen, Chen Fang, Jiebo Luo

In this paper, we tackle the problem of enhancing real-world low-light images with significant noise in an unsupervised fashion.

Image-to-Image Translation Low-Light Image Enhancement

Paper
Add Code

A Unified 3D Human Motion Synthesis Model via Conditional Variational Auto-Encoder

no code implementations • ICCV 2021 • Yujun Cai, Yiwei Wang, Yiheng Zhu, Tat-Jen Cham, Jianfei Cai, Junsong Yuan, Jun Liu, Chuanxia Zheng, Sijie Yan, Henghui Ding, Xiaohui Shen, Ding Liu, Nadia Magnenat Thalmann

Notably, by considering this problem as a conditional generation process, we estimate a parametric distribution of the missing regions based on the input conditions, from which to sample and synthesize the full motion series.

motion prediction Motion Synthesis

Paper
Add Code

DynOcc: Learning Single-View Depth from Dynamic Occlusion Cues

no code implementations • 30 Mar 2021 • Yifan Wang, Linjie Luo, Xiaohui Shen, Xing Mei

Recently, significant progress has been made in single-view depth estimation thanks to increasingly large and diverse depth datasets.

3D Reconstruction Autonomous Driving +2

Paper
Add Code

Adversarial Open Domain Adaptation for Sketch-to-Photo Synthesis

2 code implementations • 12 Apr 2021 • Xiaoyu Xiang, Ding Liu, Xiao Yang, Yiheng Zhu, Xiaohui Shen, Jan P. Allebach

In this paper, we explore open-domain sketch-to-photo translation, which aims to synthesize a realistic photo from a freehand sketch with its class label, even if the sketches of that class are missing in the training data.

Ranked #1 on Sketch-to-Image Translation on Scribble

Domain Adaptation Image-to-Image Translation +1

1,886

Paper
Code

Semantic StyleGAN

no code implementations • arXiv:2112.02236v2 [cs.CV] 7 Dec 2021 2021 • Researchers at ByteDance Inc, Yichun Shi, Xiao Yang, Yangyue Wan, Xiaohui Shen

SemanticStyleGAN presents a method where a generator is trained to model local semantic parts separately and synthesizes images in a compositional way.

Disentanglement

Paper
Add Code

Video Salient Object Detection via Contrastive Features and Attention Modules

no code implementations • 3 Nov 2021 • Yi-Wen Chen, Xiaojie Jin, Xiaohui Shen, Ming-Hsuan Yang

Video salient object detection aims to find the most visually distinctive objects in a video.

Contrastive Learning Object +7

Paper
Add Code

SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing

1 code implementation • CVPR 2022 • Yichun Shi, Xiao Yang, Yangyue Wan, Xiaohui Shen

When combined with editing methods designed for StyleGANs, it can achieve a more fine-grained control to edit synthesized or real images.

Disentanglement Facial Editing +2

248

Paper
Code

Contrastive Masked Autoencoders are Stronger Vision Learners

1 code implementation • 27 Jul 2022 • Zhicheng Huang, Xiaojie Jin, Chengze Lu, Qibin Hou, Ming-Ming Cheng, Dongmei Fu, Xiaohui Shen, Jiashi Feng

The momentum encoder, fed with the full images, enhances the feature discriminability via contrastive learning with its online counterpart.

Contrastive Learning Image Classification +3

Paper
Code

R2Former: Unified Retrieval and Reranking Transformer for Place Recognition

1 code implementation • CVPR 2023 • Sijie Zhu, Linjie Yang, Chen Chen, Mubarak Shah, Xiaohui Shen, Heng Wang

Visual Place Recognition (VPR) estimates the location of query images by matching them with images in a reference database.

Feature Correlation Retrieval +1

Paper
Code

MV-Adapter: Multimodal Video Transfer Learning for Video Text Retrieval

1 code implementation • 19 Jan 2023 • Xiaojie Jin, BoWen Zhang, Weibo Gong, Kai Xu, Xueqing Deng, Peng Wang, Zhao Zhang, Xiaohui Shen, Jiashi Feng

The first is a Temporal Adaptation Module that is incorporated in the video branch to introduce global and local temporal contexts.

Retrieval Text Retrieval +2

Paper
Code

$R^{2}$Former: Unified $R$etrieval and $R$eranking Transformer for Place Recognition

no code implementations • 6 Apr 2023 • Sijie Zhu, Linjie Yang, Chen Chen, Mubarak Shah, Xiaohui Shen, Heng Wang

Visual Place Recognition (VPR) estimates the location of query images by matching them with images in a reference database.

Feature Correlation Retrieval +1

Paper
Add Code

Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP

1 code implementation • NeurIPS 2023 • Qihang Yu, Ju He, Xueqing Deng, Xiaohui Shen, Liang-Chieh Chen

The proposed FC-CLIP, benefits from the following observations: the frozen CLIP backbone maintains the ability of open-vocabulary classification and can also serve as a strong mask generator, and the convolutional CLIP generalizes well to a larger input resolution than the one used during contrastive image-text pretraining.

Ranked #1 on Open Vocabulary Semantic Segmentation on Cityscapes

Open Vocabulary Panoptic Segmentation Open Vocabulary Semantic Segmentation +1

234

Paper
Code

Towards Open-Ended Visual Recognition with Large Language Model

1 code implementation • 14 Nov 2023 • Qihang Yu, Xiaohui Shen, Liang-Chieh Chen

Localizing and recognizing objects in the open-ended physical world poses a long-standing challenge within the domain of machine perception.

Language Modelling Large Language Model +2

Paper
Code

MaXTron: Mask Transformer with Trajectory Attention for Video Panoptic Segmentation

1 code implementation • 30 Nov 2023 • Ju He, Qihang Yu, Inkyu Shin, Xueqing Deng, Xiaohui Shen, Alan Yuille, Liang-Chieh Chen

To alleviate the issue, we propose to adapt the trajectory attention for both the dense pixel features and object queries, aiming to improve the short-term and long-term tracking results, respectively.

Ranked #1 on Video Panoptic Segmentation on VIPSeg

Object Video Classification +3

Paper
Code

ViTamin: Designing Scalable Vision Models in the Vision-Language Era

1 code implementation • 2 Apr 2024 • Jieneng Chen, Qihang Yu, Xiaohui Shen, Alan Yuille, Liang-Chieh Chen

To this end, we introduce ViTamin, a new vision models tailored for VLMs.

Paper
Code

COCONut: Modernizing COCO Segmentation

no code implementations • 12 Apr 2024 • Xueqing Deng, Qihang Yu, Peng Wang, Xiaohui Shen, Liang-Chieh Chen

By enhancing the annotation quality and expanding the dataset to encompass 383K images with more than 5. 18M panoptic masks, we introduce COCONut, the COCO Next Universal segmenTation dataset.

Panoptic Segmentation Segmentation +1

Paper
Add Code

Video Object Detection via Object-level Temporal Aggregation

no code implementations • ECCV 2020 • Chun-Han Yao, Chen Fang, Xiaohui Shen, Yangyue Wan, Ming-Hsuan Yang

While single-image object detectors can be naively applied to videos in a frame-by-frame fashion, the prediction is often temporally inconsistent.

Object object-detection +2

Paper
Add Code

Learning Progressive Joint Propagation for Human Motion Prediction

no code implementations • ECCV 2020 • Yujun Cai, Lin Huang, Yiwei Wang, Tat-Jen Cham, Jianfei Cai, Junsong Yuan, Jun Liu, Xu Yang, Yiheng Zhu, Xiaohui Shen, Ding Liu, Jing Liu, Nadia Magnenat Thalmann

Last, in order to incorporate a general motion space for high-quality prediction, we build a memory-based dictionary, which aims to preserve the global motion patterns in training data to guide the predictions.

Human motion prediction motion prediction

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.