no code implementations • ECCV 2020 • Yujun Cai, Lin Huang, Yiwei Wang, Tat-Jen Cham, Jianfei Cai, Junsong Yuan, Jun Liu, Xu Yang, Yiheng Zhu, Xiaohui Shen, Ding Liu, Jing Liu, Nadia Magnenat Thalmann
Last, in order to incorporate a general motion space for high-quality prediction, we build a memory-based dictionary, which aims to preserve the global motion patterns in training data to guide the predictions.
no code implementations • ECCV 2020 • Chun-Han Yao, Chen Fang, Xiaohui Shen, Yangyue Wan, Ming-Hsuan Yang
While single-image object detectors can be naively applied to videos in a frame-by-frame fashion, the prediction is often temporally inconsistent.
no code implementations • 4 Feb 2025 • Chia-Wen Kuo, Sijie Zhu, Fan Chen, Xiaohui Shen, Longyin Wen
Large vision-and-language models (LVLMs) typically treat visual and textual embeddings as homogeneous inputs to a large language model (LLM).
no code implementations • 4 Feb 2025 • Xueqing Deng, Qihang Yu, Ali Athar, Chenglin Yang, Linjie Yang, Xiaojie Jin, Xiaohui Shen, Liang-Chieh Chen
This dataset sets a new benchmark for evaluating models on joint panoptic segmentation and grounded captioning tasks, addressing the need for high-quality, detailed image-text annotations in multi-modal learning.
1 code implementation • 13 Jan 2025 • Dongwon Kim, Ju He, Qihang Yu, Chenglin Yang, Xiaohui Shen, Suha Kwak, Liang-Chieh Chen
Building on this, we introduce a family of text-to-image Masked Generative Models (MaskGen), trained exclusively on open data while achieving comparable performance to models trained on private data.
no code implementations • 24 Dec 2024 • Chenglin Yang, Celong Liu, Xueqing Deng, Dongwon Kim, Xing Mei, Xiaohui Shen, Liang-Chieh Chen
We present 1. 58-bit FLUX, the first successful approach to quantizing the state-of-the-art text-to-image generation model, FLUX. 1-dev, using 1. 58-bit weights (i. e., values in {-1, 0, +1}) while maintaining comparable performance for generating 1024 x 1024 images.
1 code implementation • 19 Dec 2024 • Sucheng Ren, Qihang Yu, Ju He, Xiaohui Shen, Alan Yuille, Liang-Chieh Chen
Recently, in image generation, VAR proposes scale-wise autoregressive modeling, which extends the next token prediction to the next scale prediction, preserving the 2D structure of images.
Ranked #13 on
Image Generation
on ImageNet 256x256
1 code implementation • 1 Nov 2024 • Qihang Yu, Ju He, Xueqing Deng, Xiaohui Shen, Liang-Chieh Chen
This paper presents Randomized AutoRegressive modeling (RAR) for visual generation, which sets a new state-of-the-art performance on the image generation task while maintaining full compatibility with language modeling frameworks.
Ranked #4 on
Image Generation
on ImageNet 256x256
1 code implementation • 24 Sep 2024 • Mark Weber, Lijun Yu, Qihang Yu, Xueqing Deng, Xiaohui Shen, Daniel Cremers, Liang-Chieh Chen
Masked transformer models for class-conditional image generation have become a compelling alternative to diffusion models.
Ranked #6 on
Image Generation
on ImageNet 256x256
1 code implementation • 13 Jun 2024 • Qihao Liu, Zhanpeng Zeng, Ju He, Qihang Yu, Xiaohui Shen, Liang-Chieh Chen
This paper presents innovative enhancements to diffusion models by integrating a novel multi-resolution network and time-dependent layer normalization.
Ranked #12 on
Image Generation
on ImageNet 256x256
1 code implementation • 11 Jun 2024 • Qihang Yu, Mark Weber, Xueqing Deng, Xiaohui Shen, Daniel Cremers, Liang-Chieh Chen
At ImageNet 512 x 512 benchmark, TiTok not only outperforms state-of-the-art diffusion model DiT-XL/2 (gFID 2. 74 vs. 3. 04), but also reduces the image tokens by 64x, leading to 410x faster generation process.
Ranked #8 on
Image Reconstruction
on ImageNet
no code implementations • 4 Jun 2024 • Inkyu Shin, Qihang Yu, Xiaohui Shen, In So Kweon, Kuk-Jin Yoon, Liang-Chieh Chen
In the second stage, we leverage the reconstruction ability developed in the first stage to impose the temporal constraints on the video diffusion model.
no code implementations • CVPR 2024 • Xueqing Deng, Qihang Yu, Peng Wang, Xiaohui Shen, Liang-Chieh Chen
By enhancing the annotation quality and expanding the dataset to encompass 383K images with more than 5. 18M panoptic masks, we introduce COCONut, the COCO Next Universal segmenTation dataset.
2 code implementations • CVPR 2024 • Jieneng Chen, Qihang Yu, Xiaohui Shen, Alan Yuille, Liang-Chieh Chen
To this end, we introduce ViTamin, a new vision models tailored for VLMs.
2 code implementations • 30 Nov 2023 • Ju He, Qihang Yu, Inkyu Shin, Xueqing Deng, Alan Yuille, Xiaohui Shen, Liang-Chieh Chen
In this work, we present Axial-VS, a general and simple framework that enhances video segmenters by tracking objects along axial trajectories.
Ranked #2 on
Video Panoptic Segmentation
on VIPSeg
1 code implementation • 14 Nov 2023 • Qihang Yu, Xiaohui Shen, Liang-Chieh Chen
Localizing and recognizing objects in the open-ended physical world poses a long-standing challenge within the domain of machine perception.
1 code implementation • NeurIPS 2023 • Qihang Yu, Ju He, Xueqing Deng, Xiaohui Shen, Liang-Chieh Chen
The proposed FC-CLIP, benefits from the following observations: the frozen CLIP backbone maintains the ability of open-vocabulary classification and can also serve as a strong mask generator, and the convolutional CLIP generalizes well to a larger input resolution than the one used during contrastive image-text pretraining.
Ranked #1 on
Open Vocabulary Semantic Segmentation
on Cityscapes
Open Vocabulary Panoptic Segmentation
Open Vocabulary Semantic Segmentation
+2
no code implementations • 6 Apr 2023 • Sijie Zhu, Linjie Yang, Chen Chen, Mubarak Shah, Xiaohui Shen, Heng Wang
Visual Place Recognition (VPR) estimates the location of query images by matching them with images in a reference database.
1 code implementation • CVPR 2024 • Xiaojie Jin, BoWen Zhang, Weibo Gong, Kai Xu, Xueqing Deng, Peng Wang, Zhao Zhang, Xiaohui Shen, Jiashi Feng
The first is a Temporal Adaptation Module that is incorporated in the video branch to introduce global and local temporal contexts.
1 code implementation • CVPR 2023 • Sijie Zhu, Linjie Yang, Chen Chen, Mubarak Shah, Xiaohui Shen, Heng Wang
Visual Place Recognition (VPR) estimates the location of query images by matching them with images in a reference database.
1 code implementation • 27 Jul 2022 • Zhicheng Huang, Xiaojie Jin, Chengze Lu, Qibin Hou, Ming-Ming Cheng, Dongmei Fu, Xiaohui Shen, Jiashi Feng
The momentum encoder, fed with the full images, enhances the feature discriminability via contrastive learning with its online counterpart.
1 code implementation • CVPR 2022 • Yichun Shi, Xiao Yang, Yangyue Wan, Xiaohui Shen
When combined with editing methods designed for StyleGANs, it can achieve a more fine-grained control to edit synthesized or real images.
no code implementations • 3 Nov 2021 • Yi-Wen Chen, Xiaojie Jin, Xiaohui Shen, Ming-Hsuan Yang
Video salient object detection aims to find the most visually distinctive objects in a video.
no code implementations • arXiv:2112.02236v2 [cs.CV] 7 Dec 2021 2021 • Researchers at ByteDance Inc, Yichun Shi, Xiao Yang, Yangyue Wan, Xiaohui Shen
SemanticStyleGAN presents a method where a generator is trained to model local semantic parts separately and synthesizes images in a compositional way.
2 code implementations • 12 Apr 2021 • Xiaoyu Xiang, Ding Liu, Xiao Yang, Yiheng Zhu, Xiaohui Shen, Jan P. Allebach
In this paper, we explore open-domain sketch-to-photo translation, which aims to synthesize a realistic photo from a freehand sketch with its class label, even if the sketches of that class are missing in the training data.
Ranked #1 on
Sketch-to-Image Translation
on Scribble
no code implementations • 30 Mar 2021 • Yifan Wang, Linjie Luo, Xiaohui Shen, Xing Mei
Recently, significant progress has been made in single-view depth estimation thanks to increasingly large and diverse depth datasets.
no code implementations • ICCV 2021 • Yujun Cai, Yiwei Wang, Yiheng Zhu, Tat-Jen Cham, Jianfei Cai, Junsong Yuan, Jun Liu, Chuanxia Zheng, Sijie Yan, Henghui Ding, Xiaohui Shen, Ding Liu, Nadia Magnenat Thalmann
Notably, by considering this problem as a conditional generation process, we estimate a parametric distribution of the missing regions based on the input conditions, from which to sample and synthesize the full motion series.
no code implementations • 6 May 2020 • Wei Xiong, Ding Liu, Xiaohui Shen, Chen Fang, Jiebo Luo
In this paper, we tackle the problem of enhancing real-world low-light images with significant noise in an unsupervised fashion.
no code implementations • 7 Apr 2020 • Jian Ren, Menglei Chai, Sergey Tulyakov, Chen Fang, Xiaohui Shen, Jianchao Yang
In this paper, we tackle the problem of human motion transfer, where we synthesize novel motion video for a target person that imitates the movement from a reference video.
1 code implementation • 24 Feb 2020 • Tianlang Chen, Chen Fang, Xiaohui Shen, Yiheng Zhu, Zhili Chen, Jiebo Luo
In this work, we propose a new solution to 3D human pose estimation in videos.
Ranked #13 on
Monocular 3D Human Pose Estimation
on Human3.6M
8 code implementations • 17 Jun 2019 • Yifan Jiang, Xinyu Gong, Ding Liu, Yu Cheng, Chen Fang, Xiaohui Shen, Jianchao Yang, Pan Zhou, Zhangyang Wang
Deep learning-based methods have achieved remarkable success in image restoration and enhancement, but are they still competitive when there is a lack of paired training data?
no code implementations • CVPR 2020 • Haoye Dong, Xiaodan Liang, Yixuan Zhang, Xujie Zhang, Zhenyu Xie, Bowen Wu, Ziqi Zhang, Xiaohui Shen, Jian Yin
Interactive fashion image manipulation, which enables users to edit images with sketches and color strokes, is an interesting research problem with great application value.
1 code implementation • CVPR 2019 • Ke Gong, Yiming Gao, Xiaodan Liang, Xiaohui Shen, Meng Wang, Liang Lin
By distilling universal semantic graph representation to each specific task, Graphonomy is able to predict all levels of parsing labels in one system without piling up the complexity.
1 code implementation • ECCV 2020 • Yingwei Li, Song Bai, Cihang Xie, Zhenyu Liao, Xiaohui Shen, Alan L. Yuille
We observe the property of regional homogeneity in adversarial perturbations and suggest that the defenses are less robust to regionally homogeneous perturbations.
no code implementations • NeurIPS 2018 • Zijun Wei, Boyu Wang, Minh Hoai Nguyen, Jianming Zhang, Zhe Lin, Xiaohui Shen, Radomir Mech, Dimitris Samaras
Detecting segments of interest from an input sequence is a challenging problem which often requires not only good knowledge of individual target segments, but also contextual understanding of the entire input sequence and the relationships between the target segments.
no code implementations • 18 Oct 2018 • Lijun Wang, Xiaohui Shen, Jianming Zhang, Oliver Wang, Zhe Lin, Chih-Yao Hsieh, Sarah Kong, Huchuan Lu
To achieve this, we propose a novel neural network model comprised of a depth prediction module, a lens blur module, and a guided upsampling module.
no code implementations • ECCV 2018 • Hengshuang Zhao, Xiaohui Shen, Zhe Lin, Kalyan Sunkavalli, Brian Price, Jiaya Jia
We present a new image search technique that, given a background image, returns compatible foreground objects for image compositing tasks.
1 code implementation • ECCV 2018 • Wei-Chih Hung, Jianming Zhang, Xiaohui Shen, Zhe Lin, Joon-Young Lee, Ming-Hsuan Yang
Specifically, given a foreground image and a background image, our proposed method automatically generates a set of blending photos with scores that indicate the aesthetics quality with the proposed quality network and policy network.
no code implementations • ECCV 2018 • Yufei Wang, Zhe Lin, Xiaohui Shen, Jianming Zhang, Scott Cohen
Then, we refine and extend the embedding network to predict an attention map, using a curated dataset with bounding box annotations on 750 concepts.
30 code implementations • ICCV 2019 • Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas Huang
We present a generative image inpainting system to complete images with free-form mask and guidance.
Ranked #3 on
Image Inpainting
on Places2 val
no code implementations • CVPR 2018 • Shanghang Zhang, Xiaohui Shen, Zhe Lin, RadomÃr MÄch, João P. Costeira, José M. F. Moura
In this paper, we propose a unified framework to estimate a spatially-varying blur map and understand its desirability in terms of image quality at the same time.
no code implementations • CVPR 2018 • Zijun Wei, Jianming Zhang, Xiaohui Shen, Zhe Lin, RadomÃr Mech, Minh Hoai, Dimitris Samaras
Finding views with good photo composition is a challenging task for machine learning methods.
1 code implementation • ICCV 2019 • Bangjie Yin, Luan Tran, Haoxiang Li, Xiaohui Shen, Xiaoming Liu
Deep CNNs have been pushing the frontier of visual recognition over past years.
3 code implementations • 5 Apr 2018 • Xiaodan Liang, Ke Gong, Xiaohui Shen, Liang Lin
To further explore and take advantage of the semantic correlation of these two tasks, we propose a novel joint human parsing and pose estimation network to explore efficient context modeling, which can simultaneously predict parsing and pose with extremely high quality.
Ranked #10 on
Semantic Segmentation
on LIP val
28 code implementations • CVPR 2018 • Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang
Motivated by these observations, we propose a new deep generative model-based approach which can not only synthesize novel image structures but also explicitly utilize surrounding image features as references during network training to make better predictions.
1 code implementation • CVPR 2018 • Licheng Yu, Zhe Lin, Xiaohui Shen, Jimei Yang, Xin Lu, Mohit Bansal, Tamara L. Berg
In this paper, we address referring expression comprehension: localizing an image region described by a natural language expression.
Generalized Referring Expression Segmentation
Referring Expression
+1
no code implementations • NeurIPS 2017 • Xiaojie Jin, Huaxin Xiao, Xiaohui Shen, Jimei Yang, Zhe Lin, Yunpeng Chen, Zequn Jie, Jiashi Feng, Shuicheng Yan
The ability of predicting the future is important for intelligent systems, e. g. autonomous vehicles and robots to plan early and make decisions accordingly.
1 code implementation • ICCV 2017 • Wei-Chih Hung, Yi-Hsuan Tsai, Xiaohui Shen, Zhe Lin, Kalyan Sunkavalli, Xin Lu, Ming-Hsuan Yang
We present a scene parsing method that utilizes global context information based on both the parametric and non- parametric models.
no code implementations • 4 Oct 2017 • Xiaodan Liang, Yunchao Wei, Liang Lin, Yunpeng Chen, Xiaohui Shen, Jianchao Yang, Shuicheng Yan
An intuition on human segmentation is that when a human is moving in a video, the video-context (e. g., appearance and motion clues) may potentially infer reasonable mask information for the whole human body.
no code implementations • ICCV 2017 • Jian Ren, Xiaohui Shen, Zhe Lin, Radomir Mech, David J. Foran
To accommodate our study, we first collect two distinct datasets, a large image dataset from Flickr and annotated by Amazon Mechanical Turk, and a small dataset of real personal albums rated by owners.
no code implementations • ICCV 2017 • Xin Li, Zequn Jie, Wei Wang, Changsong Liu, Jimei Yang, Xiaohui Shen, Zhe Lin, Qiang Chen, Shuicheng Yan, Jiashi Feng
Thus, they suffer from heterogeneous object scales caused by perspective projection of cameras on actual scenes and inevitably encounter parsing failures on distant objects as well as other boundary and recognition errors.
1 code implementation • 19 Jul 2017 • Yufei Wang, Zhe Lin, Xiaohui Shen, Radomir Mech, Gavin Miller, Garrison W. Cottrell
Automatic organization of personal photos is a problem with many real world ap- plications, and can be divided into two main tasks: recognizing the event type of the photo collection, and selecting interesting images from the collection.
no code implementations • CVPR 2017 • Yufei Wang, Zhe Lin, Xiaohui Shen, Scott Cohen, Garrison W. Cottrell
Furthermore, our algorithm can generate descriptions with varied length, benefiting from the separate control of the skeleton and attributes.
no code implementations • 1 Apr 2017 • Marc-André Gardner, Kalyan Sunkavalli, Ersin Yumer, Xiaohui Shen, Emiliano Gambaretto, Christian Gagné, Jean-François Lalonde
We propose an automatic method to infer high dynamic range illumination from a single, limited field-of-view, low dynamic range photograph of an indoor scene.
1 code implementation • ICCV 2017 • Chenxi Liu, Zhe Lin, Xiaohui Shen, Jimei Yang, Xin Lu, Alan Yuille
In this paper we are interested in the problem of image segmentation given natural language descriptions, i. e. referring expressions.
1 code implementation • CVPR 2017 • Ke Gong, Xiaodan Liang, Dongyu Zhang, Xiaohui Shen, Liang Lin
Human parsing has recently attracted a lot of research interests due to its huge application potentials.
Ranked #13 on
Semantic Segmentation
on LIP val
no code implementations • CVPR 2017 • Xiaodan Liang, Liang Lin, Xiaohui Shen, Jiashi Feng, Shuicheng Yan, Eric P. Xing
Instead of learning LSTM models over the pre-fixed structures, we propose to further learn the intermediate interpretable multi-level graph structures in a progressive and stochastic way from data during the LSTM network optimization.
2 code implementations • CVPR 2017 • Yi-Hsuan Tsai, Xiaohui Shen, Zhe Lin, Kalyan Sunkavalli, Xin Lu, Ming-Hsuan Yang
Compositing is one of the most common operations in photo editing.
1 code implementation • 6 Dec 2016 • Ning Yu, Xiaohui Shen, Zhe Lin, Radomir Mech, Connelly Barnes
Our new dataset enables us to formulate the problem as a multi-task learning problem and train a multi-column deep convolutional neural network (CNN) to simultaneously predict the severity of all the defects.
no code implementations • NeurIPS 2016 • Peng Wang, Xiaohui Shen, Bryan Russell, Scott Cohen, Brian Price, Alan L. Yuille
This paper introduces an approach to regularize 2. 5D surface normal and depth predictions at each pixel given a single input image.
no code implementations • ICCV 2017 • Xiaojie Jin, Xin Li, Huaxin Xiao, Xiaohui Shen, Zhe Lin, Jimei Yang, Yunpeng Chen, Jian Dong, Luoqi Liu, Zequn Jie, Jiashi Feng, Shuicheng Yan
In this way, the network can effectively learn to capture video dynamics and temporal context, which are critical clues for video scene parsing, without requiring extra manual annotations.
3 code implementations • 1 Aug 2016 • Jianming Zhang, Zhe Lin, Jonathan Brandt, Xiaohui Shen, Stan Sclaroff
We aim to model the top-down attention of a Convolutional Neural Network (CNN) classifier for generating task-specific attention maps.
no code implementations • CVPR 2015 • Jianming Zhang, Shugao Ma, Mehrnoosh Sameki, Stan Sclaroff, Margrit Betke, Zhe Lin, Xiaohui Shen, Brian Price, Radomir Mech
We study the problem of Salient Object Subitizing, i. e. predicting the existence and the number of salient objects in an image using holistic cues.
1 code implementation • 8 Jun 2016 • Paul Hongsuck Seo, Zhe Lin, Scott Cohen, Xiaohui Shen, Bohyung Han
We propose a novel attention model that can accurately attends to target objects of various scales and shapes in images.
2 code implementations • 6 Jun 2016 • Shu Kong, Xiaohui Shen, Zhe Lin, Radomir Mech, Charless Fowlkes
In this work, we propose to learn a deep convolutional neural network to rank photo aesthetics in which the relative ranking of photo aesthetics are directly modeled in the loss function.
Ranked #7 on
Aesthetics Quality Assessment
on AVA
1 code implementation • CVPR 2016 • Jianming Zhang, Stan Sclaroff, Zhe Lin, Xiaohui Shen, Brian Price, Radomir Mech
Our system leverages a Convolutional-Neural-Network model to generate location proposals of salient objects.
no code implementations • CVPR 2016 • Yufei Wang, Zhe Lin, Xiaohui Shen, Radomir Mech, Gavin Miller, Garrison W. Cottrell
In this paper, we show that the selection of important images is consistent among different viewers, and that this selection process is related to the event type of the album.
no code implementations • CVPR 2016 • Jae-Pil Heo, Zhe Lin, Xiaohui Shen, Jonathan Brandt, Sung-Eui Yoon
We have tested the proposed method with the inverted index and multi-index on a diverse set of benchmarks including up to one billion data points with varying dimensions, and found that our method robustly improves the accuracy of shortlists (up to 127% relatively higher) over the state-of-the-art techniques with a comparable or even faster computational cost.
no code implementations • CVPR 2016 • Haoxiang Li, Jonathan Brandt, Zhe Lin, Xiaohui Shen, Gang Hua
Our new framework enables efficient use of these complementary multi-level contextual cues to improve overall recognition rates on the photo album person recognition task, as demonstrated through state-of-the-art results on a challenging public dataset.
no code implementations • 23 Mar 2016 • Xiaodan Liang, Xiaohui Shen, Jiashi Feng, Liang Lin, Shuicheng Yan
By taking the semantic object parsing task as an exemplar application scenario, we propose the Graph Long Short-Term Memory (Graph LSTM) network, which is the generalization of LSTM from sequential data or multi-dimensional data to general graph-structured data.
no code implementations • ICCV 2015 • Jianming Zhang, Stan Sclaroff, Zhe Lin, Xiaohui Shen, Brian Price, Radomir Mech
Powered by this fast MBD transform algorithm, the proposed salient object detection method runs at 80 FPS, and significantly outperforms previous methods with similar speed on four large benchmark datasets, and achieves comparable or better performance than state-of-the-art methods.
Ranked #6 on
Video Salient Object Detection
on VOS-T
(using extra training data)
no code implementations • ICCV 2015 • Xiaodan Liang, Chunyan Xu, Xiaohui Shen, Jianchao Yang, Si Liu, Jinhui Tang, Liang Lin, Shuicheng Yan
In this work, we address the human parsing task with a novel Contextualized Convolutional Neural Network (Co-CNN) architecture, which well integrates the cross-layer context, global image-level context, within-super-pixel context and cross-super-pixel neighborhood context into a unified network.
no code implementations • ICCV 2015 • Xin Lu, Zhe Lin, Xiaohui Shen, Radomir Mech, James Z. Wang
We propose a deep multi-patch aggregation network training approach, which allows us to train models using multiple patches generated from one image.
Ranked #8 on
Aesthetics Quality Assessment
on AVA
no code implementations • CVPR 2016 • Xiaodan Liang, Yunchao Wei, Xiaohui Shen, Zequn Jie, Jiashi Feng, Liang Lin, Shuicheng Yan
By being reversible, the proposal refinement sub-network adaptively determines an optimal number of refinement iterations required for each proposal during both training and testing.
no code implementations • CVPR 2016 • Xiaodan Liang, Xiaohui Shen, Donglai Xiang, Jiashi Feng, Liang Lin, Shuicheng Yan
The long chains of sequential computation by stacked LG-LSTM layers also enable each pixel to sense a much larger region for inference benefiting from the memorization of previous dependencies in all positions along all dimensions.
no code implementations • CVPR 2016 • Joon-Young Lee, Kalyan Sunkavalli, Zhe Lin, Xiaohui Shen, In So Kweon
We introduce a new technique that automatically generates diverse, visually compelling stylizations for a photograph in an unsupervised manner.
1 code implementation • 10 Sep 2015 • Yunchao Wei, Xiaodan Liang, Yunpeng Chen, Xiaohui Shen, Ming-Ming Cheng, Jiashi Feng, Yao Zhao, Shuicheng Yan
Then, a better network called Enhanced-DCNN is learned with supervision from the predicted segmentation masks of simple images based on the Initial-DCNN as well as the image-level annotations.
no code implementations • 9 Sep 2015 • Xiaodan Liang, Yunchao Wei, Xiaohui Shen, Jianchao Yang, Liang Lin, Shuicheng Yan
Instance-level object segmentation is an important yet under-explored task.
1 code implementation • 17 Aug 2015 • Hongyang Li, Huchuan Lu, Zhe Lin, Xiaohui Shen, Brian Price
In this paper, we propose a novel deep neural network framework embedded with low-level features (LCNN) for salient object detection in complex images.
no code implementations • CVPR 2015 • Peng Wang, Xiaohui Shen, Zhe Lin, Scott Cohen, Brian Price, Alan L. Yuille
By allowing for interactions between the depth and semantic information, the joint network provides more accurate depth prediction than a state-of-the-art CNN trained solely for depth prediction [5].
no code implementations • CVPR 2015 • Haoxiang Li, Zhe Lin, Xiaohui Shen, Jonathan Brandt, Gang Hua
To improve localization effectiveness, and reduce the number of candidates at later stages, we introduce a CNN-based calibration stage after each of the detection stages in the cascade.
2 code implementations • 27 May 2015 • Hongyang Li, Huchuan Lu, Zhe Lin, Xiaohui Shen, Brian Price
For most natural images, some boundary superpixels serve as the background labels and the saliency of other superpixels are determined by ranking their similarities to the boundary labels based on an inner propagation scheme.
no code implementations • ICCV 2015 • Peng Wang, Xiaohui Shen, Zhe Lin, Scott Cohen, Brian Price, Alan Yuille
Segmenting semantic objects from images and parsing them into their respective semantic parts are fundamental steps towards detailed object understanding in computer vision.
no code implementations • CVPR 2015 • Si Liu, Xiaodan Liang, Luoqi Liu, Xiaohui Shen, Jianchao Yang, Changsheng Xu, Liang Lin, Xiaochun Cao, Shuicheng Yan
Under the classic K Nearest Neighbor (KNN)-based nonparametric framework, the parametric Matching Convolutional Neural Network (M-CNN) is proposed to predict the matching confidence and displacements of the best matched region in the testing image for a particular semantic region in one KNN image.
1 code implementation • 9 Mar 2015 • Xiaodan Liang, Si Liu, Xiaohui Shen, Jianchao Yang, Luoqi Liu, Jian Dong, Liang Lin, Shuicheng Yan
The first CNN network is with max-pooling, and designed to predict the template coefficients for each label mask, while the second CNN network is without max-pooling to preserve sensitivity to label mask position and accurately predict the active shape parameters.
no code implementations • CVPR 2014 • Haoxiang Li, Zhe Lin, Jonathan Brandt, Xiaohui Shen, Gang Hua
Despite the fact that face detection has been studied intensively over the past several decades, the problem is still not completely solved.
no code implementations • CVPR 2014 • Jian Dong, Qiang Chen, Xiaohui Shen, Jianchao Yang, Shuicheng Yan
We study the problem of human body configuration analysis, more specifically, human parsing and human pose estimation.
no code implementations • CVPR 2013 • Xiaohui Shen, Zhe Lin, Jonathan Brandt, Ying Wu
In order to overcome these challenges, we present a novel and robust exemplarbased face detector that integrates image retrieval and discriminative learning.