1 code implementation • 2 Oct 2024 • Yang Cao, Yuanliang Jv, Dan Xu
Neural Radiance Fields (NeRF) are widely used for novel-view synthesis and have been adapted for 3D Object Detection (3DOD), offering a promising approach to 3DOD through view-synthesis representation.
no code implementations • 16 Sep 2024 • Fa-Ting Hong, Yunfei Liu, Yu Li, Changyin Zhou, Fei Yu, Dan Xu
Audio-driven talking head synthesis strives to generate lifelike video portraits from provided audio.
no code implementations • 16 Jul 2024 • Jaehyeok Kim, Dongyoon Wee, Dan Xu
To target this problem, we propose a novel approach of modeling non-rigid motions as radiance residual fields to benefit from more direct color supervision in the rendering and utilize the rigid radiance fields as a prior to reduce the complexity of the learning process.
no code implementations • 13 Jul 2024 • Fa-Ting Hong, Dan Xu
To this end, we introduce a scale transformation module that can automatically adjust the scale of the driving image to fit that of the source image, by using the information of scale difference maintained in the detected keypoints of the source image and the driving frame.
no code implementations • 18 Jun 2024 • Hang Zhou, Dan Xu, Yiding Ji
Reinforcement learning via sequence modeling has shown remarkable promise in autonomous systems, harnessing the power of offline datasets to make informed decisions in simulated environments.
no code implementations • 17 Jun 2024 • YuAn Wang, Zhao Wang, Junhao Gong, Di Huang, Tong He, Wanli Ouyang, Jile Jiao, Xuetao Feng, Qi Dou, Shixiang Tang, Dan Xu
In this paper, we introduce a novel path to $\textit{general}$ human motion generation by focusing on 2D space.
no code implementations • 4 Jun 2024 • Qi Wang, Ruijie Lu, Xudong Xu, Jingbo Wang, Michael Yu Wang, Bo Dai, Gang Zeng, Dan Xu
In the coarse stage, RoomTex first unwraps the scene mesh to a panoramic depth map and leverages ControlNet to generate a room panorama, which is regarded as the coarse reference to ensure the global texture consistency.
1 code implementation • 2 Jun 2024 • Yang Cao, Yihan Zeng, Hang Xu, Dan Xu
3D-NOD is further extended with an Enrichment strategy that significantly enriches the novel object distribution in the training scenes, and then enhances the model's ability to localize more novel objects.
no code implementations • 29 May 2024 • Hanrong Ye, De-An Huang, Yao Lu, Zhiding Yu, Wei Ping, Andrew Tao, Jan Kautz, Song Han, Dan Xu, Pavlo Molchanov, Hongxu Yin
We introduce X-VILA, an omni-modality model designed to extend the capabilities of large language models (LLMs) by incorporating image, video, and audio modalities.
no code implementations • 27 May 2024 • Zipeng Wang, Dan Xu
To address these challenges, we present Pyramidal 3D Gaussian Splatting (PyGS) with NeRF Initialization.
no code implementations • CVPR 2024 • Yuzhang Shang, Dan Xu, Gaowen Liu, Ramana Rao Kompella, Yan Yan
Moreover, we introduce a knowledge distillation mechanism to correct the direction of information flow in backward propagation.
no code implementations • CVPR 2024 • Shaocong Dong, Lihe Ding, Zhanpeng Huang, Zibin Wang, Tianfan Xue, Dan Xu
Both of them limit customization to the confines of the 2D reference and potentially introduce undesirable artifacts during the 3D lifting process, restricting the scope for direct and versatile 3D modifications.
no code implementations • 21 Apr 2024 • Yuxin Wang, Qianyi Wu, Guofeng Zhang, Dan Xu
This paper tackles the intricate challenge of object removal to update the radiance field using the 3D Gaussian Splatting.
no code implementations • CVPR 2024 • Lewei Yao, Renjie Pi, Jianhua Han, Xiaodan Liang, Hang Xu, Wei zhang, Zhenguo Li, Dan Xu
This is followed by a fine-tuning stage that leverages a small number of high-resolution samples to further enhance detection performance.
Ranked #2 on Object Detection on ODinW Full-Shot 13 Tasks
no code implementations • CVPR 2024 • Yingji Zhong, Lanqing Hong, Zhenguo Li, Dan Xu
While existing works mainly consider ray-level consistency to construct 2D learning regularization based on rendered color, depth, or semantics on image planes, in this paper we propose a novel approach that models 3D spatial field consistency to improve NeRF's performance with sparse inputs.
no code implementations • CVPR 2024 • Hanrong Ye, Dan Xu
It designs a joint diffusion and denoising paradigm to model a potential noisy distribution in the task prediction or feature maps and generate rectified outputs for different tasks.
1 code implementation • 10 Mar 2024 • You Zhang, Jin Wang, Liang-Chih Yu, Dan Xu, Xuejie Zhang
Effectively and efficiently adapting a pre-trained language model (PLM) for human-centered text understanding (HCTU) is challenging since user tokens are million-level in most personalized applications and do not have concrete explicit semantics.
no code implementations • 2 Mar 2024 • Lian Xu, Mohammed Bennamoun, Farid Boussaid, Wanli Ouyang, Ferdous Sohel, Dan Xu
We propose AuxSegNet+, a weakly supervised auxiliary learning framework to explore the rich information from these saliency maps and the significant inter-task correlation between saliency detection and semantic segmentation.
no code implementations • CVPR 2024 • Lihe Ding, Shaocong Dong, Zhanpeng Huang, Zibin Wang, Yiyuan Zhang, Kaixiong Gong, Dan Xu, Tianfan Xue
Recently, researchers have attempted to improve the genuineness of 3D objects by directly training on 3D datasets, albeit at the cost of low-quality texture generation due to the limited texture diversity in 3D datasets.
no code implementations • CVPR 2024 • Chi Yan, Delin Qu, Dan Xu, Bin Zhao, Zhigang Wang, Dong Wang, Xuelong Li
This strategy is essential to extend 3D Gaussian representation to reconstruct the whole scene rather than synthesize a static object in existing methods.
no code implementations • CVPR 2024 • Delin Qu, Chi Yan, Dong Wang, Jie Yin, Dan Xu, Bin Zhao, Xuelong Li
To address these challenges, we propose EN-SLAM, the first event-RGBD implicit neural SLAM framework, which effectively leverages the high rate and high dynamic range advantages of event data for tracking and mapping.
no code implementations • 6 Nov 2023 • Hanrong Ye, Jason Kuen, Qing Liu, Zhe Lin, Brian Price, Dan Xu
On the highly competitive ADE20K and COCO benchmarks, our data generation method markedly improves the performance of state-of-the-art segmentation models in semantic segmentation, panoptic segmentation, and instance segmentation.
1 code implementation • NeurIPS 2023 • Yang Cao, Yihan Zeng, Hang Xu, Dan Xu
Open-vocabulary 3D Object Detection (OV-3DDet) aims to detect objects from an arbitrary list of categories within a 3D scene, which remains seldom explored in the literature.
1 code implementation • 6 Aug 2023 • Lian Xu, Mohammed Bennamoun, Farid Boussaid, Hamid Laga, Wanli Ouyang, Dan Xu
Building upon the observation that the attended regions of the one-class token in the standard vision transformer can contribute to a class-agnostic localization map, we explore the potential of the transformer model to capture class-specific attention for class-discriminative object localization by learning multiple class tokens.
Object Localization Weakly supervised Semantic Segmentation +1
no code implementations • ICCV 2023 • Yuxin Wang, Wayne Wu, Dan Xu
State-of-the-art methods in this direction typically consider building separate networks for these two tasks (i. e., view synthesis and editing).
1 code implementation • ICCV 2023 • Hanrong Ye, Dan Xu
Furthermore, to establish long-range modeling of the task-specific representations from different layers of TaskExpert, we design a multi-task feature memory that updates at each layer and acts as an additional feature expert for dynamic task-specific feature decoding.
1 code implementation • ICCV 2023 • Fa-Ting Hong, Dan Xu
Talking head video generation aims to animate a human face in a still image with dynamic poses and expressions using motion information derived from a target-driving video, while maintaining the person's identity in the source image.
no code implementations • 16 Jul 2023 • Siwei Yang, Hanrong Ye, Dan Xu
A core objective in design is how to effectively model cross-task interactions to achieve a comprehensive improvement on different tasks based on their inherent complementarity and consistency.
1 code implementation • 8 Jun 2023 • Hanrong Ye, Dan Xu
And then, we design a transformer decoder to establish spatial and cross-task interaction globally, and a novel UP-Transformer block is devised to increase the resolutions of multi-task features gradually and establish cross-task interaction at different scales.
1 code implementation • 10 May 2023 • Fa-Ting Hong, Li Shen, Dan Xu
In this work, firstly, we present a novel self-supervised method for learning dense 3D facial geometry (ie, depth) from face videos, without requiring camera parameters and 3D geometry annotations in training.
no code implementations • CVPR 2023 • Lewei Yao, Jianhua Han, Xiaodan Liang, Dan Xu, Wei zhang, Zhenguo Li, Hang Xu
This paper presents DetCLIPv2, an efficient and scalable training framework that incorporates large-scale image-text pairs to achieve open-vocabulary object detection (OVD).
Ranked #5 on Object Detection on ODinW Full-Shot 13 Tasks
1 code implementation • 3 Apr 2023 • Hanrong Ye, Dan Xu
TaskPrompter introduces a new multi-task benchmark based on Cityscapes-3D dataset, which requires the multi-task model to concurrently generate predictions for monocular 3D vehicle detection, semantic segmentation, and monocular depth estimation.
Ranked #1 on Monocular Depth Estimation on Cityscapes 3D
no code implementations • 10 Mar 2023 • Jaehyeok Kim, Dongyoon Wee, Dan Xu
In this paper, we tackle this problem by proposing a set of learnable identity codes to expand the capability of the framework for multi-identity free-viewpoint rendering, and an effective pose-conditioned code query mechanism to finely model the pose-dependent non-rigid motions.
no code implementations • CVPR 2023 • Lian Xu, Wanli Ouyang, Mohammed Bennamoun, Farid Boussaid, Dan Xu
Weakly supervised dense object localization (WSDOL) relies generally on Class Activation Mapping (CAM), which exploits the correlation between the class weights of the image classifier and the pixel-level features.
1 code implementation • 26 Oct 2022 • Jiebao Zhang, Wenhua Qian, Rencan Nie, Jinde Cao, Dan Xu
We study the attack performance and computation cost of the attack method based on the Hessian with a limited number of perturbation pixels.
no code implementations • 20 Sep 2022 • Lewei Yao, Jianhua Han, Youpeng Wen, Xiaodan Liang, Dan Xu, Wei zhang, Zhenguo Li, Chunjing Xu, Hang Xu
We further design a concept dictionary~(with descriptions) from various online sources and detection datasets to provide prior knowledge for each concept.
1 code implementation • 13 Jul 2022 • Yuzhang Shang, Dan Xu, Bin Duan, Ziliang Zong, Liqiang Nie, Yan Yan
Relying on the premise that the performance of a binary neural network can be largely restored with eliminated quantization error between full-precision weight vectors and their corresponding binary vectors, existing works of network binarization frequently adopt the idea of model robustness to reach the aforementioned objective.
1 code implementation • 12 Jul 2022 • Jiebao Zhang, Wenhua Qian, Rencan Nie, Jinde Cao, Dan Xu
Adversarial training is a simple and effective defense method to improve the robustness of CNNs to adversarial examples.
1 code implementation • 6 Jul 2022 • Yuzhang Shang, Dan Xu, Ziliang Zong, Liqiang Nie, Yan Yan
Neural network binarization accelerates deep models by quantizing their weights and activations into 1-bit.
1 code implementation • 26 Mar 2022 • Guanglei Yang, Enrico Fini, Dan Xu, Paolo Rota, Mingli Ding, Moin Nabi, Xavier Alameda-Pineda, Elisa Ricci
This problem has been widely investigated in the research community and several Incremental Learning (IL) approaches have been proposed in the past years.
no code implementations • 20 Mar 2022 • Ren Kai Tan, Chao Qian, Dan Xu, Wenjing Ye
Most models are trained to work with the design problem similar to that used for data generation and require retraining if the design problem changes.
1 code implementation • 15 Mar 2022 • Hanrong Ye, Dan Xu
Multi-task dense scene understanding is a thriving research domain that requires simultaneous perception and reasoning on a series of correlated tasks with pixel-wise prediction.
Ranked #1 on Boundary Detection on NYU-Depth V2
1 code implementation • CVPR 2022 • Fa-Ting Hong, Longhao Zhang, Li Shen, Dan Xu
In a more dense way, the depth is also utilized to learn 3D-aware cross-modal (i. e. appearance and depth) attention to guide the generation of motion fields for warping source image representations.
1 code implementation • CVPR 2022 • Lian Xu, Wanli Ouyang, Mohammed Bennamoun, Farid Boussaid, Dan Xu
To this end, we propose a Multi-class Token Transformer, termed as MCTformer, which uses multiple class tokens to learn interactions between the class tokens and the patch tokens.
1 code implementation • 1 Feb 2022 • Guanglei Yang, Enrico Fini, Dan Xu, Paolo Rota, Mingli Ding, Hao Tang, Xavier Alameda-Pineda, Elisa Ricci
To fill this gap, in this paper we introduce a novel attentive feature distillation approach to mitigate catastrophic forgetting while accounting for semantic spatial- and channel-level dependencies.
1 code implementation • CVPR 2022 • Zhenxing Mi, Di Chang, Dan Xu
The new formulation makes our method only sample a very small number of depth hypotheses in each step, which is highly memory efficient, and also greatly facilitates quick training convergence.
Ranked #8 on 3D Reconstruction on DTU
no code implementations • 29 Sep 2021 • Yuzhang Shang, Dan Xu, Ziliang Zong, Liqiang Nie, Yan Yan
Neural network binarization accelerates deep models by quantizing their weights and activations into 1-bit.
no code implementations • 6 Aug 2021 • Yongtuo Liu, Dan Xu, Sucheng Ren, Hanjie Wu, Hongmin Cai, Shengfeng He
To this end, we propose to untangle \emph{domain-invariant} crowd and \emph{domain-specific} background from crowd images and design a fine-grained domain adaption method for crowd counting.
no code implementations • 6 Aug 2021 • Yongtuo Liu, Sucheng Ren, Liangyu Chai, Hanjie Wu, Jing Qin, Dan Xu, Shengfeng He
In this way, we can transfer the original spatial labeling redundancy caused by individual similarities to effective supervision signals on the unlabeled regions.
1 code implementation • 29 Jul 2021 • Yinmin Zhang, Xinzhu Ma, Shuai Yi, Jun Hou, Zhihui Wang, Wanli Ouyang, Dan Xu
In this paper, we propose to learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.
Ranked #10 on Monocular 3D Object Detection on KITTI Cars Moderate
2 code implementations • 27 Jul 2021 • Fa-Ting Hong, Jia-Chang Feng, Dan Xu, Ying Shan, Wei-Shi Zheng
In this work, we argue that the features extracted from the pretrained extractor, e. g., I3D, are not the WS-TALtask-specific features, thus the feature re-calibration is needed for reducing the task-irrelevant information redundancy.
Weakly Supervised Action Localization Weakly-supervised Temporal Action Localization +1
1 code implementation • Proceedings of the 29th ACM International Conference on Multimedia 2021 • Fa-Ting Hong, Jia-Chang Feng, Dan Xu, Ying Shan, Wei-Shi Zheng
In this work, we argue that the features extracted from the pretrained extractor, e. g., I3D, are not the WS-TALtask-specific features, thus the feature re-calibration is needed for reducing the task-irrelevant information redundancy.
Weakly-supervised Temporal Action Localization Weakly Supervised Temporal Action Localization
1 code implementation • ICCV 2021 • Lian Xu, Wanli Ouyang, Mohammed Bennamoun, Farid Boussaid, Ferdous Sohel, Dan Xu
Motivated by the significant inter-task correlation, we propose a novel weakly supervised multi-task framework termed as AuxSegNet, to leverage saliency detection and multi-label image classification as auxiliary tasks to improve the primary task of semantic segmentation using only image-level ground-truth labels.
1 code implementation • ICCV 2021 • Jiapeng Tang, Jiabao Lei, Dan Xu, Feiying Ma, Kui Jia, Lei Zhang
To this end, we propose to learn implicit surface reconstruction by sign-agnostic optimization of convolutional occupancy networks, to simultaneously achieve advanced scalability to large-scale scenes, generality to novel shapes, and applicability to raw scans in a unified framework.
no code implementations • 5 May 2021 • Dan Xu, Andrea Vedaldi, Joao F. Henriques
We build on the idea of view synthesis, which uses classical camera geometry to re-render a source image from a different point-of-view, specified by a predicted relative pose and depth map.
1 code implementation • CVPR 2021 • Jiapeng Tang, Dan Xu, Kui Jia, Lei Zhang
This paper focuses on the task of 4D shape reconstruction from a sequence of point clouds.
1 code implementation • CVPR 2021 • Xinzhu Ma, Yinmin Zhang, Dan Xu, Dongzhan Zhou, Shuai Yi, Haojie Li, Wanli Ouyang
Estimating 3D bounding boxes from monocular images is an essential component in autonomous driving, while accurate 3D object detection from this kind of data is very challenging.
3D Object Detection From Monocular Images Autonomous Driving +3
1 code implementation • 5 Mar 2021 • Guanglei Yang, Paolo Rota, Xavier Alameda-Pineda, Dan Xu, Mingli Ding, Elisa Ricci
Specifically, we integrate the estimation and the interaction of the attentions within a probabilistic representation learning framework, leading to Variational STructured Attention networks (VISTA-Net).
1 code implementation • 10 Feb 2021 • Kai Chen, Guang Chen, Dan Xu, Lijun Zhang, Yuyao Huang, Alois Knoll
Although Transformer has made breakthrough success in widespread domains especially in Natural Language Processing (NLP), applying it to time series forecasting is still a great challenge.
no code implementations • 8 Jan 2021 • Dan Xu, Xavier Alameda-Pineda, Wanli Ouyang, Elisa Ricci, Xiaogang Wang, Nicu Sebe
In contrast to previous works directly considering multi-scale feature maps obtained from the inner layers of a primary CNN architecture, and simply fusing the features with weighted averaging or concatenation, we propose a probabilistic graph attention network structure based on a novel Attention-Gated Conditional Random Fields (AG-CRFs) model for learning and fusing multi-scale representations in a principled manner.
1 code implementation • 1 Jan 2021 • Guanglei Yang, Paolo Rota, Xavier Alameda-Pineda, Dan Xu, Mingli Ding, Elisa Ricci
State-of-the-art performances in dense pixel-wise prediction tasks are obtained with specifically designed convolutional networks.
no code implementations • 16 Dec 2020 • Zhichao Wu, Lei Guo, Hao Zhang, Dan Xu
Unsupervised image segmentation aims at assigning the pixels with similar feature into a same cluster without annotation, which is an important task in computer vision.
no code implementations • 16 Nov 2020 • Sanqing Qu, Guang Chen, Dan Xu, Jinhu Dong, Fan Lu, Alois Knoll
At each time step, this sampling strategy first estimates current action progression and then decide what temporal ranges should be used to aggregate the optimal supplementary features.
no code implementations • 11 May 2020 • Geng Zhan, Dan Xu, Guo Lu, Wei Wu, Chunhua Shen, Wanli Ouyang
Existing anchor-based and anchor-free object detectors in multi-stage or one-stage pipelines have achieved very promising detection performance.
2 code implementations • 31 Mar 2020 • Hao Tang, Xiaojuan Qi, Guolei Sun, Dan Xu, Nicu Sebe, Radu Timofte, Luc van Gool
We propose a novel ECGAN for the challenging semantic image synthesis task.
2 code implementations • CVPR 2020 • Hao Tang, Dan Xu, Yan Yan, Philip H. S. Torr, Nicu Sebe
To tackle this issue, in this work we consider learning the scene generation in a local context, and correspondingly design a local class-specific generative network with semantic maps as a guidance, which separately constructs and learns sub-generators concentrating on the generation of different classes, and is able to provide more scene details.
2 code implementations • 27 Nov 2019 • Hao Tang, Hong Liu, Dan Xu, Philip H. S. Torr, Nicu Sebe
State-of-the-art methods in image-to-image translation are capable of learning a mapping from a source domain to a target domain with unpaired image data.
Ranked #1 on Facial Expression Translation on CelebA
1 code implementation • 17 Sep 2019 • Andrea Pilzer, Stéphane Lathuilière, Dan Xu, Mihai Marian Puscas, Elisa Ricci, Nicu Sebe
Extensive experiments on the publicly available datasets KITTI, Cityscapes and ApolloScape demonstrate the effectiveness of the proposed model which is competitive with other unsupervised deep learning methods for depth prediction.
1 code implementation • ICCV 2019 • Yingyue Xu, Dan Xu, Xiaopeng Hong, Wanli Ouyang, Rongrong Ji, Min Xu, Guoying Zhao
We formulate the CRF graphical model that involves message-passing of feature-feature, feature-prediction, and prediction-prediction, from the coarse scale to the finer scale, to update the features and the corresponding predictions.
no code implementations • 6 Sep 2019 • Dan Xu, Weidi Xie, Andrew Zisserman
In this paper we propose a geometry-aware model for video object detection.
1 code implementation • CVPR 2020 • Li Zhang, Dan Xu, Anurag Arnab, Philip H. S. Torr
We propose a dynamic graph message passing network, that significantly reduces the computational complexity compared to related works modelling a fully-connected graph.
no code implementations • 15 Aug 2019 • Mihai Marian Puscas, Dan Xu, Andrea Pilzer, Nicu Sebe
Inspired by the success of adversarial learning, we propose a new end-to-end unsupervised deep learning framework for monocular depth estimation consisting of two Generative Adversarial Networks (GAN), deeply coupled with a structured Conditional Random Field (CRF) model.
Monocular Depth Estimation Unsupervised Monocular Depth Estimation
1 code implementation • 2 Aug 2019 • Hao Tang, Dan Xu, Gaowen Liu, Wei Wang, Nicu Sebe, Yan Yan
In this work, we propose a novel Cycle In Cycle Generative Adversarial Network (C$^2$GAN) for the task of keypoint-guided image generation.
no code implementations • 14 May 2019 • Hao Tang, Wei Wang, Songsong Wu, Xinya Chen, Dan Xu, Nicu Sebe, Yan Yan
In this paper, we focus on the facial expression translation task and propose a novel Expression Conditional GAN (ECGAN) which can learn the mapping from one image domain to another one based on an additional expression attribute.
3 code implementations • CVPR 2019 • Hao Tang, Dan Xu, Nicu Sebe, Yanzhi Wang, Jason J. Corso, Yan Yan
In this paper, we propose a novel approach named Multi-Channel Attention SelectionGAN (SelectionGAN) that makes it possible to generate images of natural scenes in arbitrary viewpoints, based on an image of the scene and a novel semantic map.
Bird View Synthesis Cross-View Image-to-Image Translation +1
8 code implementations • 28 Mar 2019 • Hao Tang, Dan Xu, Nicu Sebe, Yan Yan
To handle the limitation, in this paper we propose a novel Attention-Guided Generative Adversarial Network (AGGAN), which can detect the most discriminative semantic object and minimize changes of unwanted part for semantic manipulation problems without using extra data and models.
Ranked #1 on Facial Expression Translation on CelebA
1 code implementation • 28 Jan 2019 • Hao Tang, Xinya Chen, Wei Wang, Dan Xu, Jason J. Corso, Nicu Sebe, Yan Yan
To this end, we propose a novel Attribute-Guided Sketch Generative Adversarial Network (ASGAN) which is an end-to-end framework and contains two pairs of generators and discriminators, one of which is used to generate faces with attributes while the other one is employed for image-to-sketch translation.
1 code implementation • 14 Jan 2019 • Hao Tang, Dan Xu, Wei Wang, Yan Yan, Nicu Sebe
State-of-the-art methods for image-to-image translation with Generative Adversarial Networks (GANs) can learn a mapping from one domain to another domain using unpaired image data.
Generative Adversarial Network Image-to-Image Translation +1
1 code implementation • 11 Sep 2018 • Hao Tang, Heng Wei, Wei Xiao, Wei Wang, Dan Xu, Yan Yan, Nicu Sebe
In this paper, we propose a novel Deep Micro-Dictionary Learning and Coding Network (DDLCN).
1 code implementation • 14 Aug 2018 • Hao Tang, Wei Wang, Dan Xu, Yan Yan, Nicu Sebe
Therefore, this task requires a high-level understanding of the mapping between the input source gesture and the output target gesture.
Ranked #1 on Gesture-to-Gesture Translation on NTU Hand Digit
2 code implementations • 28 Jul 2018 • Andrea Pilzer, Dan Xu, Mihai Marian Puscas, Elisa Ricci, Nicu Sebe
The proposed architecture consists of two generative sub-networks jointly trained with adversarial learning for reconstructing the disparity map and organized in a cycle such as to provide mutual constraints and supervision to each other.
no code implementations • CVPR 2018 • Dapeng Chen, Dan Xu, Hongsheng Li, Nicu Sebe, Xiaogang Wang
Extensive experiments demonstrate the effectiveness of our model that combines DNN and CRF for learning robust multi-scale local similarities.
no code implementations • CVPR 2018 • Dan Xu, Wanli Ouyang, Xiaogang Wang, Nicu Sebe
Depth estimation and scene parsing are two particularly important tasks in visual scene understanding.
Ranked #15 on Depth Estimation on NYU-Depth V2
1 code implementation • CVPR 2018 • Dan Xu, Wei Wang, Hao Tang, Hong Liu, Nicu Sebe, Elisa Ricci
Recent works have shown the benefit of integrating Conditional Random Fields (CRFs) models into deep architectures for improving pixel-level prediction tasks.
no code implementations • 5 Mar 2018 • Dan Xu, Xavier Alameda-Pineda, Jingkuan Song, Elisa Ricci, Nicu Sebe
In this paper we address the problem of learning robust cross-domain representations for sketch-based image retrieval (SBIR).
1 code implementation • 1 Mar 2018 • Dan Xu, Elisa Ricci, Wanli Ouyang, Xiaogang Wang, Nicu Sebe
Depth cues have been proved very useful in various computer vision and robotic tasks.
no code implementations • CVPR 2018 • Wei Wang, Xavier Alameda-Pineda, Dan Xu, Pascal Fua, Elisa Ricci, Nicu Sebe
Finally, these landmark sequences are translated into face videos.
no code implementations • NeurIPS 2017 • Dan Xu, Wanli Ouyang, Xavier Alameda-Pineda, Elisa Ricci, Xiaogang Wang, Nicu Sebe
Recent works have shown that exploiting multi-scale representations deeply learned via convolutional neural networks (CNN) is of tremendous importance for accurate contour detection.
2 code implementations • CVPR 2017 • Dan Xu, Wanli Ouyang, Elisa Ricci, Xiaogang Wang, Nicu Sebe
Then, the learned feature representations are transferred to a second deep network, which receives as input an RGB image and outputs the detection results.
2 code implementations • CVPR 2017 • Dan Xu, Elisa Ricci, Wanli Ouyang, Xiaogang Wang, Nicu Sebe
This paper addresses the problem of depth estimation from a single still image.
Ranked #14 on Depth Estimation on NYU-Depth V2
1 code implementation • CVPR 2017 • Xavier Alameda-Pineda, Andrea Pilzer, Dan Xu, Nicu Sebe, Elisa Ricci
In our overly-connected world, the automatic recognition of virality - the quality of an image or video to be rapidly and widely spread in social networks - is of crucial importance, and has recently awaken the interest of the computer vision community.
no code implementations • 18 Nov 2016 • Ye Zhang, Md Mustafizur Rahman, Alex Braylan, Brandon Dang, Heng-Lu Chang, Henna Kim, Quinten McNamara, Aaron Angert, Edward Banner, Vivek Khetan, Tyler McDonnell, An Thanh Nguyen, Dan Xu, Byron C. Wallace, Matthew Lease
A recent "third wave" of Neural Network (NN) approaches now delivers state-of-the-art performance in many machine learning tasks, spanning speech recognition, computer vision, and natural language processing.
no code implementations • 23 Aug 2016 • Nannan Li, Dan Xu, Zhenqiang Ying, Zhihao LI, Ge Li
In this paper, we address the problem of searching action proposals in unconstrained video clips.
no code implementations • 6 Oct 2015 • Dan Xu, Elisa Ricci, Yan Yan, Jingkuan Song, Nicu Sebe
We present a novel unsupervised deep learning framework for anomalous event detection in complex video scenes.