Search Results for author: Dan Xu

Found 94 papers, 49 papers with code

3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and Box-Focused Sampling for 3D Object Detection

1 code implementation2 Oct 2024 Yang Cao, Yuanliang Jv, Dan Xu

Neural Radiance Fields (NeRF) are widely used for novel-view synthesis and have been adapted for 3D Object Detection (3DOD), offering a promising approach to 3DOD through view-synthesis representation.

Motion-Oriented Compositional Neural Radiance Fields for Monocular Dynamic Human Modeling

no code implementations16 Jul 2024 Jaehyeok Kim, Dongyoon Wee, Dan Xu

To target this problem, we propose a novel approach of modeling non-rigid motions as radiance residual fields to benefit from more direct color supervision in the rendering and utilize the rigid radiance fields as a prior to reduce the complexity of the learning process.

Learning Online Scale Transformation for Talking Head Video Generation

no code implementations13 Jul 2024 Fa-Ting Hong, Dan Xu

To this end, we introduce a scale transformation module that can automatically adjust the scale of the driving image to fit that of the source image, by using the information of scale difference maintained in the detected keypoints of the source image and the driving frame.

Face Reenactment Video Generation

Sample-efficient Imitative Multi-token Decision Transformer for Generalizable Real World Driving

no code implementations18 Jun 2024 Hang Zhou, Dan Xu, Yiding Ji

Reinforcement learning via sequence modeling has shown remarkable promise in autonomous systems, harnessing the power of offline datasets to make informed decisions in simulated environments.

Autonomous Driving reinforcement-learning +1

Holistic-Motion2D: Scalable Whole-body Human Motion Generation in 2D Space

no code implementations17 Jun 2024 YuAn Wang, Zhao Wang, Junhao Gong, Di Huang, Tong He, Wanli Ouyang, Jile Jiao, Xuetao Feng, Qi Dou, Shixiang Tang, Dan Xu

In this paper, we introduce a novel path to $\textit{general}$ human motion generation by focusing on 2D space.

RoomTex: Texturing Compositional Indoor Scenes via Iterative Inpainting

no code implementations4 Jun 2024 Qi Wang, Ruijie Lu, Xudong Xu, Jingbo Wang, Michael Yu Wang, Bo Dai, Gang Zeng, Dan Xu

In the coarse stage, RoomTex first unwraps the scene mesh to a panoramic depth map and leverages ControlNet to generate a room panorama, which is regarded as the coarse reference to ensure the global texture consistency.

Edge Detection Text to 3D

Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection

1 code implementation2 Jun 2024 Yang Cao, Yihan Zeng, Hang Xu, Dan Xu

3D-NOD is further extended with an Enrichment strategy that significantly enriches the novel object distribution in the training scenes, and then enhances the model's ability to localize more novel objects.

3D Object Detection cross-modal alignment +3

X-VILA: Cross-Modality Alignment for Large Language Model

no code implementations29 May 2024 Hanrong Ye, De-An Huang, Yao Lu, Zhiding Yu, Wei Ping, Andrew Tao, Jan Kautz, Song Han, Dan Xu, Pavlo Molchanov, Hongxu Yin

We introduce X-VILA, an omni-modality model designed to extend the capabilities of large language models (LLMs) by incorporating image, video, and audio modalities.

Instruction Following Language Modelling +1

PyGS: Large-scale Scene Representation with Pyramidal 3D Gaussian Splatting

no code implementations27 May 2024 Zipeng Wang, Dan Xu

To address these challenges, we present Pyramidal 3D Gaussian Splatting (PyGS) with NeRF Initialization.

Efficient Multitask Dense Predictor via Binarization

no code implementations CVPR 2024 Yuzhang Shang, Dan Xu, Gaowen Liu, Ramana Rao Kompella, Yan Yan

Moreover, we introduce a knowledge distillation mechanism to correct the direction of information flow in backward propagation.

Binarization Knowledge Distillation +1

Interactive3D: Create What You Want by Interactive 3D Generation

no code implementations CVPR 2024 Shaocong Dong, Lihe Ding, Zhanpeng Huang, Zibin Wang, Tianfan Xue, Dan Xu

Both of them limit customization to the confines of the 2D reference and potentially introduce undesirable artifacts during the 3D lifting process, restricting the scope for direct and versatile 3D modifications.

3D Generation

GScream: Learning 3D Geometry and Feature Consistent Gaussian Splatting for Object Removal

no code implementations21 Apr 2024 Yuxin Wang, Qianyi Wu, Guofeng Zhang, Dan Xu

This paper tackles the intricate challenge of object removal to update the radiance field using the 3D Gaussian Splatting.

3D geometry Monocular Depth Estimation +1

CVT-xRF: Contrastive In-Voxel Transformer for 3D Consistent Radiance Fields from Sparse Inputs

no code implementations CVPR 2024 Yingji Zhong, Lanqing Hong, Zhenguo Li, Dan Xu

While existing works mainly consider ray-level consistency to construct 2D learning regularization based on rendered color, depth, or semantics on image planes, in this paper we propose a novel approach that models 3D spatial field consistency to improve NeRF's performance with sparse inputs.

Novel View Synthesis

DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from Partially Annotated Data

no code implementations CVPR 2024 Hanrong Ye, Dan Xu

It designs a joint diffusion and denoising paradigm to model a potential noisy distribution in the task prediction or feature maps and generate rectified outputs for different tasks.

Denoising Scene Understanding

Personalized LoRA for Human-Centered Text Understanding

1 code implementation10 Mar 2024 You Zhang, Jin Wang, Liang-Chih Yu, Dan Xu, Xuejie Zhang

Effectively and efficiently adapting a pre-trained language model (PLM) for human-centered text understanding (HCTU) is challenging since user tokens are million-level in most personalized applications and do not have concrete explicit semantics.

Language Modelling Zero-Shot Learning

Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised Semantic Segmentation

no code implementations2 Mar 2024 Lian Xu, Mohammed Bennamoun, Farid Boussaid, Wanli Ouyang, Ferdous Sohel, Dan Xu

We propose AuxSegNet+, a weakly supervised auxiliary learning framework to explore the rich information from these saliency maps and the significant inter-task correlation between saliency detection and semantic segmentation.

Auxiliary Learning Multi-Label Image Classification +5

Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors

no code implementations CVPR 2024 Lihe Ding, Shaocong Dong, Zhanpeng Huang, Zibin Wang, Yiyuan Zhang, Kaixiong Gong, Dan Xu, Tianfan Xue

Recently, researchers have attempted to improve the genuineness of 3D objects by directly training on 3D datasets, albeit at the cost of low-quality texture generation due to the limited texture diversity in 3D datasets.

3D Generation Diversity +2

GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting

no code implementations CVPR 2024 Chi Yan, Delin Qu, Dan Xu, Bin Zhao, Zhigang Wang, Dong Wang, Xuelong Li

This strategy is essential to extend 3D Gaussian representation to reconstruct the whole scene rather than synthesize a static object in existing methods.

Pose Tracking Simultaneous Localization and Mapping

Implicit Event-RGBD Neural SLAM

no code implementations CVPR 2024 Delin Qu, Chi Yan, Dong Wang, Jie Yin, Dan Xu, Bin Zhao, Xuelong Li

To address these challenges, we propose EN-SLAM, the first event-RGBD implicit neural SLAM framework, which effectively leverages the high rate and high dynamic range advantages of event data for tracking and mapping.

SegGen: Supercharging Segmentation Models with Text2Mask and Mask2Img Synthesis

no code implementations6 Nov 2023 Hanrong Ye, Jason Kuen, Qing Liu, Zhe Lin, Brian Price, Dan Xu

On the highly competitive ADE20K and COCO benchmarks, our data generation method markedly improves the performance of state-of-the-art segmentation models in semantic segmentation, panoptic segmentation, and instance segmentation.

Diversity Image Generation +4

CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection

1 code implementation NeurIPS 2023 Yang Cao, Yihan Zeng, Hang Xu, Dan Xu

Open-vocabulary 3D Object Detection (OV-3DDet) aims to detect objects from an arbitrary list of categories within a 3D scene, which remains seldom explored in the literature.

3D Object Detection cross-modal alignment +4

MCTformer+: Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation

1 code implementation6 Aug 2023 Lian Xu, Mohammed Bennamoun, Farid Boussaid, Hamid Laga, Wanli Ouyang, Dan Xu

Building upon the observation that the attended regions of the one-class token in the standard vision transformer can contribute to a class-agnostic localization map, we explore the potential of the transformer model to capture class-specific attention for class-discriminative object localization by learning multiple class tokens.

Object Localization Weakly supervised Semantic Segmentation +1

Learning Unified Decompositional and Compositional NeRF for Editable Novel View Synthesis

no code implementations ICCV 2023 Yuxin Wang, Wayne Wu, Dan Xu

State-of-the-art methods in this direction typically consider building separate networks for these two tasks (i. e., view synthesis and editing).

Novel View Synthesis

TaskExpert: Dynamically Assembling Multi-Task Representations with Memorial Mixture-of-Experts

1 code implementation ICCV 2023 Hanrong Ye, Dan Xu

Furthermore, to establish long-range modeling of the task-specific representations from different layers of TaskExpert, we design a multi-task feature memory that updates at each layer and acts as an additional feature expert for dynamic task-specific feature decoding.

Long-range modeling Multi-Task Learning +1

Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation

1 code implementation ICCV 2023 Fa-Ting Hong, Dan Xu

Talking head video generation aims to animate a human face in a still image with dynamic poses and expressions using motion information derived from a target-driving video, while maintaining the person's identity in the source image.

Talking Head Generation Video Generation

Contrastive Multi-Task Dense Prediction

no code implementations16 Jul 2023 Siwei Yang, Hanrong Ye, Dan Xu

A core objective in design is how to effectively model cross-task interactions to achieve a comprehensive improvement on different tasks based on their inherent complementarity and consistency.

Contrastive Learning Representation Learning

InvPT++: Inverted Pyramid Multi-Task Transformer for Visual Scene Understanding

1 code implementation8 Jun 2023 Hanrong Ye, Dan Xu

And then, we design a transformer decoder to establish spatial and cross-task interaction globally, and a novel UP-Transformer block is devised to increase the resolutions of multi-task features gradually and establish cross-task interaction at different scales.

Decoder Multi-Task Learning +1

DaGAN++: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

1 code implementation10 May 2023 Fa-Ting Hong, Li Shen, Dan Xu

In this work, firstly, we present a novel self-supervised method for learning dense 3D facial geometry (ie, depth) from face videos, without requiring camera parameters and 3D geometry annotations in training.

3D geometry Generative Adversarial Network +3

DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment

no code implementations CVPR 2023 Lewei Yao, Jianhua Han, Xiaodan Liang, Dan Xu, Wei zhang, Zhenguo Li, Hang Xu

This paper presents DetCLIPv2, an efficient and scalable training framework that incorporates large-scale image-text pairs to achieve open-vocabulary object detection (OVD).

Language Modelling object-detection +1

Joint 2D-3D Multi-Task Learning on Cityscapes-3D: 3D Detection, Segmentation, and Depth Estimation

1 code implementation3 Apr 2023 Hanrong Ye, Dan Xu

TaskPrompter introduces a new multi-task benchmark based on Cityscapes-3D dataset, which requires the multi-task model to concurrently generate predictions for monocular 3D vehicle detection, semantic segmentation, and monocular depth estimation.

3D Object Detection Autonomous Driving +4

You Only Train Once: Multi-Identity Free-Viewpoint Neural Human Rendering from Monocular Videos

no code implementations10 Mar 2023 Jaehyeok Kim, Dongyoon Wee, Dan Xu

In this paper, we tackle this problem by proposing a set of learnable identity codes to expand the capability of the framework for multi-identity free-viewpoint rendering, and an effective pose-conditioned code query mechanism to finely model the pose-dependent non-rigid motions.

Learning Multi-Modal Class-Specific Tokens for Weakly Supervised Dense Object Localization

no code implementations CVPR 2023 Lian Xu, Wanli Ouyang, Mohammed Bennamoun, Farid Boussaid, Dan Xu

Weakly supervised dense object localization (WSDOL) relies generally on Class Activation Mapping (CAM), which exploits the correlation between the class weights of the image classifier and the pixel-level features.

Object Localization Representation Learning +2

LP-BFGS attack: An adversarial attack based on the Hessian with limited pixels

1 code implementation26 Oct 2022 Jiebao Zhang, Wenhua Qian, Rencan Nie, Jinde Cao, Dan Xu

We study the attack performance and computation cost of the attack method based on the Hessian with a limited number of perturbation pixels.

Adversarial Attack

DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection

no code implementations20 Sep 2022 Lewei Yao, Jianhua Han, Youpeng Wen, Xiaodan Liang, Dan Xu, Wei zhang, Zhenguo Li, Chunjing Xu, Hang Xu

We further design a concept dictionary~(with descriptions) from various online sources and detection datasets to provide prior knowledge for each concept.

object-detection Open World Object Detection

Lipschitz Continuity Retained Binary Neural Network

1 code implementation13 Jul 2022 Yuzhang Shang, Dan Xu, Bin Duan, Ziliang Zong, Liqiang Nie, Yan Yan

Relying on the premise that the performance of a binary neural network can be largely restored with eliminated quantization error between full-precision weight vectors and their corresponding binary vectors, existing works of network binarization frequently adopt the idea of model robustness to reach the aforementioned objective.

Binarization Quantization

Exploring Adversarial Examples and Adversarial Robustness of Convolutional Neural Networks by Mutual Information

1 code implementation12 Jul 2022 Jiebao Zhang, Wenhua Qian, Rencan Nie, Jinde Cao, Dan Xu

Adversarial training is a simple and effective defense method to improve the robustness of CNNs to adversarial examples.

Adversarial Robustness

Network Binarization via Contrastive Learning

1 code implementation6 Jul 2022 Yuzhang Shang, Dan Xu, Ziliang Zong, Liqiang Nie, Yan Yan

Neural network binarization accelerates deep models by quantizing their weights and activations into 1-bit.

Binarization Contrastive Learning +2

Uncertainty-aware Contrastive Distillation for Incremental Semantic Segmentation

1 code implementation26 Mar 2022 Guanglei Yang, Enrico Fini, Dan Xu, Paolo Rota, Mingli Ding, Moin Nabi, Xavier Alameda-Pineda, Elisa Ricci

This problem has been widely investigated in the research community and several Incremental Learning (IL) approaches have been proposed in the past years.

Contrastive Learning Image Classification +5

An Adaptive and Scalable ANN-based Model-Order-Reduction Method for Large-Scale TO Designs

no code implementations20 Mar 2022 Ren Kai Tan, Chao Qian, Dan Xu, Wenjing Ye

Most models are trained to work with the design problem similar to that used for data generation and require retraining if the design problem changes.

Cantilever Beam

InvPT: Inverted Pyramid Multi-task Transformer for Dense Scene Understanding

1 code implementation15 Mar 2022 Hanrong Ye, Dan Xu

Multi-task dense scene understanding is a thriving research domain that requires simultaneous perception and reasoning on a series of correlated tasks with pixel-wise prediction.

Boundary Detection Human Parsing +6

Depth-Aware Generative Adversarial Network for Talking Head Video Generation

1 code implementation CVPR 2022 Fa-Ting Hong, Longhao Zhang, Li Shen, Dan Xu

In a more dense way, the depth is also utilized to learn 3D-aware cross-modal (i. e. appearance and depth) attention to guide the generation of motion fields for warping source image representations.

3D geometry Generative Adversarial Network +2

Multi-class Token Transformer for Weakly Supervised Semantic Segmentation

1 code implementation CVPR 2022 Lian Xu, Wanli Ouyang, Mohammed Bennamoun, Farid Boussaid, Dan Xu

To this end, we propose a Multi-class Token Transformer, termed as MCTformer, which uses multiple class tokens to learn interactions between the class tokens and the patch tokens.

Object Object Localization +2

Continual Attentive Fusion for Incremental Learning in Semantic Segmentation

1 code implementation1 Feb 2022 Guanglei Yang, Enrico Fini, Dan Xu, Paolo Rota, Mingli Ding, Hao Tang, Xavier Alameda-Pineda, Elisa Ricci

To fill this gap, in this paper we introduce a novel attentive feature distillation approach to mitigate catastrophic forgetting while accounting for semantic spatial- and channel-level dependencies.

Incremental Learning Semantic Segmentation

Generalized Binary Search Network for Highly-Efficient Multi-View Stereo

1 code implementation CVPR 2022 Zhenxing Mi, Di Chang, Dan Xu

The new formulation makes our method only sample a very small number of depth hypotheses in each step, which is highly memory efficient, and also greatly facilitates quick training convergence.

3D Reconstruction Depth Estimation +2

Contrastive Mutual Information Maximization for Binary Neural Networks

no code implementations29 Sep 2021 Yuzhang Shang, Dan Xu, Ziliang Zong, Liqiang Nie, Yan Yan

Neural network binarization accelerates deep models by quantizing their weights and activations into 1-bit.

Binarization Contrastive Learning +2

Fine-grained Domain Adaptive Crowd Counting via Point-derived Segmentation

no code implementations6 Aug 2021 Yongtuo Liu, Dan Xu, Sucheng Ren, Hanjie Wu, Hongmin Cai, Shengfeng He

To this end, we propose to untangle \emph{domain-invariant} crowd and \emph{domain-specific} background from crowd images and design a fine-grained domain adaption method for crowd counting.

Crowd Counting Domain Adaptation +1

Reducing Spatial Labeling Redundancy for Semi-supervised Crowd Counting

no code implementations6 Aug 2021 Yongtuo Liu, Sucheng Ren, Liangyu Chai, Hanjie Wu, Jing Qin, Dan Xu, Shengfeng He

In this way, we can transfer the original spatial labeling redundancy caused by individual similarities to effective supervision signals on the unlabeled regions.

Crowd Counting

Cross-modal Consensus Network for Weakly Supervised Temporal Action Localization

2 code implementations27 Jul 2021 Fa-Ting Hong, Jia-Chang Feng, Dan Xu, Ying Shan, Wei-Shi Zheng

In this work, we argue that the features extracted from the pretrained extractor, e. g., I3D, are not the WS-TALtask-specific features, thus the feature re-calibration is needed for reducing the task-irrelevant information redundancy.

Weakly Supervised Action Localization Weakly-supervised Temporal Action Localization +1

Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation

1 code implementation ICCV 2021 Lian Xu, Wanli Ouyang, Mohammed Bennamoun, Farid Boussaid, Ferdous Sohel, Dan Xu

Motivated by the significant inter-task correlation, we propose a novel weakly supervised multi-task framework termed as AuxSegNet, to leverage saliency detection and multi-label image classification as auxiliary tasks to improve the primary task of semantic segmentation using only image-level ground-truth labels.

Auxiliary Learning Multi-Label Image Classification +6

SA-ConvONet: Sign-Agnostic Optimization of Convolutional Occupancy Networks

1 code implementation ICCV 2021 Jiapeng Tang, Jiabao Lei, Dan Xu, Feiying Ma, Kui Jia, Lei Zhang

To this end, we propose to learn implicit surface reconstruction by sign-agnostic optimization of convolutional occupancy networks, to simultaneously achieve advanced scalability to large-scale scenes, generality to novel shapes, and applicability to raw scans in a unified framework.

Surface Reconstruction

Moving SLAM: Fully Unsupervised Deep Learning in Non-Rigid Scenes

no code implementations5 May 2021 Dan Xu, Andrea Vedaldi, Joao F. Henriques

We build on the idea of view synthesis, which uses classical camera geometry to re-render a source image from a different point-of-view, specified by a predicted relative pose and depth map.

3D geometry Depth Estimation +2

Delving into Localization Errors for Monocular 3D Object Detection

1 code implementation CVPR 2021 Xinzhu Ma, Yinmin Zhang, Dan Xu, Dongzhan Zhou, Shuai Yi, Haojie Li, Wanli Ouyang

Estimating 3D bounding boxes from monocular images is an essential component in autonomous driving, while accurate 3D object detection from this kind of data is very challenging.

3D Object Detection From Monocular Images Autonomous Driving +3

Variational Structured Attention Networks for Deep Visual Representation Learning

1 code implementation5 Mar 2021 Guanglei Yang, Paolo Rota, Xavier Alameda-Pineda, Dan Xu, Mingli Ding, Elisa Ricci

Specifically, we integrate the estimation and the interaction of the attentions within a probabilistic representation learning framework, leading to Variational STructured Attention networks (VISTA-Net).

Depth Estimation Representation Learning +1

NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series Forecasting

1 code implementation10 Feb 2021 Kai Chen, Guang Chen, Dan Xu, Lijun Zhang, Yuyao Huang, Alois Knoll

Although Transformer has made breakthrough success in widespread domains especially in Natural Language Processing (NLP), applying it to time series forecasting is still a great challenge.

Time Series Time Series Forecasting

Probabilistic Graph Attention Network with Conditional Kernels for Pixel-Wise Prediction

no code implementations8 Jan 2021 Dan Xu, Xavier Alameda-Pineda, Wanli Ouyang, Elisa Ricci, Xiaogang Wang, Nicu Sebe

In contrast to previous works directly considering multi-scale feature maps obtained from the inner layers of a primary CNN architecture, and simply fusing the features with weighted averaging or concatenation, we propose a probabilistic graph attention network structure based on a novel Attention-Gated Conditional Random Fields (AG-CRFs) model for learning and fusing multi-scale representations in a principled manner.

Graph Attention Monocular Depth Estimation +1

Variational Structured Attention Networks for Dense Pixel-Wise Prediction

1 code implementation1 Jan 2021 Guanglei Yang, Paolo Rota, Xavier Alameda-Pineda, Dan Xu, Mingli Ding, Elisa Ricci

State-of-the-art performances in dense pixel-wise prediction tasks are obtained with specifically designed convolutional networks.

Unsupervised Image Segmentation using Mutual Mean-Teaching

no code implementations16 Dec 2020 Zhichao Wu, Lei Guo, Hao Zhang, Dan Xu

Unsupervised image segmentation aims at assigning the pixels with similar feature into a same cluster without annotation, which is an important task in computer vision.

Image Segmentation Segmentation +2

LAP-Net: Adaptive Features Sampling via Learning Action Progression for Online Action Detection

no code implementations16 Nov 2020 Sanqing Qu, Guang Chen, Dan Xu, Jinhu Dong, Fan Lu, Alois Knoll

At each time step, this sampling strategy first estimates current action progression and then decide what temporal ranges should be used to aggregate the optimal supplementary features.

Online Action Detection

Scope Head for Accurate Localization in Object Detection

no code implementations11 May 2020 Geng Zhan, Dan Xu, Guo Lu, Wei Wu, Chunhua Shen, Wanli Ouyang

Existing anchor-based and anchor-free object detectors in multi-stage or one-stage pipelines have achieved very promising detection performance.

Object object-detection +2

Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation

2 code implementations CVPR 2020 Hao Tang, Dan Xu, Yan Yan, Philip H. S. Torr, Nicu Sebe

To tackle this issue, in this work we consider learning the scene generation in a local context, and correspondingly design a local class-specific generative network with semantic maps as a guidance, which separately constructs and learns sub-generators concentrating on the generation of different classes, and is able to provide more scene details.

Image Generation Scene Generation

AttentionGAN: Unpaired Image-to-Image Translation using Attention-Guided Generative Adversarial Networks

2 code implementations27 Nov 2019 Hao Tang, Hong Liu, Dan Xu, Philip H. S. Torr, Nicu Sebe

State-of-the-art methods in image-to-image translation are capable of learning a mapping from a source domain to a target domain with unpaired image data.

Image-to-Image Translation Translation

Progressive Fusion for Unsupervised Binocular Depth Estimation using Cycled Networks

1 code implementation17 Sep 2019 Andrea Pilzer, Stéphane Lathuilière, Dan Xu, Mihai Marian Puscas, Elisa Ricci, Nicu Sebe

Extensive experiments on the publicly available datasets KITTI, Cityscapes and ApolloScape demonstrate the effectiveness of the proposed model which is competitive with other unsupervised deep learning methods for depth prediction.

Data Augmentation Depth Prediction +2

Structured Modeling of Joint Deep Feature and Prediction Refinement for Salient Object Detection

1 code implementation ICCV 2019 Yingyue Xu, Dan Xu, Xiaopeng Hong, Wanli Ouyang, Rongrong Ji, Min Xu, Guoying Zhao

We formulate the CRF graphical model that involves message-passing of feature-feature, feature-prediction, and prediction-prediction, from the coarse scale to the finer scale, to update the features and the corresponding predictions.

object-detection RGB Salient Object Detection +1

Dynamic Graph Message Passing Networks

1 code implementation CVPR 2020 Li Zhang, Dan Xu, Anurag Arnab, Philip H. S. Torr

We propose a dynamic graph message passing network, that significantly reduces the computational complexity compared to related works modelling a fully-connected graph.

Image Classification object-detection +3

Structured Coupled Generative Adversarial Networks for Unsupervised Monocular Depth Estimation

no code implementations15 Aug 2019 Mihai Marian Puscas, Dan Xu, Andrea Pilzer, Nicu Sebe

Inspired by the success of adversarial learning, we propose a new end-to-end unsupervised deep learning framework for monocular depth estimation consisting of two Generative Adversarial Networks (GAN), deeply coupled with a structured Conditional Random Field (CRF) model.

Monocular Depth Estimation Unsupervised Monocular Depth Estimation

Cycle In Cycle Generative Adversarial Networks for Keypoint-Guided Image Generation

1 code implementation2 Aug 2019 Hao Tang, Dan Xu, Gaowen Liu, Wei Wang, Nicu Sebe, Yan Yan

In this work, we propose a novel Cycle In Cycle Generative Adversarial Network (C$^2$GAN) for the task of keypoint-guided image generation.

Generative Adversarial Network Image Generation

Expression Conditional GAN for Facial Expression-to-Expression Translation

no code implementations14 May 2019 Hao Tang, Wei Wang, Songsong Wu, Xinya Chen, Dan Xu, Nicu Sebe, Yan Yan

In this paper, we focus on the facial expression translation task and propose a novel Expression Conditional GAN (ECGAN) which can learn the mapping from one image domain to another one based on an additional expression attribute.

Attribute Facial expression generation +2

Multi-Channel Attention Selection GAN with Cascaded Semantic Guidance for Cross-View Image Translation

3 code implementations CVPR 2019 Hao Tang, Dan Xu, Nicu Sebe, Yanzhi Wang, Jason J. Corso, Yan Yan

In this paper, we propose a novel approach named Multi-Channel Attention SelectionGAN (SelectionGAN) that makes it possible to generate images of natural scenes in arbitrary viewpoints, based on an image of the scene and a novel semantic map.

Bird View Synthesis Cross-View Image-to-Image Translation +1

Attention-Guided Generative Adversarial Networks for Unsupervised Image-to-Image Translation

8 code implementations28 Mar 2019 Hao Tang, Dan Xu, Nicu Sebe, Yan Yan

To handle the limitation, in this paper we propose a novel Attention-Guided Generative Adversarial Network (AGGAN), which can detect the most discriminative semantic object and minimize changes of unwanted part for semantic manipulation problems without using extra data and models.

Generative Adversarial Network Translation +1

Attribute-Guided Sketch Generation

1 code implementation28 Jan 2019 Hao Tang, Xinya Chen, Wei Wang, Dan Xu, Jason J. Corso, Nicu Sebe, Yan Yan

To this end, we propose a novel Attribute-Guided Sketch Generative Adversarial Network (ASGAN) which is an end-to-end framework and contains two pairs of generators and discriminators, one of which is used to generate faces with attributes while the other one is employed for image-to-sketch translation.

Attribute Generative Adversarial Network +1

Dual Generator Generative Adversarial Networks for Multi-Domain Image-to-Image Translation

1 code implementation14 Jan 2019 Hao Tang, Dan Xu, Wei Wang, Yan Yan, Nicu Sebe

State-of-the-art methods for image-to-image translation with Generative Adversarial Networks (GANs) can learn a mapping from one domain to another domain using unpaired image data.

Generative Adversarial Network Image-to-Image Translation +1

Deep Micro-Dictionary Learning and Coding Network

1 code implementation11 Sep 2018 Hao Tang, Heng Wei, Wei Xiao, Wei Wang, Dan Xu, Yan Yan, Nicu Sebe

In this paper, we propose a novel Deep Micro-Dictionary Learning and Coding Network (DDLCN).

Dictionary Learning

GestureGAN for Hand Gesture-to-Gesture Translation in the Wild

1 code implementation14 Aug 2018 Hao Tang, Wei Wang, Dan Xu, Yan Yan, Nicu Sebe

Therefore, this task requires a high-level understanding of the mapping between the input source gesture and the output target gesture.

Data Augmentation Generative Adversarial Network +2

Unsupervised Adversarial Depth Estimation using Cycled Generative Networks

2 code implementations28 Jul 2018 Andrea Pilzer, Dan Xu, Mihai Marian Puscas, Elisa Ricci, Nicu Sebe

The proposed architecture consists of two generative sub-networks jointly trained with adversarial learning for reconstructing the disparity map and organized in a cycle such as to provide mutual constraints and supervision to each other.

Monocular Depth Estimation

Group Consistent Similarity Learning via Deep CRF for Person Re-Identification

no code implementations CVPR 2018 Dapeng Chen, Dan Xu, Hongsheng Li, Nicu Sebe, Xiaogang Wang

Extensive experiments demonstrate the effectiveness of our model that combines DNN and CRF for learning robust multi-scale local similarities.

Person Re-Identification

Structured Attention Guided Convolutional Neural Fields for Monocular Depth Estimation

1 code implementation CVPR 2018 Dan Xu, Wei Wang, Hao Tang, Hong Liu, Nicu Sebe, Elisa Ricci

Recent works have shown the benefit of integrating Conditional Random Fields (CRFs) models into deep architectures for improving pixel-level prediction tasks.

Monocular Depth Estimation

Learning Deep Structured Multi-Scale Features using Attention-Gated CRFs for Contour Prediction

no code implementations NeurIPS 2017 Dan Xu, Wanli Ouyang, Xavier Alameda-Pineda, Elisa Ricci, Xiaogang Wang, Nicu Sebe

Recent works have shown that exploiting multi-scale representations deeply learned via convolutional neural networks (CNN) is of tremendous importance for accurate contour detection.

Contour Detection

Learning Cross-Modal Deep Representations for Robust Pedestrian Detection

2 code implementations CVPR 2017 Dan Xu, Wanli Ouyang, Elisa Ricci, Xiaogang Wang, Nicu Sebe

Then, the learned feature representations are transferred to a second deep network, which receives as input an RGB image and outputs the detection results.

Pedestrian Detection

Viraliency: Pooling Local Virality

1 code implementation CVPR 2017 Xavier Alameda-Pineda, Andrea Pilzer, Dan Xu, Nicu Sebe, Elisa Ricci

In our overly-connected world, the automatic recognition of virality - the quality of an image or video to be rapidly and widely spread in social networks - is of crucial importance, and has recently awaken the interest of the computer vision community.

Neural Information Retrieval: A Literature Review

no code implementations18 Nov 2016 Ye Zhang, Md Mustafizur Rahman, Alex Braylan, Brandon Dang, Heng-Lu Chang, Henna Kim, Quinten McNamara, Aaron Angert, Edward Banner, Vivek Khetan, Tyler McDonnell, An Thanh Nguyen, Dan Xu, Byron C. Wallace, Matthew Lease

A recent "third wave" of Neural Network (NN) approaches now delivers state-of-the-art performance in many machine learning tasks, spanning speech recognition, computer vision, and natural language processing.

Information Retrieval Retrieval +2

Searching Action Proposals via Spatial Actionness Estimation and Temporal Path Inference and Tracking

no code implementations23 Aug 2016 Nannan Li, Dan Xu, Zhenqiang Ying, Zhihao LI, Ge Li

In this paper, we address the problem of searching action proposals in unconstrained video clips.

Learning Deep Representations of Appearance and Motion for Anomalous Event Detection

no code implementations6 Oct 2015 Dan Xu, Elisa Ricci, Yan Yan, Jingkuan Song, Nicu Sebe

We present a novel unsupervised deep learning framework for anomalous event detection in complex video scenes.

Anomaly Detection Denoising +1

Cannot find the paper you are looking for? You can Submit a new open access paper.