no code implementations • COLING 2022 • Xueyuan Chen, Shun Lei, Zhiyong Wu, Dong Xu, Weifeng Zhao, Helen Meng
On top of these, a bi-reference attention mechanism is used to align both local-scale reference style embedding sequence and local-scale context style embedding sequence with corresponding phoneme embedding sequence.
no code implementations • 20 Dec 2022 • Guanbo Pan, Guo Lu, Zhihao Hu, Dong Xu
Although several content adaptive methods have been proposed by updating the encoder-side components, the adaptability of both latents and the decoder is not well exploited.
no code implementations • 21 Nov 2022 • Weiqi Sun, Rui Su, Qian Yu, Dong Xu
Weakly supervised temporal action localization (WTAL) aims to localize actions in untrimmed videos with only weak supervision information (e. g. video-level labels).
Weakly-supervised Temporal Action Localization
Weakly Supervised Temporal Action Localization
no code implementations • 24 Sep 2022 • Lichen Zhao, Daigang Cai, Jing Zhang, Lu Sheng, Dong Xu, Rui Zheng, Yinjie Zhao, Lipeng Wang, Xibo Fan
We also propose a new 3D VQA framework to effectively predict the completely visually grounded and explainable answer.
1 code implementation • 31 Aug 2022 • ZiMing Wang, Xiaoliang Huo, Zhenghao Chen, Jing Zhang, Lu Sheng, Dong Xu
In addition to previous methods that seek correspondences by hand-crafted or learnt geometric features, recent point cloud registration methods have tried to apply RGB-D data to achieve more accurate correspondence.
1 code implementation • 14 Aug 2022 • Chenjian Gao, Qian Yu, Lu Sheng, Yi-Zhe Song, Dong Xu
Reconstructing a 3D shape based on a single sketch image is challenging due to the large domain gap between a sparse, irregular sketch and a regular, dense 3D shape.
no code implementations • CVPR 2022 • Zhihao Hu, Guo Lu, Jinyang Guo, Shan Liu, Wei Jiang, Dong Xu
The previous deep video compression approaches only use the single scale motion compensation strategy and rarely adopt the mode prediction technique from the traditional standards like H. 264/H. 265 for both motion and residual compression.
no code implementations • 13 Mar 2022 • Feiyu Wang, Qin Wang, Wen Li, Dong Xu, Luc van Gool
Benefited from this new perspective, we first propose a new deep semi-supervised learning framework called Semi-supervised Learning by Empirical Distribution Alignment (SLEDA), in which existing technologies from the domain adaptation community can be readily used to address the semi-supervised learning problem through reducing the empirical distribution distance between labeled and unlabeled data.
no code implementations • CVPR 2022 • Daigang Cai, Lichen Zhao, Jing Zhang, Lu Sheng, Dong Xu
Observing that the 3D captioning task and the 3D grounding task contain both shared and complementary information in nature, in this work, we propose a unified framework to jointly solve these two distinct but closely related tasks in a synergistic fashion, which consists of both shared task-agnostic modules and lightweight task-specific modules.
no code implementations • CVPR 2022 • Zhenghao Chen, Guo Lu, Zhihao Hu, Shan Liu, Wei Jiang, Dong Xu
In this work, we propose the first end-to-end optimized framework for compressing automotive stereo videos (i. e., stereo videos from autonomous driving applications) from both left and right views.
no code implementations • CVPR 2022 • Guo Lu, Tianxiong Zhong, Jing Geng, Qiang Hu, Dong Xu
Specifically, given the image in the reference modality (e. g., the infrared image), we use the channel-wise alignment module to produce the aligned features based on the affine transform.
1 code implementation • 30 Nov 2021 • Sen Wang, Hengchao Wang, Fan Jiang, Anqi Wang, Hangwei Liu, Hanbo Zhao, Boyuan Yang, Dong Xu, Yan Zhang, Wei Fan
As the Hi-C links of two adjacent contigs concentrate only at the neighbor ends of the contigs, larger contig size will reduce the power to differentiate adjacent (signal) and non-adjacent (noise) contig linkages, leading to a higher rate of mis-assembly.
1 code implementation • CVPR 2021 • Weichen Zhang, Wen Li, Dong Xu
In this work, we propose a new cross-dataset 3D object detection method named Scale-aware and Range-aware Domain Adaptation Network (SRDAN).
1 code implementation • 14 Jun 2021 • MingHan Yang, Dong Xu, Qiwen Cui, Zaiwen Wen, Pengxiang Xu
In this paper, a novel second-order method called NG+ is proposed.
no code implementations • 26 May 2021 • Jinyang Guo, Dong Xu, Guo Lu
Furthermore, to achieve variable bitrate decoding with one single decoder, we propose a bitrate adaptive module to project the representation from a base bitrate to the expected representation at a target bitrate for transmission.
no code implementations • CVPR 2021 • Zhihao Hu, Guo Lu, Dong Xu
In this work, we propose a feature-space video coding network (FVC) by performing all major operations (i. e., motion estimation, motion compression, motion compensation and residual compression) in the feature space.
no code implementations • CVPR 2021 • Zizheng Que, Guo Lu, Dong Xu
In this paper, we propose a two-stage deep learning framework called VoxelContext-Net for both static and dynamic point cloud compression.
1 code implementation • CVPR 2021 • Bowen Cheng, Lu Sheng, Shaoshuai Shi, Ming Yang, Dong Xu
Inspired by the back-tracing strategy in the conventional Hough voting methods, in this work, we introduce a new 3D object detection method, named as Back-tracing Representative Points Network (BRNet), which generatively back-traces the representative points from the vote centers and also revisits complementary seed points around these generated points, so as to better capture the fine local structural features surrounding the potential objects from the raw point clouds.
Ranked #9 on
3D Object Detection
on SUN-RGBD val
no code implementations • 26 Mar 2021 • Jiayi Tian, Jing Zhang, Wen Li, Dong Xu
On the other hand, we also design an effective distribution alignment method to reduce the distribution divergence between the virtual domain and the target domain by gradually improving the compactness of the target domain distribution through model learning.
3 code implementations • 19 Jan 2021 • Mingchen Zhuge, Deng-Ping Fan, Nian Liu, Dingwen Zhang, Dong Xu, Ling Shao
We define the concept of integrity at both a micro and macro level.
1 code implementation • ICCV 2021 • Xiaolei Wu, Zhihao Hu, Lu Sheng, Dong Xu
In this work, we propose a new feed-forward arbitrary style transfer method, referred to as StyleFormer, which can simultaneously fulfill fine-grained style diversity and semantic content coherency.
no code implementations • ICCV 2021 • Rui Su, Qian Yu, Dong Xu
Spatio-temporal video grounding (STVG) aims to localize a spatio-temporal tube of a target object in an untrimmed video based on a query sentence.
no code implementations • ICCV 2021 • Lichen Zhao, Daigang Cai, Lu Sheng, Dong Xu
Visual grounding on 3D point clouds is an emerging vision and language task that benefits various applications in understanding the 3D visual world.
no code implementations • 1 Jan 2021 • Eleanor Quint, Dong Xu, Samuel W Flint, Stephen D Scott, Matthew Dwyer
In order to satisfy safety conditions, an agent may be constrained from acting freely.
1 code implementation • CVPR 2021 • Jie Liu, Chuming Li, Feng Liang, Chen Lin, Ming Sun, Junjie Yan, Wanli Ouyang, Dong Xu
To develop a practical method for learning complex inception convolution based on the data, a simple but effective search algorithm, referred to as efficient dilation optimization (EDO), is developed.
1 code implementation • 10 Nov 2020 • Zongheng Tang, Yue Liao, Si Liu, Guanbin Li, Xiaojie Jin, Hongxu Jiang, Qian Yu, Dong Xu
HC-STVG is a video grounding task that requires both spatial (where) and temporal (when) localization.
1 code implementation • 7 Nov 2020 • Muhammad Hassan, Yan Wang, Di Wang, Daixi Li, Yanchun Liang, You Zhou, Dong Xu
We collected 100, 000 shoeprints of subjects ranging from 7 to 80 years old and used the data to develop a deep learning end-to-end model ShoeNet to analyze age-related patterns and predict age.
no code implementations • 27 Oct 2020 • Hao Ma, Jingbin Liu, Zhirong Hu, Hongyu Qiu, Dong Xu, Zemin Wang, Xiaodong Gong, Sheng Yang
This paper designs a technique route to generate high-quality panoramic image with depth information, which involves two critical research hotspots: fusion of LiDAR and image data and image stitching.
no code implementations • 27 Oct 2020 • Hao Ma, Jingbin Liu, Keke Liu, Hongyu Qiu, Dong Xu, Zemin Wang, Xiaodong Gong, Sheng Yang
Registration of 3D LiDAR point clouds with optical images is critical in the combination of multi-source data.
no code implementations • ECCV 2020 • Zhihao Hu, Zhenghao Chen, Dong Xu, Guo Lu, Wanli Ouyang, Shuhang Gu
In this work, we propose a new framework called Resolution-adaptive Flow Coding (RaFC) to effectively compress the flow maps globally and locally, in which we use multi-resolution representations instead of single-resolution representations for both the input flow maps and the output motion features of the MV encoder.
1 code implementation • 14 Jul 2020 • Dong Xu, David Shriver, Matthew B. Dwyer, Sebastian Elbaum
The field of verification has advanced due to the interplay of theoretical development and empirical evaluation.
no code implementations • 11 Jul 2020 • Dong Xu, Xiao Huang, Joseph Mango, Xiang Li, Zhenlong Li
We propose a multi-exit evacuation simulation based on Deep Reinforcement Learning (DRL), referred to as the MultiExit-DRL, which involves in a Deep Neural Network (DNN) framework to facilitate state-to-action mapping.
no code implementations • CVPR 2021 • Ming-Han Yang, Dong Xu, Hongyu Chen, Zaiwen Wen, Mengyun Chen
In this paper, we consider stochastic second-order methods for minimizing a finite summation of nonconvex functions.
1 code implementation • 10 Jun 2020 • Ming-Han Yang, Dong Xu, Zaiwen Wen, Mengyun Chen, Pengxiang Xu
Experiments on the distributed large-batch training show that the scaling efficiency is quite reasonable.
no code implementations • ECCV 2020 • Guo Lu, Chunlei Cai, Xiaoyun Zhang, Li Chen, Wanli Ouyang, Dong Xu, Zhiyong Gao
Therefore, the encoder is adaptive to different video contents and achieves better compression performance by reducing the domain gap between the training and testing datasets.
no code implementations • 15 Mar 2020 • Jinyang Guo, Wanli Ouyang, Dong Xu
To this end, we propose a new strategy to suppress the influence of unimportant features (i. e., the features will be removed at the next pruning stage).
2 code implementations • 9 Feb 2020 • Jiaheng Liu, Guo Lu, Zhihao Hu, Dong Xu
Our EDIC method can also be readily incorporated with the Deep Video Compression (DVC) framework to further improve the video compression performance.
no code implementations • 28 Dec 2019 • Xiao Huang, Dong Xu, Zhenlong Li, Cuizhen Wang
The results of this study prove the possibility of multispectral-to-nighttime translation and further indicate that, with the additional social media data, the generated nighttime imagery can be very similar to the ground-truth imagery.
1 code implementation • 2 Oct 2019 • Eleanor Quint, Dong Xu, Samuel Flint, Stephen Scott, Matthew Dwyer
In order to satisfy safety conditions, an agent may be constrained from acting freely.
no code implementations • 21 Sep 2019 • Zehui Yao, Boyan Zhang, Zhiyong Wang, Wanli Ouyang, Dong Xu, Dagan Feng
For example, given two image domains $X_1$ and $X_2$ with certain attributes, the intersection $X_1 \cap X_2$ denotes a new domain where images possess the attributes from both $X_1$ and $X_2$ domains.
1 code implementation • 6 Aug 2019 • David Shriver, Dong Xu, Sebastian Elbaum, Matthew B. Dwyer
Deep neural networks (DNN) are growing in capability and applicability.
1 code implementation • 26 Jul 2019 • Ming Liu, Dongpeng Liu, Guangyu Sun, Yi Zhao, Duolin Wang, Fangxing Liu, Xiang Fang, Qing He, Dong Xu
Detecting inaccurate smart meters and targeting them for replacement can save significant resources.
no code implementations • CVPR 2019 • Rui Su, Wanli Ouyang, Luping Zhou, Dong Xu
Specifically, we first generate a larger set of region proposals by combining the latest region proposals from both streams, from which we can readily obtain a larger set of labelled training samples to help learn better action detection models.
no code implementations • 26 May 2019 • Dong Xu, Wu-Jun Li
HAS adopts a hashing strategy to learn a binary matrix representation for each answer, which can dramatically reduce the memory cost for storing the matrix representations of answers.
no code implementations • 26 May 2019 • Dong Xu, Jianhui Ji, Haikuan Huang, Hongbo Deng, Wu-Jun Li
Nevertheless, it is difficult for RNN based models to capture the information about long-range dependency among words in the sentences of questions and answers.
4 code implementations • CVPR 2019 • Guo Lu, Wanli Ouyang, Dong Xu, Xiaoyun Zhang, Chunlei Cai, Zhiyong Gao
Conventional video compression approaches use the predictive coding architecture and encode the corresponding motion information and residual information.
no code implementations • 27 Sep 2018 • Dong Xu, Eleanor Quint, Zeynep Hakguder, Haluk Dogan, Stephen Scott, Matthew Dwyer
We study the problem of deep reinforcement learning where the agent's action sequences are constrained, e. g., prohibition of dithering or overactuating action sequences that might damage a robot, drone, or other physical device.
no code implementations • ECCV 2018 • Dongang Wang, Wanli Ouyang, Wen Li, Dong Xu
We then train view-specific action classifiers based on the view-specific representation for each view and a view classifier based on the shared representation at lower layers.
1 code implementation • ECCV 2018 • Guo Lu, Wanli Ouyang, Dong Xu, Xiaoyun Zhang, Zhiyong Gao, Ming-Ting Sun
In this paper, we model the video artifact reduction task as a Kalman filtering procedure and restore decoded frames through a deep Kalman filtering network.
1 code implementation • CVPR 2018 • Weichen Zhang, Wanli Ouyang, Wen Li, Dong Xu
In this paper, we propose a new unsupervised domain adaptation approach called Collaborative and Adversarial Network (CAN) through domain-collaborative and domain-adversarial training of neural networks.
no code implementations • ICCV 2017 • Hehe Fan, Xiaojun Chang, De Cheng, Yi Yang, Dong Xu, Alexander G. Hauptmann
relevant) to the given event class, we formulate this task as a multi-instance learning (MIL) problem by taking each video as a bag and the video shots in each video as instances.
no code implementations • 12 Sep 2017 • Chao Fang, Yi Shang, Dong Xu
Results: Here, a very deep neural network, the deep inception-inside-inception networks (Deep3I), is proposed for protein secondary structure prediction and a software tool was implemented using this network.
no code implementations • 19 Jul 2017 • Erbo Li, Hanlin Mo, Dong Xu, Hua Li
In this paper, we propose relative projective differential invariants (RPDIs) which are invariant to general projective transformations.
no code implementations • CVPR 2017 • Dingwen Zhang, Le Yang, Deyu Meng, Dong Xu, Junwei Han
Object segmentation in weakly labelled videos is an interesting yet challenging task, which aims at learning to perform category-specific video object segmentation by only using video-level tags.
no code implementations • 26 Jun 2017 • Jun Liu, Amir Shahroudy, Dong Xu, Alex C. Kot, Gang Wang
Skeleton-based human action recognition has attracted a lot of research attention during the past few years.
Ranked #6 on
One-Shot 3D Action Recognition
on NTU RGB+D 120
no code implementations • 11 May 2017 • Jing Zhang, Wanqing Li, Philip Ogunbona, Dong Xu
This paper takes a problem-oriented perspective and presents a comprehensive review of transfer learning methods, both shallow and deep, for cross-dataset visual recognition.
no code implementations • 7 Mar 2017 • Erbo Li, Yazhou Huang, Dong Xu, Hua Li
Two fundamental building blocks or generating functions (GFs) for invariants are discovered, which are dot product and vector product of point vectors in Euclidean space.
no code implementations • 22 Nov 2016 • Tianrong Rao, Min Xu, Dong Xu
The proposed MldrNet combines deep representations of different levels, i. e. image semantics, image aesthetics, and low-level visual features to effectively classify the emotion types of different kinds of images, such as abstract paintings and web images.
no code implementations • European Conference on Computer Vision 2016 • Rahul Rama Varior, Bing Shuai, Jiwen Lu, Dong Xu, Gang Wang
Matching pedestrians across multiple camera views known as human re-identification (re-identification) is a challenging problem in visual surveillance.
no code implementations • 24 Jul 2016 • Jun Liu, Amir Shahroudy, Dong Xu, Gang Wang
To handle the noise and occlusion in 3D skeleton data, we introduce new gating mechanism within LSTM to learn the reliability of the sequential input data and accordingly adjust its effect on updating the long-term context information stored in the memory cell.
Ranked #8 on
Skeleton Based Action Recognition
on SBU
no code implementations • 19 Jun 2016 • Dong Xu, Wu-Jun Li
Hence, these existing models don't put supervision (loss or similarity calculation) at every time step, which will lose some useful information.
no code implementations • CVPR 2016 • Mingkui Tan, Shijie Xiao, Junbin Gao, Dong Xu, Anton Van Den Hengel, Qinfeng Shi
Trace-norm regularization plays an important role in many areas such as machine learning and computer vision.
no code implementations • CVPR 2016 • Wen Li, Dengxin Dai, Mingkui Tan, Dong Xu, Luc van Gool
The SVM+ approach has shown excellent performance in visual recognition tasks for exploiting privileged information in the training data.
no code implementations • 3 Jan 2016 • Tongliang Liu, DaCheng Tao, Dong Xu
Can we obtain dimensionality-dependent generalization bounds for $k$-dimensional coding schemes that are tighter than dimensionality-independent bounds when data is in a finite-dimensional feature space?
no code implementations • ICCV 2015 • Li Niu, Wen Li, Dong Xu
Considering the recent works show the domain generalization capability can be enhanced by fusing multiple SVM classifiers, we build upon exemplar SVMs to learn a set of SVM classifiers by using one positive sample and all negative samples in the source domain each time.
no code implementations • CVPR 2015 • Shijie Xiao, Wen Li, Dong Xu, DaCheng Tao
In this paper, we develop a fast LRR solver called FaLRR, by reformulating LRR as a new optimization problem with regard to factorized data (which is obtained by skinny SVD of the original data matrix).
no code implementations • CVPR 2015 • Huazhu Fu, Dong Xu, Stephen Lin, Jiang Liu
We present an object-based co-segmentation method that takes advantage of depth data and is able to correctly handle noisy images in which the common foreground object is missing.
no code implementations • CVPR 2015 • Li Niu, Wen Li, Dong Xu
In this work, we formulate a new weakly supervised domain generalization problem for the visual recognition task by using loosely labeled web images/videos as training data.
no code implementations • 10 Mar 2015 • Mingkui Tan, Shijie Xiao, Junbin Gao, Dong Xu, Anton Van Den Hengel, Qinfeng Shi
Nuclear-norm regularization plays a vital role in many learning tasks, such as low-rank matrix recovery (MR), and low-rank representation (LRR).
no code implementations • CVPR 2014 • Lin Chen, Wen Li, Dong Xu
In this work, we propose a new framework for recognizing RGB images captured by the conventional cameras by leveraging a set of labeled RGB-D data, in which the depth features can be additionally extracted from the depth images.
no code implementations • CVPR 2014 • Huazhu Fu, Dong Xu, Bao Zhang, Stephen Lin
We present a video co-segmentation method that uses category-independent object proposals as its basic element and can extract multiple foreground objects in a video set.
no code implementations • CVPR 2013 • Zinan Zeng, Shijie Xiao, Kui Jia, Tsung-Han Chan, Shenghua Gao, Dong Xu, Yi Ma
Our framework is motivated by the observation that samples from the same class repetitively appear in the collection of ambiguously labeled training images, while they are just ambiguously labeled in each image.
no code implementations • CVPR 2013 • Lin Chen, Lixin Duan, Dong Xu
In this work, we propose to leverage a large number of loosely labeled web videos (e. g., from YouTube) and web images (e. g., from Google/Bing image search) for visual event recognition in consumer videos without requiring any labeled consumer videos.
no code implementations • CVPR 2013 • Zhen Cui, Wen Li, Dong Xu, Shiguang Shan, Xilin Chen
Spatial-Temporal Face Region Descriptor, STFRD) for images (resp.