Search Results for author: Jiajun Deng

Found 51 papers, 28 papers with code

Agent3D-Zero: An Agent for Zero-shot 3D Understanding

no code implementations • 18 Mar 2024 • Sha Zhang, Di Huang, Jiajun Deng, Shixiang Tang, Wanli Ouyang, Tong He, Yanyong Zhang

The ability to understand and reason the 3D real world is a crucial milestone towards artificial general intelligence.

Language Modelling Scene Understanding

Paper
Add Code

HVDistill: Transferring Knowledge from Images to Point Clouds via Unsupervised Hybrid-View Distillation

1 code implementation • 18 Mar 2024 • Sha Zhang, Jiajun Deng, Lei Bai, Houqiang Li, Wanli Ouyang, Yanyong Zhang

We present a hybrid-view-based knowledge distillation framework, termed HVDistill, to guide the feature learning of a point cloud neural network with a pre-trained image network in an unsupervised man- ner.

Knowledge Distillation NER +1

Paper
Code

PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest

no code implementations • 14 Mar 2024 • Jiajun Deng, Sha Zhang, Feras Dayoub, Wanli Ouyang, Yanyong Zhang, Ian Reid

In this work, we present PoIFusion, a simple yet effective multi-modal 3D object detection framework to fuse the information of RGB images and LiDAR point clouds at the point of interest (abbreviated as PoI).

3D Object Detection Object +1

Paper
Add Code

DeepEraser: Deep Iterative Context Mining for Generic Text Eraser

1 code implementation • 29 Feb 2024 • Hao Feng, Wendi Wang, Shaokai Liu, Jiajun Deng, Wengang Zhou, Houqiang Li

In this work, we present DeepEraser, an effective deep network for generic text removal.

Paper
Code

Cycle-Consistency Learning for Captioning and Grounding

no code implementations • 23 Dec 2023 • Ning Wang, Jiajun Deng, Mingbo Jia

The proposed framework (1) allows the semi-weakly supervised training of visual grounding; (2) improves the performance of fully supervised visual grounding; (3) yields a general captioning model that can describe arbitrary image regions.

Image Captioning Visual Grounding

Paper
Add Code

Towards Automatic Data Augmentation for Disordered Speech Recognition

no code implementations • 14 Dec 2023 • Zengrui Jin, Xurong Xie, Tianzi Wang, Mengzhe Geng, Jiajun Deng, Guinan Li, Shujie Hu, Xunying Liu

Automatic recognition of disordered speech remains a highly challenging task to date due to data scarcity.

Data Augmentation Reinforcement Learning (RL) +2

Paper
Add Code

FlashOcc: Fast and Memory-Efficient Occupancy Prediction via Channel-to-Height Plugin

1 code implementation • 18 Nov 2023 • Zichen Yu, Changyong Shu, Jiajun Deng, Kangjie Lu, Zongdai Liu, Jiangyong Yu, Dawei Yang, Hui Li, Yan Chen

We apply the FlashOCC to diverse occupancy prediction baselines on the challenging Occ3D-nuScenes benchmarks and conduct extensive experiments to validate the effectiveness.

3D Object Detection Autonomous Driving +1

131

Paper
Code

I$^2$MD: 3D Action Representation Learning with Inter- and Intra-modal Mutual Distillation

no code implementations • 24 Oct 2023 • Yunyao Mao, Jiajun Deng, Wengang Zhou, Zhenbo Lu, Wanli Ouyang, Houqiang Li

Different from existing distillation solutions that transfer the knowledge of a pre-trained and fixed teacher to the student, in CMD, the knowledge is continuously updated and bidirectionally distilled between modalities during pre-training.

Contrastive Learning Representation Learning

Paper
Add Code

Invariant Training 2D-3D Joint Hard Samples for Few-Shot Point Cloud Recognition

no code implementations • ICCV 2023 • Xuanyu Yi, Jiajun Deng, Qianru Sun, Xian-Sheng Hua, Joo-Hwee Lim, Hanwang Zhang

We tackle the data scarcity challenge in few-shot point cloud recognition of 3D objects by using a joint prediction from a conventional 3D model and a well-trained 2D model.

3D Shape Classification Retrieval

Paper
Add Code

SimFIR: A Simple Framework for Fisheye Image Rectification with Self-supervised Representation Learning

no code implementations • ICCV 2023 • Hao Feng, Wendi Wang, Jiajun Deng, Wengang Zhou, Li Li, Houqiang Li

To make the best of such rectification cues, we introduce SimFIR, a simple framework for fisheye image rectification based on self-supervised representation learning.

Representation Learning

Paper
Add Code

Masked Motion Predictors are Strong 3D Action Representation Learners

1 code implementation • ICCV 2023 • Yunyao Mao, Jiajun Deng, Wengang Zhou, Yao Fang, Wanli Ouyang, Houqiang Li

To be specific, the proposed MAMP takes as input the masked spatio-temporal skeleton sequence and predicts the corresponding temporal motion of the masked human joints.

Ranked #5 on Skeleton Based Action Recognition on NTU RGB+D 120

motion prediction Skeleton Based Action Recognition

Paper
Code

Cyclic-Bootstrap Labeling for Weakly Supervised Object Detection

1 code implementation • ICCV 2023 • Yufei Yin, Jiajun Deng, Wengang Zhou, Li Li, Houqiang Li

These inaccurate high-scoring region proposals will mislead the training of subsequent refinement modules and thus hamper the detection performance.

Object object-detection +1

Paper
Code

Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and Recognition

no code implementations • 6 Jul 2023 • Guinan Li, Jiajun Deng, Mengzhe Geng, Zengrui Jin, Tianzi Wang, Shujie Hu, Mingyu Cui, Helen Meng, Xunying Liu

Accurate recognition of cocktail party speech containing overlapping speakers, noise and reverberation remains a highly challenging task to date.

Speech Dereverberation Speech Separation

Paper
Add Code

Hyper-parameter Adaptation of Conformer ASR Systems for Elderly and Dysarthric Speech Recognition

no code implementations • 27 Jun 2023 • Tianzi Wang, Shoukang Hu, Jiajun Deng, Zengrui Jin, Mengzhe Geng, Yi Wang, Helen Meng, Xunying Liu

Automatic recognition of disordered and elderly speech remains highly challenging tasks to date due to data scarcity.

Domain Adaptation speech-recognition +1

Paper
Add Code

Factorised Speaker-environment Adaptive Training of Conformer Speech Recognition Systems

no code implementations • 26 Jun 2023 • Jiajun Deng, Guinan Li, Xurong Xie, Zengrui Jin, Mingyu Cui, Tianzi Wang, Shujie Hu, Mengzhe Geng, Xunying Liu

Rich sources of variability in natural speech present significant challenges to current data intensive speech recognition technologies.

speech-recognition Speech Recognition +1

Paper
Add Code

Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems

no code implementations • 23 Jun 2023 • Mingyu Cui, Jiawen Kang, Jiajun Deng, Xi Yin, Yutao Xie, Xie Chen, Xunying Liu

Current ASR systems are mainly trained and evaluated at the utterance level.

speech-recognition Speech Recognition

Paper
Add Code

Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection

1 code implementation • CVPR 2023 • Yingjie Wang, Jiajun Deng, Yao Li, Jinshui Hu, Cong Liu, Yu Zhang, Jianmin Ji, Wanli Ouyang, Yanyong Zhang

LiDAR and Radar are two complementary sensing approaches in that LiDAR specializes in capturing an object's 3D shape while Radar provides longer detection ranges as well as velocity hints.

object-detection Object Detection

Paper
Code

Use of Speech Impairment Severity for Dysarthric Speech Recognition

no code implementations • 18 May 2023 • Mengzhe Geng, Zengrui Jin, Tianzi Wang, Shujie Hu, Jiajun Deng, Mingyu Cui, Guinan Li, Jianwei Yu, Xurong Xie, Xunying Liu

A key challenge in dysarthric speech recognition is the speaker-level diversity attributed to both speaker-identity associated factors such as gender, and speech impairment severity.

severity prediction speech-recognition +1

Paper
Add Code

Deep Unrestricted Document Image Rectification

1 code implementation • 18 Apr 2023 • Hao Feng, Shaokai Liu, Jiajun Deng, Wengang Zhou, Houqiang Li

To our best knowledge, this is the first learning-based method for the rectification of unrestricted document images.

Ranked #1 on Local Distortion on DocUNet

Local Distortion

340

Paper
Code

Exploring Self-supervised Pre-trained ASR Models For Dysarthric and Elderly Speech Recognition

no code implementations • 28 Feb 2023 • Shujie Hu, Xurong Xie, Zengrui Jin, Mengzhe Geng, Yi Wang, Mingyu Cui, Jiajun Deng, Xunying Liu, Helen Meng

Experiments conducted on the UASpeech dysarthric and DementiaBank Pitt elderly speech corpora suggest TDNN and Conformer ASR systems integrated domain adapted wav2vec2. 0 models consistently outperform the standalone wav2vec2. 0 models by statistically significant WER reductions of 8. 22% and 3. 43% absolute (26. 71% and 15. 88% relative) on the two tasks respectively.

speech-recognition Speech Recognition

Paper
Add Code

Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems

1 code implementation • 15 Feb 2023 • Jiajun Deng, Xurong Xie, Tianzi Wang, Mingyu Cui, Boyang Xue, Zengrui Jin, Guinan Li, Shujie Hu, Xunying Liu

Practical application of unsupervised model-based speaker adaptation techniques to data intensive end-to-end ASR systems is hindered by the scarcity of speaker-level data and performance sensitivity to transcription errors.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

Recurrent Generic Contour-based Instance Segmentation with Progressive Learning

1 code implementation • 21 Jan 2023 • Hao Feng, Keyi Zhou, Wengang Zhou, Yufei Yin, Jiajun Deng, Qi Sun, Houqiang Li

It maintains a single estimate of the contour that is progressively deformed toward the object boundary.

Ranked #1 on Semantic Contour Prediction on Sbd val

Instance Segmentation Lane Detection +6

Paper
Code

OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for Multi-Camera 3D Object Detection

no code implementations • 13 Jan 2023 • Xiaomeng Chu, Jiajun Deng, Yuan Zhao, Jianmin Ji, Yu Zhang, Houqiang Li, Yanyong Zhang

To this end, we propose OA-BEV, a network that can be plugged into the BEV-based 3D object detection framework to bring out the objects by incorporating object-aware pseudo-3D features and depth features.

3D Object Detection Object +1

Paper
Add Code

3DPPE: 3D Point Positional Encoding for Transformer-based Multi-Camera 3D Object Detection

1 code implementation • ICCV 2023 • Changyong Shu, Jiajun Deng, Fisher Yu, Yifan Liu

Although 3D measurements are not available at the inference time of monocular 3D object detection, 3DPPE uses predicted depth to approximate the real point positions.

Monocular 3D Object Detection object-detection

Paper
Code

3DPPE: 3D Point Positional Encoding for Multi-Camera 3D Object Detection Transformers

1 code implementation • 27 Nov 2022 • Changyong Shu, Jiajun Deng, Fisher Yu, Yifan Liu

Although 3D measurements are not available at the inference time of monocular 3D object detection, 3DPPE uses predicted depth to approximate the real point positions.

Monocular 3D Object Detection Monocular Depth Estimation +1

Paper
Code

Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition

no code implementations • 3 Nov 2022 • Zengrui Jin, Xurong Xie, Mengzhe Geng, Tianzi Wang, Shujie Hu, Jiajun Deng, Guinan Li, Xunying Liu

After LHUC speaker adaptation, the best system using VAE-GAN based augmentation produced an overall WER of 27. 78% on the UASpeech test set of 16 dysarthric speakers, and the lowest published WER of 57. 31% on the subset of speakers with "Very Low" intelligibility.

Data Augmentation Generative Adversarial Network +2

Paper
Add Code

Exploiting prompt learning with pre-trained language models for Alzheimer's Disease detection

1 code implementation • 29 Oct 2022 • Yi Wang, Jiajun Deng, Tianzi Wang, Bo Zheng, Shoukang Hu, Xunying Liu, Helen Meng

Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating preventive care and to delay further progression.

Alzheimer's Disease Detection Language Modelling +1

Paper
Code

Geometric Representation Learning for Document Image Rectification

2 code implementations • 15 Oct 2022 • Hao Feng, Wengang Zhou, Jiajun Deng, Yuechen Wang, Houqiang Li

In document image rectification, there exist rich geometric constraints between the distorted image and the ground truth one.

Representation Learning

Paper
Code

CMD: Self-supervised 3D Action Representation Learning with Cross-modal Mutual Distillation

1 code implementation • 26 Aug 2022 • Yunyao Mao, Wengang Zhou, Zhenbo Lu, Jiajun Deng, Houqiang Li

In this work, we formulate the cross-modal interaction as a bidirectional knowledge distillation problem.

3D Action Recognition Knowledge Distillation +1

Paper
Code

Confidence Score Based Conformer Speaker Adaptation for Speech Recognition

no code implementations • 24 Jun 2022 • Jiajun Deng, Xurong Xie, Tianzi Wang, Mingyu Cui, Boyang Xue, Zengrui Jin, Mengzhe Geng, Guinan Li, Xunying Liu, Helen Meng

A key challenge for automatic speech recognition (ASR) systems is to model the speaker level variability.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Conformer Based Elderly Speech Recognition System for Alzheimer's Disease Detection

no code implementations • 23 Jun 2022 • Tianzi Wang, Jiajun Deng, Mengzhe Geng, Zi Ye, Shoukang Hu, Yi Wang, Mingyu Cui, Zengrui Jin, Xunying Liu, Helen Meng

Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating preventive care to delay further progression.

Alzheimer's Disease Detection Data Augmentation +3

Paper
Add Code

Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems

no code implementations • 23 Jun 2022 • Mingyu Cui, Jiajun Deng, Shoukang Hu, Xurong Xie, Tianzi Wang, Shujie Hu, Mengzhe Geng, Boyang Xue, Xunying Liu, Helen Meng

Fundamental modelling differences between hybrid and end-to-end (E2E) automatic speech recognition (ASR) systems create large diversity and complementarity among them.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition

no code implementations • 15 Jun 2022 • Shujie Hu, Xurong Xie, Mengzhe Geng, Mingyu Cui, Jiajun Deng, Guinan Li, Tianzi Wang, Xunying Liu, Helen Meng

Articulatory features are inherently invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition (ASR) systems designed for normal speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

TransVG++: End-to-End Visual Grounding with Language Conditioned Vision Transformer

1 code implementation • 14 Jun 2022 • Jiajun Deng, Zhengyuan Yang, Daqing Liu, Tianlang Chen, Wengang Zhou, Yanyong Zhang, Houqiang Li, Wanli Ouyang

For another, we devise Language Conditioned Vision Transformer that removes external fusion modules and reuses the uni-modal ViT for vision-language fusion at the intermediate layers.

Visual Grounding

150

Paper
Code

Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition

no code implementations • 13 May 2022 • Zengrui Jin, Mengzhe Geng, Jiajun Deng, Tianzi Wang, Shujie Hu, Guinan Li, Xunying Liu

Despite the rapid progress of automatic speech recognition (ASR) technologies targeting normal speech, accurate recognition of dysarthric and elderly speech remains highly challenging tasks to date.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Audio-visual multi-channel speech separation, dereverberation and recognition

no code implementations • 5 Apr 2022 • Guinan Li, Jianwei Yu, Jiajun Deng, Xunying Liu, Helen Meng

Despite the rapid advance of automatic speech recognition (ASR) technologies, accurate recognition of cocktail party speech characterised by the interference from overlapping speakers, background noise and room reverberation remains a highly challenging task to date.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks

1 code implementation • 8 Jan 2022 • Shoukang Hu, Xurong Xie, Mingyu Cui, Jiajun Deng, Shansong Liu, Jianwei Yu, Mengzhe Geng, Xunying Liu, Helen Meng

State-of-the-art automatic speech recognition (ASR) system development is data and computation intensive.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

VPFNet: Improving 3D Object Detection with Virtual Point based LiDAR and Stereo Data Fusion

no code implementations • 29 Nov 2021 • Hanqi Zhu, Jiajun Deng, Yu Zhang, Jianmin Ji, Qiuyu Mao, Houqiang Li, Yanyong Zhang

However, this approach often suffers from the mismatch between the resolution of point clouds and RGB images, leading to sub-optimal performance.

3D Object Detection Data Augmentation +2

Paper
Add Code

DocScanner: Robust Document Image Rectification with Progressive Learning

3 code implementations • 28 Oct 2021 • Hao Feng, Wengang Zhou, Jiajun Deng, Qi Tian, Houqiang Li

The iterative refinements make DocScanner converge to a robust and superior rectification performance, while the lightweight recurrent architecture ensures the running efficiency.

Optical Character Recognition (OCR)

331

Paper
Code

DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction

2 code implementations • 25 Oct 2021 • Hao Feng, Yuechen Wang, Wengang Zhou, Jiajun Deng, Houqiang Li

Specifically, DocTr consists of a geometric unwarping transformer and an illumination correction transformer.

Optical Character Recognition (OCR)

331

Paper
Code

From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object Detection

1 code implementation • 30 Jul 2021 • Jiajun Deng, Wengang Zhou, Yanyong Zhang, Houqiang Li

To this end, in this work, we regard point clouds as hollow-3D data and propose a new architecture, namely Hallucinated Hollow-3D R-CNN ($\text{H}^2$3D R-CNN), to address the problem of 3D object detection.

3D Object Detection object-detection +1

Paper
Code

Neighbor-Vote: Improving Monocular 3D Object Detection through Neighbor Distance Voting

1 code implementation • 6 Jul 2021 • Xiaomeng Chu, Jiajun Deng, Yao Li, Zhenxun Yuan, Yanyong Zhang, Jianmin Ji, Yu Zhang

As cameras are increasingly deployed in new application domains such as autonomous driving, performing 3D object detection on monocular images becomes an important task for visual scene understanding.

Autonomous Driving Monocular 3D Object Detection +4

Paper
Code

Weakly Supervised Temporal Adjacent Network for Language Grounding

1 code implementation • 30 Jun 2021 • Yuechen Wang, Jiajun Deng, Wengang Zhou, Houqiang Li

To this end, we introduce a novel weakly supervised temporal adjacent network (WSTAN) for temporal language grounding.

Multiple Instance Learning Sentence

Paper
Code

Multi-Modal 3D Object Detection in Autonomous Driving: a Survey

no code implementations • 24 Jun 2021 • Yingjie Wang, Qiuyu Mao, Hanqi Zhu, Jiajun Deng, Yu Zhang, Jianmin Ji, Houqiang Li, Yanyong Zhang

In this survey, we first introduce the background of popular sensors used for self-driving, their data properties, and the corresponding object detection algorithms.

3D Object Detection Autonomous Driving +3

Paper
Add Code

TransVG: End-to-End Visual Grounding with Transformers

2 code implementations • ICCV 2021 • Jiajun Deng, Zhengyuan Yang, Tianlang Chen, Wengang Zhou, Houqiang Li

In this paper, we present a neat yet effective transformer-based framework for visual grounding, namely TransVG, to address the task of grounding a language query to the corresponding region onto an image.

Ranked #14 on Referring Expression Comprehension on RefCOCO

Referring Expression Comprehension Visual Grounding

150

Paper
Code

PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection

1 code implementation • 31 Jan 2021 • Shaoshuai Shi, Li Jiang, Jiajun Deng, Zhe Wang, Chaoxu Guo, Jianping Shi, Xiaogang Wang, Hongsheng Li

3D object detection is receiving increasing attention from both industry and academia thanks to its wide applications in various fields.

Ranked #2 on 3D Object Detection on KITTI Cars Easy val

3D Object Detection Object +1

4,308

Paper
Code

Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection

5 code implementations • 31 Dec 2020 • Jiajun Deng, Shaoshuai Shi, Peiwei Li, Wengang Zhou, Yanyong Zhang, Houqiang Li

In this paper, we take a slightly different viewpoint -- we find that precise positioning of raw points is not essential for high performance 3D object detection and that the coarse voxel granularity can also offer sufficient detection accuracy.

Ranked #4 on 3D Object Detection on KITTI Cars Moderate val

3D Object Detection object-detection +2

4,308

Paper
Code

Masked Contrastive Representation Learning for Reinforcement Learning

1 code implementation • 15 Oct 2020 • Jinhua Zhu, Yingce Xia, Lijun Wu, Jiajun Deng, Wengang Zhou, Tao Qin, Houqiang Li

During inference, the CNN encoder and the policy network are used to take actions, and the Transformer module is discarded.

Atari Games Contrastive Learning +3

Paper
Code

Single Shot Video Object Detector

1 code implementation • 7 Jul 2020 • Jiajun Deng, Yingwei Pan, Ting Yao, Wengang Zhou, Houqiang Li, Tao Mei

Single shot detectors that are potentially faster and simpler than two-stage detectors tend to be more applicable to object detection in videos.

Object object-detection +2

Paper
Code

Adaptive Offline Quintuplet Loss for Image-Text Matching

1 code implementation • ECCV 2020 • Tianlang Chen, Jiajun Deng, Jiebo Luo

For each image or text anchor in a training mini-batch, the model is trained to distinguish between a positive and the most confusing negative of the anchor mined from the mini-batch (i. e. online hard negative).

Image-text matching Text Matching

Paper
Code

Relation Distillation Networks for Video Object Detection

2 code implementations • ICCV 2019 • Jiajun Deng, Yingwei Pan, Ting Yao, Wengang Zhou, Houqiang Li, Tao Mei

In this paper, we introduce a new design to capture the interactions across the objects in spatio-temporal context.

Object object-detection +3

562

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.