Search Results for author: Jiajun Deng

Found 51 papers, 28 papers with code

Agent3D-Zero: An Agent for Zero-shot 3D Understanding

no code implementations18 Mar 2024 Sha Zhang, Di Huang, Jiajun Deng, Shixiang Tang, Wanli Ouyang, Tong He, Yanyong Zhang

The ability to understand and reason the 3D real world is a crucial milestone towards artificial general intelligence.

Language Modelling Scene Understanding

HVDistill: Transferring Knowledge from Images to Point Clouds via Unsupervised Hybrid-View Distillation

1 code implementation18 Mar 2024 Sha Zhang, Jiajun Deng, Lei Bai, Houqiang Li, Wanli Ouyang, Yanyong Zhang

We present a hybrid-view-based knowledge distillation framework, termed HVDistill, to guide the feature learning of a point cloud neural network with a pre-trained image network in an unsupervised man- ner.

Knowledge Distillation NER +1

PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest

no code implementations14 Mar 2024 Jiajun Deng, Sha Zhang, Feras Dayoub, Wanli Ouyang, Yanyong Zhang, Ian Reid

In this work, we present PoIFusion, a simple yet effective multi-modal 3D object detection framework to fuse the information of RGB images and LiDAR point clouds at the point of interest (abbreviated as PoI).

3D Object Detection Object +1

DeepEraser: Deep Iterative Context Mining for Generic Text Eraser

1 code implementation29 Feb 2024 Hao Feng, Wendi Wang, Shaokai Liu, Jiajun Deng, Wengang Zhou, Houqiang Li

In this work, we present DeepEraser, an effective deep network for generic text removal.

Cycle-Consistency Learning for Captioning and Grounding

no code implementations23 Dec 2023 Ning Wang, Jiajun Deng, Mingbo Jia

The proposed framework (1) allows the semi-weakly supervised training of visual grounding; (2) improves the performance of fully supervised visual grounding; (3) yields a general captioning model that can describe arbitrary image regions.

Image Captioning Visual Grounding

FlashOcc: Fast and Memory-Efficient Occupancy Prediction via Channel-to-Height Plugin

1 code implementation18 Nov 2023 Zichen Yu, Changyong Shu, Jiajun Deng, Kangjie Lu, Zongdai Liu, Jiangyong Yu, Dawei Yang, Hui Li, Yan Chen

We apply the FlashOCC to diverse occupancy prediction baselines on the challenging Occ3D-nuScenes benchmarks and conduct extensive experiments to validate the effectiveness.

3D Object Detection Autonomous Driving +1

I$^2$MD: 3D Action Representation Learning with Inter- and Intra-modal Mutual Distillation

no code implementations24 Oct 2023 Yunyao Mao, Jiajun Deng, Wengang Zhou, Zhenbo Lu, Wanli Ouyang, Houqiang Li

Different from existing distillation solutions that transfer the knowledge of a pre-trained and fixed teacher to the student, in CMD, the knowledge is continuously updated and bidirectionally distilled between modalities during pre-training.

Contrastive Learning Representation Learning

Invariant Training 2D-3D Joint Hard Samples for Few-Shot Point Cloud Recognition

no code implementations ICCV 2023 Xuanyu Yi, Jiajun Deng, Qianru Sun, Xian-Sheng Hua, Joo-Hwee Lim, Hanwang Zhang

We tackle the data scarcity challenge in few-shot point cloud recognition of 3D objects by using a joint prediction from a conventional 3D model and a well-trained 2D model.

3D Shape Classification Retrieval

SimFIR: A Simple Framework for Fisheye Image Rectification with Self-supervised Representation Learning

no code implementations ICCV 2023 Hao Feng, Wendi Wang, Jiajun Deng, Wengang Zhou, Li Li, Houqiang Li

To make the best of such rectification cues, we introduce SimFIR, a simple framework for fisheye image rectification based on self-supervised representation learning.

Representation Learning

Masked Motion Predictors are Strong 3D Action Representation Learners

1 code implementation ICCV 2023 Yunyao Mao, Jiajun Deng, Wengang Zhou, Yao Fang, Wanli Ouyang, Houqiang Li

To be specific, the proposed MAMP takes as input the masked spatio-temporal skeleton sequence and predicts the corresponding temporal motion of the masked human joints.

motion prediction Skeleton Based Action Recognition

Cyclic-Bootstrap Labeling for Weakly Supervised Object Detection

1 code implementation ICCV 2023 Yufei Yin, Jiajun Deng, Wengang Zhou, Li Li, Houqiang Li

These inaccurate high-scoring region proposals will mislead the training of subsequent refinement modules and thus hamper the detection performance.

Object object-detection +1

Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and Recognition

no code implementations6 Jul 2023 Guinan Li, Jiajun Deng, Mengzhe Geng, Zengrui Jin, Tianzi Wang, Shujie Hu, Mingyu Cui, Helen Meng, Xunying Liu

Accurate recognition of cocktail party speech containing overlapping speakers, noise and reverberation remains a highly challenging task to date.

Speech Dereverberation Speech Separation

Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection

1 code implementation CVPR 2023 Yingjie Wang, Jiajun Deng, Yao Li, Jinshui Hu, Cong Liu, Yu Zhang, Jianmin Ji, Wanli Ouyang, Yanyong Zhang

LiDAR and Radar are two complementary sensing approaches in that LiDAR specializes in capturing an object's 3D shape while Radar provides longer detection ranges as well as velocity hints.

object-detection Object Detection

Use of Speech Impairment Severity for Dysarthric Speech Recognition

no code implementations18 May 2023 Mengzhe Geng, Zengrui Jin, Tianzi Wang, Shujie Hu, Jiajun Deng, Mingyu Cui, Guinan Li, Jianwei Yu, Xurong Xie, Xunying Liu

A key challenge in dysarthric speech recognition is the speaker-level diversity attributed to both speaker-identity associated factors such as gender, and speech impairment severity.

severity prediction speech-recognition +1

Deep Unrestricted Document Image Rectification

1 code implementation18 Apr 2023 Hao Feng, Shaokai Liu, Jiajun Deng, Wengang Zhou, Houqiang Li

To our best knowledge, this is the first learning-based method for the rectification of unrestricted document images.

Local Distortion

Exploring Self-supervised Pre-trained ASR Models For Dysarthric and Elderly Speech Recognition

no code implementations28 Feb 2023 Shujie Hu, Xurong Xie, Zengrui Jin, Mengzhe Geng, Yi Wang, Mingyu Cui, Jiajun Deng, Xunying Liu, Helen Meng

Experiments conducted on the UASpeech dysarthric and DementiaBank Pitt elderly speech corpora suggest TDNN and Conformer ASR systems integrated domain adapted wav2vec2. 0 models consistently outperform the standalone wav2vec2. 0 models by statistically significant WER reductions of 8. 22% and 3. 43% absolute (26. 71% and 15. 88% relative) on the two tasks respectively.

speech-recognition Speech Recognition

Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems

1 code implementation15 Feb 2023 Jiajun Deng, Xurong Xie, Tianzi Wang, Mingyu Cui, Boyang Xue, Zengrui Jin, Guinan Li, Shujie Hu, Xunying Liu

Practical application of unsupervised model-based speaker adaptation techniques to data intensive end-to-end ASR systems is hindered by the scarcity of speaker-level data and performance sensitivity to transcription errors.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for Multi-Camera 3D Object Detection

no code implementations13 Jan 2023 Xiaomeng Chu, Jiajun Deng, Yuan Zhao, Jianmin Ji, Yu Zhang, Houqiang Li, Yanyong Zhang

To this end, we propose OA-BEV, a network that can be plugged into the BEV-based 3D object detection framework to bring out the objects by incorporating object-aware pseudo-3D features and depth features.

3D Object Detection Object +1

3DPPE: 3D Point Positional Encoding for Transformer-based Multi-Camera 3D Object Detection

1 code implementation ICCV 2023 Changyong Shu, Jiajun Deng, Fisher Yu, Yifan Liu

Although 3D measurements are not available at the inference time of monocular 3D object detection, 3DPPE uses predicted depth to approximate the real point positions.

Monocular 3D Object Detection object-detection

3DPPE: 3D Point Positional Encoding for Multi-Camera 3D Object Detection Transformers

1 code implementation27 Nov 2022 Changyong Shu, Jiajun Deng, Fisher Yu, Yifan Liu

Although 3D measurements are not available at the inference time of monocular 3D object detection, 3DPPE uses predicted depth to approximate the real point positions.

Monocular 3D Object Detection Monocular Depth Estimation +1

Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition

no code implementations3 Nov 2022 Zengrui Jin, Xurong Xie, Mengzhe Geng, Tianzi Wang, Shujie Hu, Jiajun Deng, Guinan Li, Xunying Liu

After LHUC speaker adaptation, the best system using VAE-GAN based augmentation produced an overall WER of 27. 78% on the UASpeech test set of 16 dysarthric speakers, and the lowest published WER of 57. 31% on the subset of speakers with "Very Low" intelligibility.

Data Augmentation Generative Adversarial Network +2

Geometric Representation Learning for Document Image Rectification

2 code implementations15 Oct 2022 Hao Feng, Wengang Zhou, Jiajun Deng, Yuechen Wang, Houqiang Li

In document image rectification, there exist rich geometric constraints between the distorted image and the ground truth one.

Representation Learning

Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition

no code implementations15 Jun 2022 Shujie Hu, Xurong Xie, Mengzhe Geng, Mingyu Cui, Jiajun Deng, Guinan Li, Tianzi Wang, Xunying Liu, Helen Meng

Articulatory features are inherently invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition (ASR) systems designed for normal speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

TransVG++: End-to-End Visual Grounding with Language Conditioned Vision Transformer

1 code implementation14 Jun 2022 Jiajun Deng, Zhengyuan Yang, Daqing Liu, Tianlang Chen, Wengang Zhou, Yanyong Zhang, Houqiang Li, Wanli Ouyang

For another, we devise Language Conditioned Vision Transformer that removes external fusion modules and reuses the uni-modal ViT for vision-language fusion at the intermediate layers.

Visual Grounding

Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition

no code implementations13 May 2022 Zengrui Jin, Mengzhe Geng, Jiajun Deng, Tianzi Wang, Shujie Hu, Guinan Li, Xunying Liu

Despite the rapid progress of automatic speech recognition (ASR) technologies targeting normal speech, accurate recognition of dysarthric and elderly speech remains highly challenging tasks to date.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Audio-visual multi-channel speech separation, dereverberation and recognition

no code implementations5 Apr 2022 Guinan Li, Jianwei Yu, Jiajun Deng, Xunying Liu, Helen Meng

Despite the rapid advance of automatic speech recognition (ASR) technologies, accurate recognition of cocktail party speech characterised by the interference from overlapping speakers, background noise and room reverberation remains a highly challenging task to date.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

VPFNet: Improving 3D Object Detection with Virtual Point based LiDAR and Stereo Data Fusion

no code implementations29 Nov 2021 Hanqi Zhu, Jiajun Deng, Yu Zhang, Jianmin Ji, Qiuyu Mao, Houqiang Li, Yanyong Zhang

However, this approach often suffers from the mismatch between the resolution of point clouds and RGB images, leading to sub-optimal performance.

3D Object Detection Data Augmentation +2

DocScanner: Robust Document Image Rectification with Progressive Learning

3 code implementations28 Oct 2021 Hao Feng, Wengang Zhou, Jiajun Deng, Qi Tian, Houqiang Li

The iterative refinements make DocScanner converge to a robust and superior rectification performance, while the lightweight recurrent architecture ensures the running efficiency.

Optical Character Recognition (OCR)

DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction

2 code implementations25 Oct 2021 Hao Feng, Yuechen Wang, Wengang Zhou, Jiajun Deng, Houqiang Li

Specifically, DocTr consists of a geometric unwarping transformer and an illumination correction transformer.

Optical Character Recognition (OCR)

From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object Detection

1 code implementation30 Jul 2021 Jiajun Deng, Wengang Zhou, Yanyong Zhang, Houqiang Li

To this end, in this work, we regard point clouds as hollow-3D data and propose a new architecture, namely Hallucinated Hollow-3D R-CNN ($\text{H}^2$3D R-CNN), to address the problem of 3D object detection.

3D Object Detection object-detection +1

Neighbor-Vote: Improving Monocular 3D Object Detection through Neighbor Distance Voting

1 code implementation6 Jul 2021 Xiaomeng Chu, Jiajun Deng, Yao Li, Zhenxun Yuan, Yanyong Zhang, Jianmin Ji, Yu Zhang

As cameras are increasingly deployed in new application domains such as autonomous driving, performing 3D object detection on monocular images becomes an important task for visual scene understanding.

Autonomous Driving Monocular 3D Object Detection +4

Weakly Supervised Temporal Adjacent Network for Language Grounding

1 code implementation30 Jun 2021 Yuechen Wang, Jiajun Deng, Wengang Zhou, Houqiang Li

To this end, we introduce a novel weakly supervised temporal adjacent network (WSTAN) for temporal language grounding.

Multiple Instance Learning Sentence

Multi-Modal 3D Object Detection in Autonomous Driving: a Survey

no code implementations24 Jun 2021 Yingjie Wang, Qiuyu Mao, Hanqi Zhu, Jiajun Deng, Yu Zhang, Jianmin Ji, Houqiang Li, Yanyong Zhang

In this survey, we first introduce the background of popular sensors used for self-driving, their data properties, and the corresponding object detection algorithms.

3D Object Detection Autonomous Driving +3

TransVG: End-to-End Visual Grounding with Transformers

2 code implementations ICCV 2021 Jiajun Deng, Zhengyuan Yang, Tianlang Chen, Wengang Zhou, Houqiang Li

In this paper, we present a neat yet effective transformer-based framework for visual grounding, namely TransVG, to address the task of grounding a language query to the corresponding region onto an image.

Referring Expression Comprehension Visual Grounding

Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection

5 code implementations31 Dec 2020 Jiajun Deng, Shaoshuai Shi, Peiwei Li, Wengang Zhou, Yanyong Zhang, Houqiang Li

In this paper, we take a slightly different viewpoint -- we find that precise positioning of raw points is not essential for high performance 3D object detection and that the coarse voxel granularity can also offer sufficient detection accuracy.

3D Object Detection object-detection +2

Masked Contrastive Representation Learning for Reinforcement Learning

1 code implementation15 Oct 2020 Jinhua Zhu, Yingce Xia, Lijun Wu, Jiajun Deng, Wengang Zhou, Tao Qin, Houqiang Li

During inference, the CNN encoder and the policy network are used to take actions, and the Transformer module is discarded.

Atari Games Contrastive Learning +3

Single Shot Video Object Detector

1 code implementation7 Jul 2020 Jiajun Deng, Yingwei Pan, Ting Yao, Wengang Zhou, Houqiang Li, Tao Mei

Single shot detectors that are potentially faster and simpler than two-stage detectors tend to be more applicable to object detection in videos.

Object object-detection +2

Adaptive Offline Quintuplet Loss for Image-Text Matching

1 code implementation ECCV 2020 Tianlang Chen, Jiajun Deng, Jiebo Luo

For each image or text anchor in a training mini-batch, the model is trained to distinguish between a positive and the most confusing negative of the anchor mined from the mini-batch (i. e. online hard negative).

Image-text matching Text Matching

Relation Distillation Networks for Video Object Detection

2 code implementations ICCV 2019 Jiajun Deng, Yingwei Pan, Ting Yao, Wengang Zhou, Houqiang Li, Tao Mei

In this paper, we introduce a new design to capture the interactions across the objects in spatio-temporal context.

Object object-detection +3

Cannot find the paper you are looking for? You can Submit a new open access paper.