Search Results for author: Bohan Li

Found 52 papers, 20 papers with code

Challenger: Affordable Adversarial Driving Video Generation

no code implementations21 May 2025 Zhiyuan Xu, Bohan Li, Huan-ang Gao, Mingju Gao, Yong Chen, Ming Liu, Chenxu Yan, Hang Zhao, Shuo Feng, Hao Zhao

Generating photorealistic driving videos has seen significant progress recently, but current methods largely focus on ordinary, non-adversarial scenarios.

Autonomous Driving Video Generation

Communication-Efficient Diffusion Denoising Parallelization via Reuse-then-Predict Mechanism

no code implementations20 May 2025 Kunyun Wang, Bohan Li, Kai Yu, Minyi Guo, Jieru Zhao

Diffusion models have emerged as a powerful class of generative models across various modalities, including image, video, and audio synthesis.

Audio Synthesis Denoising

UAV-Enabled Joint Sensing, Communication, Powering and Backhaul Transmission in Maritime Monitoring Networks

no code implementations18 May 2025 Bohan Li, Jiahao Liu, Yujun Liang, Qian Li, Haochen Liu, Yaoyuan Zhang, Junsheng Mu, Shahid Mumtaz, Sheng Chen

This paper addresses the challenge of energy-constrained maritime monitoring networks by proposing an unmanned aerial vehicle (UAV)-enabled integrated sensing, communication, powering and backhaul transmission scheme with a tailored time-division duplex frame structure.

Scheduling

DiST-4D: Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation

no code implementations19 Mar 2025 Jiazhe Guo, Yikang Ding, Xiwu Chen, Shuo Chen, Bohan Li, Yingshuang Zou, Xiaoyang Lyu, Feiyang Tan, Xiaojuan Qi, Zhiheng Li, Hao Zhao

To address this, we propose DiST-4D, the first disentangled spatiotemporal diffusion framework for 4D driving scene generation, which leverages metric depth as the core geometric representation.

Novel View Synthesis Scene Generation

MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction

no code implementations13 Mar 2025 Yingshuang Zou, Yikang Ding, Chuanrui Zhang, Jiazhe Guo, Bohan Li, Xiaoyang Lyu, Feiyang Tan, Xiaojuan Qi, Haoqian Wang

Recent breakthroughs in radiance fields have significantly advanced 3D scene reconstruction and novel view synthesis (NVS) in autonomous driving.

3DGS 3D Scene Reconstruction +2

Joint Beamforming and Compressed Sensing for Uplink Grant-Free Access

no code implementations9 Mar 2025 Guoqing Xia, Pei Xiao, Bohan Li, Yue Zhang, Huiyu Zhou

Based on this, we further develop a joint adaptive beamforming and subspace pursuit (JABF-SP) algorithm for the multiuser detection and data recovery, with a novel sparsity level decision method without the accurate knowledge of the noise level.

compressed sensing

U-NIAH: Unified RAG and LLM Evaluation for Long Context Needle-In-A-Haystack

1 code implementation1 Mar 2025 Yunfan Gao, Yun Xiong, Wenlong Wu, Zijing Huang, Bohan Li, Haofen Wang

Recent advancements in Large Language Models (LLMs) have expanded their context windows to unprecedented lengths, sparking debates about the necessity of Retrieval-Augmented Generation (RAG).

Hallucination RAG +2

Recent Advances in Discrete Speech Tokens: A Review

no code implementations10 Feb 2025 Yiwei Guo, Zhihan Li, Hankun Wang, Bohan Li, Chongtian Shao, Hanglei Zhang, Chenpeng Du, Xie Chen, Shujie Liu, Kai Yu

The rapid advancement of speech generation technologies in the era of large language models (LLMs) has established discrete speech tokens as a foundational paradigm for speech representation.

Language Modeling Language Modelling +1

LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding

no code implementations23 Dec 2024 Hao Li, Roy Qin, Zhengyu Zou, Diqi He, Bohan Li, Bingquan Dai, Dingewn Zhang, Junwei Han

To this end, we propose a Language-Embedded Surface Field (LangSurf), which accurately aligns the 3D language fields with the surface of objects, facilitating precise 2D and 3D segmentation with text query, widely expanding the downstream tasks such as removal and editing.

3D Semantic Segmentation Scene Understanding

Can Large Language Models Understand You Better? An MBTI Personality Detection Dataset Aligned with Population Traits

1 code implementation17 Dec 2024 Bohan Li, Jiannan Guan, Longxu Dou, Yunlong Feng, Dingzirui Wang, Yang Xu, Enbo Wang, Qiguang Chen, Bichen Wang, Xiao Xu, Yimeng Zhang, Libo Qin, Yanyan Zhao, Qingfu Zhu, Wanxiang Che

In this paper, we optimize the task by constructing MBTIBench, the first manually annotated high-quality MBTI personality detection dataset with soft labels, under the guidance of psychologists.

OccScene: Semantic Occupancy-based Cross-task Mutual Learning for 3D Scene Generation

no code implementations15 Dec 2024 Bohan Li, Xin Jin, Jianan Wang, Yukai Shi, Yasheng Sun, XiaoFeng Wang, Zhuang Ma, Baao Xie, Chao Ma, Xiaokang Yang, Wenjun Zeng

Within OccScene, the perception module can be effectively improved with customized and diverse generated scenes, while the perception priors in return enhance the generation performance for mutual benefits.

Mamba Scene Generation

UniScene: Unified Occupancy-centric Driving Scene Generation

no code implementations CVPR 2025 Bohan Li, Jiazhe Guo, Hongsi Liu, Yingshuang Zou, Yikang Ding, Xiwu Chen, Hu Zhu, Feiyang Tan, Chi Zhang, Tiancai Wang, Shuchang Zhou, Li Zhang, Xiaojuan Qi, Hao Zhao, Mu Yang, Wenjun Zeng, Xin Jin

UniScene employs a progressive generation process that decomposes the complex task of scene generation into two hierarchical steps: (a) first generating semantic occupancy from a customized scene layout as a meta scene representation rich in both semantic and geometric information, and then (b) conditioned on occupancy, generating video and LiDAR data, respectively, with two novel transfer strategies of Gaussian-based Joint Rendering and Prior-guided Sparse Modeling.

Autonomous Driving Scene Generation

R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models

2 code implementations23 Oct 2024 Linger Deng, Yuliang Liu, Bohan Li, Dongliang Luo, Liang Wu, Chengquan Zhang, Pengyuan Lyu, Ziyang Zhang, Gang Zhang, Errui Ding, Yingying Zhu, Xiang Bai

Current geometric data generation approaches, which apply preset templates to generate geometric data or use Large Language Models (LLMs) to rephrase questions and answers (Q&A), unavoidably limit data accuracy and diversity.

Diversity

UAV-Enabled Integrated Sensing and Communication in Maritime Emergency Networks

no code implementations26 Aug 2024 Bohan Li, Jiahao Liu, Yifeng Xiong, Junsheng Mu, Pei Xiao, Sheng Chen

Once the UAV passes the initial operating position, the UAV's trajectory and resource allocation are optimized during the mission period to maximize the end-to-end communication rate under the constraint of minimum sensing QoS.

Integrated sensing and communication ISAC +1

TAPTRv2: Attention-based Position Update Improves Tracking Any Point

no code implementations23 Jul 2024 Hongyang Li, Hao Zhang, Shilong Liu, Zhaoyang Zeng, Feng Li, Tianhe Ren, Bohan Li, Lei Zhang

In this paper, we present TAPTRv2, a Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task.

Position

On the Effectiveness of Acoustic BPE in Decoder-Only TTS

no code implementations4 Jul 2024 Bohan Li, Feiyu Shen, Yiwei Guo, Shuai Wang, Xie Chen, Kai Yu

Discretizing speech into tokens and generating them by a decoder-only model have been a promising direction for text-to-speech (TTS) and spoken language modeling (SLM).

Decoder Diversity +4

Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion

1 code implementation2 Jul 2024 Bohan Li, Jiajun Deng, Wenyao Zhang, Zhujin Liang, Dalong Du, Xin Jin, Wenjun Zeng

To address this problem, we present HTCL, a novel Hierarchical Temporal Context Learning paradigm for improving camera-based semantic scene completion.

3D Semantic Scene Completion valid

Extreme Video Compression with Pre-trained Diffusion Models

1 code implementation14 Feb 2024 Bohan Li, Yiming Liu, Xueyan Niu, Bo Bai, Lei Deng, Deniz Gündüz

The results showcase the potential of exploiting the temporal relations in video data using generative models.

Decoder Image Compression +1

Self-Supervised Dynamic Hypergraph Recommendation based on Hyper-Relational Knowledge Graph

no code implementations15 Aug 2023 Yi Liu, Hongrui Xuan, Bohan Li, Meng Wang, Tong Chen, Hongzhi Yin

However, the long-tail distribution of entities leads to sparsity in supervision signals, which weakens the quality of item representation when utilizing KG enhancement.

Collaborative Filtering Knowledge-Aware Recommendation +2

One at a Time: Progressive Multi-step Volumetric Probability Learning for Reliable 3D Scene Perception

no code implementations22 Jun 2023 Bohan Li, Yasheng Sun, Jingxin Dong, Zheng Zhu, Jinming Liu, Xin Jin, Wenjun Zeng

Numerous studies have investigated the pivotal role of reliable 3D volume representation in scene perception tasks, such as multi-view stereo (MVS) and semantic scene completion (SSC).

Depth Estimation Representation Learning

EMoG: Synthesizing Emotive Co-speech 3D Gesture with Diffusion Model

no code implementations20 Jun 2023 Lianying Yin, Yijun Wang, Tianyu He, Jinming Liu, Wei Zhao, Bohan Li, Xin Jin, Jianxin Lin

In this paper, we present a novel framework (EMoG) to tackle the above challenges with denoising diffusion models: 1) To alleviate the one-to-many problem, we incorporate emotion clues to guide the generation process, making the generation much easier; 2) To model joint correlation, we propose to decompose the difficult gesture generation into two sub-problems: joint correlation modeling and temporal dynamics modeling.

Denoising Gesture Generation

NaviNeRF: NeRF-based 3D Representation Disentanglement by Latent Semantic Navigation

1 code implementation ICCV 2023 Baao Xie, Bohan Li, Zequn Zhang, Junting Dong, Xin Jin, Jingyu Yang, Wenjun Zeng

They are complementary -- the outer navigation is to identify global-view semantic directions, and the inner refinement dedicates to fine-grained attributes.

Disentanglement NeRF

MixPro: Simple yet Effective Data Augmentation for Prompt-based Learning

no code implementations19 Apr 2023 Bohan Li, Longxu Dou, Yutai Hou, Yunlong Feng, Honglin Mu, Qingfu Zhu, Qinghua Sun, Wanxiang Che

Prompt-based learning has shown considerable promise in reformulating various downstream tasks as cloze problems by combining original input with a predetermined template.

Data Augmentation Few-Shot Learning +1

Bridging Stereo Geometry and BEV Representation with Reliable Mutual Interaction for Semantic Scene Completion

1 code implementation24 Mar 2023 Bohan Li, Yasheng Sun, Zhujin Liang, Dalong Du, Zhuanghui Zhang, XiaoFeng Wang, Yunnan Wang, Xin Jin, Wenjun Zeng

However, due to the inherent representation gap between stereo geometry and BEV features, it is non-trivial to bridge them for dense prediction task of SSC.

3D Semantic Scene Completion Hallucination +2

Semantic-Guided Generative Image Augmentation Method with Diffusion Models for Image Classification

no code implementations4 Feb 2023 Bohan Li, Xiao Xu, Xinghao Wang, Yutai Hou, Yunlong Feng, Feng Wang, Xuanliang Zhang, Qingfu Zhu, Wanxiang Che

In contrast, generative methods bring more image diversity in the augmented images but may not preserve semantic consistency, thus incorrectly changing the essential semantics of the original image.

Diversity Image Augmentation +3

Knowledge Enhancement for Contrastive Multi-Behavior Recommendation

no code implementations13 Jan 2023 Hongrui Xuan, Yi Liu, Bohan Li, Hongzhi Yin

In particular, we design the multi-behavior learning module to extract users' personalized behavior information for user-embedding enhancement, and utilize knowledge graph in the knowledge enhancement module to derive more robust knowledge-aware representations for items.

Contrastive Learning Recommendation Systems +1

VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing

1 code implementation30 Nov 2022 Yihan Wu, Junliang Guo, Xu Tan, Chen Zhang, Bohan Li, Ruihua Song, Lei He, Sheng Zhao, Arul Menezes, Jiang Bian

In this paper, we propose a machine translation system tailored for the task of video dubbing, which directly considers the speech duration of each token in translation, to match the length of source and target speech.

Machine Translation Sentence +4

Multi-rate adaptive transform coding for video compression

no code implementations25 Oct 2022 Lyndon R. Duong, Bohan Li, Cheng Chen, Jingning Han

Contemporary lossy image and video coding standards rely on transform coding, the process through which pixels are mapped to an alternative representation to facilitate efficient data compression.

Data Compression Quantization +1

MetaPrompting: Learning to Learn Better Prompts

1 code implementation COLING 2022 Yutai Hou, Hongyuan Dong, Xinghao Wang, Bohan Li, Wanxiang Che

Prompting method is regarded as one of the crucial progress for few-shot nature language processing.

Meta-Learning

When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition

3 code implementations23 Jul 2022 Bohan Li, Ye Yuan, Dingkang Liang, Xiao Liu, Zhilong Ji, Jinfeng Bai, Wenyu Liu, Xiang Bai

Recently, most handwritten mathematical expression recognition (HMER) methods adopt the encoder-decoder networks, which directly predict the markup sequences from formula images with the attention mechanism.

Decoder Handwritten Mathmatical Expression Recognition +1

Heterogeneous graph neural network for power allocation in multicarrier-division duplex cell-free massive MIMO systems

1 code implementation1 May 2022 Bohan Li, Lie-Liang Yang, Robert G Maunder, Songlin Sun, Pei Xiao

In-band full duplex cell-free (CF) systems suffer from severe self-interference and cross-link interference, especially when CF systems are operated in distributed way.

Graph Neural Network

Decoupling Visual-Semantic Feature Learning for Robust Scene Text Recognition

no code implementations24 Nov 2021 Changxu Cheng, Bohan Li, Qi Zheng, Yongpan Wang, Wenyu Liu

As a result, the learning of semantic features is prone to have a bias on the limited vocabulary of the training set, which is called vocabulary reliance.

Decoder Scene Text Recognition

DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021

2 code implementations25 Oct 2021 Yanqing Liu, Zhihang Xu, Gang Wang, Kuan Chen, Bohan Li, Xu Tan, Jinzhu Li, Lei He, Sheng Zhao

The goal of this challenge is to synthesize natural and high-quality speech from text, and we approach this goal in two perspectives: The first is to directly model and generate waveform in 48 kHz sampling rate, which brings higher perception quality than previous systems with 16 kHz or 24 kHz sampling rate; The second is to model the variation information in speech through a systematic design, which improves the prosody and naturalness.

Speech Synthesis text-to-speech +1

Data Augmentation Approaches in Natural Language Processing: A Survey

1 code implementation5 Oct 2021 Bohan Li, Yutai Hou, Wanxiang Che

One of the main focuses of the DA methods is to improve the diversity of training data, thereby helping the model to better generalize to unseen testing data.

Data Augmentation Diversity +1

Follow Your Path: a Progressive Method for Knowledge Distillation

no code implementations20 Jul 2021 Wenxian Shi, Yuxuan Song, Hao Zhou, Bohan Li, Lei LI

However, it has been observed that a converged heavy teacher model is strongly constrained for learning a compact student network and could make the optimization subject to poor local optima.

Knowledge Distillation

AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style

no code implementations6 Jul 2021 Yuzi Yan, Xu Tan, Bohan Li, Guangyan Zhang, Tao Qin, Sheng Zhao, Yuan Shen, Wei-Qiang Zhang, Tie-Yan Liu

While recent text to speech (TTS) models perform very well in synthesizing reading-style (e. g., audiobook) speech, it is still challenging to synthesize spontaneous-style speech (e. g., podcast or conversation), mainly because of two reasons: 1) the lack of training data for spontaneous speech; 2) the difficulty in modeling the filled pauses (um and uh) and diverse rhythms in spontaneous speech.

Decoder Mixture-of-Experts +3

AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data

1 code implementation20 Apr 2021 Yuzi Yan, Xu Tan, Bohan Li, Tao Qin, Sheng Zhao, Yuan Shen, Tie-Yan Liu

In adaptation, we use untranscribed speech data for speech reconstruction and only fine-tune the TTS decoder.

Decoder text-to-speech +1

AdaSpeech: Adaptive Text to Speech for Custom Voice

2 code implementations ICLR 2021 Mingjian Chen, Xu Tan, Bohan Li, Yanqing Liu, Tao Qin, Sheng Zhao, Tie-Yan Liu

2) To better trade off the adaptation parameters and voice quality, we introduce conditional layer normalization in the mel-spectrogram decoder of AdaSpeech, and fine-tune this part in addition to speaker embedding for adaptation.

text-to-speech Text to Speech

Learning from deep model via exploring local targets

no code implementations1 Jan 2021 Wenxian Shi, Yuxuan Song, Hao Zhou, Bohan Li, Lei LI

However, it has been observed that a converged heavy teacher model is strongly constrained for learning a compact student network and could make the optimization subject to poor local optima.

Knowledge Distillation model

An Adversarial Approach to High-Quality, Sentiment-Controlled Neural Dialogue Generation

no code implementations22 Jan 2019 Xiang Kong, Bohan Li, Graham Neubig, Eduard Hovy, Yiming Yang

In this work, we propose a method for neural dialogue response generation that allows not only generating semantically reasonable responses according to the dialogue history, but also explicitly controlling the sentiment of the response via sentiment labels.

Dialogue Generation Response Generation +1

Multi-Perspective Fusion Network for Commonsense Reading Comprehension

no code implementations8 Jan 2019 Chunhua Liu, Yan Zhao, Qingyi Si, Haiou Zhang, Bohan Li, Dong Yu

From the experimental results, we can conclude that the difference fusion is comparable with union fusion, and the similarity fusion needs to be activated by the union fusion.

Reading Comprehension

Stochastic WaveNet: A Generative Latent Variable Model for Sequential Data

1 code implementation15 Jun 2018 Guokun Lai, Bohan Li, Guoqing Zheng, Yiming Yang

In this paper, we combine the ideas from both stochastic latent variables and dilated convolutions, and propose a new architecture to model sequential data, termed as Stochastic WaveNet, where stochastic latent variables are injected into the WaveNet structure.

Cannot find the paper you are looking for? You can Submit a new open access paper.