Search Results for author: Bohan Li

Found 32 papers, 14 papers with code

Extreme Video Compression with Pre-trained Diffusion Models

1 code implementation • 14 Feb 2024 • Bohan Li, Yiming Liu, Xueyan Niu, Bo Bai, Lei Deng, Deniz Gündüz

The results showcase the potential of exploiting the temporal relations in video data using generative models.

111

Paper
Code

Closed-Loop Unsupervised Representation Disentanglement with $β$-VAE Distillation and Diffusion Probabilistic Feedback

no code implementations • 4 Feb 2024 • Xin Jin, Bohan Li, Baao Xie, Wenyao Zhang, Jinming Liu, Ziqiang Li, Tao Yang, Wenjun Zeng

Representation disentanglement may help AI fundamentally understand the real world and thus benefit both discrimination and generation tasks.

Disentanglement Image Manipulation

Paper
Add Code

Self-Supervised Dynamic Hypergraph Recommendation based on Hyper-Relational Knowledge Graph

no code implementations • 15 Aug 2023 • Yi Liu, Hongrui Xuan, Bohan Li, Meng Wang, Tong Chen, Hongzhi Yin

However, the long-tail distribution of entities leads to sparsity in supervision signals, which weakens the quality of item representation when utilizing KG enhancement.

Collaborative Filtering Knowledge-Aware Recommendation +2

Paper
Add Code

One at a Time: Progressive Multi-step Volumetric Probability Learning for Reliable 3D Scene Perception

no code implementations • 22 Jun 2023 • Bohan Li, Yasheng Sun, Jingxin Dong, Zheng Zhu, Jinming Liu, Xin Jin, Wenjun Zeng

Numerous studies have investigated the pivotal role of reliable 3D volume representation in scene perception tasks, such as multi-view stereo (MVS) and semantic scene completion (SSC).

Depth Estimation Representation Learning

Paper
Add Code

EMoG: Synthesizing Emotive Co-speech 3D Gesture with Diffusion Model

no code implementations • 20 Jun 2023 • Lianying Yin, Yijun Wang, Tianyu He, Jinming Liu, Wei Zhao, Bohan Li, Xin Jin, Jianxin Lin

In this paper, we present a novel framework (EMoG) to tackle the above challenges with denoising diffusion models: 1) To alleviate the one-to-many problem, we incorporate emotion clues to guide the generation process, making the generation much easier; 2) To model joint correlation, we propose to decompose the difficult gesture generation into two sub-problems: joint correlation modeling and temporal dynamics modeling.

Denoising Gesture Generation

Paper
Add Code

ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich Document Images

no code implementations • 5 Jun 2023 • Wenwen Yu, Chengquan Zhang, Haoyu Cao, Wei Hua, Bohan Li, Huang Chen, MingYu Liu, Mingrui Chen, Jianfeng Kuang, Mengjun Cheng, Yuning Du, Shikun Feng, Xiaoguang Hu, Pengyuan Lyu, Kun Yao, Yuechen Yu, Yuliang Liu, Wanxiang Che, Errui Ding, Cheng-Lin Liu, Jiebo Luo, Shuicheng Yan, Min Zhang, Dimosthenis Karatzas, Xing Sun, Jingdong Wang, Xiang Bai

It is hoped that this competition will attract many researchers in the field of CV and NLP, and bring some new thoughts to the field of Document AI.

Document AI Entity Linking +1

Paper
Add Code

NaviNeRF: NeRF-based 3D Representation Disentanglement by Latent Semantic Navigation

1 code implementation • ICCV 2023 • Baao Xie, Bohan Li, Zequn Zhang, Junting Dong, Xin Jin, Jingyu Yang, Wenjun Zeng

They are complementary -- the outer navigation is to identify global-view semantic directions, and the inner refinement dedicates to fine-grained attributes.

Disentanglement

Paper
Code

MixPro: Simple yet Effective Data Augmentation for Prompt-based Learning

no code implementations • 19 Apr 2023 • Bohan Li, Longxu Dou, Yutai Hou, Yunlong Feng, Honglin Mu, Qingfu Zhu, Qinghua Sun, Wanxiang Che

Prompt-based learning has shown considerable promise in reformulating various downstream tasks as cloze problems by combining original input with a predetermined template.

Data Augmentation Few-Shot Learning +1

Paper
Add Code

A Two-Stage Framework with Self-Supervised Distillation For Cross-Domain Text Classification

no code implementations • 18 Apr 2023 • Yunlong Feng, Bohan Li, Libo Qin, Xiao Xu, Wanxiang Che

Cross-domain text classification aims to adapt models to a target domain that lacks labeled data.

Cross-Domain Text Classification Language Modelling +1

Paper
Add Code

Bridging Stereo Geometry and BEV Representation with Reliable Mutual Interaction for Semantic Scene Completion

1 code implementation • 24 Mar 2023 • Bohan Li, Yasheng Sun, Zhujin Liang, Dalong Du, Zhuanghui Zhang, XiaoFeng Wang, Yunnan Wang, Xin Jin, Wenjun Zeng

However, due to the inherent representation gap between stereo geometry and BEV features, it is non-trivial to bridge them for dense prediction task of SSC.

3D Semantic Scene Completion Hallucination +2

Paper
Code

Semantic-Guided Generative Image Augmentation Method with Diffusion Models for Image Classification

no code implementations • 4 Feb 2023 • Bohan Li, Xiao Xu, Xinghao Wang, Yutai Hou, Yunlong Feng, Feng Wang, Xuanliang Zhang, Qingfu Zhu, Wanxiang Che

In contrast, generative methods bring more image diversity in the augmented images but may not preserve semantic consistency, thus incorrectly changing the essential semantics of the original image.

Image Augmentation Image Classification +1

Paper
Add Code

Knowledge Enhancement for Contrastive Multi-Behavior Recommendation

no code implementations • 13 Jan 2023 • Hongrui Xuan, Yi Liu, Bohan Li, Hongzhi Yin

In particular, we design the multi-behavior learning module to extract users' personalized behavior information for user-embedding enhancement, and utilize knowledge graph in the knowledge enhancement module to derive more robust knowledge-aware representations for items.

Contrastive Learning Recommendation Systems +1

Paper
Add Code

VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing

1 code implementation • 30 Nov 2022 • Yihan Wu, Junliang Guo, Xu Tan, Chen Zhang, Bohan Li, Ruihua Song, Lei He, Sheng Zhao, Arul Menezes, Jiang Bian

In this paper, we propose a machine translation system tailored for the task of video dubbing, which directly considers the speech duration of each token in translation, to match the length of source and target speech.

Machine Translation Sentence +4

1,286

Paper
Code

Multi-rate adaptive transform coding for video compression

no code implementations • 25 Oct 2022 • Lyndon R. Duong, Bohan Li, Cheng Chen, Jingning Han

Contemporary lossy image and video coding standards rely on transform coding, the process through which pixels are mapped to an alternative representation to facilitate efficient data compression.

Data Compression Quantization +1

Paper
Add Code

MetaPrompting: Learning to Learn Better Prompts

1 code implementation • COLING 2022 • Yutai Hou, Hongyuan Dong, Xinghao Wang, Bohan Li, Wanxiang Che

Prompting method is regarded as one of the crucial progress for few-shot nature language processing.

Meta-Learning

Paper
Code

When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition

2 code implementations • 23 Jul 2022 • Bohan Li, Ye Yuan, Dingkang Liang, Xiao Liu, Zhilong Ji, Jinfeng Bai, Wenyu Liu, Xiang Bai

Recently, most handwritten mathematical expression recognition (HMER) methods adopt the encoder-decoder networks, which directly predict the markup sequences from formula images with the attention mechanism.

Optical Character Recognition (OCR)

343

Paper
Code

Heterogeneous graph neural network for power allocation in multicarrier-division duplex cell-free massive MIMO systems

no code implementations • 1 May 2022 • Bohan Li, Lie-Liang Yang, Robert G Maunder, Songlin Sun, Pei Xiao

In-band full duplex cell-free (CF) systems suffer from severe self-interference and cross-link interference, especially when CF systems are operated in distributed way.

Paper
Add Code

Inverse is Better! Fast and Accurate Prompt for Few-shot Slot Tagging

1 code implementation • Findings (ACL) 2022 • Yutai Hou, Cheng Chen, Xianzhen Luo, Bohan Li, Wanxiang Che

Such inverse prompting only requires a one-turn prediction for each slot type and greatly speeds up the prediction.

Few-Shot Learning Sentence

Paper
Code

AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios

no code implementations • 1 Apr 2022 • Yihan Wu, Xu Tan, Bohan Li, Lei He, Sheng Zhao, Ruihua Song, Tao Qin, Tie-Yan Liu

We model the speaker characteristics systematically to improve the generalization on new speakers.

Speech Synthesis

Paper
Add Code

Decoupling Visual-Semantic Feature Learning for Robust Scene Text Recognition

no code implementations • 24 Nov 2021 • Changxu Cheng, Bohan Li, Qi Zheng, Yongpan Wang, Wenyu Liu

As a result, the learning of semantic features is prone to have a bias on the limited vocabulary of the training set, which is called vocabulary reliance.

Scene Text Recognition

Paper
Add Code

DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021

1 code implementation • 25 Oct 2021 • Yanqing Liu, Zhihang Xu, Gang Wang, Kuan Chen, Bohan Li, Xu Tan, Jinzhu Li, Lei He, Sheng Zhao

The goal of this challenge is to synthesize natural and high-quality speech from text, and we approach this goal in two perspectives: The first is to directly model and generate waveform in 48 kHz sampling rate, which brings higher perception quality than previous systems with 16 kHz or 24 kHz sampling rate; The second is to model the variation information in speech through a systematic design, which improves the prosody and naturalness.

Speech Synthesis

313

Paper
Code

Data Augmentation Approaches in Natural Language Processing: A Survey

1 code implementation • 5 Oct 2021 • Bohan Li, Yutai Hou, Wanxiang Che

One of the main focuses of the DA methods is to improve the diversity of training data, thereby helping the model to better generalize to unseen testing data.

Data Augmentation

Paper
Code

Follow Your Path: a Progressive Method for Knowledge Distillation

no code implementations • 20 Jul 2021 • Wenxian Shi, Yuxuan Song, Hao Zhou, Bohan Li, Lei LI

However, it has been observed that a converged heavy teacher model is strongly constrained for learning a compact student network and could make the optimization subject to poor local optima.

Knowledge Distillation

Paper
Add Code

AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style

no code implementations • 6 Jul 2021 • Yuzi Yan, Xu Tan, Bohan Li, Guangyan Zhang, Tao Qin, Sheng Zhao, Yuan Shen, Wei-Qiang Zhang, Tie-Yan Liu

While recent text to speech (TTS) models perform very well in synthesizing reading-style (e. g., audiobook) speech, it is still challenging to synthesize spontaneous-style speech (e. g., podcast or conversation), mainly because of two reasons: 1) the lack of training data for spontaneous speech; 2) the difficulty in modeling the filled pauses (um and uh) and diverse rhythms in spontaneous speech.

Paper
Add Code

AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data

1 code implementation • 20 Apr 2021 • Yuzi Yan, Xu Tan, Bohan Li, Tao Qin, Sheng Zhao, Yuan Shen, Tie-Yan Liu

In adaptation, we use untranscribed speech data for speech reconstruction and only fine-tune the TTS decoder.

Paper
Code

AdaSpeech: Adaptive Text to Speech for Custom Voice

2 code implementations • ICLR 2021 • Mingjian Chen, Xu Tan, Bohan Li, Yanqing Liu, Tao Qin, Sheng Zhao, Tie-Yan Liu

2) To better trade off the adaptation parameters and voice quality, we introduce conditional layer normalization in the mel-spectrogram decoder of AdaSpeech, and fine-tune this part in addition to speaker embedding for adaptation.

155

Paper
Code

Learning from deep model via exploring local targets

no code implementations • 1 Jan 2021 • Wenxian Shi, Yuxuan Song, Hao Zhou, Bohan Li, Lei LI

However, it has been observed that a converged heavy teacher model is strongly constrained for learning a compact student network and could make the optimization subject to poor local optima.

Knowledge Distillation

Paper
Add Code

On the Sentence Embeddings from Pre-trained Language Models

3 code implementations • EMNLP 2020 • Bohan Li, Hao Zhou, Junxian He, Mingxuan Wang, Yiming Yang, Lei LI

Pre-trained contextual representations like BERT have achieved great success in natural language processing.

Ranked #16 on Semantic Textual Similarity on STS16

Language Modelling Semantic Similarity +4

652

Paper
Code

A Surprisingly Effective Fix for Deep Latent Variable Modeling of Text

1 code implementation • IJCNLP 2019 • Bohan Li, Junxian He, Graham Neubig, Taylor Berg-Kirkpatrick, Yiming Yang

In this paper, we investigate a simple fix for posterior collapse which yields surprisingly effective results.

Language Modelling Representation Learning

Paper
Code

An Adversarial Approach to High-Quality, Sentiment-Controlled Neural Dialogue Generation

no code implementations • 22 Jan 2019 • Xiang Kong, Bohan Li, Graham Neubig, Eduard Hovy, Yiming Yang

In this work, we propose a method for neural dialogue response generation that allows not only generating semantically reasonable responses according to the dialogue history, but also explicitly controlling the sentiment of the response via sentiment labels.

Dialogue Generation Response Generation +1

Paper
Add Code

Multi-Perspective Fusion Network for Commonsense Reading Comprehension

no code implementations • 8 Jan 2019 • Chunhua Liu, Yan Zhao, Qingyi Si, Haiou Zhang, Bohan Li, Dong Yu

From the experimental results, we can conclude that the difference fusion is comparable with union fusion, and the similarity fusion needs to be activated by the union fusion.

Reading Comprehension

Paper
Add Code

Stochastic WaveNet: A Generative Latent Variable Model for Sequential Data

1 code implementation • 15 Jun 2018 • Guokun Lai, Bohan Li, Guoqing Zheng, Yiming Yang

In this paper, we combine the ideas from both stochastic latent variables and dilated convolutions, and propose a new architecture to model sequential data, termed as Stochastic WaveNet, where stochastic latent variables are injected into the WaveNet structure.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.