Search Results for author: Bohan Li

Found 32 papers, 13 papers with code

Extreme Video Compression with Pre-trained Diffusion Models

1 code implementation14 Feb 2024 Bohan Li, Yiming Liu, Xueyan Niu, Bo Bai, Lei Deng, Deniz Gündüz

The results showcase the potential of exploiting the temporal relations in video data using generative models.

Image Compression Video Compression

Self-Supervised Dynamic Hypergraph Recommendation based on Hyper-Relational Knowledge Graph

no code implementations15 Aug 2023 Yi Liu, Hongrui Xuan, Bohan Li, Meng Wang, Tong Chen, Hongzhi Yin

However, the long-tail distribution of entities leads to sparsity in supervision signals, which weakens the quality of item representation when utilizing KG enhancement.

Collaborative Filtering Knowledge-Aware Recommendation +2

One at a Time: Progressive Multi-step Volumetric Probability Learning for Reliable 3D Scene Perception

no code implementations22 Jun 2023 Bohan Li, Yasheng Sun, Jingxin Dong, Zheng Zhu, Jinming Liu, Xin Jin, Wenjun Zeng

Numerous studies have investigated the pivotal role of reliable 3D volume representation in scene perception tasks, such as multi-view stereo (MVS) and semantic scene completion (SSC).

Depth Estimation Representation Learning

EMoG: Synthesizing Emotive Co-speech 3D Gesture with Diffusion Model

no code implementations20 Jun 2023 Lianying Yin, Yijun Wang, Tianyu He, Jinming Liu, Wei Zhao, Bohan Li, Xin Jin, Jianxin Lin

In this paper, we present a novel framework (EMoG) to tackle the above challenges with denoising diffusion models: 1) To alleviate the one-to-many problem, we incorporate emotion clues to guide the generation process, making the generation much easier; 2) To model joint correlation, we propose to decompose the difficult gesture generation into two sub-problems: joint correlation modeling and temporal dynamics modeling.

Denoising Gesture Generation

NaviNeRF: NeRF-based 3D Representation Disentanglement by Latent Semantic Navigation

no code implementations ICCV 2023 Baao Xie, Bohan Li, Zequn Zhang, Junting Dong, Xin Jin, Jingyu Yang, Wenjun Zeng

They are complementary -- the outer navigation is to identify global-view semantic directions, and the inner refinement dedicates to fine-grained attributes.

Disentanglement

MixPro: Simple yet Effective Data Augmentation for Prompt-based Learning

no code implementations19 Apr 2023 Bohan Li, Longxu Dou, Yutai Hou, Yunlong Feng, Honglin Mu, Qingfu Zhu, Qinghua Sun, Wanxiang Che

Prompt-based learning has shown considerable promise in reformulating various downstream tasks as cloze problems by combining original input with a predetermined template.

Data Augmentation Few-Shot Learning +1

Semantic-Guided Generative Image Augmentation Method with Diffusion Models for Image Classification

no code implementations4 Feb 2023 Bohan Li, Xiao Xu, Xinghao Wang, Yutai Hou, Yunlong Feng, Feng Wang, Xuanliang Zhang, Qingfu Zhu, Wanxiang Che

In contrast, generative methods bring more image diversity in the augmented images but may not preserve semantic consistency, thus incorrectly changing the essential semantics of the original image.

Image Augmentation Image Classification +1

Knowledge Enhancement for Contrastive Multi-Behavior Recommendation

no code implementations13 Jan 2023 Hongrui Xuan, Yi Liu, Bohan Li, Hongzhi Yin

In particular, we design the multi-behavior learning module to extract users' personalized behavior information for user-embedding enhancement, and utilize knowledge graph in the knowledge enhancement module to derive more robust knowledge-aware representations for items.

Contrastive Learning Recommendation Systems +1

VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing

1 code implementation30 Nov 2022 Yihan Wu, Junliang Guo, Xu Tan, Chen Zhang, Bohan Li, Ruihua Song, Lei He, Sheng Zhao, Arul Menezes, Jiang Bian

In this paper, we propose a machine translation system tailored for the task of video dubbing, which directly considers the speech duration of each token in translation, to match the length of source and target speech.

Machine Translation Sentence +4

Multi-rate adaptive transform coding for video compression

no code implementations25 Oct 2022 Lyndon R. Duong, Bohan Li, Cheng Chen, Jingning Han

Contemporary lossy image and video coding standards rely on transform coding, the process through which pixels are mapped to an alternative representation to facilitate efficient data compression.

Data Compression Quantization +1

MetaPrompting: Learning to Learn Better Prompts

1 code implementation COLING 2022 Yutai Hou, Hongyuan Dong, Xinghao Wang, Bohan Li, Wanxiang Che

Prompting method is regarded as one of the crucial progress for few-shot nature language processing.

Meta-Learning

When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition

2 code implementations23 Jul 2022 Bohan Li, Ye Yuan, Dingkang Liang, Xiao Liu, Zhilong Ji, Jinfeng Bai, Wenyu Liu, Xiang Bai

Recently, most handwritten mathematical expression recognition (HMER) methods adopt the encoder-decoder networks, which directly predict the markup sequences from formula images with the attention mechanism.

Optical Character Recognition (OCR)

Heterogeneous graph neural network for power allocation in multicarrier-division duplex cell-free massive MIMO systems

no code implementations1 May 2022 Bohan Li, Lie-Liang Yang, Robert G Maunder, Songlin Sun, Pei Xiao

In-band full duplex cell-free (CF) systems suffer from severe self-interference and cross-link interference, especially when CF systems are operated in distributed way.

AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios

no code implementations1 Apr 2022 Yihan Wu, Xu Tan, Bohan Li, Lei He, Sheng Zhao, Ruihua Song, Tao Qin, Tie-Yan Liu

We model the speaker characteristics systematically to improve the generalization on new speakers.

Speech Synthesis

Decoupling Visual-Semantic Feature Learning for Robust Scene Text Recognition

no code implementations24 Nov 2021 Changxu Cheng, Bohan Li, Qi Zheng, Yongpan Wang, Wenyu Liu

As a result, the learning of semantic features is prone to have a bias on the limited vocabulary of the training set, which is called vocabulary reliance.

Scene Text Recognition

DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021

1 code implementation25 Oct 2021 Yanqing Liu, Zhihang Xu, Gang Wang, Kuan Chen, Bohan Li, Xu Tan, Jinzhu Li, Lei He, Sheng Zhao

The goal of this challenge is to synthesize natural and high-quality speech from text, and we approach this goal in two perspectives: The first is to directly model and generate waveform in 48 kHz sampling rate, which brings higher perception quality than previous systems with 16 kHz or 24 kHz sampling rate; The second is to model the variation information in speech through a systematic design, which improves the prosody and naturalness.

Speech Synthesis

Data Augmentation Approaches in Natural Language Processing: A Survey

1 code implementation5 Oct 2021 Bohan Li, Yutai Hou, Wanxiang Che

One of the main focuses of the DA methods is to improve the diversity of training data, thereby helping the model to better generalize to unseen testing data.

Data Augmentation

Follow Your Path: a Progressive Method for Knowledge Distillation

no code implementations20 Jul 2021 Wenxian Shi, Yuxuan Song, Hao Zhou, Bohan Li, Lei LI

However, it has been observed that a converged heavy teacher model is strongly constrained for learning a compact student network and could make the optimization subject to poor local optima.

Knowledge Distillation

AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style

no code implementations6 Jul 2021 Yuzi Yan, Xu Tan, Bohan Li, Guangyan Zhang, Tao Qin, Sheng Zhao, Yuan Shen, Wei-Qiang Zhang, Tie-Yan Liu

While recent text to speech (TTS) models perform very well in synthesizing reading-style (e. g., audiobook) speech, it is still challenging to synthesize spontaneous-style speech (e. g., podcast or conversation), mainly because of two reasons: 1) the lack of training data for spontaneous speech; 2) the difficulty in modeling the filled pauses (um and uh) and diverse rhythms in spontaneous speech.

AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data

1 code implementation20 Apr 2021 Yuzi Yan, Xu Tan, Bohan Li, Tao Qin, Sheng Zhao, Yuan Shen, Tie-Yan Liu

In adaptation, we use untranscribed speech data for speech reconstruction and only fine-tune the TTS decoder.

AdaSpeech: Adaptive Text to Speech for Custom Voice

2 code implementations ICLR 2021 Mingjian Chen, Xu Tan, Bohan Li, Yanqing Liu, Tao Qin, Sheng Zhao, Tie-Yan Liu

2) To better trade off the adaptation parameters and voice quality, we introduce conditional layer normalization in the mel-spectrogram decoder of AdaSpeech, and fine-tune this part in addition to speaker embedding for adaptation.

Learning from deep model via exploring local targets

no code implementations1 Jan 2021 Wenxian Shi, Yuxuan Song, Hao Zhou, Bohan Li, Lei LI

However, it has been observed that a converged heavy teacher model is strongly constrained for learning a compact student network and could make the optimization subject to poor local optima.

Knowledge Distillation

An Adversarial Approach to High-Quality, Sentiment-Controlled Neural Dialogue Generation

no code implementations22 Jan 2019 Xiang Kong, Bohan Li, Graham Neubig, Eduard Hovy, Yiming Yang

In this work, we propose a method for neural dialogue response generation that allows not only generating semantically reasonable responses according to the dialogue history, but also explicitly controlling the sentiment of the response via sentiment labels.

Dialogue Generation Response Generation +1

Multi-Perspective Fusion Network for Commonsense Reading Comprehension

no code implementations8 Jan 2019 Chunhua Liu, Yan Zhao, Qingyi Si, Haiou Zhang, Bohan Li, Dong Yu

From the experimental results, we can conclude that the difference fusion is comparable with union fusion, and the similarity fusion needs to be activated by the union fusion.

Reading Comprehension Test

Stochastic WaveNet: A Generative Latent Variable Model for Sequential Data

1 code implementation15 Jun 2018 Guokun Lai, Bohan Li, Guoqing Zheng, Yiming Yang

In this paper, we combine the ideas from both stochastic latent variables and dilated convolutions, and propose a new architecture to model sequential data, termed as Stochastic WaveNet, where stochastic latent variables are injected into the WaveNet structure.

Cannot find the paper you are looking for? You can Submit a new open access paper.