1 code implementation • Findings (EMNLP) 2021 • Lei He, Suncong Zheng, Tao Yang, Feng Zhang
In this work, we propose to incorporate KG (including both entities and relations) into the language learning process to obtain KG-enhanced pretrained Language Model, namely KLMo.
no code implementations • 7 Nov 2024 • Yichen Shi, Zhuofu Tao, Yuhao Gao, Tianjia Zhou, Cheng Chang, Yaxing Wang, BingYu Chen, Genhao Zhang, Alvin Liu, Zhiping Yu, Ting-Jung Lin, Lei He
A significant portion of the effort is experience-driven, which makes the automation of AMS circuit design a formidable challenge.
no code implementations • 26 Sep 2024 • Siyi Lu, Lei He, Shengbo Eben Li, Yugong Luo, Jianqiang Wang, Keqiang Li
End-to-end autonomous driving offers a streamlined alternative to the traditional modular pipeline, integrating perception, prediction, and planning within a single framework.
no code implementations • 18 Sep 2024 • Ludan Zhang, Xiaokang Ding, Yuqi Dai, Lei He, Keqiang Li
End-to-end models are emerging as the mainstream in autonomous driving perception.
no code implementations • 17 Sep 2024 • Yichen Zhang, Zihan Wang, Jiali Han, Peilin Li, Jiaxun Zhang, Jianqiang Wang, Lei He, Keqiang Li
3D Gaussian Splatting (3DGS) integrates the strengths of primitive-based representations and volumetric rendering techniques, enabling real-time, high-quality rendering.
no code implementations • 9 Sep 2024 • Lei He, Qiaoyi Wang, Honglin Sun, Qing Xu, Bolin Gao, Shengbo Eben Li, Jianqiang Wang, Keqiang Li
Visual bird's eye view (BEV) perception, due to its excellent perceptual capabilities, is progressively replacing costly LiDAR-based perception systems, especially in the realm of urban intelligent driving.
no code implementations • 4 Sep 2024 • Jiaqi Li, Pingfan Jia, Jiaxing Chen, Jiaxi Liu, Lei He, Keqiang Li
This paper mainly reviews the local map construction methods with SDMap, including definitions, general processing flow, and datasets.
no code implementations • 28 Aug 2024 • Ziqian Ning, Shuai Wang, Yuepeng Jiang, Jixun Yao, Lei He, Shifeng Pan, Jie Ding, Lei Xie
Rap, a prominent genre of vocal performance, remains underexplored in vocal generation.
no code implementations • 18 Jul 2024 • Jian Sun, Yuqi Dai, Chi-Man Vong, Qing Xu, Shengbo Eben Li, Jianqiang Wang, Lei He, Keqiang Li
Based on prior knowledge about the main composition of the BEV surrounding environment varying with the increase of distance intervals, long-sequence global modeling is utilized to improve the model's understanding and perception of the environment.
no code implementations • 17 Jul 2024 • Yuqi Dai, Jian Sun, Shengbo Eben Li, Qing Xu, Jianqiang Wang, Lei He, Keqiang Li
Perception is essential for autonomous driving system.
no code implementations • 15 May 2024 • Zhuofu Tao, Yichen Shi, Yiru Huo, Rui Ye, Zonghang Li, Li Huang, Chen Wu, Na Bai, Zhiping Yu, Ting-Jung Lin, Lei He
Today's analog/mixed-signal (AMS) integrated circuit (IC) designs demand substantial manual intervention.
no code implementations • 7 May 2024 • Ludan Zhang, Chaoyi Chen, Lei He, Keqiang Li
Autonomous driving perception models are typically composed of multiple functional modules that interact through complex relationships to accomplish environment understanding.
no code implementations • 22 Apr 2024 • Lei He, Leheng Li, Wenchao Sun, Zeyu Han, Yichen Liu, Sifa Zheng, Jianqiang Wang, Keqiang Li
To the best of our knowledge, this is the first survey specifically focused on the applications of NeRF in the Autonomous Driving domain.
no code implementations • 10 Apr 2024 • Leying Zhang, Yao Qian, Long Zhou, Shujie Liu, Dongmei Wang, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Lei He, Sheng Zhao, Michael Zeng
In this paper, we introduce CoVoMix: Conversational Voice Mixture Generation, a novel model for zero-shot, human-like, multi-speaker, multi-round dialogue speech generation.
1 code implementation • 1 Apr 2024 • Jing Hao, Lei He, Kuo Feng Hung
To address this issue, we propose T-Mamba, integrating shared positional encoding and frequency-based features into vision mamba, to address limitations in spatial position preservation and feature enhancement in frequency domain.
no code implementations • 5 Mar 2024 • Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan, Detai Xin, Dongchao Yang, Yanqing Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu, Tao Qin, Xiang-Yang Li, Wei Ye, Shikun Zhang, Jiang Bian, Lei He, Jinyu Li, Sheng Zhao
Specifically, 1) we design a neural codec with factorized vector quantization (FVQ) to disentangle speech waveform into subspaces of content, prosody, timbre, and acoustic details; 2) we propose a factorized diffusion model to generate attributes in each subspace following its corresponding prompt.
2 code implementations • 6 Feb 2024 • Yichen Shi, Yuhao Gao, Yingxin Lai, Hongyang Wang, Jun Feng, Lei He, Jun Wan, Changsheng chen, Zitong Yu, Xiaochun Cao
For the face forgery detection task, we evaluate GAN-based and diffusion-based data with both visual and acoustic modalities.
no code implementations • 19 Dec 2023 • Xueyuan Chen, Xi Wang, Shaofei Zhang, Lei He, Zhiyong Wu, Xixin Wu, Helen Meng
Both objective and subjective evaluations demonstrate that our proposed method can effectively improve the naturalness and expressiveness of the synthesized speech in audiobook synthesis especially for the role and out-of-domain scenarios.
no code implementations • 8 Oct 2023 • Tianyang Zhong, Wei Zhao, Yutong Zhang, Yi Pan, Peixin Dong, Zuowei Jiang, Xiaoyan Kui, Youlan Shang, Li Yang, Yaonai Wei, Longtao Yang, Hao Chen, Huan Zhao, Yuxiao Liu, Ning Zhu, Yiwei Li, Yisong Wang, Jiaqi Yao, Jiaqi Wang, Ying Zeng, Lei He, Chao Zheng, Zhixue Zhang, Ming Li, Zhengliang Liu, Haixing Dai, Zihao Wu, Lu Zhang, Shu Zhang, Xiaoyan Cai, Xintao Hu, Shijie Zhao, Xi Jiang, Xin Zhang, Xiang Li, Dajiang Zhu, Lei Guo, Dinggang Shen, Junwei Han, Tianming Liu, Jun Liu, Tuo Zhang
Radiology report generation, as a key step in medical image analysis, is critical to the quantitative analysis of clinically informed decision-making levels.
no code implementations • 20 Sep 2023 • Duarte Rondao, Lei He, Nabil Aouf
Cameras are rapidly becoming the choice for on-board sensors towards space rendezvous due to their small form factor and inexpensive power, mass, and volume costs.
no code implementations • 7 Sep 2023 • Brendan Walsh, Mark Hamilton, Greg Newby, Xi Wang, Serena Ruan, Sheng Zhao, Lei He, Shaofei Zhang, Eric Dettinger, William T. Freeman, Markus Weimer
In this work, we present a system that can automatically generate high-quality audiobooks from online e-books.
no code implementations • 6 Sep 2023 • Zhihang Xu, Shaofei Zhang, Xi Wang, Jiajun Zhang, Wenning Wei, Lei He, Sheng Zhao
In this paper, we present MuLanTTS, the Microsoft end-to-end neural text-to-speech (TTS) system designed for the Blizzard Challenge 2023.
no code implementations • 5 Sep 2023 • Yichong Leng, Zhifang Guo, Kai Shen, Xu Tan, Zeqian Ju, Yanqing Liu, Yufei Liu, Dongchao Yang, Leying Zhang, Kaitao Song, Lei He, Xiang-Yang Li, Sheng Zhao, Tao Qin, Jiang Bian
TTS approaches based on the text prompt face two main challenges: 1) the one-to-many problem, where not all details about voice variability can be described in the text prompt, and 2) the limited availability of text prompt datasets, where vendors and large cost of data labeling are required to write text prompts for speech.
no code implementations • 27 Jul 2023 • Chengrui Wei, Meng Yang, Lei He, Nanning Zheng
It has long been an ill-posed problem to predict absolute depth maps from single images in real (unseen) indoor scenes.
no code implementations • 3 Jul 2023 • Yujia Xiao, Shaofei Zhang, Xi Wang, Xu Tan, Lei He, Sheng Zhao, Frank K. Soong, Tan Lee
Experiments show that ContextSpeech significantly improves the voice quality and prosody expressiveness in paragraph reading with competitive model efficiency.
no code implementations • 26 Jun 2023 • Weinan Song, Yaxuan Zhu, Lei He, YingNian Wu, Jianwen Xie
The components of translator, style encoder, and style generator constitute a diversified image generator.
1 code implementation • 9 Jun 2023 • Jie Gui, Xiaofeng Cong, Lei He, Yuan Yan Tang, James Tin-Yau Kwok
On the one hand, the dehazing task is an illposedness problem, which means that no unique solution exists.
no code implementations • 7 Jun 2023 • Zeyu Han, Jiahao Wang, Zikun Xu, Shuocheng Yang, Lei He, Shaobing Xu, Jianqiang Wang, Keqiang Li
In an effort to bridge this gap and stimulate future research, this paper presents an exhaustive survey on the utilization of 4D mmWave radar in autonomous driving.
2 code implementations • 18 Apr 2023 • Kai Shen, Zeqian Ju, Xu Tan, Yanqing Liu, Yichong Leng, Lei He, Tao Qin, Sheng Zhao, Jiang Bian
To enhance the zero-shot capability that is important to achieve diverse speech synthesis, we design a speech prompting mechanism to facilitate in-context learning in the diffusion model and the duration/pitch predictor.
no code implementations • 8 Apr 2023 • Yangyang Guo, Hao Wang, Lei He, Witold Pedrycz, P. N. Suganthan, Yanjie Song
The RL-GP adopts the ensemble population strategies.
no code implementations • 21 Mar 2023 • Weinan Song, Haoxin Zheng, Dezhan Tu, Chengwen Liang, Lei He
Extensive experiments in simulated and real data show that our model significantly outperforms existing state-of-the-art models without learning from paired images or prior individual knowledge.
1 code implementation • 7 Mar 2023 • Ziqiang Zhang, Long Zhou, Chengyi Wang, Sanyuan Chen, Yu Wu, Shujie Liu, Zhuo Chen, Yanqing Liu, Huaming Wang, Jinyu Li, Lei He, Sheng Zhao, Furu Wei
We propose a cross-lingual neural codec language model, VALL-E X, for cross-lingual speech synthesis.
no code implementations • 6 Mar 2023 • Ruiqing Xue, Yanqing Liu, Lei He, Xu Tan, Linquan Liu, Edward Lin, Sheng Zhao
Neural text-to-speech (TTS) generally consists of cascaded architecture with separately optimized acoustic model and vocoder, or end-to-end architecture with continuous mel-spectrograms or self-extracted speech frames as the intermediate representations to bridge acoustic model and vocoder, which suffers from two limitations: 1) the continuous acoustic frames are hard to predict with phoneme only, and acoustic information like duration or pitch is also needed to solve the one-to-many problem, which is not easy to scale on large scale and noise datasets; 2) to achieve diverse speech output based on continuous speech features, complex VAE or flow-based models are usually required.
7 code implementations • 5 Jan 2023 • Chengyi Wang, Sanyuan Chen, Yu Wu, Ziqiang Zhang, Long Zhou, Shujie Liu, Zhuo Chen, Yanqing Liu, Huaming Wang, Jinyu Li, Lei He, Sheng Zhao, Furu Wei
In addition, we find Vall-E could preserve the speaker's emotion and acoustic environment of the acoustic prompt in synthesis.
1 code implementation • 30 Dec 2022 • Zehua Chen, Yihan Wu, Yichong Leng, Jiawei Chen, Haohe Liu, Xu Tan, Yang Cui, Ke Wang, Lei He, Sheng Zhao, Jiang Bian, Danilo Mandic
Denoising Diffusion Probabilistic Models (DDPMs) are emerging in text-to-speech (TTS) synthesis because of their strong capability of generating high-fidelity samples.
1 code implementation • 30 Nov 2022 • Yihan Wu, Junliang Guo, Xu Tan, Chen Zhang, Bohan Li, Ruihua Song, Lei He, Sheng Zhao, Arul Menezes, Jiang Bian
In this paper, we propose a machine translation system tailored for the task of video dubbing, which directly considers the speech duration of each token in translation, to match the length of source and target speech.
no code implementations • 16 Nov 2022 • Biwei Cao, Jiuxin Cao, Jie Gui, Jiayun Shen, Bo Liu, Lei He, Yuan Yan Tang, James Tin-Yau Kwok
Such approaches, however, ignore the VE's unique nature of relation inference between the premise and hypothesis.
1 code implementation • 31 Oct 2022 • Kun Wei, Long Zhou, Ziqiang Zhang, Liping Chen, Shujie Liu, Lei He, Jinyu Li, Furu Wei
However, direct S2ST suffers from the data scarcity problem because the corpora from speech of the source language to speech of the target language are very rare.
1 code implementation • 14 Jul 2022 • Jiawei Yang, Hanbo Chen, Yuan Liang, Junzhou Huang, Lei He, Jianhua Yao
We first benchmark representative SSL methods for dense prediction tasks in pathology images.
2 code implementations • 5 Jul 2022 • Jiawei Yang, Hanbo Chen, Yu Zhao, Fan Yang, Yao Zhang, Lei He, Jianhua Yao
We evaluate ReMix on two public datasets with two state-of-the-art MIL methods.
no code implementations • 25 Jun 2022 • Yihan Wu, Xi Wang, Shaofei Zhang, Lei He, Ruihua Song, Jian-Yun Nie
In this paper, we propose a novel framework for learning style representation from abundant plain text in a self-supervised manner.
1 code implementation • 30 May 2022 • Yichong Leng, Zehua Chen, Junliang Guo, Haohe Liu, Jiawei Chen, Xu Tan, Danilo Mandic, Lei He, Xiang-Yang Li, Tao Qin, Sheng Zhao, Tie-Yan Liu
Combining this novel perspective of two-stage synthesis with advanced generative models (i. e., the diffusion models), the proposed BinauralGrad is able to generate accurate and high-fidelity binaural audio samples.
3 code implementations • 9 May 2022 • Xu Tan, Jiawei Chen, Haohe Liu, Jian Cong, Chen Zhang, Yanqing Liu, Xi Wang, Yichong Leng, YuanHao Yi, Lei He, Frank Soong, Tao Qin, Sheng Zhao, Tie-Yan Liu
In this paper, we answer these questions by first defining the human-level quality based on the statistical significance of subjective measure and introducing appropriate guidelines to judge it, and then developing a TTS system called NaturalSpeech that achieves human-level quality on a benchmark dataset.
Ranked #1 on Text-To-Speech Synthesis on LJSpeech (using extra training data)
no code implementations • 1 Apr 2022 • Yihan Wu, Xu Tan, Bohan Li, Lei He, Sheng Zhao, Ruihua Song, Tao Qin, Tie-Yan Liu
We model the speaker characteristics systematically to improve the generalization on new speakers.
no code implementations • 12 Mar 2022 • Weinan Song, Gaurav Fotedar, Nima Tajbakhsh, Ziheng Zhou, Lei He, Xiaowei Ding
Furthermore, we take the transfer results as additional training data for fluid segmentation to prove the advantage of our model indirectly, i. e., in the task of data adaptation and augmentation.
no code implementations • 8 Feb 2022 • Zehua Chen, Xu Tan, Ke Wang, Shifeng Pan, Danilo Mandic, Lei He, Sheng Zhao
In this paper, we propose InferGrad, a diffusion model for vocoder that incorporates inference process into training, to reduce the inference iterations while maintaining high generation quality.
no code implementations • 20 Jan 2022 • J. Yang, Lei He
To further improve the speaker similarity, joint training with a speaker classifier is proposed.
1 code implementation • NeurIPS 2021 • Yuan Liang, Weikun Han, Liang Qiu, Chen Wu, Yiting shao, Kun Wang, Lei He
In this work, we pioneer to study deep learning for dental forensic identification based on panoramic radiographs.
no code implementations • 4 Nov 2021 • Zhuofu Tao, Chen Wu, Yuan Liang, Lei He
In this work, we propose LW-GCN, a lightweight FPGA-based accelerator with a software-hardware co-designed process to tackle irregularity in computation and memory access in GCN inference.
1 code implementation • 28 Oct 2021 • Moyun Liu, Youping Chen, Lei He, Yang Zhang, Jingming Xie
To further prove the ability of our method, we test it on public dataset MS COCO, and the results show that our LF-YOLO has a outstanding versatility detection performance.
2 code implementations • 25 Oct 2021 • Yanqing Liu, Zhihang Xu, Gang Wang, Kuan Chen, Bohan Li, Xu Tan, Jinzhu Li, Lei He, Sheng Zhao
The goal of this challenge is to synthesize natural and high-quality speech from text, and we approach this goal in two perspectives: The first is to directly model and generate waveform in 48 kHz sampling rate, which brings higher perception quality than previous systems with 16 kHz or 24 kHz sampling rate; The second is to model the variation information in speech through a systematic design, which improves the prosody and naturalness.
no code implementations • 19 Oct 2021 • Mutian He, Jingzhou Yang, Lei He, Frank K. Soong
End-to-end TTS requires a large amount of speech/text paired data to cover all necessary knowledge, particularly how to pronounce different words in diverse contexts, so that a neural model may learn such knowledge accordingly.
no code implementations • 30 Aug 2021 • Yuan Liang, Weinan Song, Jiawei Yang, Liang Qiu, Kun Wang, Lei He
Different from single object reconstruction from photos, this task has the unique challenge of constructing multiple objects at high resolutions.
no code implementations • 27 Jul 2021 • Shifeng Pan, Lei He
Secondly, in these models the content/text, prosody, and speaker timbre are usually highly entangled, it's therefore not realistic to expect a satisfied result when freely combining these components, such as to transfer speaking style between speakers.
1 code implementation • 21 Jul 2021 • Jiawei Yang, Yao Zhang, Yuan Liang, Yang Zhang, Lei He, Zhiqiang He
Experiments on kidney tumor segmentation task demonstrate that TumorCP surpasses the strong baseline by a remarkable margin of 7. 12% on tumor Dice.
no code implementations • 14 Jul 2021 • Lei He, Shengjie Jiang, Xiaoqing Liang, Ning Wang, Shiyu Song
Compared to traditional methods based on object detectors, the essential design in our work is a parallel feature difference calculation structure that infers map changes by comparing features extracted from the camera and rasterized images.
no code implementations • 8 Jun 2021 • Liping Chen, Yan Deng, Xi Wang, Frank K. Soong, Lei He
Experimental results obtained by the Transformer TTS show that the proposed BERT can extract fine-grained, segment-level prosody, which is complementary to utterance-level prosody to improve the final prosody of the TTS speech.
no code implementations • 27 Apr 2021 • Rui Zhao, Jian Xue, Jinyu Li, Wenning Wei, Lei He, Yifan Gong
The first challenge is solved with a splicing data method which concatenates the speech segments extracted from the source domain data.
no code implementations • 8 Apr 2021 • Fengpeng Yue, Yan Deng, Lei He, Tom Ko
Machine Speech Chain, which integrates both end-to-end (E2E) automatic speech recognition (ASR) and text-to-speech (TTS) into one circle for joint training, has been proven to be effective in data augmentation by leveraging large amounts of unpaired data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
2 code implementations • 5 Mar 2021 • Mutian He, Jingzhou Yang, Lei He, Frank K. Soong
To scale neural speech synthesis to various real-world languages, we present a multilingual end-to-end framework that maps byte inputs to spectrograms, thus allowing arbitrary input scripts.
no code implementations • 2 Feb 2021 • Yuan Liang, Weinan Song, Jiawei Yang, Liang Qiu, Kun Wang, Lei He
Second, we can largely boost the robustness of existing ConvNets, proved by: (i) testing on scans with synthetic pathologies, and (ii) training and evaluation on scans of different scanning setups across datasets.
no code implementations • 19 Jan 2021 • Lei He, Jiwen Lu, Guanghui Wang, Shiyu Song, Jie zhou
In this paper, we first introduce the concept of semantic objectness to exploit the geometric relationship of these two tasks through an analysis of the imaging process, then propose a Semantic Object Segmentation and Depth Estimation Network (SOSD-Net) based on the objectness assumption.
Ranked #88 on Semantic Segmentation on NYU Depth v2
no code implementations • 23 Dec 2020 • Jiawei Yang, Yuan Liang, Yao Zhang, Weinan Song, Kun Wang, Lei He
The ability of deep learning to predict with uncertainty is recognized as key for its adoption in clinical routines.
no code implementations • 30 Jul 2020 • Jinyu Li, Rui Zhao, Zhong Meng, Yanqing Liu, Wenning Wei, Sarangarajan Parthasarathy, Vadim Mazalov, Zhenghao Wang, Lei He, Sheng Zhao, Yifan Gong
Because of its streaming nature, recurrent neural network transducer (RNN-T) is a very promising end-to-end (E2E) model that may replace the popular hybrid model for automatic speech recognition.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 13 Jun 2020 • Shengyun Peng, Yunxuan Yu, Kun Wang, Lei He
Specifically, a target object is defined by a bounding box center, tracking offset, and object size.
no code implementations • 18 Mar 2020 • Weinan Song, Yuan Liang, Jiawei Yang, Kun Wang, Lei He
In this paper, we propose a framework, named Oral-3D, to reconstruct the 3D oral cavity from a single PX image and prior information of the dental arch.
no code implementations • 19 Feb 2020 • Weinan Song, Yuan Liang, Jiawei Yang, Kun Wang, Lei He
The encoder-decoder network is widely used to learn deep feature representations from pixel-wise annotations in biomedical image analysis.
no code implementations • 7 Jan 2020 • Yinqiu Liu, Kai Qian, Jianli Chen, Kun Wang, Lei He
As an emerging technology, blockchain has achieved great success in numerous application scenarios, from intelligent healthcare to smart cities.
Cryptography and Security Distributed, Parallel, and Cluster Computing 68M14 C.2.2
no code implementations • 10 Oct 2019 • Yuan Liang, Weinan Song, J. P. Dym, Kun Wang, Lei He
Label propagation is a popular technique for anatomical segmentation.
no code implementations • 4 Oct 2019 • Lei He, Arthur Guijt, Mathijs de Weerdt, Lining Xing, Neil Yorke-Smith
Sparrow integrates the exploration ability of BRKGA and the exploitation ability of ALNS.
no code implementations • 31 Aug 2019 • Lei He
Inspired by the great success of convolutional neural networks on structural data like videos and images, graph neural network (GNN) emerges as a powerful approach to process non-euclidean data structures and has been proved powerful in various application domains such as social network, e-commerce, and knowledge graph.
Distributed, Parallel, and Cluster Computing
1 code implementation • 18 Jul 2019 • Yibin Zheng, Xi Wang, Lei He, Shifeng Pan, Frank K. Soong, Zhengqi Wen, Jian-Hua Tao
Experimental results show our proposed methods especially the second one (bidirectional decoder regularization), leads a significantly improvement on both robustness and overall naturalness, as outperforming baseline (the revised version of Tacotron2) with a MOS gap of 0. 14 in a challenging test, and achieving close to human quality (4. 42 vs. 4. 49 in MOS) on general test.
1 code implementation • 3 Jun 2019 • Mutian He, Yan Deng, Lei He
In this paper, we propose a novel stepwise monotonic attention method in sequence-to-sequence acoustic modeling to improve the robustness on out-of-domain inputs.
no code implementations • 9 Apr 2019 • Haohan Guo, Frank K. Soong, Lei He, Lei Xie
However, the autoregressive module training is affected by the exposure bias, or the mismatch between the different distributions of real and predicted data.
no code implementations • 9 Apr 2019 • Haohan Guo, Frank K. Soong, Lei He, Lei Xie
The end-to-end TTS, which can predict speech directly from a given sequence of graphemes or phonemes, has shown improved performance over the conventional TTS.
no code implementations • 3 Jan 2019 • Huaiping Ming, Lei He, Haohan Guo, Frank K. Soong
In this paper, we propose a feature reinforcement method under the sequence-to-sequence neural text-to-speech (TTS) synthesis framework.
no code implementations • 13 Dec 2018 • Yan Deng, Lei He, Frank Soong
Neural TTS has shown it can generate high quality synthesized speech.
1 code implementation • 12 Dec 2018 • Liang Qiu, Yuanyi Ding, Lei He
In recent years, Recurrent Neural Networks (RNNs) based models have been applied to the Slot Filling problem of Spoken Language Understanding and achieved the state-of-the-art performances.
2 code implementations • 11 Dec 2018 • Ya-Jie Zhang, Shifeng Pan, Lei He, Zhen-Hua Ling
In this paper, we introduce the Variational Autoencoder (VAE) to an end-to-end speech synthesis model, to learn the latent representation of speaking styles in an unsupervised manner.
no code implementations • 27 Mar 2018 • Lei He, Guanghui Wang, Zhanyi Hu
In order to learn monocular depth by embedding the focal length, we propose a method to generate synthetic varying-focal-length dataset from fixed-focal-length datasets, and a simple and effective method is implemented to fill the holes in the newly generated images.
no code implementations • COLING 2016 • Lei He, Wei Li, Hai Zhuge
This paper investigates differential topic models (dTM) for summarizing the differences among document groups.
no code implementations • COLING 2016 • Wei Li, Lei He, Hai Zhuge
This paper studies the abstractive multi-document summarization for event-oriented news texts through event information extraction and abstract representation.
no code implementations • 1 Nov 2015 • Peilu Wang, Yao Qian, Frank K. Soong, Lei He, Hai Zhao
Bidirectional Long Short-Term Memory Recurrent Neural Network (BLSTM-RNN) has been shown to be very effective for modeling and predicting sequential data, e. g. speech utterances or handwritten documents.
4 code implementations • 21 Oct 2015 • Peilu Wang, Yao Qian, Frank K. Soong, Lei He, Hai Zhao
Bidirectional Long Short-Term Memory Recurrent Neural Network (BLSTM-RNN) has been shown to be very effective for tagging sequential data, e. g. speech utterances or handwritten documents.
no code implementations • 18 Nov 2014 • Chen Chen, Junzhou Huang, Lei He, Hongsheng Li
The convergence rate of the proposed algorithm is almost the same as that of the traditional IRLS algorithms, that is, exponentially fast.
no code implementations • CVPR 2014 • Chen Chen, Junzhou Huang, Lei He, Hongsheng Li
In this paper, we propose a novel algorithm for structured sparsity reconstruction.