Search Results for author: Lei He

Found 45 papers, 11 papers with code

KLMo: Knowledge Graph Enhanced Pretrained Language Model with Fine-Grained Relationships

1 code implementation Findings (EMNLP) 2021 Lei He, Suncong Zheng, Tao Yang, Feng Zhang

In this work, we propose to incorporate KG (including both entities and relations) into the language learning process to obtain KG-enhanced pretrained Language Model, namely KLMo.

Entity Linking Entity Typing +4

NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

no code implementations9 May 2022 Xu Tan, Jiawei Chen, Haohe Liu, Jian Cong, Chen Zhang, Yanqing Liu, Xi Wang, Yichong Leng, YuanHao Yi, Lei He, Frank Soong, Tao Qin, Sheng Zhao, Tie-Yan Liu

In this paper, we answer these questions by first defining the human-level quality based on the statistical significance of subjective measure and introducing appropriate guidelines to judge it, and then developing a TTS system called NaturalSpeech that achieves human-level quality on a benchmark dataset.

Speech Synthesis Text-To-Speech Synthesis

AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios

no code implementations1 Apr 2022 Yihan Wu, Xu Tan, Bohan Li, Lei He, Sheng Zhao, Ruihua Song, Tao Qin, Tie-Yan Liu

We model the speaker characteristics systematically to improve the generalization on new speakers.

Speech Synthesis

InferGrad: Improving Diffusion Models for Vocoder by Considering Inference in Training

no code implementations8 Feb 2022 Zehua Chen, Xu Tan, Ke Wang, Shifeng Pan, Danilo Mandic, Lei He, Sheng Zhao

In this paper, we propose InferGrad, a diffusion model for vocoder that incorporates inference process into training, to reduce the inference iterations while maintaining high generation quality.

Denoising

Exploring Forensic Dental Identification with Deep Learning

1 code implementation NeurIPS 2021 Yuan Liang, Weikun Han, Liang Qiu, Chen Wu, Yiting shao, Kun Wang, Lei He

In this work, we pioneer to study deep learning for dental forensic identification based on panoramic radiographs.

LW-GCN: A Lightweight FPGA-based Graph Convolutional Network Accelerator

no code implementations4 Nov 2021 Zhuofu Tao, Chen Wu, Yuan Liang, Lei He

In this work, we propose LW-GCN, a lightweight FPGA-based accelerator with a software-hardware co-designed process to tackle irregularity in computation and memory access in GCN inference.

Quantization

LF-YOLO: A Lighter and Faster YOLO for Weld Defect Detection of X-ray Image

1 code implementation28 Oct 2021 Moyun Liu, Youping Chen, Lei He, Yang Zhang, Jingming Xie

To further prove the ability of our method, we test it on public dataset MS COCO, and the results show that our LF-YOLO has a outstanding versatility detection performance.

Defect Detection

DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021

1 code implementation25 Oct 2021 Yanqing Liu, Zhihang Xu, Gang Wang, Kuan Chen, Bohan Li, Xu Tan, Jinzhu Li, Lei He, Sheng Zhao

The goal of this challenge is to synthesize natural and high-quality speech from text, and we approach this goal in two perspectives: The first is to directly model and generate waveform in 48 kHz sampling rate, which brings higher perception quality than previous systems with 16 kHz or 24 kHz sampling rate; The second is to model the variation information in speech through a systematic design, which improves the prosody and naturalness.

Speech Synthesis

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

no code implementations19 Oct 2021 Mutian He, Jingzhou Yang, Lei He, Frank K. Soong

End-to-end TTS suffers from high data requirements as it is difficult for both costly speech corpora to cover all necessary knowledge and neural models to learn the knowledge, hence additional knowledge needs to be injected manually.

X2Teeth: 3D Teeth Reconstruction from a Single Panoramic Radiograph

no code implementations30 Aug 2021 Yuan Liang, Weinan Song, Jiawei Yang, Liang Qiu, Kun Wang, Lei He

Different from single object reconstruction from photos, this task has the unique challenge of constructing multiple objects at high resolutions.

3D Reconstruction Object Reconstruction

Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis

no code implementations27 Jul 2021 Shifeng Pan, Lei He

Secondly, in these models the content/text, prosody, and speaker timbre are usually highly entangled, it's therefore not realistic to expect a satisfied result when freely combining these components, such as to transfer speaking style between speakers.

Expressive Speech Synthesis Style Transfer

TumorCP: A Simple but Effective Object-Level Data Augmentation for Tumor Segmentation

1 code implementation21 Jul 2021 Jiawei Yang, Yao Zhang, Yuan Liang, Yang Zhang, Lei He, Zhiqiang He

Experiments on kidney tumor segmentation task demonstrate that TumorCP surpasses the strong baseline by a remarkable margin of 7. 12% on tumor Dice.

Data Augmentation Tumor Segmentation

Diff-Net: Image Feature Difference based High-Definition Map Change Detection for Autonomous Driving

no code implementations14 Jul 2021 Lei He, Shengjie Jiang, Xiaoqing Liang, Ning Wang, Shiyu Song

Compared to traditional methods based on object detectors, the essential design in our work is a parallel feature difference calculation structure that infers map changes by comparing features extracted from the camera and rasterized images.

Autonomous Driving Change Detection +2

Speech BERT Embedding For Improving Prosody in Neural TTS

no code implementations8 Jun 2021 Liping Chen, Yan Deng, Xi Wang, Frank K. Soong, Lei He

Experimental results obtained by the Transformer TTS show that the proposed BERT can extract fine-grained, segment-level prosody, which is complementary to utterance-level prosody to improve the final prosody of the TTS speech.

On Addressing Practical Challenges for RNN-Transducer

no code implementations27 Apr 2021 Rui Zhao, Jian Xue, Jinyu Li, Wenning Wei, Lei He, Yifan Gong

The first challenge is solved with a splicing data method which concatenates the speech segments extracted from the source domain data.

Speech Recognition

Exploring Machine Speech Chain for Domain Adaptation and Few-Shot Speaker Adaptation

no code implementations8 Apr 2021 Fengpeng Yue, Yan Deng, Lei He, Tom Ko

Machine Speech Chain, which integrates both end-to-end (E2E) automatic speech recognition (ASR) and text-to-speech (TTS) into one circle for joint training, has been proven to be effective in data augmentation by leveraging large amounts of unpaired data.

Automatic Speech Recognition Data Augmentation +1

Multilingual Byte2Speech Models for Scalable Low-resource Speech Synthesis

2 code implementations5 Mar 2021 Mutian He, Jingzhou Yang, Lei He, Frank K. Soong

To scale neural speech synthesis to various real-world languages, we present a multilingual end-to-end framework that maps byte inputs to spectrograms, thus allowing arbitrary input scripts.

Speech Synthesis

Atlas-aware ConvNetfor Accurate yet Robust Anatomical Segmentation

no code implementations2 Feb 2021 Yuan Liang, Weinan Song, Jiawei Yang, Liang Qiu, Kun Wang, Lei He

Second, we can largely boost the robustness of existing ConvNets, proved by: (i) testing on scans with synthetic pathologies, and (ii) training and evaluation on scans of different scanning setups across datasets.

SOSD-Net: Joint Semantic Object Segmentation and Depth Estimation from Monocular images

no code implementations19 Jan 2021 Lei He, Jiwen Lu, Guanghui Wang, Shiyu Song, Jie zhou

In this paper, we first introduce the concept of semantic objectness to exploit the geometric relationship of these two tasks through an analysis of the imaging process, then propose a Semantic Object Segmentation and Depth Estimation Network (SOSD-Net) based on the objectness assumption.

Monocular Depth Estimation Multi-Task Learning +2

Exploring Instance-Level Uncertainty for Medical Detection

no code implementations23 Dec 2020 Jiawei Yang, Yuan Liang, Yao Zhang, Weinan Song, Kun Wang, Lei He

The ability of deep learning to predict with uncertainty is recognized as key for its adoption in clinical routines.

Lung Nodule Detection

Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability

no code implementations30 Jul 2020 Jinyu Li, Rui Zhao, Zhong Meng, Yanqing Liu, Wenning Wei, Sarangarajan Parthasarathy, Vadim Mazalov, Zhenghao Wang, Lei He, Sheng Zhao, Yifan Gong

Because of its streaming nature, recurrent neural network transducer (RNN-T) is a very promising end-to-end (E2E) model that may replace the popular hybrid model for automatic speech recognition.

Automatic Speech Recognition

Accurate Anchor Free Tracking

no code implementations13 Jun 2020 Shengyun Peng, Yunxuan Yu, Kun Wang, Lei He

Specifically, a target object is defined by a bounding box center, tracking offset, and object size.

Frame Visual Object Tracking

Oral-3D: Reconstructing the 3D Bone Structure of Oral Cavity from 2D Panoramic X-ray

no code implementations18 Mar 2020 Weinan Song, Yuan Liang, Jiawei Yang, Kun Wang, Lei He

In this paper, we propose a framework, named Oral-3D, to reconstruct the 3D oral cavity from a single PX image and prior information of the dental arch.

3D Reconstruction

T-Net: Learning Feature Representation with Task-specific Supervision for Biomedical Image Analysis

no code implementations19 Feb 2020 Weinan Song, Yuan Liang, Jiawei Yang, Kun Wang, Lei He

The encoder-decoder network is widely used to learn deep feature representations from pixel-wise annotations in biomedical image analysis.

Region Proposal Representation Learning

Effective Scaling of Blockchain Beyond Consensus Innovations and Moore's Law

no code implementations7 Jan 2020 Yinqiu Liu, Kai Qian, Jianli Chen, Kun Wang, Lei He

As an emerging technology, blockchain has achieved great success in numerous application scenarios, from intelligent healthcare to smart cities.

Cryptography and Security Distributed, Parallel, and Cluster Computing 68M14 C.2.2

EnGN: A High-Throughput and Energy-Efficient Accelerator for Large Graph Neural Networks

no code implementations31 Aug 2019 Lei He

Inspired by the great success of convolutional neural networks on structural data like videos and images, graph neural network (GNN) emerges as a powerful approach to process non-euclidean data structures and has been proved powerful in various application domains such as social network, e-commerce, and knowledge graph.

Distributed, Parallel, and Cluster Computing

Forward-Backward Decoding for Regularizing End-to-End TTS

1 code implementation18 Jul 2019 Yibin Zheng, Xi Wang, Lei He, Shifeng Pan, Frank K. Soong, Zhengqi Wen, Jian-Hua Tao

Experimental results show our proposed methods especially the second one (bidirectional decoder regularization), leads a significantly improvement on both robustness and overall naturalness, as outperforming baseline (the revised version of Tacotron2) with a MOS gap of 0. 14 in a challenging test, and achieving close to human quality (4. 42 vs. 4. 49 in MOS) on general test.

Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS

1 code implementation3 Jun 2019 Mutian He, Yan Deng, Lei He

In this paper, we propose a novel stepwise monotonic attention method in sequence-to-sequence acoustic modeling to improve the robustness on out-of-domain inputs.

Hard Attention

Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS

no code implementations9 Apr 2019 Haohan Guo, Frank K. Soong, Lei He, Lei Xie

The end-to-end TTS, which can predict speech directly from a given sequence of graphemes or phonemes, has shown improved performance over the conventional TTS.

A New GAN-based End-to-End TTS Training Algorithm

no code implementations9 Apr 2019 Haohan Guo, Frank K. Soong, Lei He, Lei Xie

However, the autoregressive module training is affected by the exposure bias, or the mismatch between the different distributions of real and predicted data.

Transfer Learning

Feature reinforcement with word embedding and parsing information in neural TTS

no code implementations3 Jan 2019 Huaiping Ming, Lei He, Haohan Guo, Frank K. Soong

In this paper, we propose a feature reinforcement method under the sequence-to-sequence neural text-to-speech (TTS) synthesis framework.

Recurrent Neural Networks with Pre-trained Language Model Embedding for Slot Filling Task

1 code implementation12 Dec 2018 Liang Qiu, Yuanyi Ding, Lei He

In recent years, Recurrent Neural Networks (RNNs) based models have been applied to the Slot Filling problem of Spoken Language Understanding and achieved the state-of-the-art performances.

Language Modelling Slot Filling +1

Learning latent representations for style control and transfer in end-to-end speech synthesis

2 code implementations11 Dec 2018 Ya-Jie Zhang, Shifeng Pan, Lei He, Zhen-Hua Ling

In this paper, we introduce the Variational Autoencoder (VAE) to an end-to-end speech synthesis model, to learn the latent representation of speaking styles in an unsupervised manner.

Speech Synthesis Style Transfer

Learning Depth from Single Images with Deep Neural Network Embedding Focal Length

no code implementations27 Mar 2018 Lei He, Guanghui Wang, Zhanyi Hu

In order to learn monocular depth by embedding the focal length, we propose a method to generate synthetic varying-focal-length dataset from fixed-focal-length datasets, and a simple and effective method is implemented to fill the holes in the newly generated images.

Depth Estimation Network Embedding +1

Abstractive News Summarization based on Event Semantic Link Network

no code implementations COLING 2016 Wei Li, Lei He, Hai Zhuge

This paper studies the abstractive multi-document summarization for event-oriented news texts through event information extraction and abstract representation.

Abstractive Text Summarization Document Summarization +2

A Unified Tagging Solution: Bidirectional LSTM Recurrent Neural Network with Word Embedding

no code implementations1 Nov 2015 Peilu Wang, Yao Qian, Frank K. Soong, Lei He, Hai Zhao

Bidirectional Long Short-Term Memory Recurrent Neural Network (BLSTM-RNN) has been shown to be very effective for modeling and predicting sequential data, e. g. speech utterances or handwritten documents.

Chunking Feature Engineering +2

Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Recurrent Neural Network

3 code implementations21 Oct 2015 Peilu Wang, Yao Qian, Frank K. Soong, Lei He, Hai Zhao

Bidirectional Long Short-Term Memory Recurrent Neural Network (BLSTM-RNN) has been shown to be very effective for tagging sequential data, e. g. speech utterances or handwritten documents.

Part-Of-Speech Tagging POS

Fast Iteratively Reweighted Least Squares Algorithms for Analysis-Based Sparsity Reconstruction

no code implementations18 Nov 2014 Chen Chen, Junzhou Huang, Lei He, Hongsheng Li

The convergence rate of the proposed algorithm is almost the same as that of the traditional IRLS algorithms, that is, exponentially fast.

14 Compressive Sensing

Cannot find the paper you are looking for? You can Submit a new open access paper.