Search Results for author: Xinjian Li

Found 29 papers, 7 papers with code

Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining

1 code implementation30 Jan 2023 Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, Hiroshi Saruwatari

While neural text-to-speech (TTS) has achieved human-like natural synthetic speech, multilingual TTS systems are limited to resource-rich languages due to the need for paired text and studio-quality audio data.

Language Modelling

ASR2K: Speech Recognition for Around 2000 Languages without Audio

1 code implementation6 Sep 2022 Xinjian Li, Florian Metze, David R Mortensen, Alan W Black, Shinji Watanabe

We achieve 50% CER and 74% WER on the Wilderness dataset with Crubadan statistics only and improve them to 45% CER and 69% WER when using 10000 raw text utterances.

Language Modelling Speech Recognition

Domain Robust Feature Extraction for Rapid Low Resource ASR Development

no code implementations28 Jul 2018 Siddharth Dalmia, Xinjian Li, Florian Metze, Alan W. black

We demonstrate the effectiveness of using a pre-trained English recognizer, which is robust to such mismatched conditions, as a domain normalizing feature extractor on a low resource language.

Real-time Neural-based Input Method

no code implementations ICLR 2019 Jiali Yao, Raphael Shu, Xinjian Li, Katsutoshi Ohtsuki, Hideki Nakayama

The input method is an essential service on every mobile and desktop devices that provides text suggestions.

Language Modelling

SHE2: Stochastic Hamiltonian Exploration and Exploitation for Derivative-Free Optimization

no code implementations ICLR 2019 Haoyi Xiong, Wenqing Hu, Zhanxing Zhu, Xinjian Li, Yunchao Zhang, Jun Huan

Derivative-free optimization (DFO) using trust region methods is frequently used for machine learning applications, such as (hyper-)parameter optimization without the derivatives of objective functions known.

BIG-bench Machine Learning Text-to-Image Generation

Phoneme Level Language Models for Sequence Based Low Resource ASR

no code implementations20 Feb 2019 Siddharth Dalmia, Xinjian Li, Alan W. black, Florian Metze

Building multilingual and crosslingual models help bring different languages together in a language universal space.

Language Modelling

The ARIEL-CMU Systems for LoReHLT18

no code implementations24 Feb 2019 Aditi Chaudhary, Siddharth Dalmia, Junjie Hu, Xinjian Li, Austin Matthews, Aldrian Obaja Muis, Naoki Otani, Shruti Rijhwani, Zaid Sheikh, Nidhi Vyas, Xinyi Wang, Jiateng Xie, Ruochen Xu, Chunting Zhou, Peter J. Jansen, Yiming Yang, Lori Levin, Florian Metze, Teruko Mitamura, David R. Mortensen, Graham Neubig, Eduard Hovy, Alan W. black, Jaime Carbonell, Graham V. Horwood, Shabnam Tafreshi, Mona Diab, Efsun S. Kayi, Noura Farra, Kathleen McKeown

This paper describes the ARIEL-CMU submissions to the Low Resource Human Language Technologies (LoReHLT) 2018 evaluations for the tasks Machine Translation (MT), Entity Discovery and Linking (EDL), and detection of Situation Frames in Text and Speech (SF Text and Speech).

Machine Translation Translation

Multilingual Speech Recognition with Corpus Relatedness Sampling

no code implementations2 Aug 2019 Xinjian Li, Siddharth Dalmia, Alan W. black, Florian Metze

For example, the target corpus might benefit more from a corpus in the same domain or a corpus from a close language.

speech-recognition Speech Recognition

Adversarial Music: Real World Audio Adversary Against Wake-word Detection System

no code implementations NeurIPS 2019 Juncheng B. Li, Shuhui Qu, Xinjian Li, Joseph Szurley, J. Zico Kolter, Florian Metze

In this work, we target our attack on the wake-word detection system, jamming the model with some inconspicuous background music to deactivate the VAs while our audio adversary is present.

Real-World Adversarial Attack

Towards Zero-shot Learning for Automatic Phonemic Transcription

no code implementations26 Feb 2020 Xinjian Li, Siddharth Dalmia, David R. Mortensen, Juncheng Li, Alan W. black, Florian Metze

The difficulty of this task is that phoneme inventories often differ between the training languages and the target language, making it infeasible to recognize unseen phonemes.

Zero-Shot Learning

AlloVera: A Multilingual Allophone Database

no code implementations LREC 2020 David R. Mortensen, Xinjian Li, Patrick Littell, Alexis Michaud, Shruti Rijhwani, Antonios Anastasopoulos, Alan W. black, Florian Metze, Graham Neubig

While phonemic representations are language specific, phonetic representations (stated in terms of (allo)phones) are much closer to a universal (language-independent) transcription.

speech-recognition Speech Recognition

Revisiting Factorizing Aggregated Posterior in Learning Disentangled Representations

no code implementations12 Sep 2020 Ze Cheng, Juncheng Li, Chenxu Wang, Jixuan Gu, Hao Xu, Xinjian Li, Florian Metze

In this paper, we provide a theoretical explanation that low total correlation of sampled representation cannot guarantee low total correlation of the mean representation.

End-to-end Quantized Training via Log-Barrier Extensions

no code implementations1 Jan 2021 Juncheng B Li, Shuhui Qu, Xinjian Li, Emma Strubell, Florian Metze

Quantization of neural network parameters and activations has emerged as a successful approach to reducing the model size and inference time on hardware that sup-ports native low-precision arithmetic.

Quantization

Acoustics Based Intent Recognition Using Discovered Phonetic Units for Low Resource Languages

no code implementations7 Nov 2020 Akshat Gupta, Xinjian Li, Sai Krishna Rallabandi, Alan W Black

With the aim of aiding development of spoken dialog systems in low resourced languages, we propose a novel acoustics based intent recognition system that uses discovered phonetic units for intent classification.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

On Prosody Modeling for ASR+TTS based Voice Conversion

no code implementations20 Jul 2021 Wen-Chin Huang, Tomoki Hayashi, Xinjian Li, Shinji Watanabe, Tomoki Toda

In voice conversion (VC), an approach showing promising results in the latest voice conversion challenge (VCC) 2020 is to first use an automatic speech recognition (ASR) model to transcribe the source speech into the underlying linguistic contents; these are then used as input by a text-to-speech (TTS) system to generate the converted speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Multi-Faceted Hierarchical Multi-Task Learning for a Large Number of Tasks with Multi-dimensional Relations

no code implementations26 Oct 2021 Junning Liu, Zijie Xia, Yu Lei, Xinjian Li, Xu Wang

For example, when using MTL to model various user behaviors in RS, if we differentiate new users and new items from old ones, there will be a cartesian product style increase of tasks with multi-dimensional relations.

Multi-Task Learning Recommendation Systems

RTC-VAE: HARNESSING THE PECULIARITY OF TOTAL CORRELATION IN LEARNING DISENTANGLED REPRESENTATIONS

no code implementations25 Sep 2019 Ze Cheng, Juncheng B Li, Chenxu Wang, Jixuan Gu, Hao Xu, Xinjian Li, Florian Metze

In the problem of unsupervised learning of disentangled representations, one of the promising methods is to penalize the total correlation of sampled latent vari-ables.

Disentanglement

Phone Inventories and Recognition for Every Language

no code implementations LREC 2022 Xinjian Li, Florian Metze, David R. Mortensen, Alan W Black, Shinji Watanabe

Identifying phone inventories is a crucial component in language documentation and the preservation of endangered languages.

Cannot find the paper you are looking for? You can Submit a new open access paper.