Search Results for author: Xinjian Li

Found 29 papers, 7 papers with code

Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

1 code implementation • 25 Sep 2023 • Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-weon Jung, Soumi Maiti, Shinji Watanabe

Pre-training speech models on large volumes of data has achieved remarkable success.

Speech Recognition Translation

7,871

Paper
Code

Semi-Supervised Recognition under a Noisy and Fine-grained Dataset

1 code implementation • 18 Jun 2020 • Cheng Cui, Zhi Ye, Yangxi Li, Xinjian Li, Min Yang, Kai Wei, Bing Dai, Yanmei Zhao, Zhongji Liu, Rong Pang

One of the difficulties of this competition is how to use unlabeled data.

Ranked #245 on Image Classification on ImageNet (using extra training data)

Fine-Grained Image Recognition Image Classification +1

5,253

Paper
Code

Universal Phone Recognition with a Multilingual Allophone System

1 code implementation • 26 Feb 2020 • Xinjian Li, Siddharth Dalmia, Juncheng Li, Matthew Lee, Patrick Littell, Jiali Yao, Antonios Anastasopoulos, David R. Mortensen, Graham Neubig, Alan W. black, Florian Metze

Multilingual models can improve language processing, particularly for low resource situations, by sharing parameters across languages.

speech-recognition Speech Recognition

505

Paper
Code

Zero-shot Learning for Grapheme to Phoneme Conversion with Language Ensemble

1 code implementation • Findings (ACL) 2022 • Xinjian Li, Florian Metze, David Mortensen, Shinji Watanabe, Alan Black

Grapheme-to-Phoneme (G2P) has many applications in NLP and speech fields.

8k Zero-Shot Learning

129

Paper
Code

Enabling Real-time Neural IME with Incremental Vocabulary Selection

1 code implementation • NAACL 2019 • Jiali Yao, Raphael Shu, Xinjian Li, Katsutoshi Ohtsuki, Hideki Nakayama

Input method editor (IME) converts sequential alphabet key inputs to words in a target language.

Language Modelling speech-recognition +1

109

Paper
Code

Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining

1 code implementation • 30 Jan 2023 • Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, Hiroshi Saruwatari

While neural text-to-speech (TTS) has achieved human-like natural synthetic speech, multilingual TTS systems are limited to resource-rich languages due to the need for paired text and studio-quality audio data.

Language Modelling

Paper
Code

ASR2K: Speech Recognition for Around 2000 Languages without Audio

1 code implementation • 6 Sep 2022 • Xinjian Li, Florian Metze, David R Mortensen, Alan W Black, Shinji Watanabe

We achieve 50% CER and 74% WER on the Wilderness dataset with Crubadan statistics only and improve them to 45% CER and 69% WER when using 10000 raw text utterances.

Language Modelling Speech Recognition

Paper
Code

Domain Robust Feature Extraction for Rapid Low Resource ASR Development

no code implementations • 28 Jul 2018 • Siddharth Dalmia, Xinjian Li, Florian Metze, Alan W. black

We demonstrate the effectiveness of using a pre-trained English recognizer, which is robust to such mismatched conditions, as a domain normalizing feature extractor on a low resource language.

Paper
Add Code

Real-time Neural-based Input Method

no code implementations • ICLR 2019 • Jiali Yao, Raphael Shu, Xinjian Li, Katsutoshi Ohtsuki, Hideki Nakayama

The input method is an essential service on every mobile and desktop devices that provides text suggestions.

Language Modelling

Paper
Add Code

SHE2: Stochastic Hamiltonian Exploration and Exploitation for Derivative-Free Optimization

no code implementations • ICLR 2019 • Haoyi Xiong, Wenqing Hu, Zhanxing Zhu, Xinjian Li, Yunchao Zhang, Jun Huan

Derivative-free optimization (DFO) using trust region methods is frequently used for machine learning applications, such as (hyper-)parameter optimization without the derivatives of objective functions known.

BIG-bench Machine Learning Text-to-Image Generation

Paper
Add Code

Phoneme Level Language Models for Sequence Based Low Resource ASR

no code implementations • 20 Feb 2019 • Siddharth Dalmia, Xinjian Li, Alan W. black, Florian Metze

Building multilingual and crosslingual models help bring different languages together in a language universal space.

Language Modelling

Paper
Add Code

The ARIEL-CMU Systems for LoReHLT18

no code implementations • 24 Feb 2019 • Aditi Chaudhary, Siddharth Dalmia, Junjie Hu, Xinjian Li, Austin Matthews, Aldrian Obaja Muis, Naoki Otani, Shruti Rijhwani, Zaid Sheikh, Nidhi Vyas, Xinyi Wang, Jiateng Xie, Ruochen Xu, Chunting Zhou, Peter J. Jansen, Yiming Yang, Lori Levin, Florian Metze, Teruko Mitamura, David R. Mortensen, Graham Neubig, Eduard Hovy, Alan W. black, Jaime Carbonell, Graham V. Horwood, Shabnam Tafreshi, Mona Diab, Efsun S. Kayi, Noura Farra, Kathleen McKeown

This paper describes the ARIEL-CMU submissions to the Low Resource Human Language Technologies (LoReHLT) 2018 evaluations for the tasks Machine Translation (MT), Entity Discovery and Linking (EDL), and detection of Situation Frames in Text and Speech (SF Text and Speech).

Machine Translation Translation

Paper
Add Code

SANTLR: Speech Annotation Toolkit for Low Resource Languages

no code implementations • 2 Aug 2019 • Xinjian Li, Zhong Zhou, Siddharth Dalmia, Alan W. black, Florian Metze

In this work, we present SANTLR: Speech Annotation Toolkit for Low Resource Languages.

speech-recognition Speech Recognition

Paper
Add Code

Multilingual Speech Recognition with Corpus Relatedness Sampling

no code implementations • 2 Aug 2019 • Xinjian Li, Siddharth Dalmia, Alan W. black, Florian Metze

For example, the target corpus might benefit more from a corpus in the same domain or a corpus from a close language.

speech-recognition Speech Recognition

Paper
Add Code

Adversarial Music: Real World Audio Adversary Against Wake-word Detection System

no code implementations • NeurIPS 2019 • Juncheng B. Li, Shuhui Qu, Xinjian Li, Joseph Szurley, J. Zico Kolter, Florian Metze

In this work, we target our attack on the wake-word detection system, jamming the model with some inconspicuous background music to deactivate the VAs while our audio adversary is present.

Real-World Adversarial Attack

Paper
Add Code

Towards Zero-shot Learning for Automatic Phonemic Transcription

no code implementations • 26 Feb 2020 • Xinjian Li, Siddharth Dalmia, David R. Mortensen, Juncheng Li, Alan W. black, Florian Metze

The difficulty of this task is that phoneme inventories often differ between the training languages and the target language, making it infeasible to recognize unseen phonemes.

Zero-Shot Learning

Paper
Add Code

AlloVera: A Multilingual Allophone Database

no code implementations • LREC 2020 • David R. Mortensen, Xinjian Li, Patrick Littell, Alexis Michaud, Shruti Rijhwani, Antonios Anastasopoulos, Alan W. black, Florian Metze, Graham Neubig

While phonemic representations are language specific, phonetic representations (stated in terms of (allo)phones) are much closer to a universal (language-independent) transcription.

speech-recognition Speech Recognition

Paper
Add Code

A Summary of the First Workshop on Language Technology for Language Documentation and Revitalization

no code implementations • LREC 2020 • Graham Neubig, Shruti Rijhwani, Alexis Palmer, Jordan MacKenzie, Hilaria Cruz, Xinjian Li, Matthew Lee, Aditi Chaudhary, Luke Gessler, Steven Abney, Shirley Anugrah Hayati, Antonios Anastasopoulos, Olga Zamaraeva, Emily Prud'hommeaux, Jennette Child, Sara Child, Rebecca Knowles, Sarah Moeller, Jeffrey Micher, Yiyuan Li, Sydney Zink, Mengzhou Xia, Roshan S Sharma, Patrick Littell

Despite recent advances in natural language processing and other language technology, the application of such technology to language documentation and conservation has been limited.

Paper
Add Code

Revisiting Factorizing Aggregated Posterior in Learning Disentangled Representations

no code implementations • 12 Sep 2020 • Ze Cheng, Juncheng Li, Chenxu Wang, Jixuan Gu, Hao Xu, Xinjian Li, Florian Metze

In this paper, we provide a theoretical explanation that low total correlation of sampled representation cannot guarantee low total correlation of the mean representation.

Paper
Add Code

End-to-end Quantized Training via Log-Barrier Extensions

no code implementations • 1 Jan 2021 • Juncheng B Li, Shuhui Qu, Xinjian Li, Emma Strubell, Florian Metze

Quantization of neural network parameters and activations has emerged as a successful approach to reducing the model size and inference time on hardware that sup-ports native low-precision arithmetic.

Quantization

Paper
Add Code

Acoustics Based Intent Recognition Using Discovered Phonetic Units for Low Resource Languages

no code implementations • 7 Nov 2020 • Akshat Gupta, Xinjian Li, Sai Krishna Rallabandi, Alan W Black

With the aim of aiding development of spoken dialog systems in low resourced languages, we propose a novel acoustics based intent recognition system that uses discovered phonetic units for intent classification.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Add Code

Tusom2021: A Phonetically Transcribed Speech Dataset from an Endangered Language for Universal Phone Recognition Experiments

no code implementations • 2 Apr 2021 • David R. Mortensen, Jordan Picone, Xinjian Li, Kathleen Siminyu

There is additionally interest in building language technologies for low-resource and endangered languages.

Paper
Add Code

Phoneme Recognition through Fine Tuning of Phonetic Representations: a Case Study on Luhya Language Varieties

no code implementations • 4 Apr 2021 • Kathleen Siminyu, Xinjian Li, Antonios Anastasopoulos, David Mortensen, Michael R. Marlo, Graham Neubig

Models pre-trained on multiple languages have shown significant promise for improving speech recognition, particularly for low-resource languages.

speech-recognition Speech Recognition

Paper
Add Code

On Prosody Modeling for ASR+TTS based Voice Conversion

no code implementations • 20 Jul 2021 • Wen-Chin Huang, Tomoki Hayashi, Xinjian Li, Shinji Watanabe, Tomoki Toda

In voice conversion (VC), an approach showing promising results in the latest voice conversion challenge (VCC) 2020 is to first use an automatic speech recognition (ASR) model to transcribe the source speech into the underlying linguistic contents; these are then used as input by a text-to-speech (TTS) system to generate the converted speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Multi-Faceted Hierarchical Multi-Task Learning for a Large Number of Tasks with Multi-dimensional Relations

no code implementations • 26 Oct 2021 • Junning Liu, Zijie Xia, Yu Lei, Xinjian Li, Xu Wang

For example, when using MTL to model various user behaviors in RS, if we differentiate new users and new items from old ones, there will be a cartesian product style increase of tasks with multi-dimensional relations.

Multi-Task Learning Recommendation Systems

Paper
Add Code

Zero-shot Learning for Speech Recognition with Universal Phonetic Model

no code implementations • 27 Sep 2018 • Xinjian Li, Siddharth Dalmia, David R. Mortensen, Florian Metze, Alan W Black

Our model is able to recognize unseen phonemes in the target language, if only a small text corpus is available.

speech-recognition Speech Recognition +1

Paper
Add Code

RTC-VAE: HARNESSING THE PECULIARITY OF TOTAL CORRELATION IN LEARNING DISENTANGLED REPRESENTATIONS

no code implementations • 25 Sep 2019 • Ze Cheng, Juncheng B Li, Chenxu Wang, Jixuan Gu, Hao Xu, Xinjian Li, Florian Metze

In the problem of unsupervised learning of disentangled representations, one of the promising methods is to penalize the total correlation of sampled latent vari-ables.

Disentanglement

Paper
Add Code

Phone Inventories and Recognition for Every Language

no code implementations • LREC 2022 • Xinjian Li, Florian Metze, David R. Mortensen, Alan W Black, Shinji Watanabe

Identifying phone inventories is a crucial component in language documentation and the preservation of endangered languages.

Paper
Add Code

Textless Direct Speech-to-Speech Translation with Discrete Speech Representation

no code implementations • 31 Oct 2022 • Xinjian Li, Ye Jia, Chung-Cheng Chiu

Research on speech-to-speech translation (S2ST) has progressed rapidly in recent years.

Speech-to-Speech Translation Translation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.