Search Results for author: Andros Tjandra

Found 35 papers, 5 papers with code

Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning

no code implementations10 Jun 2024 Chung-Ming Chien, Andros Tjandra, Apoorv Vyas, Matt Le, Bowen Shi, Wei-Ning Hsu

As the scale of generative models continues to grow, efficient reuse and adaptation of pre-trained models have become crucial considerations.

Audiobox: Unified Audio Generation with Natural Language Prompts

no code implementations25 Dec 2023 Apoorv Vyas, Bowen Shi, Matthew Le, Andros Tjandra, Yi-Chiao Wu, Baishan Guo, Jiemin Zhang, Xinyue Zhang, Robert Adkins, William Ngan, Jeff Wang, Ivan Cruz, Bapi Akula, Akinniyi Akinyemi, Brian Ellis, Rashel Moritz, Yael Yungster, Alice Rakotoarison, Liang Tan, Chris Summers, Carleigh Wood, Joshua Lane, Mary Williamson, Wei-Ning Hsu

Research communities have made great progress over the past year advancing the performance of large scale audio generative models for a single modality (speech, sound, or music) through adopting more powerful generative models and scaling data.

AudioCaps Audio Generation +1

Generative Pre-training for Speech with Flow Matching

no code implementations25 Oct 2023 Alexander H. Liu, Matt Le, Apoorv Vyas, Bowen Shi, Andros Tjandra, Wei-Ning Hsu

Generative models have gained more and more attention in recent years for their remarkable success in tasks that required estimating and sampling data distribution to generate high-fidelity synthetic data.

Speech Enhancement Speech Synthesis +1

Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model

no code implementations22 Sep 2023 Jiamin Xie, Ke Li, Jinxi Guo, Andros Tjandra, Yuan Shangguan, Leda Sari, Chunyang Wu, Junteng Jia, Jay Mahadeokar, Ozlem Kalinli

In this work, we propose the use of an adaptive masking approach in two scenarios for pruning a multilingual ASR model efficiently, each resulting in sparse monolingual models or a sparse multilingual model (named as Dynamic ASR Pathways).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

SpeeChain: A Speech Toolkit for Large-Scale Machine Speech Chain

no code implementations8 Jan 2023 Heli Qi, Sashi Novitasari, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

This paper introduces SpeeChain, an open-source Pytorch-based toolkit designed to develop the machine speech chain for large-scale use.

Data Augmentation

Voice-preserving Zero-shot Multiple Accent Conversion

no code implementations23 Nov 2022 Mumin Jin, Prashant Serai, JiLong Wu, Andros Tjandra, Vimal Manohar, Qing He

Most people who have tried to learn a foreign language would have experienced difficulties understanding or speaking with a native speaker's accent.

Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities

no code implementations10 Nov 2022 Andros Tjandra, Nayan Singhal, David Zhang, Ozlem Kalinli, Abdelrahman Mohamed, Duc Le, Michael L. Seltzer

Later, we use our optimal tokenization strategy to train multiple embedding and output model to further improve our result.

Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise Distillation

1 code implementation29 Mar 2022 Rendi Chevi, Radityo Eko Prasojo, Alham Fikri Aji, Andros Tjandra, Sakriani Sakti

We present Nix-TTS, a lightweight TTS achieved via knowledge distillation to a high-quality yet large-sized, non-autoregressive, and end-to-end (vocoder-free) TTS teacher model.

Decoder Knowledge Distillation +1

Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks

no code implementations14 Oct 2021 Sangeeta Srivastava, Yun Wang, Andros Tjandra, Anurag Kumar, Chunxi Liu, Kritika Singh, Yatharth Saraf

While self-supervised speech representation learning has been popular in the speech research community, very few works have comprehensively analyzed audio representation learning for non-speech audio tasks.

Audio Classification Representation Learning +2

Augmenting Images for ASR and TTS through Single-loop and Dual-loop Multimodal Chain Framework

no code implementations4 Nov 2020 Johanes Effendi, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

Previous research has proposed a machine speech chain to enable automatic speech recognition (ASR) and text-to-speech synthesis (TTS) to assist each other in semi-supervised learning and to avoid the need for a large amount of paired speech and text data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Cross-Lingual Machine Speech Chain for Javanese, Sundanese, Balinese, and Bataks Speech Recognition and Synthesis

no code implementations LREC 2020 Sashi Novitasari, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

We then develop ASR and TTS of ethnic languages by utilizing Indonesian ASR and TTS in a cross-lingual machine speech chain framework with only text or only speech data removing the need for paired speech-text data of those ethnic languages.

Machine Translation speech-recognition +3

Sequence-to-Sequence Learning via Attention Transfer for Incremental Speech Recognition

no code implementations4 Nov 2020 Sashi Novitasari, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

One main reason is because the model needs to decide the incremental steps and learn the transcription that aligns with the current short speech segment.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Unsupervised Learning of Disentangled Speech Content and Style Representation

no code implementations24 Oct 2020 Andros Tjandra, Ruoming Pang, Yu Zhang, Shigeki Karita

We present an approach for unsupervised learning of speech representation disentangling contents and styles.

Decoder Speaker Recognition

Deja-vu: Double Feature Presentation and Iterated Loss in Deep Transformer Networks

1 code implementation23 Oct 2019 Andros Tjandra, Chunxi Liu, Frank Zhang, Xiaohui Zhang, Yongqiang Wang, Gabriel Synnaeve, Satoshi Nakamura, Geoffrey Zweig

As our motivation is to allow acoustic models to re-examine their input features in light of partial hypotheses we introduce intermediate model heads and loss function.

Speech-to-speech Translation between Untranscribed Unknown Languages

no code implementations2 Oct 2019 Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

Second, we train a sequence-to-sequence model that directly maps the source language speech to the target language's discrete representation.

Speech-to-Speech Translation Translation

Listening while Speaking and Visualizing: Improving ASR through Multimodal Chain

no code implementations3 Jun 2019 Johanes Effendi, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

Previously, a machine speech chain, which is based on sequence-to-sequence deep learning, was proposed to mimic speech perception and production behavior.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

VQVAE Unsupervised Unit Discovery and Multi-scale Code2Spec Inverter for Zerospeech Challenge 2019

no code implementations27 May 2019 Andros Tjandra, Berrak Sisman, Mingyang Zhang, Sakriani Sakti, Haizhou Li, Satoshi Nakamura

Our proposed approach significantly improved the intelligibility (in CER), the MOS, and discrimination ABX scores compared to the official ZeroSpeech 2019 baseline or even the topline.


Machine Speech Chain with One-shot Speaker Adaptation

no code implementations28 Mar 2018 Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

In the speech chain loop mechanism, ASR also benefits from the ability to further learn an arbitrary speaker's characteristics from the generated speech waveform, resulting in a significant improvement in the recognition rate.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Tensor Decomposition for Compressing Recurrent Neural Network

1 code implementation28 Feb 2018 Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

In the machine learning fields, Recurrent Neural Network (RNN) has become a popular architecture for sequential data modeling.

Tensor Decomposition

Sequence-to-Sequence ASR Optimization via Reinforcement Learning

no code implementations30 Oct 2017 Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

Despite the success of sequence-to-sequence approaches in automatic speech recognition (ASR) systems, the models still suffer from several problems, mainly due to the mismatch between the training and inference conditions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Attention-based Wav2Text with Feature Transfer Learning

no code implementations22 Sep 2017 Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

In this paper, we construct the first end-to-end attention-based encoder-decoder model to process directly from raw speech waveform to the text transcription.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Gated Recurrent Neural Tensor Network

no code implementations7 Jun 2017 Andros Tjandra, Sakriani Sakti, Ruli Manurung, Mirna Adriani, Satoshi Nakamura

Our proposed RNNs, which are called a Long-Short Term Memory Recurrent Neural Tensor Network (LSTMRNTN) and Gated Recurrent Unit Recurrent Neural Tensor Network (GRURNTN), are made by combining the LSTM and GRU RNN models with the tensor product.

Language Modelling

Compressing Recurrent Neural Network with Tensor Train

no code implementations23 May 2017 Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

Recurrent Neural Network (RNN) are a popular choice for modeling temporal and sequential tasks and achieve many state-of-the-art performance on various complex problems.

Cannot find the paper you are looking for? You can Submit a new open access paper.