Search Results for author: Sakriani Sakti

Found 56 papers, 9 papers with code

Multi-paraphrase Augmentation to Leverage Neural Caption Translation

no code implementations • IWSLT (EMNLP) 2018 • Johanes Effendi, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura

In this paper, we investigate and utilize neural paraphrasing to improve translation quality in neural MT (NMT), which has not yet been much explored.

Machine Translation NMT +1

Paper
Add Code

NAIST Simultaneous Speech-to-Text Translation System for IWSLT 2022

no code implementations • IWSLT (ACL) 2022 • Ryo Fukuda, Yuka Ko, Yasumasa Kano, Kosuke Doi, Hirotaka Tokuyama, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura

This paper describes NAIST’s simultaneous speech translation systems developed for IWSLT 2022 Evaluation Campaign.

Segmentation Simultaneous Speech-to-Text Translation +1

Paper
Add Code

NAIST English-to-Japanese Simultaneous Translation System for IWSLT 2021 Simultaneous Text-to-text Task

no code implementations • ACL (IWSLT) 2021 • Ryo Fukuda, Yui Oka, Yasumasa Kano, Yuki Yano, Yuka Ko, Hirotaka Tokuyama, Kosuke Doi, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura

This paper describes NAIST’s system for the English-to-Japanese Simultaneous Text-to-text Translation Task in IWSLT 2021 Evaluation Campaign.

Knowledge Distillation Machine Translation +1

Paper
Add Code

Using Spoken Word Posterior Features in Neural Machine Translation

no code implementations • IWSLT (EMNLP) 2018 • Kaho Osamura, Takatomo Kano, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura

In this paper, a neural sequence-to-sequence ASR is used as feature processing that is trained to produce word posterior features given spoken utterances.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

SpeeChain: A Speech Toolkit for Large-Scale Machine Speech Chain

no code implementations • 8 Jan 2023 • Heli Qi, Sashi Novitasari, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

This paper introduces SpeeChain, an open-source Pytorch-based toolkit designed to develop the machine speech chain for large-scale use.

Data Augmentation

Paper
Add Code

NusaCrowd: Open Source Initiative for Indonesian NLP Resources

1 code implementation • 19 Dec 2022 • Samuel Cahyawijaya, Holy Lovenia, Alham Fikri Aji, Genta Indra Winata, Bryan Wilie, Rahmad Mahendra, Christian Wibisono, Ade Romadhony, Karissa Vincentio, Fajri Koto, JENNIFER SANTOSO, David Moeljadi, Cahya Wirawan, Frederikus Hudi, Ivan Halim Parmonangan, Ika Alfina, Muhammad Satrio Wicaksono, Ilham Firdausi Putra, Samsul Rahmadani, Yulianti Oenang, Ali Akbar Septiandri, James Jaya, Kaustubh D. Dhole, Arie Ardiyanti Suryani, Rifki Afina Putri, Dan Su, Keith Stevens, Made Nindyatama Nityasya, Muhammad Farid Adilazuarda, Ryan Ignatius, Ryandito Diandaru, Tiezheng Yu, Vito Ghifari, Wenliang Dai, Yan Xu, Dyah Damapuspita, Cuk Tho, Ichwanul Muslim Karo Karo, Tirana Noor Fatyanosa, Ziwei Ji, Pascale Fung, Graham Neubig, Timothy Baldwin, Sebastian Ruder, Herry Sujaini, Sakriani Sakti, Ayu Purwarianti

We present NusaCrowd, a collaborative initiative to collect and unify existing resources for Indonesian languages, including opening access to previously non-public resources.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

253

Paper
Code

Instance-level Heterogeneous Domain Adaptation for Limited-labeled Sketch-to-Photo Retrieval

1 code implementation • IEEE Transactions on Multimedia 2020 • Fan Yang, Yang Wu, Zheng Wang, Xiang Li, Sakriani Sakti, Satoshi Nakamura

Therefore, previous works pre-train their models on rich-labeled photo retrieval data (i. e., source domain) and then fine-tune them on the limited-labeled sketch-to-photo retrieval data (i. e., target domain).

Ranked #1 on Image Retrieval on PKU-Reid

Domain Adaptation Image Retrieval +1

Paper
Code

Actor-identified Spatiotemporal Action Detection --- Detecting Who Is Doing What in Videos

1 code implementation • 27 Aug 2022 • Fan Yang, Norimichi Ukita, Sakriani Sakti, Satoshi Nakamura

By using MOT, the spatiotemporal boundary of each actor is obtained and assigned to a unique actor identity.

Action Classification Action Detection +3

Paper
Code

Speech Artifact Removal from EEG Recordings of Spoken Word Production with Tensor Decomposition

no code implementations • 1 Jun 2022 • Holy Lovenia, Hiroki Tanaka, Sakriani Sakti, Ayu Purwarianti, Satoshi Nakamura

Research about brain activities involving spoken word production is considerably underdeveloped because of the undiscovered characteristics of speech artifacts, which contaminate electroencephalogram (EEG) signals and prevent the inspection of the underlying cognitive processes.

blind source separation EEG +1

Paper
Add Code

Improved Consistency Training for Semi-Supervised Sequence-to-Sequence ASR via Speech Chain Reconstruction and Self-Transcribing

no code implementations • 14 May 2022 • Heli Qi, Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura

The existing paradigm of semi-supervised S2S ASR utilizes SpecAugment as data augmentation and requires a static teacher model to produce pseudo transcripts for untranscribed speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise Distillation

1 code implementation • 29 Mar 2022 • Rendi Chevi, Radityo Eko Prasojo, Alham Fikri Aji, Andros Tjandra, Sakriani Sakti

We present Nix-TTS, a lightweight TTS achieved via knowledge distillation to a high-quality yet large-sized, non-autoregressive, and end-to-end (vocoder-free) TTS teacher model.

Knowledge Distillation Neural Architecture Search

221

Paper
Code

Simultaneous Speech-to-Speech Translation System with Neural Incremental ASR, MT, and TTS

no code implementations • 10 Nov 2020 • Katsuhito Sudoh, Takatomo Kano, Sashi Novitasari, Tomoya Yanagita, Sakriani Sakti, Satoshi Nakamura

This paper presents a newly developed, simultaneous neural speech-to-speech translation system and its evaluation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Paper
Add Code

Cross-Lingual Machine Speech Chain for Javanese, Sundanese, Balinese, and Bataks Speech Recognition and Synthesis

no code implementations • LREC 2020 • Sashi Novitasari, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

We then develop ASR and TTS of ethnic languages by utilizing Indonesian ASR and TTS in a cross-lingual machine speech chain framework with only text or only speech data removing the need for paired speech-text data of those ethnic languages.

Machine Translation speech-recognition +3

Paper
Add Code

Augmenting Images for ASR and TTS through Single-loop and Dual-loop Multimodal Chain Framework

no code implementations • 4 Nov 2020 • Johanes Effendi, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

Previous research has proposed a machine speech chain to enable automatic speech recognition (ASR) and text-to-speech synthesis (TTS) to assist each other in semi-supervised learning and to avoid the need for a large amount of paired speech and text data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Paper
Add Code

Incremental Machine Speech Chain Towards Enabling Listening while Speaking in Real-time

no code implementations • 4 Nov 2020 • Sashi Novitasari, Andros Tjandra, Tomoya Yanagita, Sakriani Sakti, Satoshi Nakamura

By contrast, humans can listen to what hey speak in real-time, and if there is a delay in hearing, they won't be able to continue speaking.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Sequence-to-Sequence Learning via Attention Transfer for Incremental Speech Recognition

no code implementations • 4 Nov 2020 • Sashi Novitasari, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

One main reason is because the model needs to decide the incremental steps and learn the transcription that aligns with the current short speech segment.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

The Zero Resource Speech Challenge 2020: Discovering discrete subword and word units

no code implementations • 12 Oct 2020 • Ewan Dunbar, Julien Karadayi, Mathieu Bernard, Xuan-Nga Cao, Robin Algayres, Lucas Ondel, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux

We present the Zero Resource Speech Challenge 2020, which aims at learning speech representations from raw audio signals without any labels.

Speech Synthesis

Paper
Add Code

ReMOTS: Self-Supervised Refining Multi-Object Tracking and Segmentation

no code implementations • 7 Jul 2020 • Fan Yang, Xin Chang, Chenyu Dang, Ziqiang Zheng, Sakriani Sakti, Satoshi Nakamura, Yang Wu

We aim to improve the performance of Multiple Object Tracking and Segmentation (MOTS) by refinement.

Ranked #1 on Multi-Object Tracking on MOTS20

Multi-Object Tracking Multi-Object Tracking and Segmentation +2

Paper
Add Code

Transformer VQ-VAE for Unsupervised Unit Discovery and Speech Synthesis: ZeroSpeech 2020 Challenge

no code implementations • 24 May 2020 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

In this paper, we report our submitted system for the ZeroSpeech 2020 challenge on Track 2019.

Speech Synthesis

Paper
Add Code

Emotional Speech Corpus for Persuasive Dialogue System

no code implementations • LREC 2020 • Sara Asai, Koichiro Yoshino, Seitaro Shinagawa, Sakriani Sakti, Satoshi Nakamura

Expressing emotion is known as an efficient way to persuade one{'}s dialogue partner to accept one{'}s claim or proposal.

Paper
Add Code

Using Panoramic Videos for Multi-person Localization and Tracking in a 3D Panoramic Coordinate

1 code implementation • 24 Nov 2019 • Fan Yang, Feiran Li, Yang Wu, Sakriani Sakti, Satoshi Nakamura

3D panoramic multi-person localization and tracking are prominent in many applications, however, conventional methods using LiDAR equipment could be economically expensive and also computationally inefficient due to the processing of point cloud data.

Ranked #1 on Multi-Object Tracking on MOT15_3D (using extra training data)

Multi-Object Tracking

Paper
Code

Speech-to-speech Translation between Untranscribed Unknown Languages

no code implementations • 2 Oct 2019 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

Second, we train a sequence-to-sequence model that directly maps the source language speech to the target language's discrete representation.

Speech-to-Speech Translation Translation

Paper
Add Code

Make Skeleton-based Action Recognition Model Smaller, Faster and Better

3 code implementations • arXiv 2019 • Fan Yang, Sakriani Sakti, Yang Wu, Satoshi Nakamura

Although skeleton-based action recognition has achieved great success in recent years, most of the existing methods may suffer from a large model size and slow execution speed.

Ranked #1 on Hand Gesture Recognition on DHG-14

Action Recognition Hand Gesture Recognition +1

253

Paper
Code

Listening while Speaking and Visualizing: Improving ASR through Multimodal Chain

no code implementations • 3 Jun 2019 • Johanes Effendi, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

Previously, a machine speech chain, which is based on sequence-to-sequence deep learning, was proposed to mimic speech perception and production behavior.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Paper
Add Code

VQVAE Unsupervised Unit Discovery and Multi-scale Code2Spec Inverter for Zerospeech Challenge 2019

no code implementations • 27 May 2019 • Andros Tjandra, Berrak Sisman, Mingyang Zhang, Sakriani Sakti, Haizhou Li, Satoshi Nakamura

Our proposed approach significantly improved the intelligibility (in CER), the MOS, and discrimination ABX scores compared to the official ZeroSpeech 2019 baseline or even the topline.

Clustering

Paper
Add Code

The Zero Resource Speech Challenge 2019: TTS without T

no code implementations • 25 Apr 2019 • Ewan Dunbar, Robin Algayres, Julien Karadayi, Mathieu Bernard, Juan Benjumea, Xuan-Nga Cao, Lucie Miskic, Charlotte Dugrain, Lucas Ondel, Alan W. black, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux

We present the Zero Resource Speech Challenge 2019, which proposes to build a speech synthesizer without any text or phonetic labels: hence, TTS without T (text-to-speech without text).

Paper
Add Code

End-to-End Feedback Loss in Speech Chain Framework via Straight-Through Estimator

no code implementations • 31 Oct 2018 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

In our previous work, we applied a speech chain mechanism as a semi-supervised learning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Multi-scale Alignment and Contextual History for Attention Mechanism in Sequence-to-sequence Model

no code implementations • 22 Jul 2018 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

In this paper, we propose two ideas to improve sequence-to-sequence model performance by enhancing the attention module.

Sequence-To-Sequence Speech Recognition speech-recognition

Paper
Add Code

Unsupervised Counselor Dialogue Clustering for Positive Emotion Elicitation in Neural Dialogue System

no code implementations • WS 2018 • Nurul Lubis, Sakriani Sakti, Koichiro Yoshino, Satoshi Nakamura

Positive emotion elicitation seeks to improve user{'}s emotional state through dialogue system interaction, where a chat-based scenario is layered with an implicit goal to address user{'}s emotional needs.

Clustering Emotion Recognition +2

Paper
Add Code

Construction of English-French Multimodal Affective Conversational Corpus from TV Dramas

no code implementations • LREC 2018 • Sashi Novitasari, Quoc Truong Do, Sakriani Sakti, Dessi Lestari, Satoshi Nakamura

Emotion Recognition Speech Recognition +1

Paper
Add Code

Dialogue Scenario Collection of Persuasive Dialogue with Emotional Expressions via Crowdsourcing

no code implementations • LREC 2018 • Koichiro Yoshino, Yoko Ishikawa, Masahiro Mizukami, Yu Suzuki, Sakriani Sakti, Satoshi Nakamura

Dialogue Management

Paper
Add Code

Machine Speech Chain with One-shot Speaker Adaptation

no code implementations • 28 Mar 2018 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

In the speech chain loop mechanism, ASR also benefits from the ability to further learn an arbitrary speaker's characteristics from the generated speech waveform, resulting in a significant improvement in the recognition rate.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Tensor Decomposition for Compressing Recurrent Neural Network

1 code implementation • 28 Feb 2018 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

In the machine learning fields, Recurrent Neural Network (RNN) has become a popular architecture for sequential data modeling.

Tensor Decomposition

Paper
Code

Interactive Image Manipulation with Natural Language Instruction Commands

no code implementations • 23 Feb 2018 • Seitaro Shinagawa, Koichiro Yoshino, Sakriani Sakti, Yu Suzuki, Satoshi Nakamura

We propose an interactive image-manipulation system with natural language instruction, which can generate a target image from a source image and an instruction that describes the difference between the source and the target image.

Image Generation Image Manipulation

Paper
Add Code

Structured-based Curriculum Learning for End-to-end English-Japanese Speech Translation

no code implementations • 13 Feb 2018 • Takatomo Kano, Sakriani Sakti, Satoshi Nakamura

Sequence-to-sequence attentional-based neural network architectures have been shown to provide a powerful model for machine translation and speech recognition.

Machine Translation speech-recognition +2

Paper
Add Code

Sequence-to-Sequence ASR Optimization via Reinforcement Learning

no code implementations • 30 Oct 2017 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

Despite the success of sequence-to-sequence approaches in automatic speech recognition (ASR) systems, the models still suffer from several problems, mainly due to the mismatch between the training and inference conditions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Attention-based Wav2Text with Feature Transfer Learning

no code implementations • 22 Sep 2017 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

In this paper, we construct the first end-to-end attention-based encoder-decoder model to process directly from raw speech waveform to the text transcription.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Listening while Speaking: Speech Chain by Deep Learning

no code implementations • 16 Jul 2017 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

In this paper, we take a step further and develop a closed-loop speech chain model based on deep learning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Gated Recurrent Neural Tensor Network

no code implementations • 7 Jun 2017 • Andros Tjandra, Sakriani Sakti, Ruli Manurung, Mirna Adriani, Satoshi Nakamura

Our proposed RNNs, which are called a Long-Short Term Memory Recurrent Neural Tensor Network (LSTMRNTN) and Gated Recurrent Unit Recurrent Neural Tensor Network (GRURNTN), are made by combining the LSTM and GRU RNN models with the tensor product.

Language Modelling

Paper
Add Code

Local Monotonic Attention Mechanism for End-to-End Speech and Language Processing

no code implementations • IJCNLP 2017 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

In this paper, we propose a novel attention mechanism that has local and monotonic properties.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Compressing Recurrent Neural Network with Tensor Train

no code implementations • 23 May 2017 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

Recurrent Neural Network (RNN) are a popular choice for modeling temporal and sequential tasks and achieve many state-of-the-art performance on various complex problems.

Paper
Add Code

Construction of Japanese Audio-Visual Emotion Database and Its Application in Emotion Recognition

no code implementations • LREC 2016 • Nurul Lubis, R Gomez, y, Sakriani Sakti, Keisuke Nakamura, Koichiro Yoshino, Satoshi Nakamura, Kazuhiro Nakadai

Emotional aspects play a vital role in making human communication a rich and dynamic experience.

Emotion Recognition

Paper
Add Code

An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering

no code implementations • WS 2015 • Kyoshiro Sugiyama, Masahiro Mizukami, Graham Neubig, Koichiro Yoshino, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura

Cross-Lingual Question Answering Machine Translation +1

Paper
Add Code

Improving Pivot Translation by Remembering the Pivot

no code implementations • IJCNLP 2015 • Akiva Miura, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura

Language Modelling Machine Translation +1

Paper
Add Code

Syntax-based Simultaneous Translation through Prediction of Unseen Syntactic Constituents

no code implementations • IJCNLP 2015 • Yusuke Oda, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura

Boundary Detection Machine Translation +1

Paper
Add Code

Ckylark: A More Robust PCFG-LA Parser

1 code implementation • NAACL 2015 • Yusuke Oda, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura

Machine Translation Natural Language Inference

Paper
Code

Semantic Parsing of Ambiguous Input through Paraphrasing and Verification

no code implementations • TACL 2015 • Philip Arthur, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura

We propose a new method for semantic parsing of ambiguous and ungrammatical input, such as search queries.

Language Modelling Semantic Parsing +1

Paper
Add Code

Rule-based Syntactic Preprocessing for Syntax-based Machine Translation

no code implementations • WS 2014 • Yuto Hatakoshi, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura

Machine Translation Translation

Paper
Add Code

Reinforcement Learning of Cooperative Persuasive Dialogue Policies using Framing

no code implementations • COLING 2014 • Takuya Hiraoka, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Discriminative Language Models as a Tool for Machine Translation Error Analysis

1 code implementation • COLING 2014 • Koichi Akabe, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura

Machine Translation Translation

Paper
Code

Linguistic and Acoustic Features for Automatic Identification of Autism Spectrum Disorders in Children's Narrative

no code implementations • WS 2014 • Hiroki Tanaka, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura

Paper
Add Code

Optimizing Segmentation Strategies for Simultaneous Speech Translation

no code implementations • ACL 2014 • Yusuke Oda, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura

Machine Translation Speech Recognition +1

Paper
Add Code

Towards Multilingual Conversations in the Medical Domain: Development of Multilingual Medical Data and A Network-based ASR System

no code implementations • LREC 2014 • Sakriani Sakti, Keigo Kubo, Sho Matsumiya, Graham Neubig, Tomoki Toda, Satoshi Nakamura, Fumihiro Adachi, Ryosuke Isotani

This paper outlines the recent development on multilingual medical data and multilingual speech recognition system for network-based speech-to-speech translation in the medical domain.

Machine Translation speech-recognition +5