no code implementations • IWSLT (ACL) 2022 • Ryo Fukuda, Yuka Ko, Yasumasa Kano, Kosuke Doi, Hirotaka Tokuyama, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura
This paper describes NAIST’s simultaneous speech translation systems developed for IWSLT 2022 Evaluation Campaign.
no code implementations • ACL (IWSLT) 2021 • Ryo Fukuda, Yui Oka, Yasumasa Kano, Yuki Yano, Yuka Ko, Hirotaka Tokuyama, Kosuke Doi, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura
This paper describes NAIST’s system for the English-to-Japanese Simultaneous Text-to-text Translation Task in IWSLT 2021 Evaluation Campaign.
no code implementations • IWSLT (EMNLP) 2018 • Johanes Effendi, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura
In this paper, we investigate and utilize neural paraphrasing to improve translation quality in neural MT (NMT), which has not yet been much explored.
no code implementations • IWSLT (EMNLP) 2018 • Kaho Osamura, Takatomo Kano, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura
In this paper, a neural sequence-to-sequence ASR is used as feature processing that is trained to produce word posterior features given spoken utterances.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 8 Jan 2023 • Heli Qi, Sashi Novitasari, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
This paper introduces SpeeChain, an open-source Pytorch-based toolkit designed to develop the machine speech chain for large-scale use.
1 code implementation • 19 Dec 2022 • Samuel Cahyawijaya, Holy Lovenia, Alham Fikri Aji, Genta Indra Winata, Bryan Wilie, Rahmad Mahendra, Christian Wibisono, Ade Romadhony, Karissa Vincentio, Fajri Koto, JENNIFER SANTOSO, David Moeljadi, Cahya Wirawan, Frederikus Hudi, Ivan Halim Parmonangan, Ika Alfina, Muhammad Satrio Wicaksono, Ilham Firdausi Putra, Samsul Rahmadani, Yulianti Oenang, Ali Akbar Septiandri, James Jaya, Kaustubh D. Dhole, Arie Ardiyanti Suryani, Rifki Afina Putri, Dan Su, Keith Stevens, Made Nindyatama Nityasya, Muhammad Farid Adilazuarda, Ryan Ignatius, Ryandito Diandaru, Tiezheng Yu, Vito Ghifari, Wenliang Dai, Yan Xu, Dyah Damapuspita, Cuk Tho, Ichwanul Muslim Karo Karo, Tirana Noor Fatyanosa, Ziwei Ji, Pascale Fung, Graham Neubig, Timothy Baldwin, Sebastian Ruder, Herry Sujaini, Sakriani Sakti, Ayu Purwarianti
We present NusaCrowd, a collaborative initiative to collect and unite existing resources for Indonesian languages, including opening access to previously non-public resources.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • IEEE Transactions on Multimedia 2020 • Fan Yang, Yang Wu, Zheng Wang, Xiang Li, Sakriani Sakti, Satoshi Nakamura
Therefore, previous works pre-train their models on rich-labeled photo retrieval data (i. e., source domain) and then fine-tune them on the limited-labeled sketch-to-photo retrieval data (i. e., target domain).
Ranked #1 on
Image Retrieval
on PKU-Reid
1 code implementation • 27 Aug 2022 • Fan Yang, Norimichi Ukita, Sakriani Sakti, Satoshi Nakamura
By using MOT, the spatiotemporal boundary of each actor is obtained and assigned to a unique actor identity.
no code implementations • 1 Jun 2022 • Holy Lovenia, Hiroki Tanaka, Sakriani Sakti, Ayu Purwarianti, Satoshi Nakamura
Research about brain activities involving spoken word production is considerably underdeveloped because of the undiscovered characteristics of speech artifacts, which contaminate electroencephalogram (EEG) signals and prevent the inspection of the underlying cognitive processes.
no code implementations • 14 May 2022 • Heli Qi, Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura
The existing paradigm of semi-supervised S2S ASR utilizes SpecAugment as data augmentation and requires a static teacher model to produce pseudo transcripts for untranscribed speech.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • 29 Mar 2022 • Rendi Chevi, Radityo Eko Prasojo, Alham Fikri Aji, Andros Tjandra, Sakriani Sakti
We present Nix-TTS, a lightweight TTS achieved via knowledge distillation to a high-quality yet large-sized, non-autoregressive, and end-to-end (vocoder-free) TTS teacher model.
no code implementations • 10 Nov 2020 • Katsuhito Sudoh, Takatomo Kano, Sashi Novitasari, Tomoya Yanagita, Sakriani Sakti, Satoshi Nakamura
This paper presents a newly developed, simultaneous neural speech-to-speech translation system and its evaluation.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+6
no code implementations • 4 Nov 2020 • Johanes Effendi, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
Previous research has proposed a machine speech chain to enable automatic speech recognition (ASR) and text-to-speech synthesis (TTS) to assist each other in semi-supervised learning and to avoid the need for a large amount of paired speech and text data.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+6
no code implementations • 4 Nov 2020 • Sashi Novitasari, Andros Tjandra, Tomoya Yanagita, Sakriani Sakti, Satoshi Nakamura
By contrast, humans can listen to what hey speak in real-time, and if there is a delay in hearing, they won't be able to continue speaking.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 4 Nov 2020 • Sashi Novitasari, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
One main reason is because the model needs to decide the incremental steps and learn the transcription that aligns with the current short speech segment.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • LREC 2020 • Sashi Novitasari, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
We then develop ASR and TTS of ethnic languages by utilizing Indonesian ASR and TTS in a cross-lingual machine speech chain framework with only text or only speech data removing the need for paired speech-text data of those ethnic languages.
no code implementations • 12 Oct 2020 • Ewan Dunbar, Julien Karadayi, Mathieu Bernard, Xuan-Nga Cao, Robin Algayres, Lucas Ondel, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux
We present the Zero Resource Speech Challenge 2020, which aims at learning speech representations from raw audio signals without any labels.
no code implementations • 7 Jul 2020 • Fan Yang, Xin Chang, Chenyu Dang, Ziqiang Zheng, Sakriani Sakti, Satoshi Nakamura, Yang Wu
We aim to improve the performance of Multiple Object Tracking and Segmentation (MOTS) by refinement.
Ranked #1 on
Multi-Object Tracking
on MOTS20
Multi-Object Tracking
Multi-Object Tracking and Segmentation
+1
no code implementations • 24 May 2020 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
In this paper, we report our submitted system for the ZeroSpeech 2020 challenge on Track 2019.
no code implementations • LREC 2020 • Sara Asai, Koichiro Yoshino, Seitaro Shinagawa, Sakriani Sakti, Satoshi Nakamura
Expressing emotion is known as an efficient way to persuade one{'}s dialogue partner to accept one{'}s claim or proposal.
1 code implementation • 24 Nov 2019 • Fan Yang, Feiran Li, Yang Wu, Sakriani Sakti, Satoshi Nakamura
3D panoramic multi-person localization and tracking are prominent in many applications, however, conventional methods using LiDAR equipment could be economically expensive and also computationally inefficient due to the processing of point cloud data.
Ranked #1 on
Multi-Object Tracking
on MOT15_3D
(using extra training data)
no code implementations • 2 Oct 2019 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
Second, we train a sequence-to-sequence model that directly maps the source language speech to the target language's discrete representation.
3 code implementations • arXiv 2019 • Fan Yang, Sakriani Sakti, Yang Wu, Satoshi Nakamura
Although skeleton-based action recognition has achieved great success in recent years, most of the existing methods may suffer from a large model size and slow execution speed.
Ranked #1 on
Hand Gesture Recognition
on DHG-14
no code implementations • 3 Jun 2019 • Johanes Effendi, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
Previously, a machine speech chain, which is based on sequence-to-sequence deep learning, was proposed to mimic speech perception and production behavior.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+6
no code implementations • 27 May 2019 • Andros Tjandra, Berrak Sisman, Mingyang Zhang, Sakriani Sakti, Haizhou Li, Satoshi Nakamura
Our proposed approach significantly improved the intelligibility (in CER), the MOS, and discrimination ABX scores compared to the official ZeroSpeech 2019 baseline or even the topline.
no code implementations • 25 Apr 2019 • Ewan Dunbar, Robin Algayres, Julien Karadayi, Mathieu Bernard, Juan Benjumea, Xuan-Nga Cao, Lucie Miskic, Charlotte Dugrain, Lucas Ondel, Alan W. black, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux
We present the Zero Resource Speech Challenge 2019, which proposes to build a speech synthesizer without any text or phonetic labels: hence, TTS without T (text-to-speech without text).
no code implementations • 31 Oct 2018 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
In our previous work, we applied a speech chain mechanism as a semi-supervised learning.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 22 Jul 2018 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
In this paper, we propose two ideas to improve sequence-to-sequence model performance by enhancing the attention module.
no code implementations • WS 2018 • Nurul Lubis, Sakriani Sakti, Koichiro Yoshino, Satoshi Nakamura
Positive emotion elicitation seeks to improve user{'}s emotional state through dialogue system interaction, where a chat-based scenario is layered with an implicit goal to address user{'}s emotional needs.
no code implementations • 28 Mar 2018 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
In the speech chain loop mechanism, ASR also benefits from the ability to further learn an arbitrary speaker's characteristics from the generated speech waveform, resulting in a significant improvement in the recognition rate.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
1 code implementation • 28 Feb 2018 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
In the machine learning fields, Recurrent Neural Network (RNN) has become a popular architecture for sequential data modeling.
no code implementations • 23 Feb 2018 • Seitaro Shinagawa, Koichiro Yoshino, Sakriani Sakti, Yu Suzuki, Satoshi Nakamura
We propose an interactive image-manipulation system with natural language instruction, which can generate a target image from a source image and an instruction that describes the difference between the source and the target image.
no code implementations • 13 Feb 2018 • Takatomo Kano, Sakriani Sakti, Satoshi Nakamura
Sequence-to-sequence attentional-based neural network architectures have been shown to provide a powerful model for machine translation and speech recognition.
no code implementations • 30 Oct 2017 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
Despite the success of sequence-to-sequence approaches in automatic speech recognition (ASR) systems, the models still suffer from several problems, mainly due to the mismatch between the training and inference conditions.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 22 Sep 2017 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
In this paper, we construct the first end-to-end attention-based encoder-decoder model to process directly from raw speech waveform to the text transcription.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 16 Jul 2017 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
In this paper, we take a step further and develop a closed-loop speech chain model based on deep learning.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 7 Jun 2017 • Andros Tjandra, Sakriani Sakti, Ruli Manurung, Mirna Adriani, Satoshi Nakamura
Our proposed RNNs, which are called a Long-Short Term Memory Recurrent Neural Tensor Network (LSTMRNTN) and Gated Recurrent Unit Recurrent Neural Tensor Network (GRURNTN), are made by combining the LSTM and GRU RNN models with the tensor product.
no code implementations • IJCNLP 2017 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
In this paper, we propose a novel attention mechanism that has local and monotonic properties.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 23 May 2017 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
Recurrent Neural Network (RNN) are a popular choice for modeling temporal and sequential tasks and achieve many state-of-the-art performance on various complex problems.
no code implementations • LREC 2016 • Nurul Lubis, R Gomez, y, Sakriani Sakti, Keisuke Nakamura, Koichiro Yoshino, Satoshi Nakamura, Kazuhiro Nakadai
Emotional aspects play a vital role in making human communication a rich and dynamic experience.
no code implementations • TACL 2015 • Philip Arthur, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura
We propose a new method for semantic parsing of ambiguous and ungrammatical input, such as search queries.
no code implementations • LREC 2014 • Hiroaki Shimizu, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura
This makes it possible to compare translation data with simultaneous interpretation data.
no code implementations • LREC 2014 • Sakriani Sakti, Keigo Kubo, Sho Matsumiya, Graham Neubig, Tomoki Toda, Satoshi Nakamura, Fumihiro Adachi, Ryosuke Isotani
This paper outlines the recent development on multilingual medical data and multilingual speech recognition system for network-based speech-to-speech translation in the medical domain.