no code implementations • 26 Jan 2024 • Ragib Amin Nihal, Benjamin Yen, Katsutoshi Itoyama, Kazuhiro Nakadai
The demand for accurate object detection in aerial imagery has surged with the widespread use of drones and satellite technology.
no code implementations • 21 Sep 2023 • Atsuo Hiroe, Katsutoshi Itoyama, Kazuhiro Nakadai
Via the experiments with the CHiME-3 dataset, we verify that the four BFs have the same peak performance as the upper bound provided by the ideal MWF BF, whereas the optimal mask depends on the adopted BF and differs from the IRM.
no code implementations • 29 May 2023 • Yui Sudo, Kazuya Hata, Kazuhiro Nakadai
End-to-end automatic speech recognition (E2E-ASR) has the potential to improve performance, but a specific issue that needs to be addressed is the difficulty it has in handling enharmonic words: named entities (NEs) with the same pronunciation and part of speech that are spelled differently.
no code implementations • 15 Nov 2021 • Muhammad Shakeel, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai
We describe a novel metric-based learning approach that introduces a multimodal framework and uses deep audio and geophone encoders in siamese configuration to design an adaptable and lightweight supervised model.
no code implementations • 1 Apr 2021 • Shakeel Muhammad, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai
In the present study, we present an intelligent earthquake signal detector that provides added assistance to automate traditional disaster responses.
no code implementations • 7 Nov 2018 • Nelson Yalta, Shinji Watanabe, Takaaki Hori, Kazuhiro Nakadai, Tetsuya OGATA
By employing a convolutional neural network (CNN)-based multichannel end-to-end speech recognition system, this study attempts to overcome the presents difficulties in everyday environments.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
3 code implementations • 3 Jul 2018 • Nelson Yalta, Shinji Watanabe, Kazuhiro Nakadai, Tetsuya OGATA
However, applying DNNs for generating dance to a piece of music is nevertheless challenging, because of 1) DNNs need to generate large sequences while mapping the music input, 2) the DNN needs to constraint the motion beat to the music, and 3) DNNs require a considerable amount of hand-crafted data.
no code implementations • LREC 2016 • Nurul Lubis, R Gomez, y, Sakriani Sakti, Keisuke Nakamura, Koichiro Yoshino, Satoshi Nakamura, Kazuhiro Nakadai
Emotional aspects play a vital role in making human communication a rich and dynamic experience.