no code implementations • 25 May 2022 • Bowen Shi, Diane Brentari, Greg Shakhnarovich, Karen Livescu
Existing work on sign language translation--that is, translation from sign language videos into sentences in a written language--has focused mainly on (1) data collected in a controlled environment or (2) data in a specific domain, which limits the applicability to real-world settings.
no code implementations • 15 May 2022 • Bowen Shi, Abdelrahman Mohamed, Wei-Ning Hsu
This paper investigates self-supervised pre-training for audio-visual speaker representation learning where a visual stream showing the speaker's mouth area is used alongside speech as inputs.
no code implementations • ACL 2022 • Bowen Shi, Diane Brentari, Greg Shakhnarovich, Karen Livescu
This is an important task since significant content in sign language is often conveyed via fingerspelling, and to our knowledge the task has not been studied before.
1 code implementation • ICLR 2022 • Bowen Shi, Wei-Ning Hsu, Kushal Lakhotia, Abdelrahman Mohamed
The lip-reading WER is further reduced to 26. 9% when using all 433 hours of labeled data from LRS3 and combined with self-training.
Ranked #1 on
Lipreading
on LRS3-TED
(using extra training data)
1 code implementation • 5 Jan 2022 • Bowen Shi, Wei-Ning Hsu, Abdelrahman Mohamed
Audio-based automatic speech recognition (ASR) degrades significantly in noisy environments and is particularly vulnerable to interfering speech, as the model cannot determine which speaker to transcribe.
Ranked #1 on
Audio-Visual Speech Recognition
on LRS3-TED
Audio-Visual Speech Recognition
Automatic Speech Recognition
+3
no code implementations • 23 Nov 2021 • Han Li, Bowen Shi, Wenrui Dai, Yabo Chen, Botao Wang, Yu Sun, Min Guo, Chenlin Li, Junni Zou, Hongkai Xiong
Recent 2D-to-3D human pose estimation works tend to utilize the graph structure formed by the topology of the human skeleton.
no code implementations • 8 Jun 2021 • Bowen Shi, Xiaopeng Zhang, Haohang Xu, Wenrui Dai, Junni Zou, Hongkai Xiong, Qi Tian
This is achieved by first pretraining the network via the proposed pixel-to-prototype contrastive loss over multiple datasets regardless of their taxonomy labels, and followed by fine-tuning the pretrained model over specific dataset as usual.
1 code implementation • CVPR 2021 • Bowen Shi, Diane Brentari, Greg Shakhnarovich, Karen Livescu
We propose a benchmark and a suite of evaluation metrics, some of which reflect the effect of detection on the downstream fingerspelling recognition task.
no code implementations • 26 Aug 2020 • Bowen Shi, Isaac H. Kim
We derive a universal correction to the ground-state entanglement entropy, which is equal to the logarithm of the total quantum dimension of a set of superselection sectors localized on the domain wall.
Strongly Correlated Electrons High Energy Physics - Theory Quantum Physics
1 code implementation • 1 Jul 2020 • Bowen Shi, Shane Settle, Karen Livescu
We find that word error rate can be reduced by a large margin by pre-training the acoustic segment representation with AWEs, and additional (smaller) gains can be obtained by pre-training the word prediction layer with AGWEs.
1 code implementation • WS 2020 • Shubham Toshniwal, Haoyue Shi, Bowen Shi, Lingyu Gao, Karen Livescu, Kevin Gimpel
Many natural language processing (NLP) tasks involve reasoning with textual spans, including question answering, entity recognition, and coreference resolution.
no code implementations • 21 Feb 2020 • Bowen Shi, Ming Sun, Krishna C. Puvvada, Chieh-Chi Kao, Spyros Matsoukas, Chao Wang
We study few-shot acoustic event detection (AED) in this paper.
1 code implementation • 17 Jan 2020 • Yuhui Xu, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Bowen Shi, Qi Tian, Hongkai Xiong
However, these methods suffer the difficulty in optimizing network, so that the searched network is often unfriendly to hardware.
2 code implementations • ICCV 2019 • Bowen Shi, Aurora Martinez Del Rio, Jonathan Keane, Diane Brentari, Greg Shakhnarovich, Karen Livescu
In this paper we focus on recognition of fingerspelling sequences in American Sign Language (ASL) videos collected in the wild, mainly from YouTube and Deaf social media.
no code implementations • 1 Jul 2019 • Bowen Shi, Ming Sun, Chieh-Chi Kao, Viktor Rozgic, Spyros Matsoukas, Chao Wang
Acoustic Event Detection (AED), aiming at detecting categories of events based on audio signals, has found application in many intelligent systems.
no code implementations • NIPS Workshop CDNNRIA 2018 • Bowen Shi, Ming Sun, Chieh-Chi Kao, Viktor Rozgic, Spyros Matsoukas, Chao Wang
In this paper, we present a compression approach based on the combination of low-rank matrix factorization and quantization training, to reduce complexity for neural network based acoustic event detection (AED) models.
no code implementations • 29 Apr 2019 • Bowen Shi, Ming Sun, Chieh-Chi Kao, Viktor Rozgic, Spyros Matsoukas, Chao Wang
This paper presents our work of training acoustic event detection (AED) models using unlabeled dataset.
no code implementations • 24 Apr 2019 • Ankita Pasad, Bowen Shi, Herman Kamper, Karen Livescu
Recent work has shown that speech paired with images can be used to learn semantically meaningful speech representations even without any textual supervision.
no code implementations • 26 Oct 2018 • Bowen Shi, Aurora Martinez Del Rio, Jonathan Keane, Jonathan Michaux, Diane Brentari, Greg Shakhnarovich, Karen Livescu
As the first attempt at fingerspelling recognition in the wild, this work is intended to serve as a baseline for future work on sign language recognition in realistic conditions.
no code implementations • 9 Oct 2017 • Bowen Shi, Karen Livescu
We introduce a model for fingerspelling recognition that addresses these issues.