Search Results for author: Bowen Shi

Found 20 papers, 7 papers with code

Open-Domain Sign Language Translation Learned from Online Video

no code implementations25 May 2022 Bowen Shi, Diane Brentari, Greg Shakhnarovich, Karen Livescu

Existing work on sign language translation--that is, translation from sign language videos into sentences in a written language--has focused mainly on (1) data collected in a controlled environment or (2) data in a specific domain, which limits the applicability to real-world settings.

Sign Language Translation Translation

Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT

no code implementations15 May 2022 Bowen Shi, Abdelrahman Mohamed, Wei-Ning Hsu

This paper investigates self-supervised pre-training for audio-visual speaker representation learning where a visual stream showing the speaker's mouth area is used alongside speech as inputs.

Representation Learning Speaker Verification

Searching for fingerspelled content in American Sign Language

no code implementations ACL 2022 Bowen Shi, Diane Brentari, Greg Shakhnarovich, Karen Livescu

This is an important task since significant content in sign language is often conveyed via fingerspelling, and to our knowledge the task has not been studied before.


Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction

1 code implementation ICLR 2022 Bowen Shi, Wei-Ning Hsu, Kushal Lakhotia, Abdelrahman Mohamed

The lip-reading WER is further reduced to 26. 9% when using all 433 hours of labeled data from LRS3 and combined with self-training.

 Ranked #1 on Lipreading on LRS3-TED (using extra training data)

Automatic Speech Recognition Lipreading +2

Robust Self-Supervised Audio-Visual Speech Recognition

1 code implementation5 Jan 2022 Bowen Shi, Wei-Ning Hsu, Abdelrahman Mohamed

Audio-based automatic speech recognition (ASR) degrades significantly in noisy environments and is particularly vulnerable to interfering speech, as the model cannot determine which speaker to transcribe.

Audio-Visual Speech Recognition Automatic Speech Recognition +3

Hierarchical Graph Networks for 3D Human Pose Estimation

no code implementations23 Nov 2021 Han Li, Bowen Shi, Wenrui Dai, Yabo Chen, Botao Wang, Yu Sun, Min Guo, Chenlin Li, Junni Zou, Hongkai Xiong

Recent 2D-to-3D human pose estimation works tend to utilize the graph structure formed by the topology of the human skeleton.

3D Human Pose Estimation

Multi-dataset Pretraining: A Unified Model for Semantic Segmentation

no code implementations8 Jun 2021 Bowen Shi, Xiaopeng Zhang, Haohang Xu, Wenrui Dai, Junni Zou, Hongkai Xiong, Qi Tian

This is achieved by first pretraining the network via the proposed pixel-to-prototype contrastive loss over multiple datasets regardless of their taxonomy labels, and followed by fine-tuning the pretrained model over specific dataset as usual.

Semantic Segmentation

Fingerspelling Detection in American Sign Language

1 code implementation CVPR 2021 Bowen Shi, Diane Brentari, Greg Shakhnarovich, Karen Livescu

We propose a benchmark and a suite of evaluation metrics, some of which reflect the effect of detection on the downstream fingerspelling recognition task.

Pose Estimation

Domain wall topological entanglement entropy

no code implementations26 Aug 2020 Bowen Shi, Isaac H. Kim

We derive a universal correction to the ground-state entanglement entropy, which is equal to the logarithm of the total quantum dimension of a set of superselection sectors localized on the domain wall.

Strongly Correlated Electrons High Energy Physics - Theory Quantum Physics

Whole-Word Segmental Speech Recognition with Acoustic Word Embeddings

1 code implementation1 Jul 2020 Bowen Shi, Shane Settle, Karen Livescu

We find that word error rate can be reduced by a large margin by pre-training the acoustic segment representation with AWEs, and additional (smaller) gains can be obtained by pre-training the word prediction layer with AGWEs.

Speech Recognition Word Embeddings

A Cross-Task Analysis of Text Span Representations

1 code implementation WS 2020 Shubham Toshniwal, Haoyue Shi, Bowen Shi, Lingyu Gao, Karen Livescu, Kevin Gimpel

Many natural language processing (NLP) tasks involve reasoning with textual spans, including question answering, entity recognition, and coreference resolution.

Coreference Resolution Question Answering

Latency-Aware Differentiable Neural Architecture Search

1 code implementation17 Jan 2020 Yuhui Xu, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Bowen Shi, Qi Tian, Hongkai Xiong

However, these methods suffer the difficulty in optimizing network, so that the searched network is often unfriendly to hardware.

Neural Architecture Search

Fingerspelling recognition in the wild with iterative visual attention

2 code implementations ICCV 2019 Bowen Shi, Aurora Martinez Del Rio, Jonathan Keane, Diane Brentari, Greg Shakhnarovich, Karen Livescu

In this paper we focus on recognition of fingerspelling sequences in American Sign Language (ASL) videos collected in the wild, mainly from YouTube and Deaf social media.

Hand Detection Sign Language Recognition

Compression of Acoustic Event Detection Models With Quantized Distillation

no code implementations1 Jul 2019 Bowen Shi, Ming Sun, Chieh-Chi Kao, Viktor Rozgic, Spyros Matsoukas, Chao Wang

Acoustic Event Detection (AED), aiming at detecting categories of events based on audio signals, has found application in many intelligent systems.

Event Detection Knowledge Distillation +1

Compression of Acoustic Event Detection Models with Low-rank Matrix Factorization and Quantization Training

no code implementations NIPS Workshop CDNNRIA 2018 Bowen Shi, Ming Sun, Chieh-Chi Kao, Viktor Rozgic, Spyros Matsoukas, Chao Wang

In this paper, we present a compression approach based on the combination of low-rank matrix factorization and quantization training, to reduce complexity for neural network based acoustic event detection (AED) models.

Event Detection Quantization

On the Contributions of Visual and Textual Supervision in Low-Resource Semantic Speech Retrieval

no code implementations24 Apr 2019 Ankita Pasad, Bowen Shi, Herman Kamper, Karen Livescu

Recent work has shown that speech paired with images can be used to learn semantically meaningful speech representations even without any textual supervision.

Visual Grounding

American Sign Language fingerspelling recognition in the wild

no code implementations26 Oct 2018 Bowen Shi, Aurora Martinez Del Rio, Jonathan Keane, Jonathan Michaux, Diane Brentari, Greg Shakhnarovich, Karen Livescu

As the first attempt at fingerspelling recognition in the wild, this work is intended to serve as a baseline for future work on sign language recognition in realistic conditions.

Frame Sign Language Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.