Search Results for author: YuHan Shen

Found 3 papers, 2 papers with code

Exploring the Role of Audio in Video Captioning

no code implementations21 Jun 2023 YuHan Shen, Linjie Yang, Longyin Wen, Haichao Yu, Ehsan Elhamifar, Heng Wang

Recent focus in video captioning has been on designing architectures that can consume both video and text modalities, and using large-scale video datasets with text transcripts for pre-training, such as HowTo100M.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Semi-Weakly-Supervised Learning of Complex Actions From Instructional Task Videos

1 code implementation CVPR 2022 YuHan Shen, Ehsan Elhamifar

To compute the SRE loss, we develop a flexible transcript prediction (FTP) method that uses the output of the action classifier to find both the length of the transcript and the sequence of actions occurring in an unlabeled video.

Action Segmentation Weakly-supervised Learning

Learning To Segment Actions From Visual and Language Instructions via Differentiable Weak Sequence Alignment

1 code implementation CVPR 2021 YuHan Shen, Lu Wang, Ehsan Elhamifar

We address the problem of unsupervised localization of key-steps and feature learning in instructional videos using both visual and language instructions.

Cannot find the paper you are looking for? You can Submit a new open access paper.