Search Results for author: Zhenhai Zhu

Found 5 papers, 4 papers with code

H-Transformer-1D: Fast One-Dimensional Hierarchical Attention for Sequences

2 code implementations • ACL 2021 • Zhenhai Zhu, Radu Soricut

We describe an efficient hierarchical method to compute attention in the Transformer architecture.

Ranked #1 on Language Modelling on One Billion Word (Validation perplexity metric)

Inductive Bias Language Modelling

153

Paper
Code

Multimodal Pretraining for Dense Video Captioning

1 code implementation • Asian Chapter of the Association for Computational Linguistics 2020 • Gabriel Huang, Bo Pang, Zhenhai Zhu, Clara Rivera, Radu Soricut

First, we construct and release a new dense video captioning dataset, Video Timeline Tags (ViTT), featuring a variety of instructional videos together with time-stamped annotations.

Ranked #1 on Dense Video Captioning on YouCook2 (ROUGE-L metric, using extra training data)

Dense Video Captioning

Paper
Code

Beyond Instructional Videos: Probing for More Diverse Visual-Textual Grounding on YouTube

1 code implementation • EMNLP 2020 • Jack Hessel, Zhenhai Zhu, Bo Pang, Radu Soricut

Pretraining from unlabelled web videos has quickly become the de-facto means of achieving high performance on many video understanding tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

A Case Study on Combining ASR and Visual Features for Generating Instructional Video Captions

no code implementations • CONLL 2019 • Jack Hessel, Bo Pang, Zhenhai Zhu, Radu Soricut

Instructional videos get high-traffic on video sharing platforms, and prior work suggests that providing time-stamped, subtask annotations (e. g., "heat the oil in the pan") improves user experiences.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Improved Image Captioning via Policy Gradient optimization of SPIDEr

2 code implementations • ICCV 2017 • Si-Qi Liu, Zhenhai Zhu, Ning Ye, Sergio Guadarrama, Kevin Murphy

Finally, we show that using our PG method we can optimize any of the metrics, including the proposed SPIDEr metric which results in image captions that are strongly preferred by human raters compared to captions generated by the same model but trained to optimize MLE or the COCO metrics.

Image Captioning

129

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.