Search Results for author: Zhenhai Zhu

Found 5 papers, 4 papers with code

H-Transformer-1D: Fast One-Dimensional Hierarchical Attention for Sequences

2 code implementations ACL 2021 Zhenhai Zhu, Radu Soricut

We describe an efficient hierarchical method to compute attention in the Transformer architecture.

 Ranked #1 on Language Modelling on One Billion Word (Validation perplexity metric)

Inductive Bias Language Modelling

Multimodal Pretraining for Dense Video Captioning

1 code implementation Asian Chapter of the Association for Computational Linguistics 2020 Gabriel Huang, Bo Pang, Zhenhai Zhu, Clara Rivera, Radu Soricut

First, we construct and release a new dense video captioning dataset, Video Timeline Tags (ViTT), featuring a variety of instructional videos together with time-stamped annotations.

 Ranked #1 on Dense Video Captioning on YouCook2 (ROUGE-L metric, using extra training data)

Dense Video Captioning

A Case Study on Combining ASR and Visual Features for Generating Instructional Video Captions

no code implementations CONLL 2019 Jack Hessel, Bo Pang, Zhenhai Zhu, Radu Soricut

Instructional videos get high-traffic on video sharing platforms, and prior work suggests that providing time-stamped, subtask annotations (e. g., "heat the oil in the pan") improves user experiences.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Improved Image Captioning via Policy Gradient optimization of SPIDEr

2 code implementations ICCV 2017 Si-Qi Liu, Zhenhai Zhu, Ning Ye, Sergio Guadarrama, Kevin Murphy

Finally, we show that using our PG method we can optimize any of the metrics, including the proposed SPIDEr metric which results in image captions that are strongly preferred by human raters compared to captions generated by the same model but trained to optimize MLE or the COCO metrics.

Image Captioning

Cannot find the paper you are looking for? You can Submit a new open access paper.