Found 39 papers, 10 papers with code

COSMic: A Coherence-Aware Generation Metric for Image Descriptions

no code implementations11 Sep 2021 Mert İnan, Piyush Sharma, Baber Khalid, Radu Soricut, Matthew Stone, Malihe Alikhani

Developers of text generation models rely on automated evaluation metrics as a stand-in for slow and expensive manual evaluations.

Image Captioning Text Generation

H-Transformer-1D: Fast One-Dimensional Hierarchical Attention for Sequences

1 code implementation ACL 2021 Zhenhai Zhu, Radu Soricut

We describe an efficient hierarchical method to compute attention in the Transformer architecture.

 Ranked #1 on Language Modelling on One Billion Word (Validation perplexity metric)

Hierarchical structure Language Modelling

Bridging the Gap Between Practice and PAC-Bayes Theory in Few-Shot Meta-Learning

no code implementations28 May 2021 Nan Ding, Xi Chen, Tomer Levinboim, Sebastian Goodman, Radu Soricut

Despite recent advances in its theoretical understanding, there still remains a significant gap in the ability of existing PAC-Bayesian theories on meta-learning to explain performance improvements in the few-shot learning setting, where the number of training examples in the target tasks is severely limited.

Few-Shot Learning

2.5D Visual Relationship Detection

no code implementations26 Apr 2021 Yu-Chuan Su, Soravit Changpinyo, Xiangning Chen, Sathish Thoppay, Cho-Jui Hsieh, Lior Shapira, Radu Soricut, Hartwig Adam, Matthew Brown, Ming-Hsuan Yang, Boqing Gong

To enable progress on this task, we create a new dataset consisting of 220k human-annotated 2. 5D relationships among 512K objects from 11K images.

Depth Estimation Visual Relationship Detection

Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts

1 code implementation CVPR 2021 Soravit Changpinyo, Piyush Sharma, Nan Ding, Radu Soricut

The availability of large-scale image captioning and visual question answering datasets has contributed significantly to recent successes in vision-and-language pre-training.

Image Captioning Question Answering +1

Understanding Guided Image Captioning Performance across Domains

1 code implementation4 Dec 2020 Edwin G. Ng, Bo Pang, Piyush Sharma, Radu Soricut

Image captioning models generally lack the capability to take into account user interest, and usually default to global descriptions that try to balance readability, informativeness, and information overload.

Image Captioning Visual Question Answering

Multimodal Pretraining for Dense Video Captioning

1 code implementation Asian Chapter of the Association for Computational Linguistics 2020 Gabriel Huang, Bo Pang, Zhenhai Zhu, Clara Rivera, Radu Soricut

First, we construct and release a new dense video captioning dataset, Video Timeline Tags (ViTT), featuring a variety of instructional videos together with time-stamped annotations.

 Ranked #1 on Dense Video Captioning on YouCook2 (ROUGE-L metric, using extra training data)

Dense Video Captioning

Improving Text Generation Evaluation with Batch Centering and Tempered Word Mover Distance

no code implementations13 Oct 2020 Xi Chen, Nan Ding, Tomer Levinboim, Radu Soricut

Recent advances in automatic evaluation metrics for text have shown that deep contextualized word representations, such as those generated by BERT encoders, are helpful for designing metrics that correlate well with human judgements.

Text Generation

TeaForN: Teacher-Forcing with N-grams

no code implementations EMNLP 2020 Sebastian Goodman, Nan Ding, Radu Soricut

Sequence generation models trained with teacher-forcing suffer from issues related to exposure bias and lack of differentiability across timesteps.

Machine Translation Translation

Attention that does not Explain Away

no code implementations29 Sep 2020 Nan Ding, Xinjie Fan, Zhenzhong Lan, Dale Schuurmans, Radu Soricut

Models based on the Transformer architecture have achieved better accuracy than the ones based on competing architectures for a large set of tasks.

Denoising Large-Scale Image Captioning from Alt-text Data using Content Selection Models

no code implementations10 Sep 2020 Khyathi Raghavi Chandu, Piyush Sharma, Soravit Changpinyo, Ashish Thapliyal, Radu Soricut

Training large-scale image captioning (IC) models demands access to a rich and diverse set of training examples, gathered from the wild, often from noisy alt-text data.

Denoising Image Captioning

Cross-modal Coherence Modeling for Caption Generation

no code implementations ACL 2020 Malihe Alikhani, Piyush Sharma, Shengjie Li, Radu Soricut, Matthew Stone

We use coherence relations inspired by computational models of discourse to study the information needs and goals of image captioning.

Image Captioning

Multi-Image Summarization: Textual Summary from a Set of Cohesive Images

no code implementations15 Jun 2020 Nicholas Trieu, Sebastian Goodman, Pradyumna Narayana, Kazoo Sone, Radu Soricut

Multi-sentence summarization is a well studied problem in NLP, while generating image descriptions for a single image is a well studied problem in Computer Vision.

Image Captioning Sentence Summarization

Clue: Cross-modal Coherence Modeling for Caption Generation

no code implementations2 May 2020 Malihe Alikhani, Piyush Sharma, Shengjie Li, Radu Soricut, Matthew Stone

We use coherence relations inspired by computational models of discourse to study the information needs and goals of image captioning.

Image Captioning

Cross-modal Language Generation using Pivot Stabilization for Web-scale Language Coverage

no code implementations ACL 2020 Ashish V. Thapliyal, Radu Soricut

Cross-modal language generation tasks such as image captioning are directly hurt in their ability to support non-English languages by the trend of data-hungry models combined with the lack of non-English annotations.

Image Captioning Text Generation +1

Beyond Instructional Videos: Probing for More Diverse Visual-Textual Grounding on YouTube

1 code implementation EMNLP 2020 Jack Hessel, Zhenhai Zhu, Bo Pang, Radu Soricut

Pretraining from unlabelled web videos has quickly become the de-facto means of achieving high performance on many video understanding tasks.

automatic-speech-recognition Speech Recognition +1

Reinforcing an Image Caption Generator Using Off-Line Human Feedback

no code implementations21 Nov 2019 Paul Hongsuck Seo, Piyush Sharma, Tomer Levinboim, Bohyung Han, Radu Soricut

Human ratings are currently the most accurate way to assess the quality of an image captioning model, yet most often the only used outcome of an expensive human rating evaluation is a few overall statistics over the evaluation dataset.

Image Captioning

A Case Study on Combining ASR and Visual Features for Generating Instructional Video Captions

no code implementations CONLL 2019 Jack Hessel, Bo Pang, Zhenhai Zhu, Radu Soricut

Instructional videos get high-traffic on video sharing platforms, and prior work suggests that providing time-stamped, subtask annotations (e. g., "heat the oil in the pan") improves user experiences.

automatic-speech-recognition Speech Recognition

Multi-stage Pretraining for Abstractive Summarization

no code implementations23 Sep 2019 Sebastian Goodman, Zhenzhong Lan, Radu Soricut

Neural models for abstractive summarization tend to achieve the best performance in the presence of highly specialized, summarization specific modeling add-ons such as pointer-generator, coverage-modeling, and inferencetime heuristics.

Abstractive Text Summarization

Quality Estimation for Image Captions Based on Large-scale Human Evaluations

1 code implementation NAACL 2021 Tomer Levinboim, Ashish V. Thapliyal, Piyush Sharma, Radu Soricut

Automatic image captioning has improved significantly over the last few years, but the problem is far from being solved, with state of the art models still often producing low quality captions when used in the wild.

Image Captioning Model Selection

Informative Image Captioning with External Sources of Information

no code implementations ACL 2019 Sanqiang Zhao, Piyush Sharma, Tomer Levinboim, Radu Soricut

An image caption should fluently present the essential information in a given image, including informative, fine-grained entity mentions and the manner in which these entities interact.

Image Captioning

Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning

1 code implementation ACL 2018 Piyush Sharma, Nan Ding, Sebastian Goodman, Radu Soricut

We present a new dataset of image caption annotations, Conceptual Captions, which contains an order of magnitude more images than the MS-COCO dataset (Lin et al., 2014) and represents a wider variety of both images and image caption styles.

Image Captioning

SHAPED: Shared-Private Encoder-Decoder for Text Style Adaptation

no code implementations NAACL 2018 Ye Zhang, Nan Ding, Radu Soricut

Supervised training of abstractive language generation models results in learning conditional probabilities over language sequences based on the supervised training signal.

Text Generation

Cold-Start Reinforcement Learning with Softmax Policy Gradient

no code implementations NeurIPS 2017 Nan Ding, Radu Soricut

Policy-gradient approaches to reinforcement learning have two common and undesirable overhead procedures, namely warm-start training and sample variance reduction.

Image Captioning Policy Gradient Methods

Understanding Image and Text Simultaneously: a Dual Vision-Language Machine Comprehension Task

no code implementations22 Dec 2016 Nan Ding, Sebastian Goodman, Fei Sha, Radu Soricut

We introduce a new multi-modal task for computer systems, posed as a combined vision-language comprehension challenge: identifying the most suitable text describing a scene, given several similar options.

Image Captioning Multi-Task Learning +1

Multilingual Word Embeddings using Multigraphs

no code implementations14 Dec 2016 Radu Soricut, Nan Ding

We present a family of neural-network--inspired models for computing continuous word representations, specifically designed to exploit both monolingual and multilingual text.

Machine Translation Multilingual Word Embeddings +3

Building Large Machine Reading-Comprehension Datasets using Paragraph Vectors

1 code implementation13 Dec 2016 Radu Soricut, Nan Ding

We present a dual contribution to the task of machine reading-comprehension: a technique for creating large-sized machine-comprehension (MC) datasets using paragraph-vector models; and a novel, hybrid neural-network architecture that combines the representation power of recurrent neural networks with the discriminative power of fully-connected multi-layered networks.

Machine Reading Comprehension

