Search Results for author: Radu Soricut

Found 50 papers, 16 papers with code

CrossVQA: Scalably Generating Benchmarks for Systematically Testing VQA Generalization

no code implementations EMNLP 2021 Arjun Akula, Soravit Changpinyo, Boqing Gong, Piyush Sharma, Song-Chun Zhu, Radu Soricut

One challenge in evaluating visual question answering (VQA) models in the cross-dataset adaptation setting is that the distribution shifts are multi-modal, making it difficult to identify if it is the shifts in visual or language features that play a key role.

Answer Generation Question-Answer-Generation +2

Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting

no code implementations13 Dec 2022 Su Wang, Chitwan Saharia, Ceslee Montgomery, Jordi Pont-Tuset, Shai Noy, Stefano Pellegrini, Yasumasa Onoe, Sarah Laszlo, David J. Fleet, Radu Soricut, Jason Baldridge, Mohammad Norouzi, Peter Anderson, William Chan

Through extensive human evaluation on EditBench, we find that object-masking during training leads to across-the-board improvements in text-image alignment -- such that Imagen Editor is preferred over DALL-E 2 and Stable Diffusion -- and, as a cohort, these models are better at object-rendering than text-rendering, and handle material/color/size attributes better than count/shape attributes.

Image Inpainting text-guided-image-editing

Improving Robust Generalization by Direct PAC-Bayesian Bound Minimization

no code implementations22 Nov 2022 Zifan Wang, Nan Ding, Tomer Levinboim, Xi Chen, Radu Soricut

Recent research in robust optimization has shown an overfitting-like phenomenon in which models trained against adversarial attacks exhibit higher robustness on the training set compared to the test set.

Adversarial Robustness

PreSTU: Pre-Training for Scene-Text Understanding

no code implementations12 Sep 2022 Jihyung Kil, Soravit Changpinyo, Xi Chen, Hexiang Hu, Sebastian Goodman, Wei-Lun Chao, Radu Soricut

The ability to recognize and reason about text embedded in visual inputs is often lacking in vision-and-language (V&L) models, perhaps because V&L pre-training methods have often failed to include such an ability as their training objective.

Image Captioning Optical Character Recognition +2

Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset

no code implementations25 May 2022 Ashish V. Thapliyal, Jordi Pont-Tuset, Xi Chen, Radu Soricut

Research in massively multilingual image captioning has been severely hampered by a lack of high-quality evaluation datasets.

Image Captioning Model Selection +1

All You May Need for VQA are Image Captions

1 code implementation NAACL 2022 Soravit Changpinyo, Doron Kukliansky, Idan Szpektor, Xi Chen, Nan Ding, Radu Soricut

Visual Question Answering (VQA) has benefited from increasingly sophisticated models, but has not enjoyed the same level of engagement in terms of data creation.

Image Captioning Question Answering +3

End-to-end Dense Video Captioning as Sequence Generation

no code implementations COLING 2022 Wanrong Zhu, Bo Pang, Ashish V. Thapliyal, William Yang Wang, Radu Soricut

Dense video captioning aims to identify the events of interest in an input video, and generate descriptive captions for each event.

Ranked #2 on Dense Video Captioning on ViTT (CIDEr metric, using extra training data)

Dense Video Captioning

PACTran: PAC-Bayesian Metrics for Estimating the Transferability of Pretrained Models to Classification Tasks

1 code implementation10 Mar 2022 Nan Ding, Xi Chen, Tomer Levinboim, Beer Changpinyo, Radu Soricut

With the increasing abundance of pretrained models in recent years, the problem of selecting the best pretrained checkpoint for a particular downstream classification task has been gaining increased attention.

Learning Theory Model Selection +1

H-Transformer-1D: Fast One-Dimensional Hierarchical Attention for Sequences

2 code implementations ACL 2021 Zhenhai Zhu, Radu Soricut

We describe an efficient hierarchical method to compute attention in the Transformer architecture.

 Ranked #1 on Language Modelling on One Billion Word (Validation perplexity metric)

Inductive Bias Language Modelling

Bridging the Gap Between Practice and PAC-Bayes Theory in Few-Shot Meta-Learning

no code implementations NeurIPS 2021 Nan Ding, Xi Chen, Tomer Levinboim, Sebastian Goodman, Radu Soricut

Despite recent advances in its theoretical understanding, there still remains a significant gap in the ability of existing PAC-Bayesian theories on meta-learning to explain performance improvements in the few-shot learning setting, where the number of training examples in the target tasks is severely limited.

Few-Shot Learning

2.5D Visual Relationship Detection

1 code implementation26 Apr 2021 Yu-Chuan Su, Soravit Changpinyo, Xiangning Chen, Sathish Thoppay, Cho-Jui Hsieh, Lior Shapira, Radu Soricut, Hartwig Adam, Matthew Brown, Ming-Hsuan Yang, Boqing Gong

To enable progress on this task, we create a new dataset consisting of 220k human-annotated 2. 5D relationships among 512K objects from 11K images.

Benchmarking Depth Estimation +1

Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts

2 code implementations CVPR 2021 Soravit Changpinyo, Piyush Sharma, Nan Ding, Radu Soricut

The availability of large-scale image captioning and visual question answering datasets has contributed significantly to recent successes in vision-and-language pre-training.

Image Captioning Question Answering +1

Understanding Guided Image Captioning Performance across Domains

1 code implementation CoNLL (EMNLP) 2021 Edwin G. Ng, Bo Pang, Piyush Sharma, Radu Soricut

Image captioning models generally lack the capability to take into account user interest, and usually default to global descriptions that try to balance readability, informativeness, and information overload.

Image Captioning Informativeness +1

Multimodal Pretraining for Dense Video Captioning

1 code implementation Asian Chapter of the Association for Computational Linguistics 2020 Gabriel Huang, Bo Pang, Zhenhai Zhu, Clara Rivera, Radu Soricut

First, we construct and release a new dense video captioning dataset, Video Timeline Tags (ViTT), featuring a variety of instructional videos together with time-stamped annotations.

 Ranked #1 on Dense Video Captioning on YouCook2 (ROUGE-L metric, using extra training data)

Dense Video Captioning

Improving Text Generation Evaluation with Batch Centering and Tempered Word Mover Distance

no code implementations EMNLP (Eval4NLP) 2020 Xi Chen, Nan Ding, Tomer Levinboim, Radu Soricut

Recent advances in automatic evaluation metrics for text have shown that deep contextualized word representations, such as those generated by BERT encoders, are helpful for designing metrics that correlate well with human judgements.

Text Generation

TeaForN: Teacher-Forcing with N-grams

no code implementations EMNLP 2020 Sebastian Goodman, Nan Ding, Radu Soricut

Sequence generation models trained with teacher-forcing suffer from issues related to exposure bias and lack of differentiability across timesteps.

Machine Translation News Summarization +1

Attention that does not Explain Away

no code implementations29 Sep 2020 Nan Ding, Xinjie Fan, Zhenzhong Lan, Dale Schuurmans, Radu Soricut

Models based on the Transformer architecture have achieved better accuracy than the ones based on competing architectures for a large set of tasks.

Denoising Large-Scale Image Captioning from Alt-text Data using Content Selection Models

no code implementations COLING 2022 Khyathi Raghavi Chandu, Piyush Sharma, Soravit Changpinyo, Ashish Thapliyal, Radu Soricut

Training large-scale image captioning (IC) models demands access to a rich and diverse set of training examples, gathered from the wild, often from noisy alt-text data.

Denoising Image Captioning

Cross-modal Coherence Modeling for Caption Generation

no code implementations ACL 2020 Malihe Alikhani, Piyush Sharma, Shengjie Li, Radu Soricut, Matthew Stone

We use coherence relations inspired by computational models of discourse to study the information needs and goals of image captioning.

Image Captioning

Multi-Image Summarization: Textual Summary from a Set of Cohesive Images

no code implementations15 Jun 2020 Nicholas Trieu, Sebastian Goodman, Pradyumna Narayana, Kazoo Sone, Radu Soricut

Multi-sentence summarization is a well studied problem in NLP, while generating image descriptions for a single image is a well studied problem in Computer Vision.

Image Captioning Sentence Summarization

Clue: Cross-modal Coherence Modeling for Caption Generation

no code implementations2 May 2020 Malihe Alikhani, Piyush Sharma, Shengjie Li, Radu Soricut, Matthew Stone

We use coherence relations inspired by computational models of discourse to study the information needs and goals of image captioning.

Image Captioning

Cross-modal Language Generation using Pivot Stabilization for Web-scale Language Coverage

no code implementations ACL 2020 Ashish V. Thapliyal, Radu Soricut

Cross-modal language generation tasks such as image captioning are directly hurt in their ability to support non-English languages by the trend of data-hungry models combined with the lack of non-English annotations.

Image Captioning Text Generation +1

Reinforcing an Image Caption Generator Using Off-Line Human Feedback

no code implementations21 Nov 2019 Paul Hongsuck Seo, Piyush Sharma, Tomer Levinboim, Bohyung Han, Radu Soricut

Human ratings are currently the most accurate way to assess the quality of an image captioning model, yet most often the only used outcome of an expensive human rating evaluation is a few overall statistics over the evaluation dataset.

Image Captioning

A Case Study on Combining ASR and Visual Features for Generating Instructional Video Captions

no code implementations CONLL 2019 Jack Hessel, Bo Pang, Zhenhai Zhu, Radu Soricut

Instructional videos get high-traffic on video sharing platforms, and prior work suggests that providing time-stamped, subtask annotations (e. g., "heat the oil in the pan") improves user experiences.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Multi-stage Pretraining for Abstractive Summarization

no code implementations23 Sep 2019 Sebastian Goodman, Zhenzhong Lan, Radu Soricut

Neural models for abstractive summarization tend to achieve the best performance in the presence of highly specialized, summarization specific modeling add-ons such as pointer-generator, coverage-modeling, and inferencetime heuristics.

Abstractive Text Summarization

Quality Estimation for Image Captions Based on Large-scale Human Evaluations

1 code implementation NAACL 2021 Tomer Levinboim, Ashish V. Thapliyal, Piyush Sharma, Radu Soricut

Automatic image captioning has improved significantly over the last few years, but the problem is far from being solved, with state of the art models still often producing low quality captions when used in the wild.

Image Captioning Model Selection

Informative Image Captioning with External Sources of Information

no code implementations ACL 2019 Sanqiang Zhao, Piyush Sharma, Tomer Levinboim, Radu Soricut

An image caption should fluently present the essential information in a given image, including informative, fine-grained entity mentions and the manner in which these entities interact.

Image Captioning Informativeness

Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning

1 code implementation ACL 2018 Piyush Sharma, Nan Ding, Sebastian Goodman, Radu Soricut

We present a new dataset of image caption annotations, Conceptual Captions, which contains an order of magnitude more images than the MS-COCO dataset (Lin et al., 2014) and represents a wider variety of both images and image caption styles.

Image Captioning

SHAPED: Shared-Private Encoder-Decoder for Text Style Adaptation

no code implementations NAACL 2018 Ye Zhang, Nan Ding, Radu Soricut

Supervised training of abstractive language generation models results in learning conditional probabilities over language sequences based on the supervised training signal.

Text Generation

Cold-Start Reinforcement Learning with Softmax Policy Gradient

no code implementations NeurIPS 2017 Nan Ding, Radu Soricut

Policy-gradient approaches to reinforcement learning have two common and undesirable overhead procedures, namely warm-start training and sample variance reduction.

Image Captioning Policy Gradient Methods +2

Understanding Image and Text Simultaneously: a Dual Vision-Language Machine Comprehension Task

no code implementations22 Dec 2016 Nan Ding, Sebastian Goodman, Fei Sha, Radu Soricut

We introduce a new multi-modal task for computer systems, posed as a combined vision-language comprehension challenge: identifying the most suitable text describing a scene, given several similar options.

Image Captioning Multi-Task Learning +1

Multilingual Word Embeddings using Multigraphs

no code implementations14 Dec 2016 Radu Soricut, Nan Ding

We present a family of neural-network--inspired models for computing continuous word representations, specifically designed to exploit both monolingual and multilingual text.

Machine Translation Multilingual Word Embeddings +3

Building Large Machine Reading-Comprehension Datasets using Paragraph Vectors

1 code implementation13 Dec 2016 Radu Soricut, Nan Ding

We present a dual contribution to the task of machine reading-comprehension: a technique for creating large-sized machine-comprehension (MC) datasets using paragraph-vector models; and a novel, hybrid neural-network architecture that combines the representation power of recurrent neural networks with the discriminative power of fully-connected multi-layered networks.

Machine Reading Comprehension

Cannot find the paper you are looking for? You can Submit a new open access paper.