no code implementations • EMNLP 2021 • Arjun Akula, Soravit Changpinyo, Boqing Gong, Piyush Sharma, Song-Chun Zhu, Radu Soricut
One challenge in evaluating visual question answering (VQA) models in the cross-dataset adaptation setting is that the distribution shifts are multi-modal, making it difficult to identify if it is the shifts in visual or language features that play a key role.
1 code implementation • 22 Feb 2023 • Paul Voigtlaender, Soravit Changpinyo, Jordi Pont-Tuset, Radu Soricut, Vittorio Ferrari
We propose Video Localized Narratives, a new form of multimodal video annotations connecting vision and language.
no code implementations • 13 Dec 2022 • Su Wang, Chitwan Saharia, Ceslee Montgomery, Jordi Pont-Tuset, Shai Noy, Stefano Pellegrini, Yasumasa Onoe, Sarah Laszlo, David J. Fleet, Radu Soricut, Jason Baldridge, Mohammad Norouzi, Peter Anderson, William Chan
Through extensive human evaluation on EditBench, we find that object-masking during training leads to across-the-board improvements in text-image alignment -- such that Imagen Editor is preferred over DALL-E 2 and Stable Diffusion -- and, as a cohort, these models are better at object-rendering than text-rendering, and handle material/color/size attributes better than count/shape attributes.
no code implementations • 22 Nov 2022 • Zifan Wang, Nan Ding, Tomer Levinboim, Xi Chen, Radu Soricut
Recent research in robust optimization has shown an overfitting-like phenomenon in which models trained against adversarial attacks exhibit higher robustness on the training set compared to the test set.
no code implementations • 14 Sep 2022 • Xi Chen, Xiao Wang, Soravit Changpinyo, AJ Piergiovanni, Piotr Padlewski, Daniel Salz, Sebastian Goodman, Adam Grycner, Basil Mustafa, Lucas Beyer, Alexander Kolesnikov, Joan Puigcerver, Nan Ding, Keran Rong, Hassan Akbari, Gaurav Mishra, Linting Xue, Ashish Thapliyal, James Bradbury, Weicheng Kuo, Mojtaba Seyedhosseini, Chao Jia, Burcu Karagol Ayan, Carlos Riquelme, Andreas Steiner, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut
PaLI generates text based on visual and textual inputs, and with this interface performs many vision, language, and multimodal tasks, in many languages.
Ranked #1 on
Image Captioning
on nocaps out-of-domain
1 code implementation • 12 Sep 2022 • Soravit Changpinyo, Linting Xue, Idan Szpektor, Ashish V. Thapliyal, Julien Amelot, Michal Yarom, Xi Chen, Radu Soricut
Visual Question Answering (VQA) has been primarily studied through the lens of the English language.
no code implementations • 12 Sep 2022 • Jihyung Kil, Soravit Changpinyo, Xi Chen, Hexiang Hu, Sebastian Goodman, Wei-Lun Chao, Radu Soricut
The ability to recognize and reason about text embedded in visual inputs is often lacking in vision-and-language (V&L) models, perhaps because V&L pre-training methods have often failed to include such an ability as their training objective.
no code implementations • 25 May 2022 • Ashish V. Thapliyal, Jordi Pont-Tuset, Xi Chen, Radu Soricut
Research in massively multilingual image captioning has been severely hampered by a lack of high-quality evaluation datasets.
1 code implementation • NAACL 2022 • Soravit Changpinyo, Doron Kukliansky, Idan Szpektor, Xi Chen, Nan Ding, Radu Soricut
Visual Question Answering (VQA) has benefited from increasingly sophisticated models, but has not enjoyed the same level of engagement in terms of data creation.
no code implementations • COLING 2022 • Wanrong Zhu, Bo Pang, Ashish V. Thapliyal, William Yang Wang, Radu Soricut
Dense video captioning aims to identify the events of interest in an input video, and generate descriptive captions for each event.
Ranked #2 on
Dense Video Captioning
on ViTT
(CIDEr metric, using extra
training data)
1 code implementation • 10 Mar 2022 • Nan Ding, Xi Chen, Tomer Levinboim, Beer Changpinyo, Radu Soricut
With the increasing abundance of pretrained models in recent years, the problem of selecting the best pretrained checkpoint for a particular downstream classification task has been gaining increased attention.
1 code implementation • Findings (EMNLP) 2021 • Mert İnan, Piyush Sharma, Baber Khalid, Radu Soricut, Matthew Stone, Malihe Alikhani
Developers of text generation models rely on automated evaluation metrics as a stand-in for slow and expensive manual evaluations.
2 code implementations • ACL 2021 • Zhenhai Zhu, Radu Soricut
We describe an efficient hierarchical method to compute attention in the Transformer architecture.
Ranked #1 on
Language Modelling
on One Billion Word
(Validation perplexity metric)
no code implementations • NeurIPS 2021 • Nan Ding, Xi Chen, Tomer Levinboim, Sebastian Goodman, Radu Soricut
Despite recent advances in its theoretical understanding, there still remains a significant gap in the ability of existing PAC-Bayesian theories on meta-learning to explain performance improvements in the few-shot learning setting, where the number of training examples in the target tasks is severely limited.
1 code implementation • 26 Apr 2021 • Yu-Chuan Su, Soravit Changpinyo, Xiangning Chen, Sathish Thoppay, Cho-Jui Hsieh, Lior Shapira, Radu Soricut, Hartwig Adam, Matthew Brown, Ming-Hsuan Yang, Boqing Gong
To enable progress on this task, we create a new dataset consisting of 220k human-annotated 2. 5D relationships among 512K objects from 11K images.
2 code implementations • CVPR 2021 • Soravit Changpinyo, Piyush Sharma, Nan Ding, Radu Soricut
The availability of large-scale image captioning and visual question answering datasets has contributed significantly to recent successes in vision-and-language pre-training.
Ranked #9 on
Image Captioning
on nocaps-val-out-domain
no code implementations • ICCV 2021 • Soravit Changpinyo, Jordi Pont-Tuset, Vittorio Ferrari, Radu Soricut
Most existing image retrieval systems use text queries as a way for the user to express what they are looking for.
1 code implementation • CoNLL (EMNLP) 2021 • Edwin G. Ng, Bo Pang, Piyush Sharma, Radu Soricut
Image captioning models generally lack the capability to take into account user interest, and usually default to global descriptions that try to balance readability, informativeness, and information overload.
1 code implementation • Asian Chapter of the Association for Computational Linguistics 2020 • Gabriel Huang, Bo Pang, Zhenhai Zhu, Clara Rivera, Radu Soricut
First, we construct and release a new dense video captioning dataset, Video Timeline Tags (ViTT), featuring a variety of instructional videos together with time-stamped annotations.
Ranked #1 on
Dense Video Captioning
on YouCook2
(ROUGE-L metric, using extra
training data)
no code implementations • EMNLP (Eval4NLP) 2020 • Xi Chen, Nan Ding, Tomer Levinboim, Radu Soricut
Recent advances in automatic evaluation metrics for text have shown that deep contextualized word representations, such as those generated by BERT encoders, are helpful for designing metrics that correlate well with human judgements.
no code implementations • EMNLP 2020 • Sebastian Goodman, Nan Ding, Radu Soricut
Sequence generation models trained with teacher-forcing suffer from issues related to exposure bias and lack of differentiability across timesteps.
no code implementations • 29 Sep 2020 • Nan Ding, Xinjie Fan, Zhenzhong Lan, Dale Schuurmans, Radu Soricut
Models based on the Transformer architecture have achieved better accuracy than the ones based on competing architectures for a large set of tasks.
no code implementations • COLING 2022 • Khyathi Raghavi Chandu, Piyush Sharma, Soravit Changpinyo, Ashish Thapliyal, Radu Soricut
Training large-scale image captioning (IC) models demands access to a rich and diverse set of training examples, gathered from the wild, often from noisy alt-text data.
no code implementations • ACL 2020 • Malihe Alikhani, Piyush Sharma, Shengjie Li, Radu Soricut, Matthew Stone
We use coherence relations inspired by computational models of discourse to study the information needs and goals of image captioning.
no code implementations • 15 Jun 2020 • Nicholas Trieu, Sebastian Goodman, Pradyumna Narayana, Kazoo Sone, Radu Soricut
Multi-sentence summarization is a well studied problem in NLP, while generating image descriptions for a single image is a well studied problem in Computer Vision.
no code implementations • 2 May 2020 • Malihe Alikhani, Piyush Sharma, Shengjie Li, Radu Soricut, Matthew Stone
We use coherence relations inspired by computational models of discourse to study the information needs and goals of image captioning.
no code implementations • ACL 2020 • Ashish V. Thapliyal, Radu Soricut
Cross-modal language generation tasks such as image captioning are directly hurt in their ability to support non-English languages by the trend of data-hungry models combined with the lack of non-English annotations.
1 code implementation • EMNLP 2020 • Jack Hessel, Zhenhai Zhu, Bo Pang, Radu Soricut
Pretraining from unlabelled web videos has quickly become the de-facto means of achieving high performance on many video understanding tasks.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • ECCV 2020 • Jordi Pont-Tuset, Jasper Uijlings, Soravit Changpinyo, Radu Soricut, Vittorio Ferrari
We ask annotators to describe an image with their voice while simultaneously hovering their mouse over the region they are describing.
Ranked #2 on
Image Captioning
on Localized Narratives
no code implementations • 21 Nov 2019 • Paul Hongsuck Seo, Piyush Sharma, Tomer Levinboim, Bohyung Han, Radu Soricut
Human ratings are currently the most accurate way to assess the quality of an image captioning model, yet most often the only used outcome of an expensive human rating evaluation is a few overall statistics over the evaluation dataset.
no code implementations • CONLL 2019 • Jack Hessel, Bo Pang, Zhenhai Zhu, Radu Soricut
Instructional videos get high-traffic on video sharing platforms, and prior work suggests that providing time-stamped, subtask annotations (e. g., "heat the oil in the pan") improves user experiences.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
43 code implementations • ICLR 2020 • Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut
Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks.
Ranked #1 on
Natural Language Inference
on QNLI
no code implementations • 23 Sep 2019 • Sebastian Goodman, Zhenzhong Lan, Radu Soricut
Neural models for abstractive summarization tend to achieve the best performance in the presence of highly specialized, summarization specific modeling add-ons such as pointer-generator, coverage-modeling, and inferencetime heuristics.
1 code implementation • NAACL 2021 • Tomer Levinboim, Ashish V. Thapliyal, Piyush Sharma, Radu Soricut
Automatic image captioning has improved significantly over the last few years, but the problem is far from being solved, with state of the art models still often producing low quality captions when used in the wild.
no code implementations • IJCNLP 2019 • Soravit Changpinyo, Bo Pang, Piyush Sharma, Radu Soricut
Object detection plays an important role in current solutions to vision and language tasks like image captioning and visual question answering.
Ranked #4 on
Visual Question Answering
on VizWiz 2018
no code implementations • ACL 2019 • Sanqiang Zhao, Piyush Sharma, Tomer Levinboim, Radu Soricut
An image caption should fluently present the essential information in a given image, including informative, fine-grained entity mentions and the manner in which these entities interact.
1 code implementation • ACL 2018 • Piyush Sharma, Nan Ding, Sebastian Goodman, Radu Soricut
We present a new dataset of image caption annotations, Conceptual Captions, which contains an order of magnitude more images than the MS-COCO dataset (Lin et al., 2014) and represents a wider variety of both images and image caption styles.
no code implementations • NAACL 2018 • Ye Zhang, Nan Ding, Radu Soricut
Supervised training of abstractive language generation models results in learning conditional probabilities over language sequences based on the supervised training signal.
no code implementations • NeurIPS 2017 • Nan Ding, Radu Soricut
Policy-gradient approaches to reinforcement learning have two common and undesirable overhead procedures, namely warm-start training and sample variance reduction.
no code implementations • 22 Dec 2016 • Nan Ding, Sebastian Goodman, Fei Sha, Radu Soricut
We introduce a new multi-modal task for computer systems, posed as a combined vision-language comprehension challenge: identifying the most suitable text describing a scene, given several similar options.
no code implementations • 14 Dec 2016 • Radu Soricut, Nan Ding
We present a family of neural-network--inspired models for computing continuous word representations, specifically designed to exploit both monolingual and multilingual text.
1 code implementation • 13 Dec 2016 • Radu Soricut, Nan Ding
We present a dual contribution to the task of machine reading-comprehension: a technique for creating large-sized machine-comprehension (MC) datasets using paragraph-vector models; and a novel, hybrid neural-network architecture that combines the representation power of recurrent neural networks with the discriminative power of fully-connected multi-layered networks.
no code implementations • TACL 2016 • Manaal Faruqui, Ryan Mcdonald, Radu Soricut
Morpho-syntactic lexicons provide information about the morphological and syntactic roles of words in a language.
no code implementations • WS 2014 • Ondrej Bojar, Christian Buck, Christian Federmann, Barry Haddow, Philipp Koehn, Johannes Leveling, Christof Monz, Pavel Pecina, Matt Post, Herve Saint-Amand, Radu Soricut, Lucia Specia, Aleš Tamchyna