no code implementations • 25 Mar 2024 • Jaywon Koo, Ziyan Yang, Paola Cascante-Bonilla, Baishakhi Ray, Vicente Ordonez
Visual Programming has emerged as an alternative to end-to-end black-box visual reasoning models.
no code implementations • 20 Mar 2024 • Ruozhen He, Paola Cascante-Bonilla, Ziyan Yang, Alexander C. Berg, Vicente Ordonez
We introduce SynGround, a novel framework that combines data-driven learning and knowledge transfer from various large-scale pretrained models to enhance the visual grounding capabilities of a pretrained vision-and-language model.
1 code implementation • 28 Feb 2024 • Zilin Xiao, Ming Gong, Paola Cascante-Bonilla, Xingyao Zhang, Jie Wu, Vicente Ordonez
We introduce AutoVER, an Autoregressive model for Visual Entity Recognition.
no code implementations • 7 Dec 2023 • Ruozhen He, Paola Cascante-Bonilla, Ziyan Yang, Alexander C. Berg, Vicente Ordonez
Vision-and-language models trained to match images with text can be combined with visual explanation methods to point to the locations of specific objects in an image.
1 code implementation • ICCV 2023 • Paola Cascante-Bonilla, Khaled Shehada, James Seale Smith, Sivan Doveh, Donghyun Kim, Rameswar Panda, Gül Varol, Aude Oliva, Vicente Ordonez, Rogerio Feris, Leonid Karlinsky
We contribute Synthetic Visual Concepts (SyViC) - a million-scale synthetic dataset and data generation codebase allowing to generate additional suitable data to improve VLC understanding and compositional reasoning of VL models.
Ranked #68 on Visual Reasoning on Winoground
1 code implementation • CVPR 2023 • James Seale Smith, Leonid Karlinsky, Vyshnavi Gutta, Paola Cascante-Bonilla, Donghyun Kim, Assaf Arbelle, Rameswar Panda, Rogerio Feris, Zsolt Kira
Our experiments show that we outperform the current SOTA method DualPrompt on established benchmarks by as much as 4. 5% in average final accuracy.
1 code implementation • 22 Nov 2022 • Paola Cascante-Bonilla, Leonid Karlinsky, James Seale Smith, Yanjun Qi, Vicente Ordonez
Generalized Zero-Shot Learning (GZSL) aims to train a classifier that can generalize to unseen classes, using a set of attributes as auxiliary information, and the visual features extracted from a pre-trained convolutional neural network.
1 code implementation • CVPR 2023 • James Seale Smith, Paola Cascante-Bonilla, Assaf Arbelle, Donghyun Kim, Rameswar Panda, David Cox, Diyi Yang, Zsolt Kira, Rogerio Feris, Leonid Karlinsky
This leads to reasoning mistakes, which need to be corrected as they occur by teaching VL models the missing SVLC skills; often this must be done using private data where the issue was found, which naturally leads to a data-free continual (no task-id) VL learning setting.
no code implementations • CVPR 2022 • Paola Cascante-Bonilla, Hui Wu, Letao Wang, Rogerio Feris, Vicente Ordonez
By exploiting 3D and physics simulation platforms, we provide a pipeline to generate synthetic data to expand and replace type-specific questions and answers without risking the exposure of sensitive or personal data that might be present in real images.
no code implementations • 16 Jun 2021 • Paola Cascante-Bonilla, Arshdeep Sekhon, Yanjun Qi, Vicente Ordonez
This paper proposes PatchMix, a data augmentation method that creates new samples by composing patches from pairs of images in a grid-like pattern.
1 code implementation • 16 Jan 2020 • Paola Cascante-Bonilla, Fuwen Tan, Yanjun Qi, Vicente Ordonez
Pseudo-labeling works by applying pseudo-labels to samples in the unlabeled set by using a model trained on the combination of the labeled samples and any previously pseudo-labeled samples, and iteratively repeating this process in a self-training cycle.
1 code implementation • NeurIPS 2019 • Fuwen Tan, Paola Cascante-Bonilla, Xiaoxiao Guo, Hui Wu, Song Feng, Vicente Ordonez
We show that using multiple rounds of natural language queries as input can be surprisingly effective to find arbitrarily specific images of complex scenes.
2 code implementations • 8 Aug 2019 • Paola Cascante-Bonilla, Kalpathy Sitaraman, Mengjia Luo, Vicente Ordonez
Film media is a rich form of artistic expression.
no code implementations • NAACL 2019 • Paola Cascante-Bonilla, Xuwang Yin, Vicente Ordonez, Song Feng
In this paper we introduce Chat-crowd, an interactive environment for visual layout composition via conversational interactions.