Search Results for author: Artemis Panagopoulou

Found 6 papers, 5 papers with code

X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning

1 code implementation • 30 Nov 2023 • Artemis Panagopoulou, Le Xue, Ning Yu, Junnan Li, Dongxu Li, Shafiq Joty, ran Xu, Silvio Savarese, Caiming Xiong, Juan Carlos Niebles

Vision-language pre-training and instruction tuning have demonstrated general-purpose capabilities in 2D visual reasoning tasks by aligning visual encoders with state-of-the-art large language models (LLMs).

Visual Reasoning

Paper
Code

I Spy a Metaphor: Large Language Models and Diffusion Models Co-Create Visual Metaphors

1 code implementation • 24 May 2023 • Tuhin Chakrabarty, Arkadiy Saakyan, Olivia Winn, Artemis Panagopoulou, Yue Yang, Marianna Apidianaki, Smaranda Muresan

We propose to solve the task through the collaboration between Large Language Models (LLMs) and Diffusion Models: Instruct GPT-3 (davinci-002) with Chain-of-Thought prompting generates text that represents a visual elaboration of the linguistic metaphor containing the implicit meaning and relevant objects, which is then used as input to the diffusion-based text-to-image models. Using a human-AI collaboration framework, where humans interact both with the LLM and the top-performing diffusion model, we create a high-quality dataset containing 6, 476 visual metaphors for 1, 540 linguistic metaphors and their associated visual elaborations.

Visual Entailment

Paper
Code

Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification

1 code implementation • CVPR 2023 • Yue Yang, Artemis Panagopoulou, Shenghao Zhou, Daniel Jin, Chris Callison-Burch, Mark Yatskar

Overall, LaBo demonstrates that inherently interpretable models can be widely applied at similar, or better, performance than black box approaches.

Image Classification Language Modelling

Paper
Code

Visualizing the Obvious: A Concreteness-based Ensemble Model for Noun Property Prediction

1 code implementation • 24 Oct 2022 • Yue Yang, Artemis Panagopoulou, Marianna Apidianaki, Mark Yatskar, Chris Callison-Burch

We propose to extract these properties from images and use them in an ensemble model, in order to complement the information that is extracted from language models.

Property Prediction

Paper
Code

Induce, Edit, Retrieve:Language Grounded Multimodal Schema for Instructional Video Retrieval

no code implementations • 17 Nov 2021 • Yue Yang, Joongwon Kim, Artemis Panagopoulou, Mark Yatskar, Chris Callison-Burch

Schemata are structured representations of complex tasks that can aid artificial intelligence by allowing models to break down complex tasks into intermediate steps.

Retrieval Video Retrieval

Paper
Add Code

Visual Goal-Step Inference using wikiHow

1 code implementation • EMNLP 2021 • Yue Yang, Artemis Panagopoulou, Qing Lyu, Li Zhang, Mark Yatskar, Chris Callison-Burch

Understanding what sequence of steps are needed to complete a goal can help artificial intelligence systems reason about human activities.

Ranked #1 on VGSI on wikiHow-image

Multimodal Reasoning VGSI

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.