Search Results for author: Michael J. Tarr

Found 15 papers, 7 papers with code

Brain Mapping with Dense Features: Grounding Cortical Semantic Selectivity in Natural Images With Vision Transformers

no code implementations7 Oct 2024 Andrew F. Luo, Jacob Yeung, Rushikesh Zawar, Shaurya Dewan, Margaret M. Henderson, Leila Wehbe, Michael J. Tarr

To overcome the challenge presented by the co-occurrence of multiple categories in natural images, we introduce BrainSAIL (Semantic Attribution and Image Localization), a method for isolating specific neurally-activating visual concepts in images.

Denoising

VLM Agents Generate Their Own Memories: Distilling Experience into Embodied Programs of Thought

no code implementations20 Jun 2024 Gabriel Sarch, Lawrence Jang, Michael J. Tarr, William W. Cohen, Kenneth Marino, Katerina Fragkiadaki

In TEACh, combining fine-tuning and retrieval on ICAL examples outperforms raw human demonstrations and expert examples, achieving a 17. 5% increase in goal-condition success.

Action Anticipation Continual Learning +6

Reanimating Images using Neural Representations of Dynamic Stimuli

no code implementations4 Jun 2024 Jacob Yeung, Andrew F. Luo, Gabriel Sarch, Margaret M. Henderson, Deva Ramanan, Michael J. Tarr

Our approach leverages state-of-the-art video diffusion models to decouple static image representation from motion generation, enabling us to utilize fMRI brain activity for a deeper understanding of human responses to dynamic visual stimuli.

Motion Generation Optical Flow Estimation

HELPER-X: A Unified Instructable Embodied Agent to Tackle Four Interactive Vision-Language Domains with Memory-Augmented Language Models

no code implementations29 Apr 2024 Gabriel Sarch, Sahil Somani, Raghav Kapoor, Michael J. Tarr, Katerina Fragkiadaki

Recent research on instructable agents has used memory-augmented Large Language Models (LLMs) as task planners, a technique that retrieves language-program examples relevant to the input instruction and uses them as in-context examples in the LLM prompt to improve the performance of the LLM in inferring the correct action and task plans.

Instruction Following

Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models

no code implementations23 Oct 2023 Gabriel Sarch, Yue Wu, Michael J. Tarr, Katerina Fragkiadaki

Pre-trained and frozen large language models (LLMs) can effectively map simple scene rearrangement instructions to programs over a robot's visuomotor functions through appropriate few-shot example prompting.

Prompt Engineering Retrieval

BrainSCUBA: Fine-Grained Natural Language Captions of Visual Cortex Selectivity

no code implementations6 Oct 2023 Andrew F. Luo, Margaret M. Henderson, Michael J. Tarr, Leila Wehbe

Our results show that BrainSCUBA is a promising means for understanding functional preferences in the brain, and provides motivation for further hypothesis-driven investigation of visual cortex.

Image Generation Language Modeling +3

Thinking Like an Annotator: Generation of Dataset Labeling Instructions

no code implementations24 Jun 2023 Nadine Chang, Francesco Ferroni, Michael J. Tarr, Martial Hebert, Deva Ramanan

In Labeling Instruction Generation, we take a reasonably annotated dataset and: 1) generate a set of examples that are visually representative of each category in the dataset; 2) provide a text label that corresponds to each of the examples.

Language Modelling Retrieval

Quantifying the Roles of Visual, Linguistic, and Visual-Linguistic Complexity in Verb Acquisition

1 code implementation5 Apr 2023 Yuchen Zhou, Michael J. Tarr, Daniel Yurovsky

Based on these results, we conclude that verb acquisition is influenced by all three sources of complexity, but that the variability of visual structure poses the most significant challenge for verb learning.

TIDEE: Tidying Up Novel Rooms using Visuo-Semantic Commonsense Priors

1 code implementation21 Jul 2022 Gabriel Sarch, Zhaoyuan Fang, Adam W. Harley, Paul Schydlo, Michael J. Tarr, Saurabh Gupta, Katerina Fragkiadaki

We introduce TIDEE, an embodied agent that tidies up a disordered scene based on learned commonsense object placement and room arrangement priors.

Object

Learning Neural Acoustic Fields

1 code implementation4 Apr 2022 Andrew Luo, Yilun Du, Michael J. Tarr, Joshua B. Tenenbaum, Antonio Torralba, Chuang Gan

By modeling acoustic propagation in a scene as a linear time-invariant system, NAFs learn to continuously map all emitter and listener location pairs to a neural impulse response function that can then be applied to arbitrary sounds.

AlphaNet: Improving Long-Tail Classification By Combining Classifiers

1 code implementation17 Aug 2020 Nadine Chang, Jayanth Koushik, Aarti Singh, Martial Hebert, Yu-Xiong Wang, Michael J. Tarr

Methods in long-tail learning focus on improving performance for data-poor (rare) classes; however, performance for such classes remains much lower than performance for more data-rich (frequent) classes.

Classification Long-tail Learning +1

Learning Intermediate Features of Object Affordances with a Convolutional Neural Network

no code implementations20 Feb 2020 Aria Yuan Wang, Michael J. Tarr

Our ability to interact with the world around us relies on being able to infer what actions objects afford -- often referred to as affordances.

BOLD5000: A public fMRI dataset of 5000 images

3 code implementations5 Sep 2018 Nadine Chang, John A. Pyles, Abhinav Gupta, Michael J. Tarr, Elissa M. Aminoff

Vision science, particularly machine vision, has been revolutionized by introducing large-scale image datasets and statistical learning approaches.

Diversity Scene Understanding

Cannot find the paper you are looking for? You can Submit a new open access paper.