Search Results for author: Gabriel Sarch

Found 8 papers, 3 papers with code

VLM Agents Generate Their Own Memories: Distilling Experience into Embodied Programs of Thought

no code implementations20 Jun 2024 Gabriel Sarch, Lawrence Jang, Michael J. Tarr, William W. Cohen, Kenneth Marino, Katerina Fragkiadaki

In TEACh, combining fine-tuning and retrieval on ICAL examples outperforms raw human demonstrations and expert examples, achieving a 17. 5% increase in goal-condition success.

Action Anticipation Continual Learning +6

Reanimating Images using Neural Representations of Dynamic Stimuli

no code implementations4 Jun 2024 Jacob Yeung, Andrew F. Luo, Gabriel Sarch, Margaret M. Henderson, Deva Ramanan, Michael J. Tarr

Our approach leverages state-of-the-art video diffusion models to decouple static image representation from motion generation, enabling us to utilize fMRI brain activity for a deeper understanding of human responses to dynamic visual stimuli.

Motion Generation Optical Flow Estimation

HELPER-X: A Unified Instructable Embodied Agent to Tackle Four Interactive Vision-Language Domains with Memory-Augmented Language Models

no code implementations29 Apr 2024 Gabriel Sarch, Sahil Somani, Raghav Kapoor, Michael J. Tarr, Katerina Fragkiadaki

Recent research on instructable agents has used memory-augmented Large Language Models (LLMs) as task planners, a technique that retrieves language-program examples relevant to the input instruction and uses them as in-context examples in the LLM prompt to improve the performance of the LLM in inferring the correct action and task plans.

Instruction Following

ODIN: A Single Model for 2D and 3D Segmentation

1 code implementation CVPR 2024 Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios, Adam W. Harley, Gabriel Sarch, Kriti Aggarwal, Vishrav Chaudhary, Katerina Fragkiadaki

The gap in performance between methods that consume posed images versus post-processed 3D point clouds has fueled the belief that 2D and 3D perception require distinct model architectures.

3D Instance Segmentation 3D Semantic Segmentation +1

Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models

no code implementations23 Oct 2023 Gabriel Sarch, Yue Wu, Michael J. Tarr, Katerina Fragkiadaki

Pre-trained and frozen large language models (LLMs) can effectively map simple scene rearrangement instructions to programs over a robot's visuomotor functions through appropriate few-shot example prompting.

Prompt Engineering Retrieval

3D View Prediction Models of the Dorsal Visual Stream

no code implementations4 Sep 2023 Gabriel Sarch, Hsiao-Yu Fish Tung, Aria Wang, Jacob Prince, Michael Tarr

Deep neural network representations align well with brain activity in the ventral visual stream.

TIDEE: Tidying Up Novel Rooms using Visuo-Semantic Commonsense Priors

1 code implementation21 Jul 2022 Gabriel Sarch, Zhaoyuan Fang, Adam W. Harley, Paul Schydlo, Michael J. Tarr, Saurabh Gupta, Katerina Fragkiadaki

We introduce TIDEE, an embodied agent that tidies up a disordered scene based on learned commonsense object placement and room arrangement priors.

Object

Move to See Better: Self-Improving Embodied Object Detection

1 code implementation30 Nov 2020 Zhaoyuan Fang, Ayush Jain, Gabriel Sarch, Adam W. Harley, Katerina Fragkiadaki

Experiments on both indoor and outdoor datasets show that (1) our method obtains high-quality 2D and 3D pseudo-labels from multi-view RGB-D data; (2) fine-tuning with these pseudo-labels improves the 2D detector significantly in the test environment; (3) training a 3D detector with our pseudo-labels outperforms a prior self-supervised method by a large margin; (4) given weak supervision, our method can generate better pseudo-labels for novel objects.

Object object-detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.