no code implementations • 20 Jun 2024 • Gabriel Sarch, Lawrence Jang, Michael J. Tarr, William W. Cohen, Kenneth Marino, Katerina Fragkiadaki
In TEACh, combining fine-tuning and retrieval on ICAL examples outperforms raw human demonstrations and expert examples, achieving a 17. 5% increase in goal-condition success.
no code implementations • 4 Jun 2024 • Jacob Yeung, Andrew F. Luo, Gabriel Sarch, Margaret M. Henderson, Deva Ramanan, Michael J. Tarr
Our approach leverages state-of-the-art video diffusion models to decouple static image representation from motion generation, enabling us to utilize fMRI brain activity for a deeper understanding of human responses to dynamic visual stimuli.
no code implementations • 29 Apr 2024 • Gabriel Sarch, Sahil Somani, Raghav Kapoor, Michael J. Tarr, Katerina Fragkiadaki
Recent research on instructable agents has used memory-augmented Large Language Models (LLMs) as task planners, a technique that retrieves language-program examples relevant to the input instruction and uses them as in-context examples in the LLM prompt to improve the performance of the LLM in inferring the correct action and task plans.
1 code implementation • CVPR 2024 • Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios, Adam W. Harley, Gabriel Sarch, Kriti Aggarwal, Vishrav Chaudhary, Katerina Fragkiadaki
The gap in performance between methods that consume posed images versus post-processed 3D point clouds has fueled the belief that 2D and 3D perception require distinct model architectures.
Ranked #1 on
3D Instance Segmentation
on ScanNet200
no code implementations • 23 Oct 2023 • Gabriel Sarch, Yue Wu, Michael J. Tarr, Katerina Fragkiadaki
Pre-trained and frozen large language models (LLMs) can effectively map simple scene rearrangement instructions to programs over a robot's visuomotor functions through appropriate few-shot example prompting.
no code implementations • 4 Sep 2023 • Gabriel Sarch, Hsiao-Yu Fish Tung, Aria Wang, Jacob Prince, Michael Tarr
Deep neural network representations align well with brain activity in the ventral visual stream.
1 code implementation • 21 Jul 2022 • Gabriel Sarch, Zhaoyuan Fang, Adam W. Harley, Paul Schydlo, Michael J. Tarr, Saurabh Gupta, Katerina Fragkiadaki
We introduce TIDEE, an embodied agent that tidies up a disordered scene based on learned commonsense object placement and room arrangement priors.
1 code implementation • 30 Nov 2020 • Zhaoyuan Fang, Ayush Jain, Gabriel Sarch, Adam W. Harley, Katerina Fragkiadaki
Experiments on both indoor and outdoor datasets show that (1) our method obtains high-quality 2D and 3D pseudo-labels from multi-view RGB-D data; (2) fine-tuning with these pseudo-labels improves the 2D detector significantly in the test environment; (3) training a 3D detector with our pseudo-labels outperforms a prior self-supervised method by a large margin; (4) given weak supervision, our method can generate better pseudo-labels for novel objects.