Search Results for author: Thomas Kollar

Found 14 papers, 9 papers with code

CenterSnap: Single-Shot Multi-Object 3D Shape Reconstruction and Categorical 6D Pose and Size Estimation

3 code implementations • 3 Mar 2022 • Muhammad Zubair Irshad, Thomas Kollar, Michael Laskey, Kevin Stone, Zsolt Kira

This paper studies the complex task of simultaneous multi-object 3D reconstruction, 6D pose and size estimation from a single-view RGB-D observation.

Ranked #1 on 6D Pose Estimation using RGBD on CAMERA25

3D Reconstruction 3D Shape Reconstruction +3

269

Paper
Code

ShAPO: Implicit Representations for Multi-Object Shape, Appearance, and Pose Optimization

2 code implementations • 27 Jul 2022 • Muhammad Zubair Irshad, Sergey Zakharov, Rares Ambrus, Thomas Kollar, Zsolt Kira, Adrien Gaidon

A novel disentangled shape and appearance database of priors is first learned to embed objects in their respective shape and appearance space.

3D Shape Reconstruction From A Single 2D Image 6D Pose Estimation +4

269

Paper
Code

NeO 360: Neural Fields for Sparse View Synthesis of Outdoor Scenes

1 code implementation • ICCV 2023 • Muhammad Zubair Irshad, Sergey Zakharov, Katherine Liu, Vitor Guizilini, Thomas Kollar, Adrien Gaidon, Zsolt Kira, Rares Ambrus

NeO 360's representation allows us to learn from a large collection of unbounded 3D scenes while offering generalizability to new views and novel scenes from as few as a single image during inference.

Ranked #1 on Generalizable Novel View Synthesis on NERDS 360

Generalizable Novel View Synthesis Novel View Synthesis

220

Paper
Code

Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models

2 code implementations • 12 Feb 2024 • Siddharth Karamcheti, Suraj Nair, Ashwin Balakrishna, Percy Liang, Thomas Kollar, Dorsa Sadigh

Visually-conditioned language models (VLMs) have seen growing adoption in applications such as visual dialogue, scene understanding, and robotic task planning; adoption that has fueled a wealth of new models such as LLaVa, InstructBLIP, and PaLI-3.

Hallucination Object Localization +3

202

Paper
Code

Language-Driven Representation Learning for Robotics

2 code implementations • 24 Feb 2023 • Siddharth Karamcheti, Suraj Nair, Annie S. Chen, Thomas Kollar, Chelsea Finn, Dorsa Sadigh, Percy Liang

First, we demonstrate that existing representations yield inconsistent results across these tasks: masked autoencoding approaches pick up on low-level spatial features at the cost of high-level semantics, while contrastive learning approaches capture the opposite.

Contrastive Learning Imitation Learning +2

173

Paper
Code

SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo

1 code implementation • 30 Jun 2021 • Thomas Kollar, Michael Laskey, Kevin Stone, Brijen Thananjeyan, Mark Tjersland

However, the RGB-D baseline only grasps 35% of the hard (e. g., transparent) objects, while SimNet grasps 95%, suggesting that SimNet can enable robust manipulation of unknown objects, including transparent objects, in unknown environments.

Keypoint Detection Object +5

Paper
Code

Language models scale reliably with over-training and on downstream tasks

1 code implementation • 13 Mar 2024 • Samir Yitzhak Gadre, Georgios Smyrnis, Vaishaal Shankar, Suchin Gururangan, Mitchell Wortsman, Rulin Shao, Jean Mercat, Alex Fang, Jeffrey Li, Sedrick Keh, Rui Xin, Marianna Nezhurina, Igor Vasiljevic, Jenia Jitsev, Alexandros G. Dimakis, Gabriel Ilharco, Shuran Song, Thomas Kollar, Yair Carmon, Achal Dave, Reinhard Heckel, Niklas Muennighoff, Ludwig Schmidt

We fit scaling laws that extrapolate in both the number of model parameters and the ratio of training tokens to parameters.

Language Modelling

Paper
Code

A Critical Evaluation of AI Feedback for Aligning Large Language Models

1 code implementation • 19 Feb 2024 • Archit Sharma, Sedrick Keh, Eric Mitchell, Chelsea Finn, Kushal Arora, Thomas Kollar

RLAIF first performs supervised fine-tuning (SFT) using demonstrations from a teacher model and then further fine-tunes the model with reinforcement learning (RL), using feedback from a critic model.

Instruction Following reinforcement-learning +1

Paper
Code

CARTO: Category and Joint Agnostic Reconstruction of ARTiculated Objects

1 code implementation • CVPR 2023 • Nick Heppert, Muhammad Zubair Irshad, Sergey Zakharov, Katherine Liu, Rares Andrei Ambrus, Jeannette Bohg, Abhinav Valada, Thomas Kollar

We present CARTO, a novel approach for reconstructing multiple articulated objects from a single stereo RGB observation.

Object

Paper
Code

Generalized Grounding Graphs: A Probabilistic Framework for Understanding Grounded Commands

no code implementations • 29 Nov 2017 • Thomas Kollar, Stefanie Tellex, Matthew Walter, Albert Huang, Abraham Bachrach, Sachi Hemachandra, Emma Brunskill, Ashis Banerjee, Deb Roy, Seth Teller, Nicholas Roy

Symbolic models capture linguistic structure but have not scaled successfully to handle the diverse language produced by untrained users.

Language Acquisition

Paper
Add Code

The Alexa Meaning Representation Language

no code implementations • NAACL 2018 • Thomas Kollar, Danielle Berry, Lauren Stuart, Karolina Owczarzak, Tagyoung Chung, Lambert Mathias, Michael Kayser, Bradford Snow, Spyros Matsoukas

This paper introduces a meaning representation for spoken language understanding.

Spoken Language Understanding

Paper
Add Code

Jointly Learning to Parse and Perceive: Connecting Natural Language to the Physical World

no code implementations • TACL 2013 • Jayant Krishnamurthy, Thomas Kollar

LSP learns physical representations for both categorical ({``}blue,{''} {``}mug{''}) and relational ({``}on{''}) language, and also learns to compose these representations to produce the referents of entire statements.

Language Acquisition Question Answering +1

Paper
Add Code

A Mobile Manipulation System for One-Shot Teaching of Complex Tasks in Homes

no code implementations • 30 Sep 2019 • Max Bajracharya, James Borders, Dan Helmick, Thomas Kollar, Michael Laskey, John Leichty, Jeremy Ma, Umashankar Nagarajan, Akiyoshi Ochiai, Josh Petersen, Krishna Shankar, Kevin Stone, Yutaka Takaoka

We describe a mobile manipulation hardware and software system capable of autonomously performing complex human-level tasks in real homes, after being taught the task with a single demonstration from a person in virtual reality.

Position

Paper
Add Code

SGTM 2.0: Autonomously Untangling Long Cables using Interactive Perception

no code implementations • 27 Sep 2022 • Kaushik Shivakumar, Vainavi Viswanath, Anrui Gu, Yahav Avigal, Justin Kerr, Jeffrey Ichnowski, Richard Cheng, Thomas Kollar, Ken Goldberg

Cables are commonplace in homes, hospitals, and industrial warehouses and are prone to tangling.

Uncertainty Quantification

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.