Search Results for author: Alexander Sax

Found 14 papers, 9 papers with code

Unifying 2D and 3D Vision-Language Understanding

no code implementations13 Mar 2025 Ayush Jain, Alexander Swerdlow, Yuzhou Wang, Sergio Arnaud, Ada Martin, Alexander Sax, Franziska Meier, Katerina Fragkiadaki

With these innovations, our model achieves state-of-the-art performance across multiple 3D vision-language grounding tasks, demonstrating the potential of transferring advances from 2D vision-language learning to the data-constrained 3D domain.

LIFT-GS: Cross-Scene Render-Supervised Distillation for 3D Language Grounding

no code implementations27 Feb 2025 Ang Cao, Sergio Arnaud, Oleksandr Maksymets, Jianing Yang, Ayush Jain, Sriram Yenamandra, Ada Martin, Vincent-Pierre Berges, Paul McVay, Ruslan Partsey, Aravind Rajeswaran, Franziska Meier, Justin Johnson, Jeong Joon Park, Alexander Sax

Our approach to training 3D vision-language understanding models is to train a feedforward model that makes predictions in 3D, but never requires 3D labels and is supervised only in 2D, using 2D losses and differentiable rendering.

Decoder

Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass

1 code implementation CVPR 2025 Jianing Yang, Alexander Sax, Kevin J. Liang, Mikael Henaff, Hao Tang, Ang Cao, Joyce Chai, Franziska Meier, Matt Feiszli

Multi-view 3D reconstruction remains a core challenge in computer vision, particularly in applications requiring accurate and scalable representations across diverse perspectives.

3D Reconstruction Camera Pose Estimation +2

Robustness via Cross-Domain Ensembles

no code implementations ICCV 2021 Teresa Yeo, Oğuzhan Fatih Kar, Alexander Sax, Amir Zamir

We present a method for making neural network predictions robust to shifts from the training data distribution.

Prediction

Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks

2 code implementations ECCV 2020 Jeffrey O. Zhang, Alexander Sax, Amir Zamir, Leonidas Guibas, Jitendra Malik

When training a neural network for a desired task, one may prefer to adapt a pre-trained network rather than starting from randomly initialized weights.

Imitation Learning Incremental Learning +4

Learning to Navigate Using Mid-Level Visual Priors

1 code implementation23 Dec 2019 Alexander Sax, Jeffrey O. Zhang, Bradley Emi, Amir Zamir, Silvio Savarese, Leonidas Guibas, Jitendra Malik

How much does having visual priors about the world (e. g. the fact that the world is 3D) assist in learning to perform downstream motor tasks (e. g. navigating a complex environment)?

Navigate reinforcement-learning +3

Gibson Env: Real-World Perception for Embodied Agents

5 code implementations CVPR 2018 Fei Xia, Amir Zamir, Zhi-Yang He, Alexander Sax, Jitendra Malik, Silvio Savarese

Developing visual perception models for active agents and sensorimotor control are cumbersome to be done in the physical world, as existing algorithms are too slow to efficiently learn in real-time and robots are fragile and costly.

Domain Adaptation General Reinforcement Learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.