no code implementations • 21 Dec 2024 • Liyan Chen, Gregory P. Meyer, Zaiwei Zhang, Eric M. Wolff, Paul Vernaza
Recent efforts recognize the power of scale in 3D learning (e. g. PTv3) and attention mechanisms (e. g. FlashAttention).
no code implementations • 19 Dec 2024 • Yi Xu, Yuxin Hu, Zaiwei Zhang, Gregory P. Meyer, Siva Karthik Mustikovela, Siddhartha Srinivasa, Eric M. Wolff, Xin Huang
Human drivers rely on commonsense reasoning to navigate diverse and dynamic real-world scenarios.
no code implementations • 2 Oct 2024 • Yunhao Yang, Yuxin Hu, Mao Ye, Zaiwei Zhang, Zhichao Lu, Yi Xu, Ufuk Topcu, Ben Snyder
Multimodal foundation models offer promising advancements for enhancing driving perception systems, but their high computational and financial costs pose challenges.
no code implementations • 23 Sep 2024 • Mao Ye, Gregory P. Meyer, Zaiwei Zhang, Dennis Park, Siva Karthik Mustikovela, Yuning Chai, Eric M Wolff
We propose a simple and scalable data mining approach that leverages the knowledge contained within a large vision language model (VLM).
no code implementations • 29 Aug 2024 • Zaiwei Zhang, Gregory P. Meyer, Zhichao Lu, Ashish Shrivastava, Avinash Ravichandran, Eric M. Wolff
To our knowledge, this work is the first to utilize knowledge distillation with text supervision generated by an off-the-shelf VLM and apply it to vanilla randomly initialized vision encoders.
no code implementations • 4 Oct 2023 • Chenwei Wu, Li Erran Li, Stefano Ermon, Patrick Haffner, Rong Ge, Zaiwei Zhang
Compositionality is a common property in many modalities including natural languages and images, but the compositional generalization of multi-modal models is not well-understood.
1 code implementation • 4 Apr 2023 • Haitao Yang, Zaiwei Zhang, Xiangru Huang, Min Bai, Chen Song, Bo Sun, Li Erran Li, QiXing Huang
Bird's-Eye View (BEV) features are popular intermediate scene representations shared by the 3D backbone and the detector head in LiDAR-based object detectors.
no code implementations • CVPR 2023 • Zaiwei Zhang, Min Bai, Erran Li
The first task focuses on learning semantic information by sorting local groups of points in the scene into a globally consistent set of semantically meaningful clusters using contrastive learning.
1 code implementation • 24 Jun 2022 • Zhenpei Yang, Zaiwei Zhang, QiXing Huang
Reconstructing 3D objects is an important computer vision task that has wide application in AR/VR.
1 code implementation • CVPR 2022 • Zhenpei Yang, Zhile Ren, Miguel Angel Bautista, Zaiwei Zhang, Qi Shan, QiXing Huang
In this paper, we present FvOR, a learning-based object reconstruction method that predicts accurate 3D models given a few images with noisy input poses.
1 code implementation • ICCV 2021 • Haitao Yang, Zaiwei Zhang, Siming Yan, Haibin Huang, Chongyang Ma, Yi Zheng, Chandrajit Bajaj, QiXing Huang
This task is challenging because 3D scenes exhibit diverse patterns, ranging from continuous ones, such as object sizes and the relative poses between pairs of shapes, to discrete patterns, such as occurrence and co-occurrence of objects with symmetrical relationships.
1 code implementation • ICCV 2021 • QiXing Huang, Xiangru Huang, Bo Sun, Zaiwei Zhang, Junfeng Jiang, Chandrajit Bajaj
Our approach builds on an approximation of the as-rigid-as possible (or ARAP) deformation energy.
1 code implementation • ICCV 2021 • Zaiwei Zhang, Rohit Girdhar, Armand Joulin, Ishan Misra
Pretraining on large labeled datasets is a prerequisite to achieve good performance in many computer vision tasks like 2D object recognition, video classification etc.
2 code implementations • ECCV 2020 • Zaiwei Zhang, Bo Sun, Haitao Yang, Qi-Xing Huang
We show how to convert the predicted geometric primitives into object proposals by defining a distance function between an object and the geometric primitives.
Ranked #3 on 3D Object Detection on ARKitScenes
1 code implementation • 16 May 2019 • Zaiwei Zhang, Xiangru Huang, Qi-Xing Huang, Xiao Zhang, Yuan Li
We formulate this problem as joint learning of multiple copies of the same network architecture and enforce the network weights to be shared across these networks.
1 code implementation • CVPR 2019 • Zaiwei Zhang, Zhenxiao Liang, Lemeng Wu, Xiaowei Zhou, Qi-Xing Huang
Optimizing a network of maps among a collection of objects/domains (or map synchronization) is a central problem across computer vision and many other relevant fields.
no code implementations • 6 Aug 2018 • Zaiwei Zhang, Zhenpei Yang, Chongyang Ma, Linjie Luo, Alexander Huth, Etienne Vouga, Qi-Xing Huang
We show a principled way to train this model by combining discriminator losses for both a 3D object arrangement representation and a 2D image-based representation.