no code implementations • 1 Aug 2024 • Benlin Liu, Yuhao Dong, Yiqin Wang, Yongming Rao, Yansong Tang, Wei-Chiu Ma, Ranjay Krishna
We introduce Coarse Correspondence, a simple, training-free, effective, and general-purpose visual prompting method to elicit 3D and temporal understanding in multimodal LLMs.
1 code implementation • 17 Jun 2024 • Jieyu Zhang, Weikai Huang, Zixian Ma, Oscar Michel, Dong He, Tanmay Gupta, Wei-Chiu Ma, Ali Farhadi, Aniruddha Kembhavi, Ranjay Krishna
As a result, when a developer wants to identify which models to use for their application, they are overwhelmed by the number of benchmarks and remain uncertain about which benchmark's results are most reflective of their specific use case.
no code implementations • 13 Jun 2024 • Duong H. Le, Tuan Pham, Aniruddha Kembhavi, Stephan Mandt, Wei-Chiu Ma, Jiasen Lu
We present Piva (Preserving Identity with Variational Score Distillation), a novel optimization-based method for editing images and 3D models based on diffusion models.
no code implementations • CVPR 2024 • Meng-Li Shih, Wei-Chiu Ma, Aleksander Holynski, Forrester Cole, Brian L. Curless, Janne Kontkanen
We propose ExtraNeRF, a novel method for extrapolating the range of views handled by a Neural Radiance Field (NeRF).
no code implementations • 27 May 2024 • Thao Nguyen, Matthew Wallingford, Sebastin Santy, Wei-Chiu Ma, Sewoong Oh, Ludwig Schmidt, Pang Wei Koh, Ranjay Krishna
By translating all multilingual image-text pairs from a raw web crawl to English and re-filtering them, we increase the prevalence of (translated) multilingual data in the resulting training set.
no code implementations • 18 Apr 2024 • Xingyu Fu, Yushi Hu, Bangzheng Li, Yu Feng, Haoyu Wang, Xudong Lin, Dan Roth, Noah A. Smith, Wei-Chiu Ma, Ranjay Krishna
We introduce Blink, a new benchmark for multimodal language models (LLMs) that focuses on core visual perception abilities not found in other evaluations.
no code implementations • 15 Apr 2024 • Hongchi Xia, Zhi-Hao Lin, Wei-Chiu Ma, Shenlong Wang
Creating high-quality and interactive virtual environments, such as games and simulators, often involves complex and costly manual modeling processes.
1 code implementation • NeurIPS 2023 • Tianhang Cheng, Wei-Chiu Ma, Kaiyu Guan, Antonio Torralba, Shenlong Wang
Our world is full of identical objects (\emphe. g., cans of coke, cars of same model).
no code implementations • CVPR 2024 • Hongchi Xia, Zhi-Hao Lin, Wei-Chiu Ma, Shenlong Wang
Creating high-quality and interactive virtual environments such as games and simulators often involves complex and costly manual modeling processes.
no code implementations • 11 Dec 2023 • Ava Pun, Gary Sun, Jingkang Wang, Yun Chen, Ze Yang, Sivabalan Manivasagam, Wei-Chiu Ma, Raquel Urtasun
Different outdoor illumination conditions drastically alter the appearance of urban scenes, and they can harm the performance of image-based robot perception systems if not seen during training.
no code implementations • 2 Nov 2023 • Yuwen Xiong, Wei-Chiu Ma, Jingkang Wang, Raquel Urtasun
We show that by aligning the representation of a sparse point cloud to that of a dense point cloud, we can densify the sparse point clouds as if they were captured by a real high-density LiDAR, drastically reducing the cost.
no code implementations • 2 Nov 2023 • Jingkang Wang, Sivabalan Manivasagam, Yun Chen, Ze Yang, Ioan Andrei Bârsan, Anqi Joyce Yang, Wei-Chiu Ma, Raquel Urtasun
To tackle these issues, we present CADSim, which combines part-aware object-class priors via a small set of CAD models with differentiable rendering to automatically reconstruct vehicle geometry, including articulated wheels, with high-quality appearance.
2 code implementations • CVPR 2023 • Ze Yang, Yun Chen, Jingkang Wang, Sivabalan Manivasagam, Wei-Chiu Ma, Anqi Joyce Yang, Raquel Urtasun
Previously recorded driving logs provide a rich resource to build these new scenarios from, but for closed loop evaluation, we need to modify the sensor data based on the new scene configuration and the SDV's decisions, as actors might be added or removed and the trajectories of existing actors and the SDV will differ from the original log.
1 code implementation • CVPR 2023 • Zitian Tang, Wenjie Ye, Wei-Chiu Ma, Hang Zhao
Inferring past human motion from RGB images is challenging due to the inherent uncertainty of the prediction problem.
1 code implementation • 3 Apr 2023 • Zhuoling Li, Chuanrui Zhang, Wei-Chiu Ma, Yipin Zhou, Linyan Huang, Haoqian Wang, SerNam Lim, Hengshuang Zhao
In recent years, transformer-based detectors have demonstrated remarkable performance in 2D visual perception tasks.
no code implementations • CVPR 2023 • Yuwen Xiong, Wei-Chiu Ma, Jingkang Wang, Raquel Urtasun
We show that by aligning the representation of a sparse point cloud to that of a dense point cloud, we can densify the sparse point clouds as if they were captured by a real high-density LiDAR, drastically reducing the cost.
no code implementations • CVPR 2022 • Wei-Chiu Ma, Anqi Joyce Yang, Shenlong Wang, Raquel Urtasun, Antonio Torralba
Similar to classic correspondences, VCs conform with epipolar geometry; unlike classic correspondences, VCs do not need to be co-visible across views.
1 code implementation • CVPR 2022 • Zhi-Hao Lin, Wei-Chiu Ma, Hao-Yu Hsu, Yu-Chiang Frank Wang, Shenlong Wang
We present Neural Mixtures of Planar Experts (NeurMiPs), a novel planar-based scene representation for modeling geometry and appearance.
4 code implementations • ICCV 2021 • Chen-Hsuan Lin, Wei-Chiu Ma, Antonio Torralba, Simon Lucey
In this paper, we propose Bundle-Adjusting Neural Radiance Fields (BARF) for training NeRF from imperfect (or even unknown) camera poses -- the joint problem of learning neural 3D representations and registering camera frames.
no code implementations • ECCV 2020 • Wei-Chiu Ma, Shenlong Wang, Jiayuan Gu, Sivabalan Manivasagam, Antonio Torralba, Raquel Urtasun
Specifically, at each iteration, the neural network takes the feedback as input and outputs an update on the current estimation.
no code implementations • 18 Jan 2021 • Shivam Duggal, ZiHao Wang, Wei-Chiu Ma, Sivabalan Manivasagam, Justin Liang, Shenlong Wang, Raquel Urtasun
Reconstructing high-quality 3D objects from sparse, partial observations from a single view is of crucial importance for various applications in computer vision, robotics, and graphics.
no code implementations • CVPR 2018 • Shenlong Wang, Simon Suo, Wei-Chiu Ma, Andrei Pokrovsky, Raquel Urtasun
Standard convolutional neural networks assume a grid structured input is available and exploit discrete convolutions as their fundamental building blocks.
Ranked #2 on Semantic Segmentation on S3DIS Area5 (Number of params metric)
no code implementations • CVPR 2021 • Ze Yang, Shenlong Wang, Sivabalan Manivasagam, Zeng Huang, Wei-Chiu Ma, Xinchen Yan, Ersin Yumer, Raquel Urtasun
Constructing and animating humans is an important component for building virtual worlds in a wide variety of applications such as virtual reality or robotics testing in simulation.
no code implementations • 16 Jan 2021 • Namdar Homayounfar, Justin Liang, Wei-Chiu Ma, Raquel Urtasun
Towards this goal, in this paper we propose a bottom up approach where given a single click for each object in a video, we obtain the segmentation masks of these objects in the full video.
no code implementations • CVPR 2018 • Namdar Homayounfar, Wei-Chiu Ma, Shrinidhi Kowshika Lakshmikanth, Raquel Urtasun
In this paper, we tackle the problem of online road network extraction from sparse 3D point clouds.
no code implementations • ICCV 2019 • Namdar Homayounfar, Wei-Chiu Ma, Justin Liang, Xinyu Wu, Jack Fan, Raquel Urtasun
One of the fundamental challenges to scale self-driving is being able to create accurate high definition maps (HD maps) with low cost.
no code implementations • CVPR 2019 • Justin Liang, Namdar Homayounfar, Wei-Chiu Ma, Shenlong Wang, Raquel Urtasun
Creating high definition maps that contain precise information of static elements of the scene is of utmost importance for enabling self driving cars to drive safely.
no code implementations • 16 Nov 2020 • Ze Yang, Siva Manivasagam, Ming Liang, Bin Yang, Wei-Chiu Ma, Raquel Urtasun
We then incorporate the reconstructed pedestrian assets bank in a realistic LiDAR simulation system by performing motion retargeting, and show that the simulated LiDAR data can be used to significantly reduce the amount of annotated real-world data required for visual perception tasks.
no code implementations • ECCV 2020 • Jiayuan Gu, Wei-Chiu Ma, Sivabalan Manivasagam, Wenyuan Zeng, ZiHao Wang, Yuwen Xiong, Hao Su, Raquel Urtasun
3D shape completion for real data is important but challenging, since partial point clouds acquired by real-world sensors are usually sparse, noisy and unaligned.
no code implementations • ECCV 2020 • Jerry Liu, Shenlong Wang, Wei-Chiu Ma, Meet Shah, Rui Hu, Pranaab Dhawan, Raquel Urtasun
We propose a very simple and efficient video compression framework that only focuses on modeling the conditional entropy between frames.
no code implementations • 30 Jul 2020 • Namdar Homayounfar, Yuwen Xiong, Justin Liang, Wei-Chiu Ma, Raquel Urtasun
Obtaining precise instance segmentation masks is of high importance in many modern applications such as robotic manipulation and autonomous driving.
no code implementations • CVPR 2020 • Sivabalan Manivasagam, Shenlong Wang, Kelvin Wong, Wenyuan Zeng, Mikita Sazanovich, Shuhan Tan, Bin Yang, Wei-Chiu Ma, Raquel Urtasun
We first utilize ray casting over the 3D scene and then use a deep neural network to produce deviations from the physics-based simulation, producing realistic LiDAR point clouds.
no code implementations • CVPR 2020 • Justin Liang, Namdar Homayounfar, Wei-Chiu Ma, Yuwen Xiong, Rui Hu, Raquel Urtasun
In this paper, we propose PolyTransform, a novel instance segmentation algorithm that produces precise, geometry-preserving masks by combining the strengths of prevailing segmentation approaches and modern polygon-based methods.
Ranked #1000000000 on Instance Segmentation on Cityscapes test (using extra training data)
1 code implementation • ICCV 2019 • Shivam Duggal, Shenlong Wang, Wei-Chiu Ma, Rui Hu, Raquel Urtasun
Our goal is to significantly speed up the runtime of current state-of-the-art stereo algorithms to enable real-time inference.
no code implementations • 8 Aug 2019 • Wei-Chiu Ma, Ignacio Tartavull, Ioan Andrei Bârsan, Shenlong Wang, Min Bai, Gellert Mattyus, Namdar Homayounfar, Shrinidhi Kowshika Lakshmikanth, Andrei Pokrovsky, Raquel Urtasun
In this paper we propose a novel semantic localization algorithm that exploits multiple sensors and has precision on the order of a few centimeters.
no code implementations • CVPR 2019 • Wei-Chiu Ma, Shenlong Wang, Rui Hu, Yuwen Xiong, Raquel Urtasun
In this paper we tackle the problem of scene flow estimation in the context of self-driving.
1 code implementation • ICCV 2019 • Hang Zhao, Chuang Gan, Wei-Chiu Ma, Antonio Torralba
Sounds originate from object motions and vibrations of surrounding air.
1 code implementation • CVPR 2018 • Hang Chu, Wei-Chiu Ma, Kaustav Kundu, Raquel Urtasun, Sanja Fidler
On the other hand, 3D convolution wastes a large amount of memory on mostly unoccupied 3D space, which consists of only the surface visible to the sensor.
no code implementations • ECCV 2018 • Wei-Chiu Ma, Hang Chu, Bolei Zhou, Raquel Urtasun, Antonio Torralba
At inference time, our model can be easily reduced to a single stream module that performs intrinsic decomposition on a single input image.
no code implementations • 23 Jun 2016 • Wei-Chiu Ma, Shenlong Wang, Marcus A. Brubaker, Sanja Fidler, Raquel Urtasun
In this paper we present a robust, efficient and affordable approach to self-localization which does not require neither GPS nor knowledge about the appearance of the world.
no code implementations • CVPR 2017 • Wei-Chiu Ma, De-An Huang, Namhoon Lee, Kris M. Kitani
We develop predictive models of pedestrian dynamics by encoding the coupled nature of multi-pedestrian interaction using game theory, and deep learning-based visual analysis to estimate person-specific behavior parameters.
no code implementations • CVPR 2015 • De-An Huang, Minghuang Ma, Wei-Chiu Ma, Kris M. Kitani
Furthermore, we develop a hierarchical extension to the DPP clustering algorithm and show that it can be used to discover appearance-based grasp taxonomies.