Search Results for author: Wei-Chiu Ma

Found 42 papers, 10 papers with code

Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model

no code implementations1 Aug 2024 Benlin Liu, Yuhao Dong, Yiqin Wang, Yongming Rao, Yansong Tang, Wei-Chiu Ma, Ranjay Krishna

We introduce Coarse Correspondence, a simple, training-free, effective, and general-purpose visual prompting method to elicit 3D and temporal understanding in multimodal LLMs.

Language Modelling Visual Prompting

Task Me Anything

1 code implementation17 Jun 2024 Jieyu Zhang, Weikai Huang, Zixian Ma, Oscar Michel, Dong He, Tanmay Gupta, Wei-Chiu Ma, Ali Farhadi, Aniruddha Kembhavi, Ranjay Krishna

As a result, when a developer wants to identify which models to use for their application, they are overwhelmed by the number of benchmarks and remain uncertain about which benchmark's results are most reflective of their specific use case.

2k Attribute +3

Preserving Identity with Variational Score for General-purpose 3D Editing

no code implementations13 Jun 2024 Duong H. Le, Tuan Pham, Aniruddha Kembhavi, Stephan Mandt, Wei-Chiu Ma, Jiasen Lu

We present Piva (Preserving Identity with Variational Score Distillation), a novel optimization-based method for editing images and 3D models based on diffusion models.

Denoising

Multilingual Diversity Improves Vision-Language Representations

no code implementations27 May 2024 Thao Nguyen, Matthew Wallingford, Sebastin Santy, Wei-Chiu Ma, Sewoong Oh, Ludwig Schmidt, Pang Wei Koh, Ranjay Krishna

By translating all multilingual image-text pairs from a raw web crawl to English and re-filtering them, we increase the prevalence of (translated) multilingual data in the resulting training set.

Diversity Text Retrieval

BLINK: Multimodal Large Language Models Can See but Not Perceive

no code implementations18 Apr 2024 Xingyu Fu, Yushi Hu, Bangzheng Li, Yu Feng, Haoyu Wang, Xudong Lin, Dan Roth, Noah A. Smith, Wei-Chiu Ma, Ranjay Krishna

We introduce Blink, a new benchmark for multimodal language models (LLMs) that focuses on core visual perception abilities not found in other evaluations.

Depth Estimation Multiple-choice +1

Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video

no code implementations15 Apr 2024 Hongchi Xia, Zhi-Hao Lin, Wei-Chiu Ma, Shenlong Wang

Creating high-quality and interactive virtual environments, such as games and simulators, often involves complex and costly manual modeling processes.

Video2Game: Real-time Interactive Realistic and Browser-Compatible Environment from a Single Video

no code implementations CVPR 2024 Hongchi Xia, Zhi-Hao Lin, Wei-Chiu Ma, Shenlong Wang

Creating high-quality and interactive virtual environments such as games and simulators often involves complex and costly manual modeling processes.

LightSim: Neural Lighting Simulation for Urban Scenes

no code implementations11 Dec 2023 Ava Pun, Gary Sun, Jingkang Wang, Yun Chen, Ze Yang, Sivabalan Manivasagam, Wei-Chiu Ma, Raquel Urtasun

Different outdoor illumination conditions drastically alter the appearance of urban scenes, and they can harm the performance of image-based robot perception systems if not seen during training.

UltraLiDAR: Learning Compact Representations for LiDAR Completion and Generation

no code implementations2 Nov 2023 Yuwen Xiong, Wei-Chiu Ma, Jingkang Wang, Raquel Urtasun

We show that by aligning the representation of a sparse point cloud to that of a dense point cloud, we can densify the sparse point clouds as if they were captured by a real high-density LiDAR, drastically reducing the cost.

CADSim: Robust and Scalable in-the-wild 3D Reconstruction for Controllable Sensor Simulation

no code implementations2 Nov 2023 Jingkang Wang, Sivabalan Manivasagam, Yun Chen, Ze Yang, Ioan Andrei Bârsan, Anqi Joyce Yang, Wei-Chiu Ma, Raquel Urtasun

To tackle these issues, we present CADSim, which combines part-aware object-class priors via a small set of CAD models with differentiable rendering to automatically reconstruct vehicle geometry, including articulated wheels, with high-quality appearance.

3D Reconstruction

UniSim: A Neural Closed-Loop Sensor Simulator

2 code implementations CVPR 2023 Ze Yang, Yun Chen, Jingkang Wang, Sivabalan Manivasagam, Wei-Chiu Ma, Anqi Joyce Yang, Raquel Urtasun

Previously recorded driving logs provide a rich resource to build these new scenarios from, but for closed loop evaluation, we need to modify the sensor data based on the new scene configuration and the SDV's decisions, as actors might be added or removed and the trajectories of existing actors and the SDV will differ from the original log.

Learning Compact Representations for LiDAR Completion and Generation

no code implementations CVPR 2023 Yuwen Xiong, Wei-Chiu Ma, Jingkang Wang, Raquel Urtasun

We show that by aligning the representation of a sparse point cloud to that of a dense point cloud, we can densify the sparse point clouds as if they were captured by a real high-density LiDAR, drastically reducing the cost.

Virtual Correspondence: Humans as a Cue for Extreme-View Geometry

no code implementations CVPR 2022 Wei-Chiu Ma, Anqi Joyce Yang, Shenlong Wang, Raquel Urtasun, Antonio Torralba

Similar to classic correspondences, VCs conform with epipolar geometry; unlike classic correspondences, VCs do not need to be co-visible across views.

3D Reconstruction Camera Pose Estimation +2

NeurMiPs: Neural Mixture of Planar Experts for View Synthesis

1 code implementation CVPR 2022 Zhi-Hao Lin, Wei-Chiu Ma, Hao-Yu Hsu, Yu-Chiang Frank Wang, Shenlong Wang

We present Neural Mixtures of Planar Experts (NeurMiPs), a novel planar-based scene representation for modeling geometry and appearance.

Novel View Synthesis

BARF: Bundle-Adjusting Neural Radiance Fields

4 code implementations ICCV 2021 Chen-Hsuan Lin, Wei-Chiu Ma, Antonio Torralba, Simon Lucey

In this paper, we propose Bundle-Adjusting Neural Radiance Fields (BARF) for training NeRF from imperfect (or even unknown) camera poses -- the joint problem of learning neural 3D representations and registering camera frames.

Visual Localization

Deep Feedback Inverse Problem Solver

no code implementations ECCV 2020 Wei-Chiu Ma, Shenlong Wang, Jiayuan Gu, Sivabalan Manivasagam, Antonio Torralba, Raquel Urtasun

Specifically, at each iteration, the neural network takes the feedback as input and outputs an update on the current estimation.

Pose Estimation

Mending Neural Implicit Modeling for 3D Vehicle Reconstruction in the Wild

no code implementations18 Jan 2021 Shivam Duggal, ZiHao Wang, Wei-Chiu Ma, Sivabalan Manivasagam, Justin Liang, Shenlong Wang, Raquel Urtasun

Reconstructing high-quality 3D objects from sparse, partial observations from a single view is of crucial importance for various applications in computer vision, robotics, and graphics.

3D Object Reconstruction

Deep Parametric Continuous Convolutional Neural Networks

no code implementations CVPR 2018 Shenlong Wang, Simon Suo, Wei-Chiu Ma, Andrei Pokrovsky, Raquel Urtasun

Standard convolutional neural networks assume a grid structured input is available and exploit discrete convolutions as their fundamental building blocks.

Ranked #2 on Semantic Segmentation on S3DIS Area5 (Number of params metric)

Motion Estimation Point Cloud Segmentation +1

S3: Neural Shape, Skeleton, and Skinning Fields for 3D Human Modeling

no code implementations CVPR 2021 Ze Yang, Shenlong Wang, Sivabalan Manivasagam, Zeng Huang, Wei-Chiu Ma, Xinchen Yan, Ersin Yumer, Raquel Urtasun

Constructing and animating humans is an important component for building virtual worlds in a wide variety of applications such as virtual reality or robotics testing in simulation.

VideoClick: Video Object Segmentation with a Single Click

no code implementations16 Jan 2021 Namdar Homayounfar, Justin Liang, Wei-Chiu Ma, Raquel Urtasun

Towards this goal, in this paper we propose a bottom up approach where given a single click for each object in a video, we obtain the segmentation masks of these objects in the full video.

Object Segmentation +4

DAGMapper: Learning to Map by Discovering Lane Topology

no code implementations ICCV 2019 Namdar Homayounfar, Wei-Chiu Ma, Justin Liang, Xinyu Wu, Jack Fan, Raquel Urtasun

One of the fundamental challenges to scale self-driving is being able to create accurate high definition maps (HD maps) with low cost.

Convolutional Recurrent Network for Road Boundary Extraction

no code implementations CVPR 2019 Justin Liang, Namdar Homayounfar, Wei-Chiu Ma, Shenlong Wang, Raquel Urtasun

Creating high definition maps that contain precise information of static elements of the scene is of utmost importance for enabling self driving cars to drive safely.

Self-Driving Cars

Recovering and Simulating Pedestrians in the Wild

no code implementations16 Nov 2020 Ze Yang, Siva Manivasagam, Ming Liang, Bin Yang, Wei-Chiu Ma, Raquel Urtasun

We then incorporate the reconstructed pedestrian assets bank in a realistic LiDAR simulation system by performing motion retargeting, and show that the simulated LiDAR data can be used to significantly reduce the amount of annotated real-world data required for visual perception tasks.

Data Augmentation motion retargeting

Weakly-supervised 3D Shape Completion in the Wild

no code implementations ECCV 2020 Jiayuan Gu, Wei-Chiu Ma, Sivabalan Manivasagam, Wenyuan Zeng, ZiHao Wang, Yuwen Xiong, Hao Su, Raquel Urtasun

3D shape completion for real data is important but challenging, since partial point clouds acquired by real-world sensors are usually sparse, noisy and unaligned.

Point Cloud Registration Pose Estimation

Conditional Entropy Coding for Efficient Video Compression

no code implementations ECCV 2020 Jerry Liu, Shenlong Wang, Wei-Chiu Ma, Meet Shah, Rui Hu, Pranaab Dhawan, Raquel Urtasun

We propose a very simple and efficient video compression framework that only focuses on modeling the conditional entropy between frames.

MS-SSIM SSIM +1

LevelSet R-CNN: A Deep Variational Method for Instance Segmentation

no code implementations30 Jul 2020 Namdar Homayounfar, Yuwen Xiong, Justin Liang, Wei-Chiu Ma, Raquel Urtasun

Obtaining precise instance segmentation masks is of high importance in many modern applications such as robotic manipulation and autonomous driving.

Autonomous Driving Instance Segmentation +2

LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World

no code implementations CVPR 2020 Sivabalan Manivasagam, Shenlong Wang, Kelvin Wong, Wenyuan Zeng, Mikita Sazanovich, Shuhan Tan, Bin Yang, Wei-Chiu Ma, Raquel Urtasun

We first utilize ray casting over the 3D scene and then use a deep neural network to produce deviations from the physics-based simulation, producing realistic LiDAR point clouds.

PolyTransform: Deep Polygon Transformer for Instance Segmentation

no code implementations CVPR 2020 Justin Liang, Namdar Homayounfar, Wei-Chiu Ma, Yuwen Xiong, Rui Hu, Raquel Urtasun

In this paper, we propose PolyTransform, a novel instance segmentation algorithm that produces precise, geometry-preserving masks by combining the strengths of prevailing segmentation approaches and modern polygon-based methods.

Ranked #1000000000 on Instance Segmentation on Cityscapes test (using extra training data)

Instance Segmentation Segmentation +1

The Sound of Motions

1 code implementation ICCV 2019 Hang Zhao, Chuang Gan, Wei-Chiu Ma, Antonio Torralba

Sounds originate from object motions and vibrations of surrounding air.

SurfConv: Bridging 3D and 2D Convolution for RGBD Images

1 code implementation CVPR 2018 Hang Chu, Wei-Chiu Ma, Kaustav Kundu, Raquel Urtasun, Sanja Fidler

On the other hand, 3D convolution wastes a large amount of memory on mostly unoccupied 3D space, which consists of only the surface visible to the sensor.

3D Semantic Segmentation

Single Image Intrinsic Decomposition without a Single Intrinsic Image

no code implementations ECCV 2018 Wei-Chiu Ma, Hang Chu, Bolei Zhou, Raquel Urtasun, Antonio Torralba

At inference time, our model can be easily reduced to a single stream module that performs intrinsic decomposition on a single input image.

Intrinsic Image Decomposition

Find your Way by Observing the Sun and Other Semantic Cues

no code implementations23 Jun 2016 Wei-Chiu Ma, Shenlong Wang, Marcus A. Brubaker, Sanja Fidler, Raquel Urtasun

In this paper we present a robust, efficient and affordable approach to self-localization which does not require neither GPS nor knowledge about the appearance of the world.

Forecasting Interactive Dynamics of Pedestrians with Fictitious Play

no code implementations CVPR 2017 Wei-Chiu Ma, De-An Huang, Namhoon Lee, Kris M. Kitani

We develop predictive models of pedestrian dynamics by encoding the coupled nature of multi-pedestrian interaction using game theory, and deep learning-based visual analysis to estimate person-specific behavior parameters.

Decision Making

How Do We Use Our Hands? Discovering a Diverse Set of Common Grasps

no code implementations CVPR 2015 De-An Huang, Minghuang Ma, Wei-Chiu Ma, Kris M. Kitani

Furthermore, we develop a hierarchical extension to the DPP clustering algorithm and show that it can be used to discover appearance-based grasp taxonomies.

Clustering Online Clustering

Cannot find the paper you are looking for? You can Submit a new open access paper.