no code implementations • 10 Dec 2024 • Ziqi Lu, Heng Yang, Danfei Xu, Boyi Li, Boris Ivanovic, Marco Pavone, Yue Wang
Emerging 3D geometric foundation models, such as DUSt3R, offer a promising approach for in-the-wild 3D vision tasks.
1 code implementation • 6 Dec 2024 • Xiangyu Han, Zhen Jia, Boyi Li, Yan Wang, Boris Ivanovic, Yurong You, Lingjie Liu, Yue Wang, Marco Pavone, Chen Feng, Yiming Li
Our results show that Gaussian Splatting is prone to overfitting to training views.
no code implementations • 9 Sep 2024 • Shuhan Tan, Boris Ivanovic, Yuxiao Chen, Boyi Li, Xinshuo Weng, Yulong Cao, Philipp Krähenbühl, Marco Pavone
Simulation stands as a cornerstone for safe and efficient autonomous driving development.
no code implementations • 26 Jul 2024 • Boyi Li, Ligeng Zhu, Ran Tian, Shuhan Tan, Yuxiao Chen, Yao Lu, Yin Cui, Sushant Veer, Max Ehrlich, Jonah Philion, Xinshuo Weng, Fuzhao Xue, Andrew Tao, Ming-Yu Liu, Sanja Fidler, Boris Ivanovic, Trevor Darrell, Jitendra Malik, Song Han, Marco Pavone
Finally, we establish a benchmark for video captioning and introduce a leaderboard, aiming to accelerate advancements in video understanding, captioning, and data alignment.
no code implementations • 1 Jul 2024 • Ran Tian, Boyi Li, Xinshuo Weng, Yuxiao Chen, Edward Schmerling, Yue Wang, Boris Ivanovic, Marco Pavone
The autonomous driving industry is increasingly adopting end-to-end learning from sensory inputs to minimize human biases in system design.
1 code implementation • 25 May 2024 • Xiangyu Chen, Zhenzhen Liu, Katie Z Luo, Siddhartha Datta, Adhitya Polavaram, Yan Wang, Yurong You, Boyi Li, Marco Pavone, Wei-Lun Chao, Mark Campbell, Bharath Hariharan, Kilian Q. Weinberger
Ensuring robust 3D object detection and localization is crucial for many applications in robotics and autonomous driving.
no code implementations • 6 May 2024 • Jang Hyun Cho, Boris Ivanovic, Yulong Cao, Edward Schmerling, Yue Wang, Xinshuo Weng, Boyi Li, Yurong You, Philipp Krähenbühl, Yan Wang, Marco Pavone
Our experiments on outdoor benchmarks demonstrate that Cube-LLM significantly outperforms existing baselines by 21. 3 points of AP-BEV on the Talk2Car dataset for 3D grounded reasoning and 17. 7 points on the DriveLM dataset for complex reasoning about driving scenarios, respectively.
no code implementations • 24 Mar 2024 • Boyi Li, Weixuan Xia
Key insights from our theoretical analysis and empirical findings include: (1) the superior performance of fractional stochastic-volatility models compared to various benchmark models, including those incorporating jumps and stochastic volatility, along with high computational efficiency when utilizing a piecewise kernel, (2) the practical necessity of considering jumps in both price and volatility, along with rough volatility, in pricing and hedging cryptocurrency options, (3) stability of calibrated parameter values in line with stylized facts.
no code implementations • 21 Mar 2024 • Sihyun Yu, Weili Nie, De-An Huang, Boyi Li, Jinwoo Shin, Anima Anandkumar
To tackle this issue, we propose content-motion latent diffusion model (CMD), a novel efficient extension of pretrained image diffusion models for video generation.
no code implementations • CVPR 2024 • Boyi Li, Yue Wang, Jiageng Mao, Boris Ivanovic, Sushant Veer, Karen Leung, Marco Pavone
Adapting driving behavior to new environments, customs, and laws is a long-standing problem in autonomous driving, precluding the widespread deployment of autonomous vehicles (AVs).
no code implementations • 19 Jan 2024 • Boyi Li, Jathushan Rajasegaran, Yossi Gandelsman, Alexei A. Efros, Jitendra Malik
This disentangled approach allows our method to generate a sequence of images that are faithful to the target motion in the 3D pose and, to the input image in terms of visual similarity.
1 code implementation • CVPR 2024 • Tsung-Han Wu, Long Lian, Joseph E. Gonzalez, Boyi Li, Trevor Darrell
Steered by an LLM controller, SLD turns text-to-image generation into an iterative closed-loop process, ensuring correctness in the resulting image.
no code implementations • 21 Nov 2023 • Jiaxin Ge, Sanjay Subramanian, Trevor Darrell, Boyi Li
Addressing the challenge of adapting pre-trained vision-language models for generating insightful explanations for visual reasoning tasks with limited annotations, we present ReVisE: a $\textbf{Re}$cursive $\textbf{Vis}$ual $\textbf{E}$xplanation algorithm.
1 code implementation • 3 Nov 2023 • Jiawei Yang, Boris Ivanovic, Or Litany, Xinshuo Weng, Seung Wook Kim, Boyi Li, Tong Che, Danfei Xu, Sanja Fidler, Marco Pavone, Yue Wang
We present EmerNeRF, a simple yet powerful approach for learning spatial-temporal representations of dynamic driving scenes.
no code implementations • 16 Oct 2023 • Boyi Li, Philipp Wu, Pieter Abbeel, Jitendra Malik
An interactive robot framework accomplishes long-horizon task planning and can easily generalize to new goals or distinct tasks, even during execution.
no code implementations • 29 Sep 2023 • Long Lian, Baifeng Shi, Adam Yala, Trevor Darrell, Boyi Li
We show that LLMs are able to understand complex spatiotemporal dynamics from text alone and generate layouts that align closely with both the prompts and the object motion patterns typically observed in the real world.
2 code implementations • 23 May 2023 • Long Lian, Boyi Li, Adam Yala, Trevor Darrell
Our method significantly outperforms the base diffusion model and several strong baselines in accurately generating images according to prompts that require various capabilities, doubling the generation accuracy across four tasks on average.
no code implementations • 20 Dec 2022 • Boyi Li, Rodolfo Corona, Karttikeya Mangalam, Catherine Chen, Daniel Flaherty, Serge Belongie, Kilian Q. Weinberger, Jitendra Malik, Trevor Darrell, Dan Klein
Are multimodal inputs necessary for grammar induction?
no code implementations • 11 Mar 2022 • Arushi Goel, Niveditha Kalavakonda, Nour Karessli, Tejaswi Kasarla, Kathryn Leonard, Boyi Li, Nermin Samet and, Ghada Zamzmi
In this paper, we present the details of Women in Computer Vision Workshop - WiCV 2021, organized alongside the virtual CVPR 2021.
1 code implementation • ICLR 2022 • Boyi Li, Kilian Q. Weinberger, Serge Belongie, Vladlen Koltun, René Ranftl
We present LSeg, a novel model for language-driven semantic image segmentation.
Ranked #1 on Few-Shot Semantic Segmentation on FSS-1000
1 code implementation • ICLR 2022 • Varsha Kishore, Xiangyu Chen, Yan Wang, Boyi Li, Kilian Q Weinberger
Recent attempts at image steganography make use of advances in deep learning to train an encoder-decoder network pair to hide and retrieve secret messages in images.
2 code implementations • 25 Jun 2021 • Boyi Li, Yin Cui, Tsung-Yi Lin, Serge Belongie
In this paper, we propose and explore the problem of image translation for data augmentation.
no code implementations • 11 Jan 2021 • Hazel Doughty, Nour Karessli, Kathryn Leonard, Boyi Li, Carianne Martinez, Azadeh Mobasher, Arsha Nagrani, Srishti Yadav
It provides a voice to a minority (female) group in computer vision community and focuses on increasingly the visibility of these researchers, both in academia and industry.
1 code implementation • CVPR 2021 • Boyi Li, Felix Wu, Ser-Nam Lim, Serge Belongie, Kilian Q. Weinberger
The moments (a. k. a., mean and standard deviation) of latent features are often removed as noise when training image recognition models, to increase stability and reduce training time.
Ranked #32 on Domain Generalization on ImageNet-A
no code implementations • 28 Sep 2019 • Felix Wu, Boyi Li, Lequn Wang, Ni Lao, John Blitzer, Kilian Q. Weinberger
This paper introduces Integrated Triaging, a framework that prunes almost all context in early layers of a network, leaving the remaining (deep) layers to scan only a tiny fraction of the full corpus.
no code implementations • 25 Sep 2019 • Geoff Pleiss, Amauri Souza, Joseph Kim, Boyi Li, Kilian Q. Weinberger
Neural network out-of-distribution (OOD) detection aims to identify when a model is unable to generalize to new inputs, either due to covariate shift or anomalous data.
Out-of-Distribution Detection Out of Distribution (OOD) Detection +1
2 code implementations • NeurIPS 2019 • Boyi Li, Felix Wu, Kilian Q. Weinberger, Serge Belongie
A popular method to reduce the training time of deep neural networks is to normalize activations at each layer.
2 code implementations • 28 Feb 2019 • Felix Wu, Boyi Li, Lequn Wang, Ni Lao, John Blitzer, Kilian Q. Weinberger
In this technical report, we introduce FastFusionNet, an efficient variant of FusionNet [12].
1 code implementation • 12 Dec 2017 • Boyi Li, Wenqi Ren, Dengpan Fu, DaCheng Tao, Dan Feng, Wen-Jun Zeng, Zhangyang Wang
We present a comprehensive study and evaluation of existing single image dehazing algorithms, using a new large-scale benchmark consisting of both synthetic and real-world hazy images, called REalistic Single Image DEhazing (RESIDE).
1 code implementation • ICCV 2017 • Boyi Li, Xiulian Peng, Zhangyang Wang, Jizheng Xu, Dan Feng
This paper proposes an image dehazing model built with a convolutional neural network (CNN), called All-in-One Dehazing Network (AOD-Net).
Ranked #25 on Image Dehazing on SOTS Outdoor
no code implementations • 12 Sep 2017 • Boyi Li, Xiulian Peng, Zhangyang Wang, Jizheng Xu, Dan Feng
Furthermore, we build an End-to-End United Video Dehazing and Detection Network(EVDD-Net), which concatenates and jointly trains EVD-Net with a video object detection model.
2 code implementations • 20 Jul 2017 • Boyi Li, Xiulian Peng, Zhangyang Wang, Jizheng Xu, Dan Feng
This paper proposes an image dehazing model built with a convolutional neural network (CNN), called All-in-One Dehazing Network (AOD-Net).