Search Results for author: Boyi Li

Found 32 papers, 14 papers with code

LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation Models

no code implementations10 Dec 2024 Ziqi Lu, Heng Yang, Danfei Xu, Boyi Li, Boris Ivanovic, Marco Pavone, Yue Wang

Emerging 3D geometric foundation models, such as DUSt3R, offer a promising approach for in-the-wild 3D vision tasks.

3D Reconstruction Pose Estimation

Language-Image Models with 3D Understanding

no code implementations6 May 2024 Jang Hyun Cho, Boris Ivanovic, Yulong Cao, Edward Schmerling, Yue Wang, Xinshuo Weng, Boyi Li, Yurong You, Philipp Krähenbühl, Yan Wang, Marco Pavone

Our experiments on outdoor benchmarks demonstrate that Cube-LLM significantly outperforms existing baselines by 21. 3 points of AP-BEV on the Talk2Car dataset for 3D grounded reasoning and 17. 7 points on the DriveLM dataset for complex reasoning about driving scenarios, respectively.

Question Answering Visual Question Answering

Crypto Inverse-Power Options and Fractional Stochastic Volatility

no code implementations24 Mar 2024 Boyi Li, Weixuan Xia

Key insights from our theoretical analysis and empirical findings include: (1) the superior performance of fractional stochastic-volatility models compared to various benchmark models, including those incorporating jumps and stochastic volatility, along with high computational efficiency when utilizing a piecewise kernel, (2) the practical necessity of considering jumps in both price and volatility, along with rough volatility, in pricing and hedging cryptocurrency options, (3) stability of calibrated parameter values in line with stylized facts.

Computational Efficiency

Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition

no code implementations21 Mar 2024 Sihyun Yu, Weili Nie, De-An Huang, Boyi Li, Jinwoo Shin, Anima Anandkumar

To tackle this issue, we propose content-motion latent diffusion model (CMD), a novel efficient extension of pretrained image diffusion models for video generation.

Video Generation

Driving Everywhere with Large Language Model Policy Adaptation

no code implementations CVPR 2024 Boyi Li, Yue Wang, Jiageng Mao, Boris Ivanovic, Sushant Veer, Karen Leung, Marco Pavone

Adapting driving behavior to new environments, customs, and laws is a long-standing problem in autonomous driving, precluding the widespread deployment of autonomous vehicles (AVs).

Autonomous Driving Language Modelling +2

Synthesizing Moving People with 3D Control

no code implementations19 Jan 2024 Boyi Li, Jathushan Rajasegaran, Yossi Gandelsman, Alexei A. Efros, Jitendra Malik

This disentangled approach allows our method to generate a sequence of images that are faithful to the target motion in the 3D pose and, to the input image in terms of visual similarity.

Self-correcting LLM-controlled Diffusion Models

1 code implementation CVPR 2024 Tsung-Han Wu, Long Lian, Joseph E. Gonzalez, Boyi Li, Trevor Darrell

Steered by an LLM controller, SLD turns text-to-image generation into an iterative closed-loop process, ensuring correctness in the resulting image.

Attribute Text-to-Image Generation

From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation

no code implementations21 Nov 2023 Jiaxin Ge, Sanjay Subramanian, Trevor Darrell, Boyi Li

Addressing the challenge of adapting pre-trained vision-language models for generating insightful explanations for visual reasoning tasks with limited annotations, we present ReVisE: a $\textbf{Re}$cursive $\textbf{Vis}$ual $\textbf{E}$xplanation algorithm.

Explanation Generation Visual Question Answering (VQA) +1

Interactive Task Planning with Language Models

no code implementations16 Oct 2023 Boyi Li, Philipp Wu, Pieter Abbeel, Jitendra Malik

An interactive robot framework accomplishes long-horizon task planning and can easily generalize to new goals or distinct tasks, even during execution.

Language Modelling Large Language Model +1

LLM-grounded Video Diffusion Models

no code implementations29 Sep 2023 Long Lian, Baifeng Shi, Adam Yala, Trevor Darrell, Boyi Li

We show that LLMs are able to understand complex spatiotemporal dynamics from text alone and generate layouts that align closely with both the prompts and the object motion patterns typically observed in the real world.

Language Modelling Large Language Model +1

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models

2 code implementations23 May 2023 Long Lian, Boyi Li, Adam Yala, Trevor Darrell

Our method significantly outperforms the base diffusion model and several strong baselines in accurately generating images according to prompts that require various capabilities, doubling the generation accuracy across four tasks on average.

Common Sense Reasoning Language Modelling +3

WiCV 2021: The Eighth Women In Computer Vision Workshop

no code implementations11 Mar 2022 Arushi Goel, Niveditha Kalavakonda, Nour Karessli, Tejaswi Kasarla, Kathryn Leonard, Boyi Li, Nermin Samet and, Ghada Zamzmi

In this paper, we present the details of Women in Computer Vision Workshop - WiCV 2021, organized alongside the virtual CVPR 2021.

Fixed Neural Network Steganography: Train the images, not the network

1 code implementation ICLR 2022 Varsha Kishore, Xiangyu Chen, Yan Wang, Boyi Li, Kilian Q Weinberger

Recent attempts at image steganography make use of advances in deep learning to train an encoder-decoder network pair to hide and retrieve secret messages in images.

Decoder Image Steganography +1

WiCV 2020: The Seventh Women In Computer Vision Workshop

no code implementations11 Jan 2021 Hazel Doughty, Nour Karessli, Kathryn Leonard, Boyi Li, Carianne Martinez, Azadeh Mobasher, Arsha Nagrani, Srishti Yadav

It provides a voice to a minority (female) group in computer vision community and focuses on increasingly the visibility of these researchers, both in academia and industry.

On Feature Normalization and Data Augmentation

1 code implementation CVPR 2021 Boyi Li, Felix Wu, Ser-Nam Lim, Serge Belongie, Kilian Q. Weinberger

The moments (a. k. a., mean and standard deviation) of latent features are often removed as noise when training image recognition models, to increase stability and reduce training time.

Data Augmentation Domain Generalization +2

Integrated Triaging for Fast Reading Comprehension

no code implementations28 Sep 2019 Felix Wu, Boyi Li, Lequn Wang, Ni Lao, John Blitzer, Kilian Q. Weinberger

This paper introduces Integrated Triaging, a framework that prunes almost all context in early layers of a network, leaving the remaining (deep) layers to scan only a tiny fraction of the full corpus.

Computational Efficiency Machine Reading Comprehension +1

Neural Network Out-of-Distribution Detection for Regression Tasks

no code implementations25 Sep 2019 Geoff Pleiss, Amauri Souza, Joseph Kim, Boyi Li, Kilian Q. Weinberger

Neural network out-of-distribution (OOD) detection aims to identify when a model is unable to generalize to new inputs, either due to covariate shift or anomalous data.

Out-of-Distribution Detection Out of Distribution (OOD) Detection +1

Positional Normalization

2 code implementations NeurIPS 2019 Boyi Li, Felix Wu, Kilian Q. Weinberger, Serge Belongie

A popular method to reduce the training time of deep neural networks is to normalize activations at each layer.

FastFusionNet: New State-of-the-Art for DAWNBench SQuAD

2 code implementations28 Feb 2019 Felix Wu, Boyi Li, Lequn Wang, Ni Lao, John Blitzer, Kilian Q. Weinberger

In this technical report, we introduce FastFusionNet, an efficient variant of FusionNet [12].

Reading Comprehension Retrieval

Benchmarking Single Image Dehazing and Beyond

1 code implementation12 Dec 2017 Boyi Li, Wenqi Ren, Dengpan Fu, DaCheng Tao, Dan Feng, Wen-Jun Zeng, Zhangyang Wang

We present a comprehensive study and evaluation of existing single image dehazing algorithms, using a new large-scale benchmark consisting of both synthetic and real-world hazy images, called REalistic Single Image DEhazing (RESIDE).

Benchmarking Image Dehazing +1

AOD-Net: All-In-One Dehazing Network

1 code implementation ICCV 2017 Boyi Li, Xiulian Peng, Zhangyang Wang, Jizheng Xu, Dan Feng

This paper proposes an image dehazing model built with a convolutional neural network (CNN), called All-in-One Dehazing Network (AOD-Net).

Image Dehazing object-detection +2

End-to-End United Video Dehazing and Detection

no code implementations12 Sep 2017 Boyi Li, Xiulian Peng, Zhangyang Wang, Jizheng Xu, Dan Feng

Furthermore, we build an End-to-End United Video Dehazing and Detection Network(EVDD-Net), which concatenates and jointly trains EVD-Net with a video object detection model.

Image Dehazing object-detection +1

An All-in-One Network for Dehazing and Beyond

2 code implementations20 Jul 2017 Boyi Li, Xiulian Peng, Zhangyang Wang, Jizheng Xu, Dan Feng

This paper proposes an image dehazing model built with a convolutional neural network (CNN), called All-in-One Dehazing Network (AOD-Net).

Image Dehazing object-detection +2

Cannot find the paper you are looking for? You can Submit a new open access paper.