Search Results for author: Shiyi Lan

Found 27 papers, 18 papers with code

Multi-Dimensional Pruning: Joint Channel, Layer and Block Pruning with Latency Constraint

no code implementations17 Jun 2024 Xinglong Sun, Barath Lakshmanan, Maying Shen, Shiyi Lan, Jingde Chen, Jose Alvarez

We develop a latency modeling technique that accurately captures model-wide latency variations during pruning, which is crucial for achieving an optimal latency-accuracy trade-offs at high pruning ratio.

3D Object Detection object-detection

OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning

1 code implementation2 May 2024 Shihao Wang, Zhiding Yu, Xiaohui Jiang, Shiyi Lan, Min Shi, Nadine Chang, Jan Kautz, Ying Li, Jose M. Alvarez

We further propose OmniDrive-nuScenes, a new visual question-answering dataset challenging the true 3D situational awareness of a model with comprehensive visual question-answering (VQA) tasks, including scene description, traffic regulation, 3D grounding, counterfactual reasoning, decision making and planning.

Autonomous Driving counterfactual +4

EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks

no code implementations31 Jan 2024 Shijia Liao, Shiyi Lan, Arun George Zachariah

The advent of Large Models marks a new era in machine learning, significantly outperforming smaller models by leveraging vast datasets to capture and synthesize complex patterns.

Ranked #2 on Speech Synthesis on LibriTTS (using extra training data)

Audio Generation Speech Synthesis

Fully Attentional Networks with Self-emerging Token Labeling

1 code implementation ICCV 2023 Bingyin Zhao, Zhiding Yu, Shiyi Lan, Yutao Cheng, Anima Anandkumar, Yingjie Lao, Jose M. Alvarez

With the proposed STL framework, our best model based on FAN-L-Hybrid (77. 3M parameters) achieves 84. 8% Top-1 accuracy and 42. 1% mCE on ImageNet-1K and ImageNet-C, and sets a new state-of-the-art for ImageNet-A (46. 1%) and ImageNet-R (56. 6%) without using extra data, outperforming the original FAN counterpart by significant margins.

Semantic Segmentation

Synthesize Diagnose and Optimize: Towards Fine-Grained Vision-Language Understanding

1 code implementation CVPR 2024 Wujian Peng, Sicheng Xie, Zuyao You, Shiyi Lan, Zuxuan Wu

With this in mind we propose a simple yet effective approach to optimize VLMs in fine-grained understanding achieving significant improvements on SPEC without compromising the zero-shot performance.

Attribute

A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive Properties

1 code implementation21 Dec 2023 Junfei Xiao, Ziqi Zhou, Wenxuan Li, Shiyi Lan, Jieru Mei, Zhiding Yu, Alan Yuille, Yuyin Zhou, Cihang Xie

Instead of relying solely on category-specific annotations, ProLab uses descriptive properties grounded in common sense knowledge for supervising segmentation models.

Common Sense Reasoning Descriptive +1

Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?

1 code implementation CVPR 2024 Zhiqi Li, Zhiding Yu, Shiyi Lan, Jiahan Li, Jan Kautz, Tong Lu, Jose M. Alvarez

We initially observed that the nuScenes dataset, characterized by relatively simple driving scenarios, leads to an under-utilization of perception information in end-to-end models incorporating ego status, such as the ego vehicle's velocity.

Autonomous Driving

Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding

1 code implementation30 Nov 2023 Wujian Peng, Sicheng Xie, Zuyao You, Shiyi Lan, Zuxuan Wu

With this in mind, we propose a simple yet effective approach to optimize VLMs in fine-grained understanding, achieving significant improvements on SPEC without compromising the zero-shot performance.

Attribute Compositional Zero-Shot Learning

SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation

1 code implementation24 Nov 2023 Lingchen Meng, Shiyi Lan, Hengduo Li, Jose M. Alvarez, Zuxuan Wu, Yu-Gang Jiang

In-context segmentation aims at segmenting novel images using a few labeled example images, termed as "in-context examples", exploring content similarities between examples and the target.

Meta-Learning One-Shot Segmentation +3

ViR: Towards Efficient Vision Retention Backbones

1 code implementation30 Oct 2023 Ali Hatamizadeh, Michael Ranzinger, Shiyi Lan, Jose M. Alvarez, Sanja Fidler, Jan Kautz

Inspired by this trend, we propose a new class of computer vision models, dubbed Vision Retention Networks (ViR), with dual parallel and recurrent formulations, which strike an optimal balance between fast inference and parallel training with competitive performance.

FocalFormer3D : Focusing on Hard Instance for 3D Object Detection

1 code implementation8 Aug 2023 Yilun Chen, Zhiding Yu, Yukang Chen, Shiyi Lan, Animashree Anandkumar, Jiaya Jia, Jose Alvarez

For 3D object detection, we instantiate this method as FocalFormer3D, a simple yet effective detector that excels at excavating difficult objects and improving prediction recall.

3D Object Detection Autonomous Driving +3

FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation

1 code implementation4 Jul 2023 Zhiqi Li, Zhiding Yu, David Austin, Mingsheng Fang, Shiyi Lan, Jan Kautz, Jose M. Alvarez

This technical report summarizes the winning solution for the 3D Occupancy Prediction Challenge, which is held in conjunction with the CVPR 2023 Workshop on End-to-End Autonomous Driving and CVPR 23 Workshop on Vision-Centric Autonomous Driving Workshop.

Autonomous Driving Prediction Of Occupancy Grid Maps

Vision Transformers Are Good Mask Auto-Labelers

no code implementations CVPR 2023 Shiyi Lan, Xitong Yang, Zhiding Yu, Zuxuan Wu, Jose M. Alvarez, Anima Anandkumar

We propose Mask Auto-Labeler (MAL), a high-quality Transformer-based mask auto-labeling framework for instance segmentation using only box annotations.

Instance Segmentation Segmentation +1

FocalFormer3D: Focusing on Hard Instance for 3D Object Detection

1 code implementation ICCV 2023 Yilun Chen, Zhiding Yu, Yukang Chen, Shiyi Lan, Anima Anandkumar, Jiaya Jia, Jose M. Alvarez

For 3D object detection, we instantiate this method as FocalFormer3D, a simple yet effective detector that excels at excavating difficult objects and improving prediction recall.

3D Object Detection Autonomous Driving +3

1st Place Solution of The Robust Vision Challenge 2022 Semantic Segmentation Track

1 code implementation23 Oct 2022 Junfei Xiao, Zhichao Xu, Shiyi Lan, Zhiding Yu, Alan Yuille, Anima Anandkumar

The model is trained on a composite dataset consisting of images from 9 datasets (ADE20K, Cityscapes, Mapillary Vistas, ScanNet, VIPER, WildDash 2, IDD, BDD, and COCO) with a simple dataset balancing strategy.

Segmentation Semantic Segmentation

AdaViT: Adaptive Vision Transformers for Efficient Image Recognition

no code implementations CVPR 2022 Lingchen Meng, Hengduo Li, Bor-Chun Chen, Shiyi Lan, Zuxuan Wu, Yu-Gang Jiang, Ser-Nam Lim

To this end, we introduce AdaViT, an adaptive computation framework that learns to derive usage policies on which patches, self-attention heads and transformer blocks to use throughout the backbone on a per-input basis, aiming to improve inference efficiency of vision transformers with a minimal drop of accuracy for image recognition.

M3DeTR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers

1 code implementation24 Apr 2021 Tianrui Guan, Jun Wang, Shiyi Lan, Rohan Chandra, Zuxuan Wu, Larry Davis, Dinesh Manocha

We present a novel architecture for 3D object detection, M3DeTR, which combines different point cloud representations (raw, voxels, bird-eye view) with different feature scales based on multi-scale feature pyramids.

3D Object Detection object-detection +1

InfoFocus: 3D Object Detection for Autonomous Driving with Dynamic Information Modeling

no code implementations ECCV 2020 Jun Wang, Shiyi Lan, Mingfei Gao, Larry S. Davis

Results show that our framework achieves the state-of-the-art performance with 31 FPS and improves our baseline significantly by 9. 0% mAP on the nuScenes test set.

3D Object Detection Autonomous Driving +2

Modeling Local Geometric Structure of 3D Point Clouds using Geo-CNN

2 code implementations CVPR 2019 Shiyi Lan, Ruichi Yu, Gang Yu, Larry S. Davis

This encourages the network to preserve the geometric structure in Euclidean space throughout the feature extraction hierarchy.

Modeling Local Geometric Structure

Cannot find the paper you are looking for? You can Submit a new open access paper.