Search Results for author: Shiyi Lan

Found 22 papers, 15 papers with code

Modeling Local Geometric Structure of 3D Point Clouds using Geo-CNN

2 code implementations CVPR 2019 Shiyi Lan, Ruichi Yu, Gang Yu, Larry S. Davis

This encourages the network to preserve the geometric structure in Euclidean space throughout the feature extraction hierarchy.

Modeling Local Geometric Structure

InfoFocus: 3D Object Detection for Autonomous Driving with Dynamic Information Modeling

no code implementations ECCV 2020 Jun Wang, Shiyi Lan, Mingfei Gao, Larry S. Davis

Results show that our framework achieves the state-of-the-art performance with 31 FPS and improves our baseline significantly by 9. 0% mAP on the nuScenes test set.

3D Object Detection Autonomous Driving +2

M3DeTR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers

1 code implementation24 Apr 2021 Tianrui Guan, Jun Wang, Shiyi Lan, Rohan Chandra, Zuxuan Wu, Larry Davis, Dinesh Manocha

We present a novel architecture for 3D object detection, M3DeTR, which combines different point cloud representations (raw, voxels, bird-eye view) with different feature scales based on multi-scale feature pyramids.

3D Object Detection object-detection +1

AdaViT: Adaptive Vision Transformers for Efficient Image Recognition

no code implementations CVPR 2022 Lingchen Meng, Hengduo Li, Bor-Chun Chen, Shiyi Lan, Zuxuan Wu, Yu-Gang Jiang, Ser-Nam Lim

To this end, we introduce AdaViT, an adaptive computation framework that learns to derive usage policies on which patches, self-attention heads and transformer blocks to use throughout the backbone on a per-input basis, aiming to improve inference efficiency of vision transformers with a minimal drop of accuracy for image recognition.

1st Place Solution of The Robust Vision Challenge 2022 Semantic Segmentation Track

1 code implementation23 Oct 2022 Junfei Xiao, Zhichao Xu, Shiyi Lan, Zhiding Yu, Alan Yuille, Anima Anandkumar

The model is trained on a composite dataset consisting of images from 9 datasets (ADE20K, Cityscapes, Mapillary Vistas, ScanNet, VIPER, WildDash 2, IDD, BDD, and COCO) with a simple dataset balancing strategy.

Segmentation Semantic Segmentation

FocalFormer3D: Focusing on Hard Instance for 3D Object Detection

1 code implementation ICCV 2023 Yilun Chen, Zhiding Yu, Yukang Chen, Shiyi Lan, Anima Anandkumar, Jiaya Jia, Jose M. Alvarez

For 3D object detection, we instantiate this method as FocalFormer3D, a simple yet effective detector that excels at excavating difficult objects and improving prediction recall.

3D Object Detection Autonomous Driving +2

Vision Transformers Are Good Mask Auto-Labelers

no code implementations CVPR 2023 Shiyi Lan, Xitong Yang, Zhiding Yu, Zuxuan Wu, Jose M. Alvarez, Anima Anandkumar

We propose Mask Auto-Labeler (MAL), a high-quality Transformer-based mask auto-labeling framework for instance segmentation using only box annotations.

Instance Segmentation Segmentation +1

FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation

1 code implementation4 Jul 2023 Zhiqi Li, Zhiding Yu, David Austin, Mingsheng Fang, Shiyi Lan, Jan Kautz, Jose M. Alvarez

This technical report summarizes the winning solution for the 3D Occupancy Prediction Challenge, which is held in conjunction with the CVPR 2023 Workshop on End-to-End Autonomous Driving and CVPR 23 Workshop on Vision-Centric Autonomous Driving Workshop.

Autonomous Driving Prediction Of Occupancy Grid Maps

FocalFormer3D : Focusing on Hard Instance for 3D Object Detection

1 code implementation8 Aug 2023 Yilun Chen, Zhiding Yu, Yukang Chen, Shiyi Lan, Animashree Anandkumar, Jiaya Jia, Jose Alvarez

For 3D object detection, we instantiate this method as FocalFormer3D, a simple yet effective detector that excels at excavating difficult objects and improving prediction recall.

3D Object Detection Autonomous Driving +2

ViR: Towards Efficient Vision Retention Backbones

1 code implementation30 Oct 2023 Ali Hatamizadeh, Michael Ranzinger, Shiyi Lan, Jose M. Alvarez, Sanja Fidler, Jan Kautz

Inspired by this trend, we propose a new class of computer vision models, dubbed Vision Retention Networks (ViR), with dual parallel and recurrent formulations, which strike an optimal balance between fast inference and parallel training with competitive performance.

SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation

1 code implementation24 Nov 2023 Lingchen Meng, Shiyi Lan, Hengduo Li, Jose M. Alvarez, Zuxuan Wu, Yu-Gang Jiang

In-context segmentation aims at segmenting novel images using a few labeled example images, termed as "in-context examples", exploring content similarities between examples and the target.

Meta-Learning One-Shot Segmentation +3

Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding

1 code implementation30 Nov 2023 Wujian Peng, Sicheng Xie, Zuyao You, Shiyi Lan, Zuxuan Wu

With this in mind, we propose a simple yet effective approach to optimize VLMs in fine-grained understanding, achieving significant improvements on SPEC without compromising the zero-shot performance.

Attribute Compositional Zero-Shot Learning

BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection

no code implementations4 Dec 2023 Zhenxin Li, Shiyi Lan, Jose M. Alvarez, Zuxuan Wu

Recently, the rise of query-based Transformer decoders is reshaping camera-based 3D object detection.

3D Object Detection Depth Estimation +3

Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?

1 code implementation5 Dec 2023 Zhiqi Li, Zhiding Yu, Shiyi Lan, Jiahan Li, Jan Kautz, Tong Lu, Jose M. Alvarez

We initially observed that the nuScenes dataset, characterized by relatively simple driving scenarios, leads to an under-utilization of perception information in end-to-end models incorporating ego status, such as the ego vehicle's velocity.

Autonomous Driving

A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive Properties

1 code implementation21 Dec 2023 Junfei Xiao, Ziqi Zhou, Wenxuan Li, Shiyi Lan, Jieru Mei, Zhiding Yu, Alan Yuille, Yuyin Zhou, Cihang Xie

Instead of relying solely on category-specific annotations, ProLab uses descriptive properties grounded in common sense knowledge for supervising segmentation models.

Common Sense Reasoning Descriptive +1

Fully Attentional Networks with Self-emerging Token Labeling

1 code implementation ICCV 2023 Bingyin Zhao, Zhiding Yu, Shiyi Lan, Yutao Cheng, Anima Anandkumar, Yingjie Lao, Jose M. Alvarez

With the proposed STL framework, our best model based on FAN-L-Hybrid (77. 3M parameters) achieves 84. 8% Top-1 accuracy and 42. 1% mCE on ImageNet-1K and ImageNet-C, and sets a new state-of-the-art for ImageNet-A (46. 1%) and ImageNet-R (56. 6%) without using extra data, outperforming the original FAN counterpart by significant margins.

Semantic Segmentation

EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks

no code implementations31 Jan 2024 Shijia Liao, Shiyi Lan, Arun George Zachariah

The advent of Large Models marks a new era in machine learning, significantly outperforming smaller models by leveraging vast datasets to capture and synthesize complex patterns.

Audio Generation Speech Synthesis

Cannot find the paper you are looking for? You can Submit a new open access paper.