no code implementations • 9 Feb 2023 • Zhuolin Yang, Wei Ping, Zihan Liu, Vijay Korthikanti, Weili Nie, De-An Huang, Linxi Fan, Zhiding Yu, Shiyi Lan, Bo Li, Ming-Yu Liu, Yuke Zhu, Mohammad Shoeybi, Bryan Catanzaro, Chaowei Xiao, Anima Anandkumar
Augmenting pretrained language models (LMs) with a vision encoder (e. g., Flamingo) has obtained state-of-the-art results in image-to-text generation.
no code implementations • 10 Jan 2023 • Shiyi Lan, Xitong Yang, Zhiding Yu, Zuxuan Wu, Jose M. Alvarez, Anima Anandkumar
We propose Mask Auto-Labeler (MAL), a high-quality Transformer-based mask auto-labeling framework for instance segmentation using only box annotations.
1 code implementation • 23 Oct 2022 • Junfei Xiao, Zhichao Xu, Shiyi Lan, Zhiding Yu, Alan Yuille, Anima Anandkumar
The model is trained on a composite dataset consisting of images from 9 datasets (ADE20K, Cityscapes, Mapillary Vistas, ScanNet, VIPER, WildDash 2, IDD, BDD, and COCO) with a simple dataset balancing strategy.
no code implementations • CVPR 2022 • Lingchen Meng, Hengduo Li, Bor-Chun Chen, Shiyi Lan, Zuxuan Wu, Yu-Gang Jiang, Ser-Nam Lim
To this end, we introduce AdaViT, an adaptive computation framework that learns to derive usage policies on which patches, self-attention heads and transformer blocks to use throughout the backbone on a per-input basis, aiming to improve inference efficiency of vision transformers with a minimal drop of accuracy for image recognition.
4 code implementations • ICCV 2021 • Shiyi Lan, Zhiding Yu, Christopher Choy, Subhashree Radhakrishnan, Guilin Liu, Yuke Zhu, Larry S. Davis, Anima Anandkumar
We introduce DiscoBox, a novel framework that jointly learns instance segmentation and semantic correspondence using bounding box supervision.
Box-supervised Instance Segmentation Semantic correspondence +1
1 code implementation • 24 Apr 2021 • Tianrui Guan, Jun Wang, Shiyi Lan, Rohan Chandra, Zuxuan Wu, Larry Davis, Dinesh Manocha
We present a novel architecture for 3D object detection, M3DeTR, which combines different point cloud representations (raw, voxels, bird-eye view) with different feature scales based on multi-scale feature pyramids.
Ranked #1 on 3D Object Detection on KITTI Cars Hard val
no code implementations • ECCV 2020 • Jun Wang, Shiyi Lan, Mingfei Gao, Larry S. Davis
Results show that our framework achieves the state-of-the-art performance with 31 FPS and improves our baseline significantly by 9. 0% mAP on the nuScenes test set.
Ranked #319 on 3D Object Detection on nuScenes
1 code implementation • CVPR 2020 • Shiyi Lan, Zhou Ren, Yi Wu, Larry S. Davis, Gang Hua
Object detection is an essential step towards holistic scene understanding.
Ranked #206 on Object Detection on COCO test-dev
2 code implementations • CVPR 2019 • Shiyi Lan, Ruichi Yu, Gang Yu, Larry S. Davis
This encourages the network to preserve the geometric structure in Euclidean space throughout the feature extraction hierarchy.
3 code implementations • CVPR 2017 • Hexiang Hu, Shiyi Lan, Yuning Jiang, Zhimin Cao, Fei Sha
Objects appear to scale differently in natural images.