no code implementations • CVPR 2024 • Yuan Gao, Kunyu Shi, Pengkai Zhu, Edouard Belval, Oren Nuriel, Srikar Appalaraju, Shabnam Ghadar, Vijay Mahadevan, Zhuowen Tu, Stefano Soatto
We propose Strongly Supervised pre-training with ScreenShots (S4) - a novel pre-training paradigm for Vision-Language Models using data from large-scale web screenshot rendering.
no code implementations • 15 Nov 2023 • Peng Tang, Pengkai Zhu, Tian Li, Srikar Appalaraju, Vijay Mahadevan, R. Manmatha
Based on the multi-exit model, we perform step-level dynamic early exit during inference, where the model may decide to use fewer decoder layers based on its confidence of the current layer at each individual decoding step.
no code implementations • 14 Jul 2022 • Ruizhao Zhu, Pengkai Zhu, Samarth Mishra, Venkatesh Saligrama
An object is parsed by estimating the locations of these K parts and a set of active templates that can reconstruct the part features.
no code implementations • 17 Apr 2022 • Samarth Mishra, Pengkai Zhu, Venkatesh Saligrama
RPC encodes images by first decomposing them into salient parts, and then encoding each part as a mixture of a small number of prototypes, each representing a certain concept.
no code implementations • 6 Jan 2022 • Pengkai Zhu, Zhaowei Cai, Yuanjun Xiong, Zhuowen Tu, Luis Goncalves, Vijay Mahadevan, Stefano Soatto
We present Contrastive Neighborhood Alignment (CNA), a manifold learning approach to maintain the topology of learned features whereby data points that are mapped to nearby representations by the source (teacher) model are also mapped to neighbors by the target (student) model.
no code implementations • CVPR 2020 • Pengkai Zhu, Hanxiao Wang, Venkatesh Saligrama
Zero-shot detection, namely, localizing both seen and unseen objects, increasingly gains importance for large-scale applications, with large number of object classes, since, collecting sufficient annotated data with ground truth bounding boxes is simply not scalable.
no code implementations • 18 Nov 2019 • Pengkai Zhu, Hanxiao Wang, Venkatesh Saligrama
Zero-shot detection, namely, localizing both seen and unseen objects, increasingly gains importance for large-scale applications, with large number of object classes, since, collecting sufficient annotated data with ground truth bounding boxes is simply not scalable.
no code implementations • 25 Jan 2019 • Pengkai Zhu, Hanxiao Wang, Venkatesh Saligrama
In computer vision applications, such as domain adaptation (DA), few shot learning (FSL) and zero-shot learning (ZSL), we encounter new objects and environments, for which insufficient examples exist to allow for training "models from scratch," and methods that adapt existing models, trained on the presented training environment, to the new scenario are required.
no code implementations • CVPR 2019 • Pengkai Zhu, Hanxiao Wang, Venkatesh Saligrama
To bridge the gap, we propose a novel low-dimensional embedding of visual instances that is "visually semantic."
1 code implementation • 19 Mar 2018 • Pengkai Zhu, Hanxiao Wang, Venkatesh Saligrama
While we utilize semantic features during training, our method is agnostic to semantic information for unseen classes at test-time.