Search Results for author: Shiyuan Huang

Found 7 papers, 3 papers with code

Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval

no code implementations5 Jun 2022 Xudong Lin, Simran Tiwari, Shiyuan Huang, Manling Li, Mike Zheng Shou, Heng Ji, Shih-Fu Chang

In this paper, we identify a principled model design space with two axes: how to represent videos and how to fuse video and text information.

Sentence Embeddings

Multimodal Few-Shot Object Detection with Meta-Learning Based Cross-Modal Prompting

no code implementations16 Apr 2022 Guangxing Han, Jiawei Ma, Shiyuan Huang, Long Chen, Rama Chellappa, Shih-Fu Chang

We study multimodal few-shot object detection (FSOD) in this paper, using both few-shot visual examples and class semantic information for detection.

Few-Shot Learning Few-Shot Object Detection +2

Few-Shot Object Detection with Fully Cross-Transformer

no code implementations CVPR 2022 Guangxing Han, Jiawei Ma, Shiyuan Huang, Long Chen, Shih-Fu Chang

Inspired by the recent work on vision transformers and vision-language transformers, we propose a novel Fully Cross-Transformer based model (FCT) for FSOD by incorporating cross-transformer into both the feature backbone and detection head.

Few-Shot Object Detection Metric Learning +1

Meta Faster R-CNN: Towards Accurate Few-Shot Object Detection with Attentive Feature Alignment

2 code implementations15 Apr 2021 Guangxing Han, Shiyuan Huang, Jiawei Ma, Yicheng He, Shih-Fu Chang

To improve the fine-grained few-shot proposal classification, we propose a novel attentive feature alignment method to address the spatial misalignment between the noisy proposals and few-shot classes, thus improving the performance of few-shot object detection.

Few-Shot Learning Few-Shot Object Detection +2

Task-Adaptive Negative Envision for Few-Shot Open-Set Recognition

1 code implementation CVPR 2022 Shiyuan Huang, Jiawei Ma, Guangxing Han, Shih-Fu Chang

In this paper, we instead propose task-adaptive negative class envision for FSOR to integrate threshold tuning into the learning process.

Few-Shot Learning Open Set Learning

Flow-Distilled IP Two-Stream Networks for Compressed Video Action Recognition

no code implementations10 Dec 2019 Shiyuan Huang, Xudong Lin, Svebor Karaman, Shih-Fu Chang

Recent works instead use modern compressed video modalities as an alternative to the RGB spatial stream and improve the inference speed by orders of magnitudes.

Action Recognition Optical Flow Estimation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.