Search Results for author: Shiyuan Huang

Found 13 papers, 9 papers with code

Flow-Distilled IP Two-Stream Networks for Compressed Video Action Recognition

no code implementations10 Dec 2019 Shiyuan Huang, Xudong Lin, Svebor Karaman, Shih-Fu Chang

Recent works instead use modern compressed video modalities as an alternative to the RGB spatial stream and improve the inference speed by orders of magnitudes.

Action Recognition Optical Flow Estimation +3

Task-Adaptive Negative Envision for Few-Shot Open-Set Recognition

1 code implementation CVPR 2022 Shiyuan Huang, Jiawei Ma, Guangxing Han, Shih-Fu Chang

In this paper, we instead propose task-adaptive negative class envision for FSOR to integrate threshold tuning into the learning process.

Few-Shot Learning Open Set Learning

Meta Faster R-CNN: Towards Accurate Few-Shot Object Detection with Attentive Feature Alignment

2 code implementations15 Apr 2021 Guangxing Han, Shiyuan Huang, Jiawei Ma, Yicheng He, Shih-Fu Chang

To improve the fine-grained few-shot proposal classification, we propose a novel attentive feature alignment method to address the spatial misalignment between the noisy proposals and few-shot classes, thus improving the performance of few-shot object detection.

Few-Shot Learning Few-Shot Object Detection +3

Few-Shot Object Detection with Fully Cross-Transformer

1 code implementation CVPR 2022 Guangxing Han, Jiawei Ma, Shiyuan Huang, Long Chen, Shih-Fu Chang

Inspired by the recent work on vision transformers and vision-language transformers, we propose a novel Fully Cross-Transformer based model (FCT) for FSOD by incorporating cross-transformer into both the feature backbone and detection head.

Few-Shot Object Detection Metric Learning +2

Multi-Modal Few-Shot Object Detection with Meta-Learning-Based Cross-Modal Prompting

no code implementations16 Apr 2022 Guangxing Han, Long Chen, Jiawei Ma, Shiyuan Huang, Rama Chellappa, Shih-Fu Chang

Our approach is motivated by the high-level conceptual similarity of (metric-based) meta-learning and prompt-based learning to learn generalizable few-shot and zero-shot object detection models respectively without fine-tuning.

Few-Shot Learning Few-Shot Object Detection +3

Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval

1 code implementation CVPR 2023 Xudong Lin, Simran Tiwari, Shiyuan Huang, Manling Li, Mike Zheng Shou, Heng Ji, Shih-Fu Chang

We surprisingly find that discrete text tokens coupled with a pretrained contrastive text model yields the best performance, which can even outperform state-of-the-art on the iVQA and How2QA datasets without additional training on millions of video-text data.

Retrieval Sentence +2

Video in 10 Bits: Few-Bit VideoQA for Efficiency and Privacy

1 code implementation15 Oct 2022 Shiyuan Huang, Robinson Piramuthu, Shih-Fu Chang, Gunnar A. Sigurdsson

Specifically, we insert a lightweight Feature Compression Module (FeatComp) into a VideoQA model which learns to extract task-specific tiny features as little as 10 bits, which are optimal for answering certain types of questions.

Feature Compression Question Answering +1

TempCLR: Temporal Alignment Representation with Contrastive Learning

1 code implementation28 Dec 2022 Yuncong Yang, Jiawei Ma, Shiyuan Huang, Long Chen, Xudong Lin, Guangxing Han, Shih-Fu Chang

For long videos, given a paragraph of description where the sentences describe different segments of the video, by matching all sentence-clip pairs, the paragraph and the full video are aligned implicitly.

Contrastive Learning Dynamic Time Warping +7

DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection

1 code implementation CVPR 2023 Jiawei Ma, Yulei Niu, Jincheng Xu, Shiyuan Huang, Guangxing Han, Shih-Fu Chang

Generalized few-shot object detection aims to achieve precise detection on both base classes with abundant annotations and novel classes with limited training data.

Few-Shot Object Detection object-detection

Supervised Masked Knowledge Distillation for Few-Shot Transformers

1 code implementation CVPR 2023 Han Lin, Guangxing Han, Jiawei Ma, Shiyuan Huang, Xudong Lin, Shih-Fu Chang

Vision Transformers (ViTs) emerge to achieve impressive performance on many data-abundant computer vision tasks by capturing long-range dependencies among local features.

Few-Shot Learning Inductive Bias +1

Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations

no code implementations17 Oct 2023 Shiyuan Huang, Siddarth Mamidanna, Shreedhar Jangam, Yilun Zhou, Leilani H. Gilpin

Through an extensive set of experiments, we find that ChatGPT's self-explanations perform on par with traditional ones, but are quite different from them according to various agreement metrics, meanwhile being much cheaper to produce (as they are generated along with the prediction).

Mathematical Reasoning Sentiment Analysis

Characterizing Video Question Answering with Sparsified Inputs

no code implementations27 Nov 2023 Shiyuan Huang, Robinson Piramuthu, Vicente Ordonez, Shih-Fu Chang, Gunnar A. Sigurdsson

From our experiments, we have observed only 5. 2%-5. 8% loss of performance with only 10% of video lengths, which corresponds to 2-4 frames selected from each video.

Question Answering Video Question Answering

Cannot find the paper you are looking for? You can Submit a new open access paper.