Search Results for author: Siteng Huang

Found 12 papers, 8 papers with code

Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference

1 code implementation • 21 Mar 2024 • Han Zhao, Min Zhang, Wei Zhao, Pengxiang Ding, Siteng Huang, Donglin Wang

In recent years, the application of multimodal large language models (MLLM) in various fields has achieved remarkable success.

Language Modelling Large Language Model

161

Paper
Code

Prompt-based Distribution Alignment for Unsupervised Domain Adaptation

1 code implementation • 15 Dec 2023 • Shuanghao Bai, Min Zhang, Wanqi Zhou, Siteng Huang, Zhirong Luan, Donglin Wang, Badong Chen

Therefore, in this paper, we first experimentally demonstrate that the unsupervised-trained VLMs can significantly reduce the distribution discrepancy between source and target domains, thereby improving the performance of UDA.

Prompt Engineering Unsupervised Domain Adaptation

Paper
Code

Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation

no code implementations • 27 Nov 2023 • Siteng Huang, Biao Gong, Yutong Feng, Xi Chen, Yuqian Fu, Yu Liu, Donglin Wang

Experimental results show that existing subject-driven customization methods fail to learn the representative characteristics of actions and struggle in decoupling actions from context features, including appearance.

Text-to-Image Generation

Paper
Add Code

Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation

no code implementations • 27 Nov 2023 • Biao Gong, Siteng Huang, Yutong Feng, Shiwei Zhang, Yuyuan Li, Yu Liu

To align the generated image with layout instructions, we present a training-free layout calibration system SimM that intervenes in the generative process on the fly during inference time.

Text-to-Image Generation

Paper
Add Code

VGDiffZero: Text-to-image Diffusion Models Can Be Zero-shot Visual Grounders

1 code implementation • 3 Sep 2023 • Xuyang Liu, Siteng Huang, Yachen Kang, Honggang Chen, Donglin Wang

Large-scale text-to-image diffusion models have shown impressive capabilities for generative tasks by leveraging strong vision-language alignment from pre-training.

Visual Grounding

Paper
Code

Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning

1 code implementation • 27 Mar 2023 • Siteng Huang, Biao Gong, Yutong Feng, Min Zhang, Yiliang Lv, Donglin Wang

Recent compositional zero-shot learning (CZSL) methods adapt pre-trained vision-language models (VLMs) by constructing trainable prompts only for composed state-object pairs.

Compositional Zero-Shot Learning Object

Paper
Code

VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval

1 code implementation • CVPR 2023 • Siteng Huang, Biao Gong, Yulin Pan, Jianwen Jiang, Yiliang Lv, Yuyuan Li, Donglin Wang

Many recent studies leverage the pre-trained CLIP for text-video cross-modal retrieval by tuning the backbone with additional heavy modules, which not only brings huge computational burdens with much more parameters, but also leads to the knowledge forgetting from upstream models.

Cross-Modal Retrieval Retrieval +1

Paper
Code

Reference-Limited Compositional Zero-Shot Learning

1 code implementation • 22 Aug 2022 • Siteng Huang, Qiyao Wei, Donglin Wang

Compositional zero-shot learning (CZSL) refers to recognizing unseen compositions of known visual primitives, which is an essential ability for artificial intelligence systems to learn and understand the world.

Compositional Zero-Shot Learning

Paper
Code

Tree Structure-Aware Few-Shot Image Classification via Hierarchical Aggregation

1 code implementation • 14 Jul 2022 • Min Zhang, Siteng Huang, Wenbin Li, Donglin Wang

To solve this problem, we present a plug-in Hierarchical Tree Structure-aware (HTS) method, which not only learns the relationship of FSL and pretext tasks, but more importantly, can adaptively select and aggregate feature representations generated by pretext tasks to maximize the performance of FSL tasks.

Few-Shot Image Classification Few-Shot Learning

Paper
Code

Reference-Limited Compositional Learning: A Realistic Assessment for Human-level Compositional Generalization

no code implementations • 29 Sep 2021 • Siteng Huang, Qiyao Wei, Donglin Wang

To narrow the considerable gap between artificial and human intelligence, we propose a new task, namely reference-limited compositional learning (RLCL), which reproduces three core challenges to mimic human perception: compositional learning, few-shot, and few referential compositions.

Paper
Add Code

Pareto Self-Supervised Training for Few-Shot Learning

no code implementations • CVPR 2021 • Zhengyu Chen, Jixie Ge, Heshen Zhan, Siteng Huang, Donglin Wang

While few-shot learning (FSL) aims for rapid generalization to new concepts with little supervision, self-supervised learning (SSL) constructs supervisory signals directly computed from unlabeled data.

Auxiliary Learning Few-Shot Learning +2

Paper
Add Code

Attributes-Guided and Pure-Visual Attention Alignment for Few-Shot Recognition

1 code implementation • 10 Sep 2020 • Siteng Huang, Min Zhang, Yachen Kang, Donglin Wang

However, these approaches only augment the representations of samples with available semantics while ignoring the query set, which loses the potential for the improvement and may lead to a shift between the modalities combination and the pure-visual representation.

feature selection Metric Learning

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.