1 code implementation • 1 Feb 2025 • Kai Liu, Kaicheng Yang, Zheng Chen, Zhiteng Li, Yong Guo, Wenbo Li, Linghe Kong, Yulun Zhang
One is to compress the DM into 1-bit, aka binarization, alleviating the storage and computation pressure.
no code implementations • 20 Nov 2024 • Tiancheng Gu, Kaicheng Yang, Xiang An, Ziyong Feng, Dongnan Liu, Weidong Cai
To advance these approaches, this paper introduces an Organ-Regional Information Driven (ORID) framework which can effectively integrate multi-modal information and reduce the influence of noise from unrelated organs.
1 code implementation • 18 Oct 2024 • Yin Xie, Kaicheng Yang, Ninghua Yang, Weimo Deng, Xiangzi Dai, Tiancheng Gu, Yumeng Wang, Xiang An, Yongle Zhao, Ziyong Feng, Roy Miles, Ismail Elezi, Jiankang Deng
Then, we conceptualize visual tokens as analogous to a "foreign language" for the LLMs and propose a mixed attention mechanism with bidirectional visual attention and unidirectional textual attention to comprehensively enhance the understanding of visual tokens.
no code implementations • 18 Aug 2024 • Kaicheng Yang, Tiancheng Gu, Xiang An, Haiqiang Jiang, Xiangzi Dai, Ziyong Feng, Weidong Cai, Jiankang Deng
In this paper, we introduce CLIP-CID, a novel distillation mechanism that effectively transfers knowledge from a large vision-language foundation model to a smaller model.
1 code implementation • 24 Jul 2024 • Xiang An, Kaicheng Yang, Xiangzi Dai, Ziyong Feng, Jiankang Deng
In this paper, we propose a novel Multi-Label Cluster Discrimination method named MLCD to enhance representation learning.
Ranked #1 on
Referring Expression Segmentation
on RefCOCOg-val
(using extra training data)
no code implementations • 19 Jun 2024 • Zimin Ran, Xingyu Ren, Xiang An, Kaicheng Yang, Xiangzi Dai, Ziyong Feng, Jia Guo, Linchao Zhu, Jiankang Deng
In this paper, we present a novel facial albedo reconstruction model, HiFiAlbedo, which recovers the albedo map directly from a single image without the need for captured albedo data.
2 code implementations • 11 Jun 2024 • Tiancheng Gu, Kaicheng Yang, Xiang An, Ziyong Feng, Dongnan Liu, Weidong Cai, Jiankang Deng
Contrastive Language-Image Pre-training (CLIP) has significantly improved performance in various vision-language tasks by expanding the dataset with image-text pairs obtained from websites.
no code implementations • 22 Apr 2024 • Tao Sun, Yuanzi Fu, Kaicheng Yang, Jian Wu, Ziyong Feng
This paper presents the winning solution for the 1st SkatingVerse Challenge.
1 code implementation • 19 Apr 2024 • Tiancheng Gu, Kaicheng Yang, Dongnan Liu, Weidong Cai
In this paper, we propose the Latent Prompt Assist model (LaPA) for medical visual question answering.
1 code implementation • ICCV 2023 • Kaicheng Yang, Jiankang Deng, Xiang An, Jiawei Li, Ziyong Feng, Jia Guo, Jing Yang, Tongliang Liu
However, the presence of intrinsic noise and unmatched image-text pairs in web data can potentially affect the performance of representation learning.
3 code implementations • 12 Apr 2023 • Xiang An, Jiankang Deng, Kaicheng Yang, Jaiwei Li, Ziyong Feng, Jia Guo, Jing Yang, Tongliang Liu
To further enhance the low-dimensional feature representation, we randomly select partial feature dimensions when calculating the similarities between embeddings and class-wise prototypes.
no code implementations • 12 Nov 2022 • Kaicheng Yang, Ruxuan Zhang, Hua Xu, Kai Gao
In this paper, a Self-Adjusting Fusion Representation Learning Model (SA-FRLM) is proposed to learn robust crossmodal fusion representations directly from the unaligned text and audio sequences.
1 code implementation • ACM Multimedia 2020 • Kaicheng Yang, Hua Xu, Kai Gao
In this paper, we propose the Cross-Modal BERT (CM-BERT), which relies on the interaction of text and audio modality to fine-tune the pre-trained BERT model.
Ranked #4 on
Multimodal Sentiment Analysis
on MOSI
2 code implementations • 24 Jul 2017 • Chengcheng Shao, Giovanni Luca Ciampaglia, Onur Varol, Kaicheng Yang, Alessandro Flammini, Filippo Menczer
Communication, cognitive, social, and computer scientists are engaged in efforts to study the complex causes for the viral diffusion of misinformation online and to develop solutions, while search and social media platforms are beginning to deploy countermeasures.
Social and Information Networks Computers and Society Physics and Society