no code implementations • 20 Nov 2024 • Tiancheng Gu, Kaicheng Yang, Xiang An, Ziyong Feng, Dongnan Liu, Weidong Cai
To advance these approaches, this paper introduces an Organ-Regional Information Driven (ORID) framework which can effectively integrate multi-modal information and reduce the influence of noise from unrelated organs.
1 code implementation • 18 Oct 2024 • Yin Xie, Kaicheng Yang, Ninghua Yang, Weimo Deng, Xiangzi Dai, Tiancheng Gu, Yumeng Wang, Xiang An, Yongle Zhao, Ziyong Feng, Roy Miles, Ismail Elezi, Jiankang Deng
Then, we conceptualize visual tokens as analogous to a "foreign language" for the LLMs and propose a mixed attention mechanism with bidirectional visual attention and unidirectional textual attention to comprehensively enhance the understanding of visual tokens.
no code implementations • 18 Aug 2024 • Kaicheng Yang, Tiancheng Gu, Xiang An, Haiqiang Jiang, Xiangzi Dai, Ziyong Feng, Weidong Cai, Jiankang Deng
In this paper, we introduce CLIP-CID, a novel distillation mechanism that effectively transfers knowledge from a large vision-language foundation model to a smaller model.
1 code implementation • 2 Aug 2024 • Qian Zhang, Xiangzi Dai, Ninghua Yang, Xiang An, Ziyong Feng, Xingyu Ren
However, the original VAR model is constrained to class-conditioned synthesis, relying solely on textual captions for guidance.
1 code implementation • 24 Jul 2024 • Xiang An, Kaicheng Yang, Xiangzi Dai, Ziyong Feng, Jiankang Deng
In this paper, we propose a novel Multi-Label Cluster Discrimination method named MLCD to enhance representation learning.
Ranked #1 on Referring Expression Segmentation on RefCOCOg-val (using extra training data)
no code implementations • 19 Jun 2024 • Zimin Ran, Xingyu Ren, Xiang An, Kaicheng Yang, Xiangzi Dai, Ziyong Feng, Jia Guo, Linchao Zhu, Jiankang Deng
In this paper, we present a novel facial albedo reconstruction model, HiFiAlbedo, which recovers the albedo map directly from a single image without the need for captured albedo data.
2 code implementations • 11 Jun 2024 • Tiancheng Gu, Kaicheng Yang, Xiang An, Ziyong Feng, Dongnan Liu, Weidong Cai, Jiankang Deng
Contrastive Language-Image Pre-training (CLIP) has significantly improved performance in various vision-language tasks by expanding the dataset with image-text pairs obtained from websites.
no code implementations • 28 Mar 2024 • Jiaxing Chen, Yuxuan Liu, Dehu Li, Xiang An, Weimo Deng, Ziyong Feng, Yongle Zhao, Yin Xie
P2G utilizes the tool-usage potential of MLLMs to employ expert agents for on-the-fly grounding of reasoning into critical visual and textual elements in images, thereby enabling deliberate reasoning through multimodal prompting.
no code implementations • 20 Mar 2024 • Siying Cui, Jia Guo, Xiang An, Jiankang Deng, Yongle Zhao, Xinyu Wei, Ziyong Feng
Leveraging Stable Diffusion for the generation of personalized portraits has emerged as a powerful and noteworthy tool, enabling users to create high-fidelity, custom character avatars based on their specific prompts.
1 code implementation • ICCV 2023 • Kaicheng Yang, Jiankang Deng, Xiang An, Jiawei Li, Ziyong Feng, Jia Guo, Jing Yang, Tongliang Liu
However, the presence of intrinsic noise and unmatched image-text pairs in web data can potentially affect the performance of representation learning.
3 code implementations • 12 Apr 2023 • Xiang An, Jiankang Deng, Kaicheng Yang, Jaiwei Li, Ziyong Feng, Jia Guo, Jing Yang, Tongliang Liu
To further enhance the low-dimensional feature representation, we randomly select partial feature dimensions when calculating the similarities between embeddings and class-wise prototypes.
6 code implementations • 28 Mar 2022 • Xiang An, Jiankang Deng, Jia Guo, Ziyong Feng, Xuhan Zhu, Jing Yang, Tongliang Liu
In each iteration, positive class centers and a random subset of negative class centers are selected to compute the margin-based softmax loss.
Ranked #1 on Face Recognition on MFR
1 code implementation • CVPR 2022 • Xiang An, Jiankang Deng, Jia Guo, Ziyong Feng, Xuhan Zhu, Jing Yang, Tongliang Liu
In each iteration, positive class centers and a random subset of negative class centers are selected to compute the margin-based softmax loss.
1 code implementation • 18 Aug 2021 • Jiankang Deng, Jia Guo, Xiang An, Zheng Zhu, Stefanos Zafeiriou
In this workshop, we organize Masked Face Recognition (MFR) challenge and focus on bench-marking deep face recognition methods under the existence of facial masks.
7 code implementations • 11 Oct 2020 • Xiang An, Xuhan Zhu, Yang Xiao, Lan Wu, Ming Zhang, Yuan Gao, Bin Qin, Debing Zhang, Ying Fu
The experiment demonstrates no loss of accuracy when training with only 10\% randomly sampled classes for the softmax-based loss functions, compared with training with full classes using state-of-the-art models on mainstream benchmarks.
Ranked #2 on Face Identification on MegaFace