no code implementations • 19 Dec 2024 • Zhiqiang Tang, Zihan Zhong, Tong He, Gerald Friedland
We curate a benchmark comprising 22 multimodal datasets from diverse real-world applications, encompassing all 4 combinations of the 3 modalities.
no code implementations • 24 Apr 2024 • Zhiqiang Tang, Haoyang Fang, Su Zhou, Taojiannan Yang, Zihan Zhong, Tony Hu, Katrin Kirchhoff, George Karypis
AutoGluon-Multimodal (AutoMM) is introduced as an open-source AutoML library designed specifically for multimodal learning.
1 code implementation • 31 Jan 2024 • Zihan Zhong, Zhiqiang Tang, Tong He, Haoyang Fang, Chun Yuan
The Segment Anything Model (SAM) stands as a foundational framework for image segmentation.
no code implementations • 7 Jun 2023 • Yanan sun, Zihan Zhong, Qi Fan, Chi-Keung Tang, Yu-Wing Tai
Our thorough studies validate that models pre-trained as such can learn rich representations of both modalities, improving their ability to understand how images and text relate to each other.
1 code implementation • 16 May 2023 • Yuxin Ren, Zihan Zhong, Xingjian Shi, Yi Zhu, Chun Yuan, Mu Li
It has been commonly observed that a teacher model with superior performance does not necessarily result in a stronger student, highlighting a discrepancy between current teacher training practices and effective knowledge transfer.
no code implementations • 25 Jan 2023 • Yunpeng Bai, Zihan Zhong, Chao Dong, Weichen Zhang, Guowei Xu, Chun Yuan
Then, the text input can be directly accessed into the StyleGAN space and be used to find the semantic shift according to the text description.