no code implementations • 13 Jun 2024 • Yucheng Han, Rui Wang, Chi Zhang, Juntao Hu, Pei Cheng, Bin Fu, Hanwang Zhang
Recent advancements in image generation have enabled the creation of high-quality images from text conditions.
1 code implementation • 10 Jan 2024 • Yucheng Han, Na Zhao, Weiling Chen, Keng Teck Ma, Hanwang Zhang
Our DPKE enriches the knowledge of limited training data, particularly unlabeled data, from two perspectives: data-perspective and feature-perspective.
1 code implementation • 21 Dec 2023 • Chi Zhang, Zhao Yang, Jiaxuan Liu, Yucheng Han, Xin Chen, Zebiao Huang, Bin Fu, Gang Yu
Recent advancements in large language models (LLMs) have led to the creation of intelligent agents capable of performing complex tasks.
2 code implementations • 15 Dec 2023 • Xu Yang, Yingzhe Peng, Haoxuan Ma, Shuo Xu, Chi Zhang, Yucheng Han, Hanwang Zhang
As Archimedes famously said, ``Give me a lever long enough and a fulcrum on which to place it, and I shall move the world'', in this study, we propose to use a tiny Language Model (LM), \eg, a Transformer with 67M parameters, to lever much larger Vision-Language Models (LVLMs) with 9B parameters.
no code implementations • 27 Nov 2023 • Yucheng Han, Chi Zhang, Xin Chen, Xu Yang, Zhibin Wang, Gang Yu, Bin Fu, Hanwang Zhang
Next, we introduce ChartLlama, a multi-modal large language model that we've trained using our created dataset.
1 code implementation • ICCV 2023 • Beier Zhu, Yulei Niu, Yucheng Han, Yue Wu, Hanwang Zhang
Thanks to the large pre-trained vision-language models (VLMs) like CLIP, we can craft a zero-shot classifier by "prompt", e. g., the confidence score of an image being "[CLASS]" can be obtained by using the VLM provided similarity measure between the image and the prompt sentence "a photo of a [CLASS]".
1 code implementation • ICLR 2022 • Jieru Mei, Yucheng Han, Yutong Bai, Yixiao Zhang, Yingwei Li, Xianhang Li, Alan Yuille, Cihang Xie
Specifically, our modifications in Fast AdvProp are guided by the hypothesis that disentangled learning with adversarial examples is the key for performance improvements, while other training recipes (e. g., paired clean and adversarial training samples, multi-step adversarial attackers) could be largely simplified.
no code implementations • IEEE Transactions on Image Processing 2022 • Wencheng Zhu, Yucheng Han, Jiwen Lu, Jie zhou
Then, we construct a temporal graph by using the aggregated representations of spatial graphs.
Ranked #1 on
Video Summarization
on TvSum
(using extra training data)