1 code implementation • 30 Oct 2024 • Ziyao Shangguan, Chuhan Li, Yuxuan Ding, Yanan Zheng, Yilun Zhao, Tesca Fitzgerald, Arman Cohan
Our study of existing benchmarks shows that this capability of MFMs is likely overestimated as many questions can be solved by using a single, few, or out-of-order frames.
1 code implementation • 12 Sep 2024 • Zicheng Duan, Yuxuan Ding, Chenhui Gou, Ziqin Zhou, Ethan Smith, Lingqiao Liu
Zero-shot personalized image generation models aim to produce images that align with both a given text prompt and subject image, requiring the model to effectively incorporate both sources of guidance.
no code implementations • NeurIPS 2023 • Yuxuan Ding, Chunna Tian, Haoxuan Ding, Lingqiao Liu
The Stable Diffusion model is a prominent text-to-image generation model that relies on a text prompt as its input, which is encoded using the Contrastive Language-Image Pre-Training (CLIP).
3 code implementations • 18 Jan 2023 • Biyang Guo, Xin Zhang, Ziyuan Wang, Minqi Jiang, Jinran Nie, Yuxuan Ding, Jianwei Yue, Yupeng Wu
We call the collected dataset the Human ChatGPT Comparison Corpus (HC3).
no code implementations • 21 Sep 2022 • Heng Zhou, Chunna Tian, Zhenxi Zhang, Chengyang Li, Yuxuan Ding, Yongqiang Xie, Zhongbo Li
FRDF utilizes the directional information between object pixels to effectively enhance the intra-class compactness of salient regions.
no code implementations • 25 Jul 2022 • Jingyuan Yang, Jie Li, Leida Li, Xiumei Wang, Yuxuan Ding, Xinbo Gao
In psychology, the \textit{Object-Appraisal-Emotion} model has demonstrated that each individual's emotion is affected by his/her subjective appraisal, which is further formed by the affective memory.
no code implementations • 19 Jul 2022 • Yuxuan Ding, Lingqiao Liu, Chunna Tian, Jingyuan Yang, Haoxuan Ding
The Contrastive Language-Image Pre-training (CLIP) Model is a recently proposed large-scale pre-train model which attracts increasing attention in the computer vision community.
no code implementations • 4 Sep 2021 • Jingyuan Yang, Jie Li, Xiumei Wang, Yuxuan Ding, Xinbo Gao
Then, we design three specific networks, i. e., Global-Net, Semantic-Net and Expression-Net, to extract distinct emotional features from different stimuli simultaneously.