no code implementations • 26 Nov 2024 • Liyun Zhang, Dian Ding, Yu Lu, Yi-Chao Chen, Guangtao Xue
In this paper, we present a framework, Lantern, that can improve the performance of a certain vanilla model by prompting large language models with receptive-field-aware attention weighting.
1 code implementation • 9 Oct 2024 • Jiayi Guo, Zan Chen, Yingrui Ji, Liyun Zhang, Daqin Luo, Zhigang Li, Yiqin Shen
Additionally, these frameworks lack interpretability and user engagement during the training process, primarily due to the absence of human-centered design.
no code implementations • 17 Sep 2024 • Xuanmeng Sha, Liyun Zhang, Tomohiro Mashita, Yuki Uranishi
This method generates variable and realistic human facial movements by predicting the 3D vertex trajectory on the 3D facial template with diffusion policy instead of facial generation for every frame.
no code implementations • 23 Jul 2024 • Liyun Zhang
Multimodal Large Language Models (MLLMs) have demonstrated remarkable multimodal emotion recognition capabilities, integrating multimodal cues from visual, acoustic, and linguistic contexts in the video to recognize human emotional states.
no code implementations • 3 Dec 2021 • Liyun Zhang, Photchara Ratsamee, Bowen Wang, Zhaojie Luo, Yuki Uranishi, Manabu Higashida, Haruo Takemura
The panoptic perception (i. e., foreground instances and background semantics of the image scene) is extracted to achieve alignment between object content codes of the input domain and panoptic-level style codes sampled from the target style space, then refined by a proposed feature masking module for sharping object boundaries.