no code implementations • 27 May 2024 • Yongsheng Yu, Ziyun Zeng, Hang Hua, Jianlong Fu, Jiebo Luo
To address these limitations, we propose PromptFix, a comprehensive framework that enables diffusion models to follow human instructions to perform a wide variety of image-processing tasks.
1 code implementation • 23 Apr 2024 • Shuhang Lin, Wenyue Hua, Lingyao Li, Che-Jui Chang, Lizhou Fan, Jianchao Ji, Hang Hua, Mingyu Jin, Jiebo Luo, Yongfeng Zhang
This novel system aims to simulate complex dynamic interactions among multiple agents, as well as between agents and their environments, over a period of time.
no code implementations • 23 Apr 2024 • Hang Hua, Jing Shi, Kushal Kafle, Simon Jenni, Daoan Zhang, John Collomosse, Scott Cohen, Jiebo Luo
To address this, we propose FineMatch, a new aspect-based fine-grained text and image matching benchmark, focusing on text and image mismatch detection and correction.
no code implementations • 18 Apr 2024 • Hang Hua, Yunlong Tang, Chenliang Xu, Jiebo Luo
Recent efforts have been made to expand from unimodal to multimodal video summarization, categorizing the task into three sub-tasks based on the summary's modality: video-to-video (V2V), video-to-text (V2T), and a combination of video and text summarization (V2VT).
no code implementations • 24 Mar 2024 • Yunlong Tang, Daiki Shimada, Jing Bi, Mingqian Feng, Hang Hua, Chenliang Xu
This deficiency hinders LLMs from learning the alignment between time, audio-visual events, and text tokens, thus impairing their ability to temporally localize audio-visual events in videos.
no code implementations • 1 Feb 2024 • Pinxin Liu, Luchuan Song, Daoan Zhang, Hang Hua, Yunlong Tang, Huaijin Tu, Jiebo Luo, Chenliang Xu
Existing methods like Neural Radiation Fields (NeRF) and 3D Gaussian Splatting (3DGS) have made significant strides in facial attribute control such as facial animation and components editing, yet they struggle with fine-grained representation and scalability in dynamic head modeling.
1 code implementation • 21 Mar 2023 • Jingyang Lin, Hang Hua, Ming Chen, Yikang Li, Jenhao Hsiao, Chiuman Ho, Jiebo Luo
We propose a new joint video and text summarization task.
Ranked #1 on Video Summarization on videoxum
no code implementations • ICCV 2023 • Yushi Hu, Hang Hua, Zhengyuan Yang, Weijia Shi, Noah A. Smith, Jiebo Luo
PromptCap outperforms generic captions by a large margin and achieves state-of-the-art accuracy on knowledge-based VQA tasks (60. 4% on OK-VQA and 59. 6% on A-OKVQA).
1 code implementation • 15 Nov 2022 • Yushi Hu, Hang Hua, Zhengyuan Yang, Weijia Shi, Noah A Smith, Jiebo Luo
PromptCap outperforms generic captions by a large margin and achieves state-of-the-art accuracy on knowledge-based VQA tasks (60. 4% on OK-VQA and 59. 6% on A-OKVQA).
Ranked #1 on Visual Question Answering on TextVQA test-standard
no code implementations • 12 Jun 2022 • Hang Hua, Xingjian Li, Dejing Dou, Cheng-Zhong Xu, Jiebo Luo
The advent of large-scale pre-trained language models has contributed greatly to the recent progress in natural language processing.
no code implementations • NAACL 2021 • Hang Hua, Xingjian Li, Dejing Dou, Cheng-Zhong Xu, Jiebo Luo
The brittleness of this process is often reflected by the sensitivity to random seeds.
2 code implementations • NeurIPS 2019 • Ke Wang, Hang Hua, Xiaojun Wan
Unsupervised text attribute transfer automatically transforms a text to alter a specific attribute (e. g. sentiment) without using any parallel data, while simultaneously preserving its attribute-independent content.