1 code implementation • 14 Apr 2025 • Yating Liu, Yaowei Li, Xiangyuan Lan, Wenming Yang, Zimo Liu, Qingmin Liao
Text-based Person Retrieval (TPR) as a multi-modal task, which aims to retrieve the target person from a pool of candidate images given a text description, has recently garnered considerable attention due to the progress of contrastive visual-language pre-trained model.
no code implementations • 17 Mar 2025 • Yaowei Li, Lingen Li, Zhaoyang Zhang, Xiaoyu Li, Guangzhi Wang, Hongxiang Li, Xiaodong Cun, Ying Shan, Yuexian Zou
Element-level visual manipulation is essential in digital content creation, but current diffusion-based methods lack the precision and flexibility of traditional tools.
no code implementations • 13 Dec 2024 • Yaowei Li, Yuxuan Bian, Xuan Ju, Zhaoyang Zhang, Ying Shan, Yuexian Zou, Qiang Xu
Image editing has advanced significantly with the development of diffusion models using both inversion-based and instruction-based methods.
1 code implementation • 12 Dec 2024 • Hongxiang Li, Yaowei Li, Yuhang Yang, Junjie Cao, Zhihong Zhu, Xuxin Cheng, Long Chen
Specifically, we generate a dense motion field from a sparse motion field and the reference image, which provides region-level dense guidance while maintaining the generalization of the sparse pose control.
no code implementations • 4 Dec 2024 • Lingen Li, Zhaoyang Zhang, Yaowei Li, Jiale Xu, WenBo Hu, Xiaoyu Li, Weihao Cheng, Jinwei Gu, Tianfan Xue, Ying Shan
Recent advancements in generative models have significantly improved novel view synthesis (NVS) from multi-view data.
no code implementations • 21 Jun 2024 • Yaowei Li, Xintao Wang, Zhaoyang Zhang, Zhouxia Wang, Ziyang Yuan, Liangbin Xie, Yuexian Zou, Ying Shan
To this end, we propose Image Conductor, a method for precise control of camera transitions and object movements to generate video assets from a single image.
1 code implementation • 14 Jun 2024 • Jian Chen, Peilin Zhou, Yining Hua, Dading Chong, Meng Cao, Yaowei Li, Zixuan Yuan, Bing Zhu, Junwei Liang
Real-time detection and prediction of extreme weather protect human lives and infrastructure.
1 code implementation • 30 Jan 2024 • Bang Yang, Yong Dai, Xuxin Cheng, Yaowei Li, Asif Raza, Yuexian Zou
To alleviate CF raised by covariate shift and lexical overlap, we further propose a novel approach that ensures the identical distribution of all token embeddings during initialization and regularizes token embedding learning during training.
1 code implementation • 18 Sep 2023 • Yating Liu, Yaowei Li, Zimo Liu, Wenming Yang, YaoWei Wang, Qingmin Liao
Text-based Person Retrieval (TPR) aims to retrieve the target person images given a textual query.
no code implementations • ICCV 2023 • Hongxiang Li, Meng Cao, Xuxin Cheng, Yaowei Li, Zhihong Zhu, Yuexian Zou
Due to two annoying issues in video grounding: (1) the co-existence of some visual entities in both ground truth and other moments, \ie semantic overlapping; (2) only a few moments in the video are annotated, \ie sparse annotation dilemma, vanilla contrastive learning is unable to model the correlations between temporally distant moments and learned inconsistent video representations.
no code implementations • CVPR 2023 • Yaowei Li, Ruijie Quan, Linchao Zhu, Yi Yang
Large-scale pre-training has brought unimodal fields such as computer vision and natural language processing to a new era.
no code implementations • ICCV 2023 • Yaowei Li, Bang Yang, Xuxin Cheng, Zhihong Zhu, Hongxiang Li, Yuexian Zou
Automatic radiology report generation has attracted enormous research interest due to its practical value in reducing the workload of radiologists.
no code implementations • 15 Jan 2023 • Hongxiang Li, Meng Cao, Xuxin Cheng, Zhihong Zhu, Yaowei Li, Yuexian Zou
Video grounding aims to locate a moment of interest matching the given query sentence from an untrimmed video.
no code implementations • 8 Aug 2022 • Jiawei Li, Chenxi Lan, Xinyi Zhang, Bolin Jiang, Yuqiu Xie, Naiqi Li, Yan Liu, Yaowei Li, Enze Huo, Bin Chen
To make a step forward, this paper outlines an automatic annotation system called SsaA, working in a self-supervised learning manner, for continuously making the online visual inspection in the manufacturing automation scenarios.