Search Results for author: Yaowei Li

Found 14 papers, 5 papers with code

UP-Person: Unified Parameter-Efficient Transfer Learning for Text-based Person Retrieval

1 code implementation14 Apr 2025 Yating Liu, Yaowei Li, Xiangyuan Lan, Wenming Yang, Zimo Liu, Qingmin Liao

Text-based Person Retrieval (TPR) as a multi-modal task, which aims to retrieve the target person from a pool of candidate images given a text description, has recently garnered considerable attention due to the progress of contrastive visual-language pre-trained model.

Person Retrieval Retrieval +3

BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing

no code implementations17 Mar 2025 Yaowei Li, Lingen Li, Zhaoyang Zhang, Xiaoyu Li, Guangzhi Wang, Hongxiang Li, Xiaodong Cun, Ying Shan, Yuexian Zou

Element-level visual manipulation is essential in digital content creation, but current diffusion-based methods lack the precision and flexibility of traditional tools.

Computational Efficiency Data Augmentation +2

BrushEdit: All-In-One Image Inpainting and Editing

no code implementations13 Dec 2024 Yaowei Li, Yuxuan Bian, Xuan Ju, Zhaoyang Zhang, Ying Shan, Yuexian Zou, Qiang Xu

Image editing has advanced significantly with the development of diffusion models using both inversion-based and instruction-based methods.

All Image Inpainting

DisPose: Disentangling Pose Guidance for Controllable Human Image Animation

1 code implementation12 Dec 2024 Hongxiang Li, Yaowei Li, Yuhang Yang, Junjie Cao, Zhihong Zhu, Xuxin Cheng, Long Chen

Specifically, we generate a dense motion field from a sparse motion field and the reference image, which provides region-level dense guidance while maintaining the generalization of the sparse pose control.

Image Animation

Image Conductor: Precision Control for Interactive Video Synthesis

no code implementations21 Jun 2024 Yaowei Li, Xintao Wang, Zhaoyang Zhang, Zhouxia Wang, Ziyang Yuan, Liangbin Xie, Yuexian Zou, Ying Shan

To this end, we propose Image Conductor, a method for precise control of camera transitions and object movements to generate video assets from a single image.

Object

Embracing Language Inclusivity and Diversity in CLIP through Continual Language Learning

1 code implementation30 Jan 2024 Bang Yang, Yong Dai, Xuxin Cheng, Yaowei Li, Asif Raza, Yuexian Zou

To alleviate CF raised by covariate shift and lexical overlap, we further propose a novel approach that ensures the identical distribution of all token embeddings during initialization and regularizes token embedding learning during training.

Diversity Image-text Retrieval +1

G2L: Semantically Aligned and Uniform Video Grounding via Geodesic and Game Theory

no code implementations ICCV 2023 Hongxiang Li, Meng Cao, Xuxin Cheng, Yaowei Li, Zhihong Zhu, Yuexian Zou

Due to two annoying issues in video grounding: (1) the co-existence of some visual entities in both ground truth and other moments, \ie semantic overlapping; (2) only a few moments in the video are annotated, \ie sparse annotation dilemma, vanilla contrastive learning is unable to model the correlations between temporally distant moments and learned inconsistent video representations.

Contrastive Learning Video Grounding

Efficient Multimodal Fusion via Interactive Prompting

no code implementations CVPR 2023 Yaowei Li, Ruijie Quan, Linchao Zhu, Yi Yang

Large-scale pre-training has brought unimodal fields such as computer vision and natural language processing to a new era.

Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation

no code implementations ICCV 2023 Yaowei Li, Bang Yang, Xuxin Cheng, Zhihong Zhu, Hongxiang Li, Yuexian Zou

Automatic radiology report generation has attracted enormous research interest due to its practical value in reducing the workload of radiologists.

Sentence Triplet

Exploiting Auxiliary Caption for Video Grounding

no code implementations15 Jan 2023 Hongxiang Li, Meng Cao, Xuxin Cheng, Zhihong Zhu, Yaowei Li, Yuexian Zou

Video grounding aims to locate a moment of interest matching the given query sentence from an untrimmed video.

Contrastive Learning Dense Video Captioning +2

SIAD: Self-supervised Image Anomaly Detection System

no code implementations8 Aug 2022 Jiawei Li, Chenxi Lan, Xinyi Zhang, Bolin Jiang, Yuqiu Xie, Naiqi Li, Yan Liu, Yaowei Li, Enze Huo, Bin Chen

To make a step forward, this paper outlines an automatic annotation system called SsaA, working in a self-supervised learning manner, for continuously making the online visual inspection in the manufacturing automation scenarios.

Anomaly Detection Cloud Computing +1

Cannot find the paper you are looking for? You can Submit a new open access paper.