no code implementations • 27 Nov 2024 • Wenjie Zhuo, Fan Ma, Hehe Fan
InfiniDreamer addresses the limitations of current motion generation methods, which are typically restricted to short sequences due to the lack of long motion training data.
no code implementations • 24 Nov 2024 • You Li, Fan Ma, Yi Yang
A Uni-Controlled Image Generation Module is then developed to create high-quality synthetic images that are controllable and based on the generated layouts.
no code implementations • 24 Nov 2024 • You Li, Fan Ma, Yi Yang
In this paper, we introduce Imagined Proxy for CIR (IP-CIR), a training-free method that creates a proxy image aligned with the query image and text description, enhancing query representation in the retrieval process.
Ranked #1 on Zero-Shot Composed Image Retrieval (ZS-CIR) on CIRR
no code implementations • 14 Nov 2024 • Chutian Meng, Fan Ma, Jiaxu Miao, Chi Zhang, Yi Yang, Yueting Zhuang
We use GPT4V to bridge the gap between the reference image and the text input for the T2I model, allowing T2I models to understand image content.
no code implementations • 1 Aug 2024 • Honglei Miao, Fan Ma, Ruijie Quan, Kun Zhan, Yi Yang
Despite growing interest in T2M, few methods focus on safeguarding these models against adversarial attacks, with existing work on text-to-image models proving insufficient for the unique motion domain.
no code implementations • 13 Jul 2024 • Wenjie Zhuo, Fan Ma, Hehe Fan, Yi Yang
In this paper, SDS is decoupled into a weighted sum of two components: the reconstruction term and the classifier-free guidance term.
1 code implementation • 2 Jul 2024 • Dewei Zhou, You Li, Fan Ma, Zongxin Yang, Yi Yang
Lastly, we introduced the Consistent-MIG algorithm to enhance the iterative MIG ability of MIGC and MIGC++.
1 code implementation • CVPR 2024 • Ruijie Quan, Wenguan Wang, Fan Ma, Hehe Fan, Yi Yang
We select the highest-scoring clusters and use their medoid nodes for the next iteration of clustering, until we obtain a hierarchical and informative representation of the protein.
no code implementations • CVPR 2024 • Ruijie Quan, Wenguan Wang, Zhibo Tian, Fan Ma, Yi Yang
Reconstructing the viewed images from human brain activity bridges human and computer vision through the Brain-Computer Interface.
1 code implementation • CVPR 2024 • Yucheng Suo, Fan Ma, Linchao Zhu, Yi Yang
The pseudo-word tokens generated in this stream are explicitly aligned with fine-grained semantics in the text embedding space.
1 code implementation • CVPR 2024 • Tuo Feng, Wenguan Wang, Fan Ma, Yi Yang
Consequently, it is essential to develop LiDAR perception methods that are both efficient and effective.
1 code implementation • 9 Feb 2024 • Zhenglin Zhou, Fan Ma, Hehe Fan, Zongxin Yang, Yi Yang
Extensive experiments demonstrate the efficacy of HeadStudio in generating animatable avatars from textual prompts, exhibiting appealing appearances.
1 code implementation • CVPR 2024 • Dewei Zhou, You Li, Fan Ma, Xiaoting Zhang, Yi Yang
Lastly, we aggregate all the shaded instances to provide the necessary information for accurately generating multiple instances in stable diffusion (SD).
Ranked #1 on Conditional Text-to-Image Synthesis on COCO-MIG
1 code implementation • CVPR 2024 • Chao Liang, Fan Ma, Linchao Zhu, Yingying Deng, Yi Yang
Moreover, we introduce the 3D facial prior to equip our model with control over the human head in a flexible and 3D-consistent manner.
no code implementations • CVPR 2024 • Fan Ma, Xiaojie Jin, Heng Wang, Yuchen Xian, Jiashi Feng, Yi Yang
This amplifies the effect of visual tokens on text generation especially when the relative distance is longer between visual and text tokens.
no code implementations • 12 Dec 2023 • Fan Ma, Xiaojie Jin, Heng Wang, Yuchen Xian, Jiashi Feng, Yi Yang
This amplifies the effect of visual tokens on text generation, especially when the relative distance is longer between visual and text tokens.
Ranked #2 on Question Answering on NExT-QA (Open-ended VideoQA)
no code implementations • 22 May 2023 • Xingjian He, Sihan Chen, Fan Ma, Zhicheng Huang, Xiaojie Jin, Zikang Liu, Dongmei Fu, Yi Yang, Jing Liu, Jiashi Feng
Towards this goal, we propose a novel video-text pre-training method dubbed VLAB: Video Language pre-training by feature Adapting and Blending, which transfers CLIP representations to video pre-training tasks and develops unified video multimodal models for a wide range of video-text tasks.
Ranked #1 on Visual Question Answering (VQA) on MSVD-QA (using extra training data)
no code implementations • 18 Jan 2023 • Fan Ma, Xiaojie Jin, Heng Wang, Jingjia Huang, Linchao Zhu, Jiashi Feng, Yi Yang
Specifically, text-video localization consists of moment retrieval, which predicts start and end boundaries in videos given the text description, and text localization which matches the subset of texts with the video features.
1 code implementation • CVPR 2022 • Fan Ma, Mike Zheng Shou, Linchao Zhu, Haoqi Fan, Yilei Xu, Yi Yang, Zhicheng Yan
Although UniTrack \cite{wang2021different} demonstrates that a shared appearance model with multiple heads can be used to tackle individual tracking tasks, it fails to exploit the large-scale tracking datasets for training and performs poorly on single object tracking.
no code implementations • 3 Apr 2020 • Hao Wang, Cheng Deng, Fan Ma, Yi Yang
Actor and action video segmentation with language queries aims to segment out the expression referred objects in the video.
Ranked #10 on Referring Expression Segmentation on J-HMDB
1 code implementation • ECCV 2020 • Fan Ma, Linchao Zhu, Yi Yang, Shengxin Zha, Gourab Kundu, Matt Feiszli, Zheng Shou
To obtain the single-frame supervision, the annotators are asked to identify only a single frame within the temporal window of an action.
Ranked #5 on Weakly Supervised Action Localization on BEOID
no code implementations • ICML 2017 • Fan Ma, Deyu Meng, Qi Xie, Zina Li, Xuanyi Dong
During co-training process, labels of unlabeled instances in the training pool are very likely to be false especially in the initial training rounds, while the standard co-training algorithm utilizes a “draw without replacement” manner and does not remove these false labeled instances from training.
1 code implementation • 26 Jun 2017 • Xuanyi Dong, Liang Zheng, Fan Ma, Yi Yang, Deyu Meng
Experiments on PASCAL VOC'07, MS COCO'14, and ILSVRC'13 indicate that by using as few as three or four samples selected for each category, our method produces very competitive results when compared to the state-of-the-art weakly-supervised approaches using a large number of image-level labels.
Ranked #1 on Weakly Supervised Object Detection on MS COCO