Search Results for author: Fan Ma

Found 23 papers, 10 papers with code

InfiniDreamer: Arbitrarily Long Human Motion Generation via Segment Score Distillation

no code implementations27 Nov 2024 Wenjie Zhuo, Fan Ma, Hehe Fan

InfiniDreamer addresses the limitations of current motion generation methods, which are typically restricted to short sequences due to the lack of long motion training data.

Motion Generation

AnySynth: Harnessing the Power of Image Synthetic Data Generation for Generalized Vision-Language Tasks

no code implementations24 Nov 2024 You Li, Fan Ma, Yi Yang

A Uni-Controlled Image Generation Module is then developed to create high-quality synthetic images that are controllable and based on the generated layouts.

Few-Shot Object Detection Image Generation +5

Imagine and Seek: Improving Composed Image Retrieval with an Imagined Proxy

no code implementations24 Nov 2024 You Li, Fan Ma, Yi Yang

In this paper, we introduce Imagined Proxy for CIR (IP-CIR), a training-free method that creates a proxy image aligned with the query image and text description, enhancing query representation in the retrieval process.

Image Retrieval Retrieval +1

Image Regeneration: Evaluating Text-to-Image Model via Generating Identical Image with Multimodal Large Language Models

no code implementations14 Nov 2024 Chutian Meng, Fan Ma, Jiaxu Miao, Chi Zhang, Yi Yang, Yueting Zhuang

We use GPT4V to bridge the gap between the reference image and the text input for the T2I model, allowing T2I models to understand image content.

Image Generation

Autonomous LLM-Enhanced Adversarial Attack for Text-to-Motion

no code implementations1 Aug 2024 Honglei Miao, Fan Ma, Ruijie Quan, Kun Zhan, Yi Yang

Despite growing interest in T2M, few methods focus on safeguarding these models against adversarial attacks, with existing work on text-to-image models proving insufficient for the unique motion domain.

Adversarial Text Motion Generation

VividDreamer: Invariant Score Distillation For Hyper-Realistic Text-to-3D Generation

no code implementations13 Jul 2024 Wenjie Zhuo, Fan Ma, Hehe Fan, Yi Yang

In this paper, SDS is decoupled into a weighted sum of two components: the reconstruction term and the classifier-free guidance term.

3D Generation Text to 3D

MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis

1 code implementation2 Jul 2024 Dewei Zhou, You Li, Fan Ma, Zongxin Yang, Yi Yang

Lastly, we introduced the Consistent-MIG algorithm to enhance the iterative MIG ability of MIGC and MIGC++.

Attribute Image Generation +1

Clustering for Protein Representation Learning

1 code implementation CVPR 2024 Ruijie Quan, Wenguan Wang, Fan Ma, Hehe Fan, Yi Yang

We select the highest-scoring clusters and use their medoid nodes for the next iteration of clustering, until we obtain a hierarchical and informative representation of the protein.

Clustering Protein Folding +1

Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval

1 code implementation CVPR 2024 Yucheng Suo, Fan Ma, Linchao Zhu, Yi Yang

The pseudo-word tokens generated in this stream are explicitly aligned with fine-grained semantics in the text embedding space.

Attribute Image Retrieval +3

LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels

1 code implementation CVPR 2024 Tuo Feng, Wenguan Wang, Fan Ma, Yi Yang

Consequently, it is essential to develop LiDAR perception methods that are both efficient and effective.

HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting

1 code implementation9 Feb 2024 Zhenglin Zhou, Fan Ma, Hehe Fan, Zongxin Yang, Yi Yang

Extensive experiments demonstrate the efficacy of HeadStudio in generating animatable avatars from textual prompts, exhibiting appealing appearances.

MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis

1 code implementation CVPR 2024 Dewei Zhou, You Li, Fan Ma, Xiaoting Zhang, Yi Yang

Lastly, we aggregate all the shaded instances to provide the necessary information for accurately generating multiple instances in stable diffusion (SD).

Attribute Conditional Text-to-Image Synthesis +1

CapHuman: Capture Your Moments in Parallel Universes

1 code implementation CVPR 2024 Chao Liang, Fan Ma, Linchao Zhu, Yingying Deng, Yi Yang

Moreover, we introduce the 3D facial prior to equip our model with control over the human head in a flexible and 3D-consistent manner.

Image Generation

VISTA-LLAMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens

no code implementations CVPR 2024 Fan Ma, Xiaojie Jin, Heng Wang, Yuchen Xian, Jiashi Feng, Yi Yang

This amplifies the effect of visual tokens on text generation especially when the relative distance is longer between visual and text tokens.

Hallucination Position +3

Vista-LLaMA: Reliable Video Narrator via Equal Distance to Visual Tokens

no code implementations12 Dec 2023 Fan Ma, Xiaojie Jin, Heng Wang, Yuchen Xian, Jiashi Feng, Yi Yang

This amplifies the effect of visual tokens on text generation, especially when the relative distance is longer between visual and text tokens.

Hallucination Position +2

VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending

no code implementations22 May 2023 Xingjian He, Sihan Chen, Fan Ma, Zhicheng Huang, Xiaojie Jin, Zikang Liu, Dongmei Fu, Yi Yang, Jing Liu, Jiashi Feng

Towards this goal, we propose a novel video-text pre-training method dubbed VLAB: Video Language pre-training by feature Adapting and Blending, which transfers CLIP representations to video pre-training tasks and develops unified video multimodal models for a wide range of video-text tasks.

 Ranked #1 on Visual Question Answering (VQA) on MSVD-QA (using extra training data)

Question Answering Text Retrieval +5

Temporal Perceiving Video-Language Pre-training

no code implementations18 Jan 2023 Fan Ma, Xiaojie Jin, Heng Wang, Jingjia Huang, Linchao Zhu, Jiashi Feng, Yi Yang

Specifically, text-video localization consists of moment retrieval, which predicts start and end boundaries in videos given the text description, and text localization which matches the subset of texts with the video features.

Contrastive Learning Moment Retrieval +7

Unified Transformer Tracker for Object Tracking

1 code implementation CVPR 2022 Fan Ma, Mike Zheng Shou, Linchao Zhu, Haoqi Fan, Yilei Xu, Yi Yang, Zhicheng Yan

Although UniTrack \cite{wang2021different} demonstrates that a shared appearance model with multiple heads can be used to tackle individual tracking tasks, it fails to exploit the large-scale tracking datasets for training and performs poorly on single object tracking.

Multiple Object Tracking Object

Self-Paced Co-training

no code implementations ICML 2017 Fan Ma, Deyu Meng, Qi Xie, Zina Li, Xuanyi Dong

During co-training process, labels of unlabeled instances in the training pool are very likely to be false especially in the initial training rounds, while the standard co-training algorithm utilizes a “draw without replacement” manner and does not remove these false labeled instances from training.

Few-Example Object Detection with Model Communication

1 code implementation26 Jun 2017 Xuanyi Dong, Liang Zheng, Fan Ma, Yi Yang, Deyu Meng

Experiments on PASCAL VOC'07, MS COCO'14, and ILSVRC'13 indicate that by using as few as three or four samples selected for each category, our method produces very competitive results when compared to the state-of-the-art weakly-supervised approaches using a large number of image-level labels.

Object object-detection

Cannot find the paper you are looking for? You can Submit a new open access paper.