no code implementations • 30 Mar 2024 • Ruijie Quan, Wenguan Wang, Fan Ma, Hehe Fan, Yi Yang
We select the highest-scoring clusters and use their medoid nodes for the next iteration of clustering, until we obtain a hierarchical and informative representation of the protein.
no code implementations • 29 Mar 2024 • Ruijie Quan, Wenguan Wang, Zhibo Tian, Fan Ma, Yi Yang
Reconstructing the viewed images from human brain activity bridges human and computer vision through the Brain-Computer Interface.
no code implementations • 24 Mar 2024 • Yucheng Suo, Fan Ma, Linchao Zhu, Yi Yang
The pseudo-word tokens generated in this stream are explicitly aligned with fine-grained semantics in the text embedding space.
1 code implementation • 22 Mar 2024 • Tuo Feng, Wenguan Wang, Fan Ma, Yi Yang
Consequently, it is essential to develop LiDAR perception methods that are both efficient and effective.
no code implementations • 9 Feb 2024 • Zhenglin Zhou, Fan Ma, Hehe Fan, Yi Yang
Specifically, we incorporate the FLAME into both 3D representation and score distillation: 1) FLAME-based 3D Gaussian splatting, driving 3D Gaussian points by rigging each point to a FLAME mesh.
1 code implementation • 8 Feb 2024 • Dewei Zhou, You Li, Fan Ma, Xiaoting Zhang, Yi Yang
Lastly, we aggregate all the shaded instances to provide the necessary information for accurately generating multiple instances in stable diffusion (SD).
Ranked #1 on Conditional Text-to-Image Synthesis on COCO-MIG
1 code implementation • 1 Feb 2024 • Chao Liang, Fan Ma, Linchao Zhu, Yingying Deng, Yi Yang
Moreover, we introduce the 3D facial prior to equip our model with control over the human head in a flexible and 3D-consistent manner.
no code implementations • 12 Dec 2023 • Fan Ma, Xiaojie Jin, Heng Wang, Yuchen Xian, Jiashi Feng, Yi Yang
This amplifies the effect of visual tokens on text generation, especially when the relative distance is longer between visual and text tokens.
Ranked #6 on Zero-Shot Video Question Answer on MSRVTT-QA
no code implementations • 22 May 2023 • Xingjian He, Sihan Chen, Fan Ma, Zhicheng Huang, Xiaojie Jin, Zikang Liu, Dongmei Fu, Yi Yang, Jing Liu, Jiashi Feng
Towards this goal, we propose a novel video-text pre-training method dubbed VLAB: Video Language pre-training by feature Adapting and Blending, which transfers CLIP representations to video pre-training tasks and develops unified video multimodal models for a wide range of video-text tasks.
Ranked #1 on Visual Question Answering (VQA) on MSVD-QA (using extra training data)
no code implementations • 18 Jan 2023 • Fan Ma, Xiaojie Jin, Heng Wang, Jingjia Huang, Linchao Zhu, Jiashi Feng, Yi Yang
Specifically, text-video localization consists of moment retrieval, which predicts start and end boundaries in videos given the text description, and text localization which matches the subset of texts with the video features.
1 code implementation • CVPR 2022 • Fan Ma, Mike Zheng Shou, Linchao Zhu, Haoqi Fan, Yilei Xu, Yi Yang, Zhicheng Yan
Although UniTrack \cite{wang2021different} demonstrates that a shared appearance model with multiple heads can be used to tackle individual tracking tasks, it fails to exploit the large-scale tracking datasets for training and performs poorly on single object tracking.
no code implementations • 3 Apr 2020 • Hao Wang, Cheng Deng, Fan Ma, Yi Yang
Actor and action video segmentation with language queries aims to segment out the expression referred objects in the video.
Ranked #10 on Referring Expression Segmentation on J-HMDB
1 code implementation • ECCV 2020 • Fan Ma, Linchao Zhu, Yi Yang, Shengxin Zha, Gourab Kundu, Matt Feiszli, Zheng Shou
To obtain the single-frame supervision, the annotators are asked to identify only a single frame within the temporal window of an action.
Ranked #5 on Weakly Supervised Action Localization on BEOID
no code implementations • ICML 2017 • Fan Ma, Deyu Meng, Qi Xie, Zina Li, Xuanyi Dong
During co-training process, labels of unlabeled instances in the training pool are very likely to be false especially in the initial training rounds, while the standard co-training algorithm utilizes a “draw without replacement” manner and does not remove these false labeled instances from training.
1 code implementation • 26 Jun 2017 • Xuanyi Dong, Liang Zheng, Fan Ma, Yi Yang, Deyu Meng
Experiments on PASCAL VOC'07, MS COCO'14, and ILSVRC'13 indicate that by using as few as three or four samples selected for each category, our method produces very competitive results when compared to the state-of-the-art weakly-supervised approaches using a large number of image-level labels.
Ranked #1 on Weakly Supervised Object Detection on MS COCO