1 code implementation • 30 May 2024 • Zixian Guo, Ming Liu, Zhilong Ji, Jinfeng Bai, Yiwen Guo, WangMeng Zuo
Learning a skill generally relies on both practical experience by doer and insightful high-level guidance by instructor.
no code implementations • 13 May 2024 • Shuo Yin, Weihao You, Zhilong Ji, Guoqiang Zhong, Jinfeng Bai
To fully leverage the advantages of our augmented data, we propose a two-stage training strategy: In Stage-1, we finetune Llama-2 on pure CoT data to get an intermediate model, which then is trained on the code-nested data in Stage-2 to get the resulting MuMath-Code.
1 code implementation • 9 May 2024 • Yuxiang Wei, Zhilong Ji, Jinfeng Bai, Hongzhi Zhang, Lei Zhang, WangMeng Zuo
In this work, we present MasterWeaver, a test-time tuning-free method designed to generate personalized images with both faithful identity fidelity and flexible editability.
1 code implementation • 9 Apr 2024 • Xiaolong Tang, Meina Kan, Shiguang Shan, Zhilong Ji, Jinfeng Bai, Xilin Chen
The proposed Historical Prediction Attention together with the Agent Attention and Mode Attention is further formulated as the Triple Factorized Attention module, serving as the core design of HPNet. Experiments on the Argoverse and INTERACTION datasets show that HPNet achieves state-of-the-art performance, and generates accurate and stable future trajectories.
1 code implementation • 26 Dec 2023 • Zixian Guo, Yuxiang Wei, Ming Liu, Zhilong Ji, Jinfeng Bai, Yiwen Guo, WangMeng Zuo
Parameter-efficient fine-tuning (PEFT) methods have provided an effective way for adapting large vision-language models to specific tasks or scenarios.
1 code implementation • 19 Dec 2023 • Yufei Cai, Yuxiang Wei, Zhilong Ji, Jinfeng Bai, Hu Han, WangMeng Zuo
To decouple irrelevant attributes (i. e., background and pose) from the subject embedding, we further present several attribute mappers that encode each image as several image-specific subject-unrelated embeddings.
no code implementations • 21 Aug 2023 • Changzhen Li, Jie Zhang, Yang Wei, Zhilong Ji, Jinfeng Bai, Shiguang Shan
Vision Transformers have achieved great success in computer visions, delivering exceptional performance across various tasks.
no code implementations • 21 Aug 2023 • Zhuang Liu, Ye Yuan, Zhilong Ji, Jingfeng Bai, Xiang Bai
Then we design a semantic aware module (SAM), which projects the visual and classification feature into semantic space.
1 code implementation • 27 Jun 2023 • Yuchen Su, Zhineng Chen, Zhiwen Shao, Yuning Du, Zhilong Ji, Jinfeng Bai, Yong Zhou, Yu-Gang Jiang
Next, we propose a dual assignment scheme for speed acceleration.
1 code implementation • CVPR 2023 • Yuxiang Wei, Zhilong Ji, Xiaohe Wu, Jinfeng Bai, Lei Zhang, WangMeng Zuo
Despite the progress in semantic image synthesis, it remains a challenging problem to generate photo-realistic parts from input semantic map.
1 code implementation • 9 Apr 2023 • Zhongqi Wang, Jie Zhang, Zhilong Ji, Jinfeng Bai, Shiguang Shan
While the style aggregator module is to generate paintings of a style corresponding to a reference image.
1 code implementation • ICCV 2023 • Yuxiang Wei, Yabo Zhang, Zhilong Ji, Jinfeng Bai, Lei Zhang, WangMeng Zuo
In addition to the unprecedented ability in imaginary creation, large text-to-image models are expected to take customized concepts in image generation.
1 code implementation • 27 Dec 2022 • Zhiwei Hu, Bo Chen, Yuan Gao, Zhilong Ji, Jinfeng Bai
The task of referring video object segmentation aims to segment the object in the frames of a given video to which the referring expressions refer.
no code implementations • 27 Dec 2022 • Bo Chen, Zhiwei Hu, Zhilong Ji, Jinfeng Bai, WangMeng Zuo
The main challenge of this task is to understand the visual and linguistic content simultaneously and to find the referred object accurately among all instances in the image.
1 code implementation • CVPR 2023 • Zixian Guo, Bowen Dong, Zhilong Ji, Jinfeng Bai, Yiwen Guo, WangMeng Zuo
Nonetheless, visual data (e. g., images) is by default prerequisite for learning prompts in existing methods.
Contrastive Learning Multi-label Image Recognition with Partial Labels
no code implementations • 30 Oct 2022 • Zhuang Liu, Zhichao Zhao, Ye Yuan, Zhi Qiao, Jinfeng Bai, Zhilong Ji
In this technical report, we briefly introduce the solution of our team ''summer'' for Atomospheric Turbulence Mitigation in UG$^2$+ Challenge in CVPR 2022.
no code implementations • 18 Oct 2022 • Jiajun Zhang, BoYu Chen, Zhilong Ji, Jinfeng Bai, Zonghai Hu
This paper describes the approach we have taken in the challenge.
3 code implementations • 23 Jul 2022 • Bohan Li, Ye Yuan, Dingkang Liang, Xiao Liu, Zhilong Ji, Jinfeng Bai, Wenyu Liu, Xiang Bai
Recently, most handwritten mathematical expression recognition (HMER) methods adopt the encoder-decoder networks, which directly predict the markup sequences from formula images with the attention mechanism.
1 code implementation • 18 Jul 2022 • Yabo Zhang, Mingshuai Yao, Yuxiang Wei, Zhilong Ji, Jinfeng Bai, WangMeng Zuo
In this paper, we present a novel one-shot generative domain adaption method, i. e., DiFa, for diverse generation and faithful adaptation.
2 code implementations • CVPR 2022 • Ye Yuan, Xiao Liu, Wondimu Dikubab, Hui Liu, Zhilong Ji, Zhongqin Wu, Xiang Bai
In this paper, we propose a simple and efficient method for HMER, which is the first to incorporate syntax information into an encoder-decoder network.
1 code implementation • ICCV 2021 • Yuxiang Wei, Yupeng Shi, Xiao Liu, Zhilong Ji, Yuan Gao, Zhongqin Wu, WangMeng Zuo
It simply encourages the variation of output caused by perturbations on different latent dimensions to be orthogonal, and the Jacobian with respect to the input is calculated to represent this variation.
no code implementations • 5 Aug 2021 • Xuri Ge, Fuhai Chen, Joemon M. Jose, Zhilong Ji, Zhongqin Wu, Xiao Liu
In this work, we propose to address the above issue from two aspects: (i) constructing intrinsic structure (along with relations) among the fragments of respective modalities, e. g., "dog $\to$ play $\to$ ball" in semantic structure for an image, and (ii) seeking explicit inter-modal structural and semantic correspondence between the visual and textual modalities.
1 code implementation • 5 Jul 2021 • Xin Cai, BoYu Chen, Jiabei Zeng, Jiajun Zhang, Yunjia Sun, Xiao Wang, Zhilong Ji, Xiao Liu, Xilin Chen, Shiguang Shan
This paper presents a method for gaze estimation according to face images.
no code implementations • 2 Jul 2021 • Pengcheng Wang, Lingqiao Ji, Zhilong Ji, Yuan Gao, Xiao Liu
In this technical report, we briefly introduce the solution of our team "TAL-ai" for (Semi-) supervised Face detection in the low light condition in UG2+ Challenge in CVPR 2021.
no code implementations • 21 Apr 2020 • Pengcheng Wang, ZiHao Wang, Zhilong Ji, Xiao Liu, Songfan Yang, Zhongqin Wu
This paper introduces our approach to the EmotioNet Challenge 2020.