no code implementations • 19 Dec 2024 • Yuchen Su, Zhineng Chen, Yongkun Du, Zhilong Ji, Kai Hu, Jinfeng Bai, Xieping Gao
Based on this, we introduce a position-supervised classification loss to guide the task-aligned learning of ERRNet.
1 code implementation • 2 Oct 2024 • Kailai Feng, Yabo Zhang, Haodong Yu, Zhilong Ji, Jinfeng Bai, Hongzhi Zhang, WangMeng Zuo
Artistic typography is a technique to visualize the meaning of input character in an imaginable and readable manner.
1 code implementation • 30 May 2024 • Zixian Guo, Ming Liu, Zhilong Ji, Jinfeng Bai, Yiwen Guo, WangMeng Zuo
Inferred results of LLMs are used as restarting points for the next stage of gradient optimization.
1 code implementation • 13 May 2024 • Shuo Yin, Weihao You, Zhilong Ji, Guoqiang Zhong, Jinfeng Bai
To fully leverage the advantages of our augmented data, we propose a two-stage training strategy: In Stage-1, we finetune Llama-2 on pure CoT data to get an intermediate model, which then is trained on the code-nested data in Stage-2 to get the resulting MuMath-Code.
1 code implementation • 9 May 2024 • Yuxiang Wei, Zhilong Ji, Jinfeng Bai, Hongzhi Zhang, Lei Zhang, WangMeng Zuo
In this work, we present MasterWeaver, a test-time tuning-free method designed to generate personalized images with both faithful identity fidelity and flexible editability.
1 code implementation • CVPR 2024 • Xiaolong Tang, Meina Kan, Shiguang Shan, Zhilong Ji, Jinfeng Bai, Xilin Chen
The proposed Historical Prediction Attention together with the Agent Attention and Mode Attention is further formulated as the Triple Factorized Attention module, serving as the core design of HPNet. Experiments on the Argoverse and INTERACTION datasets show that HPNet achieves state-of-the-art performance, and generates accurate and stable future trajectories.
1 code implementation • 26 Dec 2023 • Zixian Guo, Yuxiang Wei, Ming Liu, Zhilong Ji, Jinfeng Bai, Yiwen Guo, WangMeng Zuo
Parameter-efficient fine-tuning (PEFT) methods have provided an effective way for adapting large vision-language models to specific tasks or scenarios.
1 code implementation • 19 Dec 2023 • Yufei Cai, Yuxiang Wei, Zhilong Ji, Jinfeng Bai, Hu Han, WangMeng Zuo
To decouple irrelevant attributes (i. e., background and pose) from the subject embedding, we further present several attribute mappers that encode each image as several image-specific subject-unrelated embeddings.
1 code implementation • 29 Nov 2023 • Jiaxin Wen, Pei Ke, Hao Sun, Zhexin Zhang, Chengfei Li, Jinfeng Bai, Minlie Huang
While recent studies primarily focus on probing toxic outputs that can be easily detected with existing toxicity classifiers, we show that LLMs can generate diverse implicit toxic outputs that are exceptionally difficult to detect via simply zero-shot prompting.
1 code implementation • 6 Sep 2023 • Zhen Yang, Ming Ding, Qingsong Lv, Zhihuan Jiang, Zehai He, Yuyi Guo, Jinfeng Bai, Jie Tang
Previous studies have typically assumed that large language models are unable to accurately perform arithmetic operations, particularly multiplication of >8 digits, and operations involving decimals and fractions, without the use of calculator tools.
no code implementations • 21 Aug 2023 • Changzhen Li, Jie Zhang, Yang Wei, Zhilong Ji, Jinfeng Bai, Shiguang Shan
Vision Transformers have achieved great success in computer visions, delivering exceptional performance across various tasks.
2 code implementations • 27 Jun 2023 • Yuchen Su, Zhineng Chen, Zhiwen Shao, Yuning Du, Zhilong Ji, Jinfeng Bai, Yong Zhou, Yu-Gang Jiang
Next, we propose a dual assignment scheme for speed acceleration.
1 code implementation • 2 Jun 2023 • Haoyu Wang, Siyuan Wang, Wei-Qiang Zhang, Jinfeng Bai
Multilingual self-supervised speech representation models have greatly enhanced the speech recognition performance for low-resource languages, and the compression of these huge models has also become a crucial prerequisite for their industrial application.
1 code implementation • CVPR 2023 • Yuxiang Wei, Zhilong Ji, Xiaohe Wu, Jinfeng Bai, Lei Zhang, WangMeng Zuo
Despite the progress in semantic image synthesis, it remains a challenging problem to generate photo-realistic parts from input semantic map.
1 code implementation • 9 May 2023 • Tianlun Zheng, Zhineng Chen, Jinfeng Bai, Hongtao Xie, Yu-Gang Jiang
In this work, we introduce TPS++, an attention-enhanced TPS transformation that incorporates the attention mechanism to text rectification for the first time.
Ranked #1 on
Scene Text Recognition
on SVT-P
1 code implementation • 9 Apr 2023 • Zhongqi Wang, Jie Zhang, Zhilong Ji, Jinfeng Bai, Shiguang Shan
While the style aggregator module is to generate paintings of a style corresponding to a reference image.
1 code implementation • IEEE Transactions on Pattern Analysis and Machine Intelligence 2023 • Yijie Lin, Yuanbiao Gou, Xiaotian Liu, Jinfeng Bai, Jiancheng Lv, Xi Peng
In this article, we propose a unified framework to solve the following two challenging problems in incomplete multi-view representation learning: i) how to learn a consistent representation unifying different views, and ii) how to recover the missing views.
2 code implementations • ICCV 2023 • Yuxiang Wei, Yabo Zhang, Zhilong Ji, Jinfeng Bai, Lei Zhang, WangMeng Zuo
In addition to the unprecedented ability in imaginary creation, large text-to-image models are expected to take customized concepts in image generation.
1 code implementation • 27 Dec 2022 • Zhiwei Hu, Bo Chen, Yuan Gao, Zhilong Ji, Jinfeng Bai
The task of referring video object segmentation aims to segment the object in the frames of a given video to which the referring expressions refer.
no code implementations • 27 Dec 2022 • Bo Chen, Zhiwei Hu, Zhilong Ji, Jinfeng Bai, WangMeng Zuo
The main challenge of this task is to understand the visual and linguistic content simultaneously and to find the referred object accurately among all instances in the image.
1 code implementation • CVPR 2023 • Zixian Guo, Bowen Dong, Zhilong Ji, Jinfeng Bai, Yiwen Guo, WangMeng Zuo
Nonetheless, visual data (e. g., images) is by default prerequisite for learning prompts in existing methods.
no code implementations • 30 Oct 2022 • Zhuang Liu, Zhichao Zhao, Ye Yuan, Zhi Qiao, Jinfeng Bai, Zhilong Ji
In this technical report, we briefly introduce the solution of our team ''summer'' for Atomospheric Turbulence Mitigation in UG$^2$+ Challenge in CVPR 2022.
no code implementations • 18 Oct 2022 • Jiajun Zhang, BoYu Chen, Zhilong Ji, Jinfeng Bai, Zonghai Hu
This paper describes the approach we have taken in the challenge.
no code implementations • 12 Oct 2022 • Shuhao Deng, Chengfei Li, Jinfeng Bai, Qingqing Zhang, Wei-Qiang Zhang, Runyan Yang, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan
Code-switching automatic speech recognition becomes one of the most challenging and the most valuable scenarios of automatic speech recognition, due to the code-switching phenomenon between multilingual language and the frequent occurrence of code-switching phenomenon in daily life.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
3 code implementations • 23 Jul 2022 • Bohan Li, Ye Yuan, Dingkang Liang, Xiao Liu, Zhilong Ji, Jinfeng Bai, Wenyu Liu, Xiang Bai
Recently, most handwritten mathematical expression recognition (HMER) methods adopt the encoder-decoder networks, which directly predict the markup sequences from formula images with the attention mechanism.
1 code implementation • 18 Jul 2022 • Yabo Zhang, Mingshuai Yao, Yuxiang Wei, Zhilong Ji, Jinfeng Bai, WangMeng Zuo
In this paper, we present a novel one-shot generative domain adaption method, i. e., DiFa, for diverse generation and faithful adaptation.
no code implementations • 27 Jun 2022 • Chengfei Li, Shuhao Deng, Yaoping Wang, Guangjing Wang, Yaguang Gong, Changbin Chen, Jinfeng Bai
Using TALCS corpus, we conduct ASR experiments in two popular speech recognition toolkits to make a baseline system, including ESPnet and Wenet.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • 1 Mar 2022 • Yuting Nie, Junhong Zhao, Wei-Qiang Zhang, Jinfeng Bai
It has a profound impact on the multilingual interoperability of an intelligent speech system.