no code implementations • 27 Mar 2025 • Jingye Chen, Yuzhong Zhao, Yupan Huang, Lei Cui, Li Dong, Tengchao Lv, Qifeng Chen, Furu Wei
Recent advances in generative models have significantly impacted game generation.
no code implementations • CVPR 2025 • Hongyu Liu, Xuan Wang, Ziyu Wan, Yue Ma, Jingye Chen, Yanbo Fan, Yujun Shen, Yibing Song, Qifeng Chen
This work focuses on open-domain 4D avatarization, with the purpose of creating a 4D avatar from a portrait image in an arbitrary style.
no code implementations • 23 Dec 2024 • Yazhou Xing, Yang Fei, Yingqing He, Jingye Chen, Jiaxin Xie, Xiaowei Chi, Qifeng Chen
Directly applying image VAEs to individual frames in isolation can result in temporal inconsistencies and suboptimal compression rates due to a lack of temporal compression.
no code implementations • 7 Aug 2024 • Kien T. Pham, Jingye Chen, Qifeng Chen
We present TALE, a novel training-free framework harnessing the generative capabilities of text-to-image diffusion models to address the cross-domain image composition task that focuses on flawlessly incorporating user-specified objects into a designated visual contexts regardless of domain disparity.
1 code implementation • 29 May 2024 • Yingqing He, Zhaoyang Liu, Jingye Chen, Zeyue Tian, Hongyu Liu, Xiaowei Chi, Runtao Liu, Ruibin Yuan, Yazhou Xing, Wenhai Wang, Jifeng Dai, Yong Zhang, Wei Xue, Qifeng Liu, Yike Guo, Qifeng Chen
With the recent advancement in large language models (LLMs), there is a growing interest in combining LLMs with multimodal learning.
no code implementations • 28 Nov 2023 • Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei
The diffusion model has been proven a powerful generative model in recent years, yet remains a challenge in generating visual text.
Ranked #6 on
Image Generation
on TextAtlasEval
no code implementations • 20 Sep 2023 • Tengchao Lv, Yupan Huang, Jingye Chen, Yuzhong Zhao, Yilin Jia, Lei Cui, Shuming Ma, Yaoyao Chang, Shaohan Huang, Wenhui Wang, Li Dong, Weiyao Luo, Shaoxiang Wu, Guoxin Wang, Cha Zhang, Furu Wei
In this paper we present KOSMOS-2. 5, a multimodal literate model for machine reading of text-intensive images.
no code implementations • NeurIPS 2023 • Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei
Diffusion models have gained increasing attention for their impressive generation abilities but currently struggle with rendering accurate and coherent text.
no code implementations • 24 Nov 2022 • Haiyang Yu, Jingye Chen, Bin Li, xiangyang xue
In this paper, we represent each Chinese character as a stroke tree, which is organized according to its radical structures, to fully exploit the merits of both radical and stroke levels in a decent way.
1 code implementation • 6 Oct 2022 • Jingye Chen, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei
The surge of pre-training has witnessed the rapid development of document understanding recently.
Ranked #8 on
Semantic entity labeling
on FUNSD
1 code implementation • 30 Dec 2021 • Haiyang Yu, Jingye Chen, Bin Li, jianqi ma, Mengnan Guan, Xixi Xu, Xiaocong Wang, Shaobo Qu, xiangyang xue
The experimental results indicate that the performance of baselines on CTR datasets is not as good as that on English datasets due to the characteristics of Chinese texts that are quite different from the Latin alphabet.
2 code implementations • 13 Dec 2021 • Jingye Chen, Haiyang Yu, jianqi ma, Bin Li, xiangyang xue
However, the recognition of low-resolution scene text images remains a challenge.
1 code implementation • 3 Dec 2021 • Jingye Chen, Jieneng Chen, Zongwei Zhou, Bin Li, Alan Yuille, Yongyi Lu
However, these approaches formulated skin cancer diagnosis as a simple classification task, dismissing the potential benefit from lesion segmentation.
8 code implementations • 21 Sep 2021 • Minghao Li, Tengchao Lv, Jingye Chen, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei
Text recognition is a long-standing research problem for document digitalization.
Ranked #1 on
Handwritten Text Recognition
on IAM(line-level)
(using extra training data)
1 code implementation • 22 Jun 2021 • Jingye Chen, Bin Li, xiangyang xue
Inspired by the fact that humans can generalize to know how to write characters unseen before if they have learned stroke orders of some characters, we propose a stroke-based method by decomposing each character into a sequence of strokes, which are the most basic units of Chinese characters.
1 code implementation • CVPR 2021 • Jingye Chen, Bin Li, xiangyang xue
Image super-resolution, which is often regarded as a preprocessing procedure of scene text recognition, aims to recover the realistic features from a low-resolution text image.
Ranked #4 on
Optical Character Recognition (OCR)
on Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical Study
Image Super-Resolution
Optical Character Recognition (OCR)
+2
no code implementations • 29 Mar 2019 • Jieneng Chen, Jingye Chen, Ruiming Zhang, Xiaobin Hu
Because of the tremendous research that focuses on human brains and reinforcement learning, scientists have investigated how robots can autonomously tackle complex tasks in the form of a self-driving agent control in a human-like way.