Search Results for author: Jingtuo Liu

Found 36 papers, 17 papers with code

PGNet: Real-time Arbitrarily-Shaped Text Spotting with Point Gathering Network

2 code implementations12 Apr 2021 Pengfei Wang, Chengquan Zhang, Fei Qi, Shanshan Liu, Xiaoqiang Zhang, Pengyuan Lyu, Junyu Han, Jingtuo Liu, Errui Ding, Guangming Shi

With a PG-CTC decoder, we gather high-level character classification vectors from two-dimensional space and decode them into text symbols without NMS and RoI operations involved, which guarantees high efficiency.

 Ranked #1 on Scene Text Detection on ICDAR 2015 (Accuracy metric)

Optical Character Recognition (OCR) Scene Text Detection +1

PyramidBox: A Context-assisted Single Shot Face Detector

5 code implementations ECCV 2018 Xu Tang, Daniel K. Du, Zeqiang He, Jingtuo Liu

This paper proposes a novel context-assisted single shot face detector, named \emph{PyramidBox} to handle the hard face detection problem.

Face Detection

Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition

1 code implementation22 Nov 2022 Jiaxiang Tang, Kaisiyuan Wang, Hang Zhou, Xiaokang Chen, Dongliang He, Tianshu Hu, Jingtuo Liu, Gang Zeng, Jingdong Wang

While dynamic Neural Radiance Fields (NeRF) have shown success in high-fidelity 3D modeling of talking portraits, the slow training and inference speed severely obstruct their potential usage.

Talking Face Generation

ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT)

1 code implementation16 Sep 2019 Chee-Kheng Chng, Yuliang Liu, Yipeng Sun, Chun Chet Ng, Canjie Luo, Zihan Ni, ChuanMing Fang, Shuaitao Zhang, Junyu Han, Errui Ding, Jingtuo Liu, Dimosthenis Karatzas, Chee Seng Chan, Lianwen Jin

This paper reports the ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT) that consists of three major challenges: i) scene text detection, ii) scene text recognition, and iii) scene text spotting.

Scene Text Detection Scene Text Recognition +2

MobileFaceSwap: A Lightweight Framework for Video Face Swapping

1 code implementation11 Jan 2022 Zhiliang Xu, Zhibin Hong, Changxing Ding, Zhen Zhu, Junyu Han, Jingtuo Liu, Errui Ding

In this work, we propose a lightweight Identity-aware Dynamic Network (IDN) for subject-agnostic face swapping by dynamically adjusting the model parameters according to the identity information.

Face Swapping Knowledge Distillation

Learning Generalized Spoof Cues for Face Anti-spoofing

6 code implementations8 May 2020 Haocheng Feng, Zhibin Hong, Haixiao Yue, Yang Chen, Keyao Wang, Junyu Han, Jingtuo Liu, Errui Ding

In this paper, we reformulate FAS in an anomaly detection perspective and propose a residual-learning framework to learn the discriminative live-spoof differences which are defined as the spoof cues.

Anomaly Detection Face Anti-Spoofing

Editing Text in the Wild

2 code implementations8 Aug 2019 Liang Wu, Chengquan Zhang, Jiaming Liu, Junyu Han, Jingtuo Liu, Errui Ding, Xiang Bai

Specifically, we propose an end-to-end trainable style retention network (SRNet) that consists of three modules: text conversion module, background inpainting module and fusion module.

Image Inpainting Image-to-Image Translation +1

EATEN: Entity-aware Attention for Single Shot Visual Text Extraction

1 code implementation20 Sep 2019 He guo, Xiameng Qin, Jiaming Liu, Junyu Han, Jingtuo Liu, Errui Ding

Extracting entity from images is a crucial part of many OCR applications, such as entity recognition of cards, invoices, and receipts.

Entity Extraction using GAN Optical Character Recognition (OCR)

ACFNet: Attentional Class Feature Network for Semantic Segmentation

1 code implementation ICCV 2019 Fan Zhang, Yanqin Chen, Zhihang Li, Zhibin Hong, Jingtuo Liu, Feifei Ma, Junyu Han, Errui Ding

Recent works have made great progress in semantic segmentation by exploiting richer context, most of which are designed from a spatial perspective.

Segmentation Semantic Segmentation

Few-Shot Font Generation by Learning Fine-Grained Local Styles

2 code implementations CVPR 2022 Licheng Tang, Yiyang Cai, Jiaming Liu, Zhibin Hong, Mingming Gong, Minhu Fan, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang

Instead of explicitly disentangling global or component-wise modeling, the cross-attention mechanism can attend to the right local styles in the reference glyphs and aggregate the reference styles into a fine-grained style representation for the given content glyphs.

Font Generation

PyramidBox++: High Performance Detector for Finding Tiny Face

4 code implementations31 Mar 2019 Zhihang Li, Xu Tang, Junyu Han, Jingtuo Liu, Ran He

With the rapid development of deep convolutional neural network, face detection has made great progress in recent years.

Data Augmentation Face Detection +1

Biphasic Learning of GANs for High-Resolution Image-to-Image Translation

no code implementations14 Apr 2019 Jie Cao, Huaibo Huang, Yi Li, Jingtuo Liu, Ran He, Zhenan Sun

In this work, we present a novel training framework for GANs, namely biphasic learning, to achieve image-to-image translation in multiple visual domains at $1024^2$ resolution.

Image-to-Image Translation Mutual Information Estimation +2

Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning

no code implementations ICCV 2019 Yipeng Sun, Jiaming Liu, Wei Liu, Junyu Han, Errui Ding, Jingtuo Liu

Most existing text reading benchmarks make it difficult to evaluate the performance of more advanced deep learning models in large vocabularies due to the limited amount of training data.

HAMBox: Delving into Online High-quality Anchors Mining for Detecting Outer Faces

no code implementations19 Dec 2019 Yang Liu, Xu Tang, Xiang Wu, Junyu Han, Jingtuo Liu, Errui Ding

In this paper, we propose an Online High-quality Anchor Mining Strategy (HAMBox), which explicitly helps outer faces compensate with high-quality anchors.

Face Detection Multi-Task Learning +2

Learning Global Structure Consistency for Robust Object Tracking

no code implementations26 Aug 2020 Bi Li, Chengquan Zhang, Zhibin Hong, Xu Tang, Jingtuo Liu, Junyu Han, Errui Ding, Wenyu Liu

Unlike many existing trackers that focus on modeling only the target, in this work, we consider the \emph{transient variations of the whole scene}.

Object Visual Object Tracking

Real Image Super Resolution Via Heterogeneous Model Ensemble using GP-NAS

no code implementations2 Sep 2020 Zhihong Pan, Baopu Li, Teng Xi, Yanwen Fan, Gang Zhang, Jingtuo Liu, Junyu Han, Errui Ding

With advancement in deep neural network (DNN), recent state-of-the-art (SOTA) image superresolution (SR) methods have achieved impressive performance using deep residual network with dense skip connections.

Image Super-Resolution Neural Architecture Search

FaceController: Controllable Attribute Editing for Face in the Wild

no code implementations23 Feb 2021 Zhiliang Xu, Xiyu Yu, Zhibin Hong, Zhen Zhu, Junyu Han, Jingtuo Liu, Errui Ding, Xiang Bai

By simply employing some existing and easy-obtainable prior information, our method can control, transfer, and edit diverse attributes of faces in the wild.

 Ranked #1 on Face Swapping on FaceForensics++ (FID metric)

Attribute Disentanglement +1

ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval

no code implementations CVPR 2022 Mengjun Cheng, Yipeng Sun, Longchao Wang, Xiongwei Zhu, Kun Yao, Jie Chen, Guoli Song, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang

Visual appearance is considered to be the most important cue to understand images for cross-modal retrieval, while sometimes the scene text appearing in images can provide valuable information to understand the visual semantics.

Ranked #10 on Cross-Modal Retrieval on Flickr30k (using extra training data)

Contrastive Learning Cross-Modal Retrieval +1

Few-Shot Head Swapping in the Wild

no code implementations CVPR 2022 Changyong Shu, Hemao Wu, Hang Zhou, Jiaming Liu, Zhibin Hong, Changxing Ding, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang

Particularly, seamless blending is achieved with the help of a Semantic-Guided Color Reference Creation procedure and a Blending UNet.

Face Swapping

TRUST: An Accurate and End-to-End Table structure Recognizer Using Splitting-based Transformers

no code implementations31 Aug 2022 Zengyuan Guo, Yuechen Yu, Pengyuan Lv, Chengquan Zhang, Haojie Li, Zhihui Wang, Kun Yao, Jingtuo Liu, Jingdong Wang

The Vertex-based Merging Module is capable of aggregating local contextual information between adjacent basic grids, providing the ability to merge basic girds that belong to the same spanning cell accurately.

Table Recognition

StyleSwap: Style-Based Generator Empowers Robust Face Swapping

no code implementations27 Sep 2022 Zhiliang Xu, Hang Zhou, Zhibin Hong, Ziwei Liu, Jiaming Liu, Zhizhi Guo, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang

Our core idea is to leverage a style-based generator to empower high-fidelity and robust face swapping, thus the generator's advantage can be adopted for optimizing identity similarity.

Face Swapping

Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers

no code implementations9 Dec 2022 Yasheng Sun, Hang Zhou, Kaisiyuan Wang, Qianyi Wu, Zhibin Hong, Jingtuo Liu, Errui Ding, Jingdong Wang, Ziwei Liu, Hideki Koike

This requires masking a large percentage of the original image and seamlessly inpainting it with the aid of audio and reference frames.

StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator

no code implementations CVPR 2023 Jiazhi Guan, Zhanwang Zhang, Hang Zhou, Tianshu Hu, Kaisiyuan Wang, Dongliang He, Haocheng Feng, Jingtuo Liu, Errui Ding, Ziwei Liu, Jingdong Wang

Despite recent advances in syncing lip movements with any audio waves, current methods still struggle to balance generation quality and the model's generalization ability.

HD-Fusion: Detailed Text-to-3D Generation Leveraging Multiple Noise Estimation

no code implementations30 Jul 2023 Jinbo Wu, Xiaobo Gao, Xing Liu, Zhengyang Shen, Chen Zhao, Haocheng Feng, Jingtuo Liu, Errui Ding

In this paper, we study Text-to-3D content generation leveraging 2D diffusion priors to enhance the quality and detail of the generated 3D models.

3D Generation Noise Estimation +1

Accelerating Vision Transformers Based on Heterogeneous Attention Patterns

no code implementations11 Oct 2023 Deli Yu, Teng Xi, Jianwei Li, Baopu Li, Gang Zhang, Haocheng Feng, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang

On one hand, different images share more similar attention patterns in early layers than later layers, indicating that the dynamic query-by-key self-attention matrix may be replaced with a static self-attention matrix in early layers.

Dimensionality Reduction

Cannot find the paper you are looking for? You can Submit a new open access paper.