Search Results for author: Junyu Han

Found 53 papers, 28 papers with code

Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection

no code implementations26 Mar 2024 Jiacheng Zhang, Jiaming Li, Xiangru Lin, Wei zhang, Xiao Tan, Junyu Han, Errui Ding, Jingdong Wang, Guanbin Li

Additionally, we present a DepthGradient Projection (DGP) module to mitigate optimization conflicts caused by noisy depth supervision of pseudo-labels, effectively decoupling the depth gradient and removing conflicting gradients.

Monocular 3D Object Detection object-detection +1

Gradient-based Sampling for Class Imbalanced Semi-supervised Object Detection

1 code implementation ICCV 2023 Jiaming Li, Xiangru Lin, Wei zhang, Xiao Tan, YingYing Li, Junyu Han, Errui Ding, Jingdong Wang, Guanbin Li

To tackle the confirmation bias from incorrect pseudo labels of minority classes, the class-rebalancing sampling module resamples unlabeled data following the guidance of the gradient-based reweighting module.

object-detection Object Detection +1

Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis

1 code implementation30 Nov 2023 Zipeng Qi, Guoxi Huang, Zebin Huang, Qin Guo, Jinwen Chen, Junyu Han, Jian Wang, Gang Zhang, Lufei Liu, Errui Ding, Jingdong Wang

The LRDiff framework constructs an image-rendering process with multiple layers, each of which applies the vision guidance to instructively estimate the denoising direction for a single object.

Denoising Image Generation

HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception

1 code implementation NeurIPS 2023 Junkun Yuan, Xinyu Zhang, Hao Zhou, Jian Wang, Zhongwei Qiu, Zhiyin Shao, Shaofeng Zhang, Sifan Long, Kun Kuang, Kun Yao, Junyu Han, Errui Ding, Lanfen Lin, Fei Wu, Jingdong Wang

To further capture human characteristics, we propose a structure-invariant alignment loss that enforces different masked views, guided by the human part prior, to be closely aligned for the same image.

2D Pose Estimation Attribute +3

Accelerating Vision Transformers Based on Heterogeneous Attention Patterns

no code implementations11 Oct 2023 Deli Yu, Teng Xi, Jianwei Li, Baopu Li, Gang Zhang, Haocheng Feng, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang

On one hand, different images share more similar attention patterns in early layers than later layers, indicating that the dynamic query-by-key self-attention matrix may be replaced with a static self-attention matrix in early layers.

Dimensionality Reduction

Group Pose: A Simple Baseline for End-to-End Multi-person Pose Estimation

2 code implementations ICCV 2023 Huan Liu, Qiang Chen, Zichang Tan, Jiang-Jiang Liu, Jian Wang, Xiangbo Su, Xiaolong Li, Kun Yao, Junyu Han, Errui Ding, Yao Zhao, Jingdong Wang

State-of-the-art solutions adopt the DETR-like framework, and mainly develop the complex decoder, e. g., regarding pose estimation as keypoint box detection and combining with human detection in ED-Pose, hierarchically predicting with pose decoder and joint (keypoint) decoder in PETR.

Human Detection Multi-Person Pose Estimation

MataDoc: Margin and Text Aware Document Dewarping for Arbitrary Boundary

no code implementations24 Jul 2023 Beiya Dai, Xing Li, Qunyi Xie, Yulin Li, Xiameng Qin, Chengquan Zhang, Kun Yao, Junyu Han

To produce a comprehensive evaluation of MataDoc, we propose a novel benchmark ArbDoc, mainly consisting of document images with arbitrary boundaries in four typical scenarios.

document understanding Optical Character Recognition (OCR)

Semi-DETR: Semi-Supervised Object Detection with Detection Transformers

3 code implementations CVPR 2023 Jiacheng Zhang, Xiangru Lin, Wei zhang, Kuo Wang, Xiao Tan, Junyu Han, Errui Ding, Jingdong Wang, Guanbin Li

Specifically, we propose a Stage-wise Hybrid Matching strategy that combines the one-to-many assignment and one-to-one assignment strategies to improve the training efficiency of the first stage and thus provide high-quality pseudo labels for the training of the second stage.

Object object-detection +3

Ambiguity-Resistant Semi-Supervised Learning for Dense Object Detection

1 code implementation CVPR 2023 Chang Liu, Weiming Zhang, Xiangru Lin, Wei zhang, Xiao Tan, Junyu Han, Xiaomao Li, Errui Ding, Jingdong Wang

It employs a "divide-and-conquer" strategy and separately exploits positives for the classification and localization task, which is more robust to the assignment ambiguity.

Dense Object Detection Object +3

StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training

1 code implementation1 Mar 2023 Yuechen Yu, Yulin Li, Chengquan Zhang, Xiaoqiang Zhang, Zengyuan Guo, Xiameng Qin, Kun Yao, Junyu Han, Errui Ding, Jingdong Wang

Compared to the masked multi-modal modeling methods for document image understanding that rely on both the image and text modalities, StrucTexTv2 models image-only input and potentially deals with more application scenarios free from OCR pre-processing.

Document Image Classification Language Modelling +3

Graph Contrastive Learning for Skeleton-based Action Recognition

1 code implementation26 Jan 2023 Xiaohu Huang, Hao Zhou, Jian Wang, Haocheng Feng, Junyu Han, Errui Ding, Jingdong Wang, Xinggang Wang, Wenyu Liu, Bin Feng

In this paper, we propose a graph contrastive learning framework for skeleton-based action recognition (\textit{SkeletonGCL}) to explore the \textit{global} context across all sequences.

Action Recognition Contrastive Learning +2

CFCG: Semi-Supervised Semantic Segmentation via Cross-Fusion and Contour Guidance Supervision

no code implementations ICCV 2023 Shuo Li, Yue He, Weiming Zhang , Wei zhang, Xiao Tan, Junyu Han, Errui Ding, Jingdong Wang

Current state-of-the-art semi-supervised semantic segmentation (SSSS) methods typically adopt pseudo labeling and consistency regularization between multiple learners with different perturbations.

Semi-Supervised Semantic Segmentation

Group DETR v2: Strong Object Detector with Encoder-Decoder Pretraining

no code implementations arXiv 2022 Qiang Chen, Jian Wang, Chuchu Han, Shan Zhang, Zexian Li, Xiaokang Chen, Jiahui Chen, Xiaodi Wang, Shuming Han, Gang Zhang, Haocheng Feng, Kun Yao, Junyu Han, Errui Ding, Jingdong Wang

The training process consists of self-supervised pretraining and finetuning a ViT-Huge encoder on ImageNet-1K, pretraining the detector on Object365, and finally finetuning it on COCO.

Object object-detection +1

StyleSwap: Style-Based Generator Empowers Robust Face Swapping

no code implementations27 Sep 2022 Zhiliang Xu, Hang Zhou, Zhibin Hong, Ziwei Liu, Jiaming Liu, Zhizhi Guo, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang

Our core idea is to leverage a style-based generator to empower high-fidelity and robust face swapping, thus the generator's advantage can be adopted for optimizing identity similarity.

Face Swapping

Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment

2 code implementations ICCV 2023 Qiang Chen, Xiaokang Chen, Jian Wang, Shan Zhang, Kun Yao, Haocheng Feng, Junyu Han, Errui Ding, Gang Zeng, Jingdong Wang

Detection transformer (DETR) relies on one-to-one assignment, assigning one ground-truth object to one prediction, for end-to-end detection without NMS post-processing.

Data Augmentation Object +2

MaskOCR: Text Recognition with Masked Encoder-Decoder Pretraining

no code implementations1 Jun 2022 Pengyuan Lyu, Chengquan Zhang, Shanshan Liu, Meina Qiao, Yangliu Xu, Liang Wu, Kun Yao, Junyu Han, Errui Ding, Jingdong Wang

Specifically, we transform text data into synthesized text images to unify the data modalities of vision and language, and enhance the language modeling capability of the sequence decoder using a proposed masked image-language modeling scheme.

Language Modelling Optical Character Recognition (OCR) +1

Few-Shot Font Generation by Learning Fine-Grained Local Styles

2 code implementations CVPR 2022 Licheng Tang, Yiyang Cai, Jiaming Liu, Zhibin Hong, Mingming Gong, Minhu Fan, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang

Instead of explicitly disentangling global or component-wise modeling, the cross-attention mechanism can attend to the right local styles in the reference glyphs and aggregate the reference styles into a fine-grained style representation for the given content glyphs.

Font Generation

Few-Shot Head Swapping in the Wild

no code implementations CVPR 2022 Changyong Shu, Hemao Wu, Hang Zhou, Jiaming Liu, Zhibin Hong, Changxing Ding, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang

Particularly, seamless blending is achieved with the help of a Semantic-Guided Color Reference Creation procedure and a Blending UNet.

Face Swapping

ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval

no code implementations CVPR 2022 Mengjun Cheng, Yipeng Sun, Longchao Wang, Xiongwei Zhu, Kun Yao, Jie Chen, Guoli Song, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang

Visual appearance is considered to be the most important cue to understand images for cross-modal retrieval, while sometimes the scene text appearing in images can provide valuable information to understand the visual semantics.

Ranked #10 on Cross-Modal Retrieval on Flickr30k (using extra training data)

Contrastive Learning Cross-Modal Retrieval +1

MobileFaceSwap: A Lightweight Framework for Video Face Swapping

1 code implementation11 Jan 2022 Zhiliang Xu, Zhibin Hong, Changxing Ding, Zhen Zhu, Junyu Han, Jingtuo Liu, Errui Ding

In this work, we propose a lightweight Identity-aware Dynamic Network (IDN) for subject-agnostic face swapping by dynamically adjusting the model parameters according to the identity information.

Face Swapping Knowledge Distillation

PGNet: Real-time Arbitrarily-Shaped Text Spotting with Point Gathering Network

2 code implementations12 Apr 2021 Pengfei Wang, Chengquan Zhang, Fei Qi, Shanshan Liu, Xiaoqiang Zhang, Pengyuan Lyu, Junyu Han, Jingtuo Liu, Errui Ding, Guangming Shi

With a PG-CTC decoder, we gather high-level character classification vectors from two-dimensional space and decode them into text symbols without NMS and RoI operations involved, which guarantees high efficiency.

 Ranked #1 on Scene Text Detection on ICDAR 2015 (Accuracy metric)

Optical Character Recognition (OCR) Scene Text Detection +1

FaceController: Controllable Attribute Editing for Face in the Wild

no code implementations23 Feb 2021 Zhiliang Xu, Xiyu Yu, Zhibin Hong, Zhen Zhu, Junyu Han, Jingtuo Liu, Errui Ding, Xiang Bai

By simply employing some existing and easy-obtainable prior information, our method can control, transfer, and edit diverse attributes of faces in the wild.

 Ranked #1 on Face Swapping on FaceForensics++ (FID metric)

Attribute Disentanglement +1

Real Image Super Resolution Via Heterogeneous Model Ensemble using GP-NAS

no code implementations2 Sep 2020 Zhihong Pan, Baopu Li, Teng Xi, Yanwen Fan, Gang Zhang, Jingtuo Liu, Junyu Han, Errui Ding

With advancement in deep neural network (DNN), recent state-of-the-art (SOTA) image superresolution (SR) methods have achieved impressive performance using deep residual network with dense skip connections.

Image Super-Resolution Neural Architecture Search

Learning Global Structure Consistency for Robust Object Tracking

no code implementations26 Aug 2020 Bi Li, Chengquan Zhang, Zhibin Hong, Xu Tang, Jingtuo Liu, Junyu Han, Errui Ding, Wenyu Liu

Unlike many existing trackers that focus on modeling only the target, in this work, we consider the \emph{transient variations of the whole scene}.

Object Visual Object Tracking

Learning Generalized Spoof Cues for Face Anti-spoofing

6 code implementations8 May 2020 Haocheng Feng, Zhibin Hong, Haixiao Yue, Yang Chen, Keyao Wang, Junyu Han, Jingtuo Liu, Errui Ding

In this paper, we reformulate FAS in an anomaly detection perspective and propose a residual-learning framework to learn the discriminative live-spoof differences which are defined as the spoof cues.

Anomaly Detection Face Anti-Spoofing

HAMBox: Delving into Online High-quality Anchors Mining for Detecting Outer Faces

no code implementations19 Dec 2019 Yang Liu, Xu Tang, Xiang Wu, Junyu Han, Jingtuo Liu, Errui Ding

In this paper, we propose an Online High-quality Anchor Mining Strategy (HAMBox), which explicitly helps outer faces compensate with high-quality anchors.

Face Detection Multi-Task Learning +2

EATEN: Entity-aware Attention for Single Shot Visual Text Extraction

1 code implementation20 Sep 2019 He guo, Xiameng Qin, Jiaming Liu, Junyu Han, Jingtuo Liu, Errui Ding

Extracting entity from images is a crucial part of many OCR applications, such as entity recognition of cards, invoices, and receipts.

Entity Extraction using GAN Optical Character Recognition (OCR)

ACFNet: Attentional Class Feature Network for Semantic Segmentation

1 code implementation ICCV 2019 Fan Zhang, Yanqin Chen, Zhihang Li, Zhibin Hong, Jingtuo Liu, Feifei Ma, Junyu Han, Errui Ding

Recent works have made great progress in semantic segmentation by exploiting richer context, most of which are designed from a spatial perspective.

Segmentation Semantic Segmentation

Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning

no code implementations ICCV 2019 Yipeng Sun, Jiaming Liu, Wei Liu, Junyu Han, Errui Ding, Jingtuo Liu

Most existing text reading benchmarks make it difficult to evaluate the performance of more advanced deep learning models in large vocabularies due to the limited amount of training data.

ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT)

1 code implementation16 Sep 2019 Chee-Kheng Chng, Yuliang Liu, Yipeng Sun, Chun Chet Ng, Canjie Luo, Zihan Ni, ChuanMing Fang, Shuaitao Zhang, Junyu Han, Errui Ding, Jingtuo Liu, Dimosthenis Karatzas, Chee Seng Chan, Lianwen Jin

This paper reports the ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT) that consists of three major challenges: i) scene text detection, ii) scene text recognition, and iii) scene text spotting.

Scene Text Detection Scene Text Recognition +2

An End-to-end Video Text Detector with Online Tracking

no code implementations20 Aug 2019 Hongyuan Yu, Chengquan Zhang, Xuan Li, Junyu Han, Errui Ding, Liang Wang

Most existing methods attempt to enhance the performance of video text detection by cooperating with video text tracking, but treat these two tasks separately.

Text Detection

Editing Text in the Wild

2 code implementations8 Aug 2019 Liang Wu, Chengquan Zhang, Jiaming Liu, Junyu Han, Jingtuo Liu, Errui Ding, Xiang Bai

Specifically, we propose an end-to-end trainable style retention network (SRNet) that consists of three modules: text conversion module, background inpainting module and fusion module.

Image Inpainting Image-to-Image Translation +1

PyramidBox++: High Performance Detector for Finding Tiny Face

4 code implementations31 Mar 2019 Zhihang Li, Xu Tang, Junyu Han, Jingtuo Liu, Ran He

With the rapid development of deep convolutional neural network, face detection has made great progress in recent years.

Data Augmentation Face Detection +1

Detecting Text in the Wild with Deep Character Embedding Network

no code implementations2 Jan 2019 Jiaming Liu, Chengquan Zhang, Yipeng Sun, Junyu Han, Errui Ding

However, text in the wild is usually perspectively distorted or curved, which can not be easily tackled by existing approaches.

Clustering Text Detection

TextNet: Irregular Text Reading from Images with an End-to-End Trainable Network

no code implementations24 Dec 2018 Yipeng Sun, Chengquan Zhang, Zuming Huang, Jiaming Liu, Junyu Han, Errui Ding

Reading text from images remains challenging due to multi-orientation, perspective distortion and especially the curved nature of irregular text.

Optical Character Recognition (OCR) Text Detection

WordSup: Exploiting Word Annotations for Character based Text Detection

no code implementations ICCV 2017 Han Hu, Chengquan Zhang, Yuxuan Luo, Yuzhuo Wang, Junyu Han, Errui Ding

When applied in scene text detection, we are thus able to train a robust character detector by exploiting word annotations in the rich large-scale real scene text datasets, e. g. ICDAR15 and COCO-text.

Math Scene Text Detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.