Search Results for author: Shijian Lu

Found 130 papers, 44 papers with code

Collaborative Learning of Gesture Recognition and 3D Hand Pose Estimation with Multi-Order Feature Analysis

no code implementations • ECCV 2020 • Siyuan Yang, Jun Liu, Shijian Lu, Meng Hwa Er, Alex C. Kot

The proposed network exploits joint-aware features that are crucial for both tasks, with which gesture recognition and 3D hand pose estimation boost each other to learn highly discriminative features and models.

3D Hand Pose Estimation Gesture Recognition

Paper
Add Code

AMLN: Adversarial-based Mutual Learning Network for Online Knowledge Distillation

no code implementations • ECCV 2020 • Xiaobing Zhang, Shijian Lu, Haigang Gong, Zhipeng Luo, Ming Liu

Online knowledge distillation has attracted increasing interest recently, which jointly learns teacher and student models or an ensemble of student models simultaneously and collaboratively.

Knowledge Distillation Transfer Learning

Paper
Add Code

MixLight: Borrowing the Best of both Spherical Harmonics and Gaussian Models

no code implementations • 19 Apr 2024 • Xinlong Ji, Fangneng Zhan, Shijian Lu, Shi-Sheng Huang, Hua Huang

However, the method of generating illumination maps has poor generalization performance and parametric models such as Spherical Harmonic (SH) and Spherical Gaussian (SG) fall short in capturing high-frequency or low-frequency components.

Mixed Reality

Paper
Add Code

Efficient Test-Time Adaptation of Vision-Language Models

no code implementations • 27 Mar 2024 • Adilbek Karmanov, Dayan Guan, Shijian Lu, Abdulmotaleb El Saddik, Eric Xing

TDA works with a lightweight key-value cache that maintains a dynamic queue with few-shot pseudo labels as values and the corresponding test-sample features as keys.

Pseudo Label Test-time Adaptation

Paper
Add Code

Masked AutoDecoder is Effective Multi-Task Vision Generalist

1 code implementation • 12 Mar 2024 • Han Qiu, Jiaxing Huang, Peng Gao, Lewei Lu, Xiaoqin Zhang, Shijian Lu

Inspired by the success of general-purpose models in NLP, recent studies attempt to unify different vision tasks in the same sequence format and employ autoregressive Transformers for sequence prediction.

Paper
Code

StyleGaussian: Instant 3D Style Transfer with Gaussian Splatting

no code implementations • 12 Mar 2024 • Kunhao Liu, Fangneng Zhan, Muyu Xu, Christian Theobalt, Ling Shao, Shijian Lu

We introduce StyleGaussian, a novel 3D style transfer technique that allows instant transfer of any image's style to a 3D scene at 10 frames per second (fps).

Style Transfer

Paper
Add Code

FreGS: 3D Gaussian Splatting with Progressive Frequency Regularization

no code implementations • 11 Mar 2024 • Jiahui Zhang, Fangneng Zhan, Muyu Xu, Shijian Lu, Eric Xing

3D Gaussian splatting has achieved very impressive performance in real-time novel view synthesis.

Novel View Synthesis

Paper
Add Code

Weakly Supervised Monocular 3D Detection with a Single-View Image

no code implementations • 29 Feb 2024 • Xueying Jiang, Sheng Jin, Lewei Lu, Xiaoqin Zhang, Shijian Lu

We propose SKD-WM3D, a weakly supervised monocular 3D detection framework that exploits depth information to achieve M3D with a single-view image exclusively without any 3D annotations or other training data.

Object Localization Self-Knowledge Distillation +1

Paper
Add Code

DivAvatar: Diverse 3D Avatar Generation with a Single Prompt

no code implementations • 27 Feb 2024 • Weijing Tao, Biwen Lei, Kunhao Liu, Shijian Lu, Miaomiao Cui, Xuansong Xie, Chunyan Miao

We design DivAvatar, a novel framework that generates diverse avatars, empowering 3D creatives with a multitude of distinct and richly varied 3D avatars from a single text prompt.

Paper
Add Code

LLMs Meet VLMs: Boost Open Vocabulary Object Detection with Fine-grained Descriptors

no code implementations • 7 Feb 2024 • Sheng Jin, Xueying Jiang, Jiaxing Huang, Lewei Lu, Shijian Lu

This paper presents DVDet, a Descriptor-Enhanced Open Vocabulary Detector that introduces conditional context prompts and hierarchical textual descriptors that enable precise region-text alignment as well as open-vocabulary detection training in general.

Image Classification object-detection +1

Paper
Add Code

Conditional Tuning Network for Few-Shot Adaptation of Segmentation Anything Model

no code implementations • 6 Feb 2024 • Aoran Xiao, Weihao Xuan, Heli Qi, Yun Xing, Ruijie Ren, Xiaoqin Zhang, Ling Shao, Shijian Lu

CAT-SAM freezes the entire SAM and adapts its mask decoder and image encoder simultaneously with a small number of learnable parameters.

Image Segmentation Semantic Segmentation

Paper
Add Code

Cross-Domain Few-Shot Segmentation via Iterative Support-Query Correspondence Mining

1 code implementation • 16 Jan 2024 • Jiahao Nie, Yun Xing, Gongjie Zhang, Pei Yan, Aoran Xiao, Yap-Peng Tan, Alex C. Kot, Shijian Lu

Cross-Domain Few-Shot Segmentation (CD-FSS) poses the challenge of segmenting novel categories from a distinct domain using only limited exemplars.

Cross-Domain Few-Shot

Paper
Code

DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception

no code implementations • 13 Jan 2024 • Kai Jiang, Jiaxing Huang, Weiying Xie, Yunsong Li, Ling Shao, Shijian Lu

Camera-only Bird's Eye View (BEV) has demonstrated great potential in environment perception in a 3D space.

3D Object Detection object-detection +2

Paper
Add Code

Domain Adaptation for Large-Vocabulary Object Detectors

no code implementations • 13 Jan 2024 • Kai Jiang, Jiaxing Huang, Weiying Xie, Yunsong Li, Ling Shao, Shijian Lu

Large-vocabulary object detectors (LVDs) aim to detect objects of many categories, which learn super objectness features and can locate objects accurately while applied to various downstream data.

Domain Adaptation Knowledge Graphs +2

Paper
Add Code

Learning to Prompt Segment Anything Models

no code implementations • 9 Jan 2024 • Jiaxing Huang, Kai Jiang, Jingyi Zhang, Han Qiu, Lewei Lu, Shijian Lu, Eric Xing

SAMs work with two types of prompts including spatial prompts (e. g., points) and semantic prompts (e. g., texts), which work together to prompt SAMs to segment anything on downstream datasets.

Image Segmentation Segmentation +1

Paper
Add Code

Visual Instruction Tuning towards General-Purpose Multimodal Model: A Survey

no code implementations • 27 Dec 2023 • Jiaxing Huang, Jingyi Zhang, Kai Jiang, Han Qiu, Shijian Lu

Traditional computer vision generally solves each single task independently by a dedicated model with the task instruction implicitly designed in the model architecture, arising two limitations: (1) it leads to task-specific models, which require multiple models for different tasks and restrict the potential synergies from diverse tasks; (2) it leads to a pre-defined and fixed model interface that has limited interactivity and adaptability in following user' task instructions.

Instruction Following

Paper
Add Code

Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding

2 code implementations • 28 Nov 2023 • Sicong Leng, Hang Zhang, Guanzheng Chen, Xin Li, Shijian Lu, Chunyan Miao, Lidong Bing

Large Vision-Language Models (LVLMs) have advanced considerably, intertwining visual recognition and language understanding to generate content that is not only coherent but also contextually attuned.

Hallucination Object

8,950

Paper
Code

AI-Generated Images as Data Source: The Dawn of Synthetic Era

1 code implementation • 3 Oct 2023 • Zuhao Yang, Fangneng Zhan, Kunhao Liu, Muyu Xu, Shijian Lu

The advancement of visual intelligence is intrinsically tethered to the availability of large-scale data.

134

Paper
Code

Noise-Tolerant Unsupervised Adapter for Vision-Language Models

no code implementations • 26 Sep 2023 • Eman Ali, Dayan Guan, Shijian Lu, Abdulmotaleb Elsaddik

NtUA works as a key-value cache that formulates visual features and predicted pseudo-labels of the few-shot unlabelled target samples as key-value pairs.

Image Classification Knowledge Distillation +2

Paper
Add Code

Rewrite Caption Semantics: Bridging Semantic Gaps for Language-Supervised Semantic Segmentation

2 code implementations • NeurIPS 2023 • Yun Xing, Jian Kang, Aoran Xiao, Jiahao Nie, Ling Shao, Shijian Lu

Such semantic misalignment circulates in pre-training, leading to inferior zero-shot performance in dense predictions due to insufficient visual concepts captured in textual representations.

Segmentation Semantic Segmentation +1

Paper
Code

Domain Generalization via Balancing Training Difficulty and Model Capability

no code implementations • ICCV 2023 • Xueying Jiang, Jiaxing Huang, Sheng Jin, Shijian Lu

Despite its recent progress, most existing work suffers from the misalignment between the difficulty level of training samples and the capability of contemporarily trained models, leading to over-fitting or under-fitting in the trained generalization model.

Data Augmentation Domain Generalization

Paper
Add Code

Pose-Free Neural Radiance Fields via Implicit Pose Regularization

no code implementations • ICCV 2023 • Jiahui Zhang, Fangneng Zhan, Yingchen Yu, Kunhao Liu, Rongliang Wu, Xiaoqin Zhang, Ling Shao, Shijian Lu

However, as the pose estimator is trained with only rendered images, the pose estimation is usually biased or inaccurate for real images due to the domain gap between real images and rendered images, leading to poor robustness for the pose estimation of real images and further local minima in joint optimization.

Novel View Synthesis Pose Estimation

Paper
Add Code

Black-box Unsupervised Domain Adaptation with Bi-directional Atkinson-Shiffrin Memory

no code implementations • ICCV 2023 • Jingyi Zhang, Jiaxing Huang, Xueying Jiang, Shijian Lu

However, the source predictions of target data are often noisy and training with them is prone to learning collapses.

Image Classification Memorization +4

Paper
Add Code

WaveNeRF: Wavelet-based Generalizable Neural Radiance Fields

no code implementations • ICCV 2023 • Muyu Xu, Fangneng Zhan, Jiahui Zhang, Yingchen Yu, Xiaoqin Zhang, Christian Theobalt, Ling Shao, Shijian Lu

Neural Radiance Field (NeRF) has shown impressive performance in novel view synthesis via implicit scene representation.

Novel View Synthesis

Paper
Add Code

One-Shot Action Recognition via Multi-Scale Spatial-Temporal Skeleton Matching

no code implementations • 14 Jul 2023 • Siyuan Yang, Jun Liu, Shijian Lu, Er Meng Hwa, Alex C. Kot

The first is multi-scale matching which captures the scale-wise semantic relevance of skeleton data at multiple spatial and temporal scales simultaneously.

Action Recognition

Paper
Add Code

Prompt Ensemble Self-training for Open-Vocabulary Domain Adaptation

no code implementations • 29 Jun 2023 • Jiaxing Huang, Jingyi Zhang, Han Qiu, Sheng Jin, Shijian Lu

Traditional domain adaptation assumes the same vocabulary across source and target domains, which often struggles with limited transfer flexibility and efficiency while handling target domains with different vocabularies.

Unsupervised Domain Adaptation

Paper
Add Code

A Survey of Label-Efficient Deep Learning for 3D Point Clouds

1 code implementation • 31 May 2023 • Aoran Xiao, Xiaoqin Zhang, Ling Shao, Shijian Lu

We address three critical questions in this emerging research field: i) the importance and urgency of label-efficient learning in point cloud processing, ii) the subfields it encompasses, and iii) the progress achieved in this area.

Data Augmentation Efficient Exploration +2

Paper
Code

Weakly Supervised 3D Open-vocabulary Segmentation

1 code implementation • NeurIPS 2023 • Kunhao Liu, Fangneng Zhan, Jiahui Zhang, Muyu Xu, Yingchen Yu, Abdulmotaleb El Saddik, Christian Theobalt, Eric Xing, Shijian Lu

Open-vocabulary segmentation of 3D scenes is a fundamental function of human perception and thus a crucial objective in computer vision research.

Segmentation

Paper
Code

Audio-Driven Talking Face Generation with Diverse yet Realistic Facial Animations

no code implementations • 18 Apr 2023 • Rongliang Wu, Yingchen Yu, Fangneng Zhan, Jiahui Zhang, Xiaoqin Zhang, Shijian Lu

To accommodate fair variation of plausible facial animations for the same audio, we design a transformer-based probabilistic mapping network that can model the variational facial animation distribution conditioned upon the input audio and autoregressively convert the audio signals into a facial animation sequence.

Talking Face Generation

Paper
Add Code

Self-Supervised 3D Action Representation Learning with Skeleton Cloud Colorization

no code implementations • 18 Apr 2023 • Siyuan Yang, Jun Liu, Shijian Lu, Er Meng Hwa, Yongjian Hu, Alex C. Kot

We investigate self-supervised representation learning and design a novel skeleton cloud colorization technique that is capable of learning spatial and temporal skeleton representations from unlabeled skeleton sequence data.

Colorization Representation Learning +2

Paper
Add Code

POCE: Pose-Controllable Expression Editing

no code implementations • 18 Apr 2023 • Rongliang Wu, Yingchen Yu, Fangneng Zhan, Jiahui Zhang, Shengcai Liao, Shijian Lu

POCE achieves the more accessible and realistic pose-controllable expression editing by mapping face images into UV space, where facial expressions and head poses can be disentangled and edited separately.

Paper
Add Code

Face Transformer: Towards High Fidelity and Accurate Face Swapping

no code implementations • 5 Apr 2023 • Kaiwen Cui, Rongliang Wu, Fangneng Zhan, Shijian Lu

Face swapping aims to generate swapped images that fuse the identity of source faces and the attributes of target faces.

Face Swapping Vocal Bursts Intensity Prediction

Paper
Add Code

3D Semantic Segmentation in the Wild: Learning Generalized Models for Adverse-Condition Point Clouds

1 code implementation • CVPR 2023 • Aoran Xiao, Jiaxing Huang, Weihao Xuan, Ruijie Ren, Kangcheng Liu, Dayan Guan, Abdulmotaleb El Saddik, Shijian Lu, Eric Xing

In addition, we design a domain randomization technique that alternatively randomizes the geometry styles of point clouds and aggregates their embeddings, ultimately leading to a generalizable model that can improve 3DSS under various adverse weather effectively.

3D Semantic Segmentation Autonomous Driving

Paper
Code

Vision-Language Models for Vision Tasks: A Survey

1 code implementation • 3 Apr 2023 • Jingyi Zhang, Jiaxing Huang, Sheng Jin, Shijian Lu

Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks (DNNs) training, and they usually train a DNN for each single visual recognition task, leading to a laborious and time-consuming visual recognition paradigm.

Benchmarking Knowledge Distillation +1

1,742

Paper
Code

StyleRF: Zero-shot 3D Style Transfer of Neural Radiance Fields

1 code implementation • CVPR 2023 • Kunhao Liu, Fangneng Zhan, YiWen Chen, Jiahui Zhang, Yingchen Yu, Abdulmotaleb El Saddik, Shijian Lu, Eric Xing

In addition, it transforms the grid features according to the reference style which directly leads to high-quality zero-shot style transfer.

Style Transfer

133

Paper
Code

Modeling Continuous Motion for 3D Point Cloud Object Tracking

no code implementations • 14 Mar 2023 • Zhipeng Luo, Gongjie Zhang, Changqing Zhou, Zhonghua Wu, Qingyi Tao, Lewei Lu, Shijian Lu

The task of 3D single object tracking (SOT) with LiDAR point clouds is crucial for various applications, such as autonomous driving and robotics.

3D Single Object Tracking Autonomous Driving +2

Paper
Add Code

Regularized Vector Quantization for Tokenized Image Synthesis

no code implementations • CVPR 2023 • Jiahui Zhang, Fangneng Zhan, Christian Theobalt, Shijian Lu

The first is a prior distribution regularization which measures the discrepancy between a prior token distribution and the predicted token distribution to avoid codebook collapse and low codebook utilization.

Image Generation Quantization

Paper
Add Code

Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger

no code implementations • CVPR 2023 • Yi Yu, YuFei Wang, Wenhan Yang, Shijian Lu, Yap-Peng Tan, Alex C. Kot

Extensive experiments show that with our trained trigger injection models and simple modification of encoder parameters (of the compression model), the proposed attack can successfully inject several backdoors with corresponding triggers in a single image compression model.

Backdoor Attack Face Recognition +2

Paper
Add Code

DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention

no code implementations • 15 Dec 2022 • Zhipeng Luo, Changqing Zhou, Gongjie Zhang, Shijian Lu

3D object detection with surround-view images is an essential task for autonomous driving.

3D Object Detection Autonomous Driving +2

Paper
Add Code

Domain Adaptive Scene Text Detection via Subcategorization

no code implementations • 1 Dec 2022 • Zichen Tian, Chuhui Xue, Jingyi Zhang, Shijian Lu

We study domain adaptive scene text detection, a largely neglected yet very meaningful task that aims for optimal transfer of labelled scene text images while handling unlabelled images in various new domains.

Scene Text Detection Text Detection

Paper
Add Code

Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors

no code implementations • CVPR 2023 • Gongjie Zhang, Zhipeng Luo, Zichen Tian, Jingyi Zhang, Xiaoqin Zhang, Shijian Lu

Multi-scale features have been proven highly effective for object detection but often come with huge and even prohibitive extra computation costs, especially for the recent Transformer-based detectors.

Object object-detection +1

Paper
Add Code

Exploring Point-BEV Fusion for 3D Point Cloud Object Tracking with Transformer

1 code implementation • 10 Aug 2022 • Zhipeng Luo, Changqing Zhou, Liang Pan, Gongjie Zhang, Tianrui Liu, Yueru Luo, Haiyu Zhao, Ziwei Liu, Shijian Lu

In a point cloud sequence, 3D object tracking aims to predict the location and orientation of an object in consecutive frames given an object template.

3D Object Tracking Autonomous Driving +3

116

Paper
Code

TransPillars: Coarse-to-Fine Aggregation for Multi-Frame 3D Object Detection

no code implementations • 4 Aug 2022 • Zhipeng Luo, Gongjie Zhang, Changqing Zhou, Tianrui Liu, Shijian Lu, Liang Pan

3D object detection using point clouds has attracted increasing attention due to its wide applications in autonomous driving and robotics.

3D Object Detection Autonomous Driving +3

Paper
Add Code

Latent Multi-Relation Reasoning for GAN-Prior based Image Super-Resolution

no code implementations • 4 Aug 2022 • Jiahui Zhang, Fangneng Zhan, Yingchen Yu, Rongliang Wu, Xiaoqin Zhang, Shijian Lu

In addition, stochastic noises fed to the generator are employed for unconditional detail generation, which tends to produce unfaithful details that compromise the fidelity of the generated SR image.

Attribute Code Generation +3

Paper
Add Code

Meta-DETR: Image-Level Few-Shot Detection with Inter-Class Correlation Exploitation

1 code implementation • 30 Jul 2022 • Gongjie Zhang, Zhipeng Luo, Kaiwen Cui, Shijian Lu, Eric P. Xing

Despite its success, the said paradigm is still constrained by several factors, such as (i) low-quality region proposals for novel classes and (ii) negligence of the inter-class correlation among different classes.

Few-Shot Object Detection Meta-Learning +2

366

Paper
Code

PolarMix: A General Data Augmentation Technique for LiDAR Point Clouds

2 code implementations • 30 Jul 2022 • Aoran Xiao, Jiaxing Huang, Dayan Guan, Kaiwen Cui, Shijian Lu, Ling Shao

The first is scene-level swapping which exchanges point cloud sectors of two LiDAR scans that are cut along the azimuth axis.

Ranked #2 on 3D Unsupervised Domain Adaptation on SynLiDAR-to-SemanticKITTI

3D Object Detection 3D Unsupervised Domain Adaptation +3

478

Paper
Code

Semantic-Aligned Matching for Enhanced DETR Convergence and Multi-Scale Feature Fusion

1 code implementation • 28 Jul 2022 • Gongjie Zhang, Zhipeng Luo, Jiaxing Huang, Shijian Lu, Eric P. Xing

The recently proposed DEtection TRansformer (DETR) has established a fully end-to-end paradigm for object detection.

Object object-detection +1

287

Paper
Code

Contextual Text Block Detection towards Scene Text Understanding

no code implementations • 26 Jul 2022 • Chuhui Xue, Jiaxing Huang, Shijian Lu, Changhu Wang, Song Bai

We formulate the new setup by a dual detection task which first detects integral text units and then groups them into a CTB.

text-classification Text Classification +2

Paper
Add Code

Auto-regressive Image Synthesis with Integrated Quantization

no code implementations • 21 Jul 2022 • Fangneng Zhan, Yingchen Yu, Rongliang Wu, Jiahui Zhang, Kaiwen Cui, Changgong Zhang, Shijian Lu

Extensive experiments over multiple conditional image generation tasks show that our method achieves superior diverse image generation performance qualitatively and quantitatively as compared with the state-of-the-art.

Conditional Image Generation Inductive Bias +1

Paper
Add Code

Towards Counterfactual Image Manipulation via CLIP

1 code implementation • 6 Jul 2022 • Yingchen Yu, Fangneng Zhan, Rongliang Wu, Jiahui Zhang, Shijian Lu, Miaomiao Cui, Xuansong Xie, Xian-Sheng Hua, Chunyan Miao

In addition, we design a simple yet effective scheme that explicitly maps CLIP embeddings (of target text) to the latent space and fuses them with latent codes for effective latent code optimization and accurate editing.

counterfactual Image Manipulation

Paper
Code

Domain Adaptive Video Segmentation via Temporal Pseudo Supervision

1 code implementation • 6 Jul 2022 • Yun Xing, Dayan Guan, Jiaxing Huang, Shijian Lu

Specifically, we design cross-frame pseudo labelling to provide pseudo supervision from previous video frames while learning from the augmented current video frames.

Segmentation Semantic Segmentation +2

Paper
Code

VMRF: View Matching Neural Radiance Fields

no code implementations • 6 Jul 2022 • Jiahui Zhang, Fangneng Zhan, Rongliang Wu, Yingchen Yu, Wenqing Zhang, Bai Song, Xiaoqin Zhang, Shijian Lu

With the feature transport plan as the guidance, a novel pose calibration technique is designed which rectifies the initially randomized camera poses by predicting relative pose transformations between the pair of rendered and real images.

Novel View Synthesis

Paper
Add Code

UniDAformer: Unified Domain Adaptive Panoptic Segmentation Transformer via Hierarchical Mask Calibration

no code implementations • CVPR 2023 • Jingyi Zhang, Jiaxing Huang, Xiaoqin Zhang, Shijian Lu

Domain adaptive panoptic segmentation aims to mitigate data annotation challenge by leveraging off-the-shelf annotated data in one or multiple related source domains.

Ranked #2 on Domain Adaptation on Panoptic SYNTHIA-to-Cityscapes

Domain Adaptation Instance Segmentation +3

Paper
Add Code

Marginal Contrastive Correspondence for Guided Image Generation

no code implementations • CVPR 2022 • Fangneng Zhan, Yingchen Yu, Rongliang Wu, Jiahui Zhang, Shijian Lu, Changgong Zhang

We design a Marginal Contrastive Learning Network (MCL-Net) that explores contrastive learning to learn domain-invariant features for realistic exemplar-based image translation.

Contrastive Learning Image Generation +2

Paper
Add Code

Fourier Document Restoration for Robust Document Dewarping and Recognition

1 code implementation • CVPR 2022 • Chuhui Xue, Zichen Tian, Fangneng Zhan, Shijian Lu, Song Bai

State-of-the-art document dewarping techniques learn to predict 3-dimensional information of documents which are prone to errors while dealing with documents with irregular distortions or large variations in depth.

Paper
Code

Unbiased Subclass Regularization for Semi-Supervised Semantic Segmentation

1 code implementation • CVPR 2022 • Dayan Guan, Jiaxing Huang, Aoran Xiao, Shijian Lu

We build the balanced subclass distributions by clustering pixels of each original class into multiple subclasses of similar sizes, which provide class-balanced pseudo supervision to regularize the class-biased segmentation.

Segmentation Semi-Supervised Semantic Segmentation

Paper
Code

Modulated Contrast for Versatile Image Synthesis

1 code implementation • CVPR 2022 • Fangneng Zhan, Jiahui Zhang, Yingchen Yu, Rongliang Wu, Shijian Lu

Perceiving the similarity between images has been a long-standing and fundamental problem underlying various visual generation tasks.

Contrastive Learning Image Generation

Paper
Code

Accelerating DETR Convergence via Semantic-Aligned Matching

1 code implementation • CVPR 2022 • Gongjie Zhang, Zhipeng Luo, Yingchen Yu, Kaiwen Cui, Shijian Lu

First, it projects object queries into the same embedding space as encoded image features, where the matching can be accomplished efficiently with aligned semantics.

Object object-detection +1

287

Paper
Code

Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting

no code implementations • 8 Mar 2022 • Chuhui Xue, Wenqing Zhang, Yu Hao, Shijian Lu, Philip Torr, Song Bai

Our network consists of an image encoder and a character-aware text encoder that extract visual and textual features, respectively, as well as a visual-textual decoder that models the interaction among textual and visual features for learning effective scene text representations.

Optical Character Recognition Optical Character Recognition (OCR) +2

Paper
Add Code

Unsupervised Point Cloud Representation Learning with Deep Neural Networks: A Survey

1 code implementation • 28 Feb 2022 • Aoran Xiao, Jiaxing Huang, Dayan Guan, Xiaoqin Zhang, Shijian Lu, Ling Shao

The convergence of point cloud and DNNs has led to many deep point cloud models, largely trained under the supervision of large-scale and densely-labelled point cloud data.

Autonomous Driving Representation Learning

180

Paper
Code

Dual Learning Music Composition and Dance Choreography

no code implementations • 28 Jan 2022 • Shuang Wu, Zhenguang Li, Shijian Lu, Li Cheng

Music and dance have always co-existed as pillars of human activities, contributing immensely to the cultural, social, and entertainment functions in virtually all societies.

Paper
Add Code

Investigating Pose Representations and Motion Contexts Modeling for 3D Motion Prediction

1 code implementation • 30 Dec 2021 • Zhenguang Liu, Shuang Wu, Shuyuan Jin, Shouling Ji, Qi Liu, Shijian Lu, Li Cheng

One aspect that has been obviated so far, is the fact that how we represent the skeletal pose has a critical impact on the prediction results.

motion prediction

Paper
Code

Multimodal Image Synthesis and Editing: The Generative AI Era

2 code implementations • 27 Dec 2021 • Fangneng Zhan, Yingchen Yu, Rongliang Wu, Jiahui Zhang, Shijian Lu, Lingjie Liu, Adam Kortylewski, Christian Theobalt, Eric Xing

With superb power in modeling the interaction among multimodal information, multimodal image synthesis and editing has become a hot research topic in recent years.

Image Generation

753

Paper
Code

PTTR: Relational 3D Point Cloud Object Tracking with Transformer

1 code implementation • CVPR 2022 • Changqing Zhou, Zhipeng Luo, Yueru Luo, Tianrui Liu, Liang Pan, Zhongang Cai, Haiyu Zhao, Shijian Lu

In a point cloud sequence, 3D object tracking aims to predict the location and orientation of an object in the current search point cloud given a template point cloud.

3D Object Tracking Object +3

116

Paper
Code

Music-to-Dance Generation with Optimal Transport

no code implementations • 3 Dec 2021 • Shuang Wu, Shijian Lu, Li Cheng

We introduce an optimal transport distance for evaluating the authenticity of the generated dance distribution and a Gromov-Wasserstein distance to measure the correspondence between the dance distribution and the input music.

Retrieval Unity

Paper
Add Code

Model Adaptation: Historical Contrastive Learning for Unsupervised Domain Adaptation without Source Data

1 code implementation • NeurIPS 2021 • Jiaxing Huang, Dayan Guan, Aoran Xiao, Shijian Lu

To this end, we design an innovative historical contrastive learning (HCL) technique that exploits historical source hypothesis to make up for the absence of source data in UMA.

Contrastive Learning Unsupervised Domain Adaptation

Paper
Code

GenCo: Generative Co-training for Generative Adversarial Networks with Limited Data

1 code implementation • 4 Oct 2021 • Kaiwen Cui, Jiaxing Huang, Zhipeng Luo, Gongjie Zhang, Fangneng Zhan, Shijian Lu

Specifically, we design GenCo, a Generative Co-training network that mitigates the discriminator over-fitting issue by introducing multiple complementary discriminators that provide diverse supervision from multiple distinctive views in training.

Data Augmentation Image Generation

Paper
Code

Contextual Text Detection

no code implementations • 29 Sep 2021 • Chuhui Xue, Jiaxing Huang, Wenqing Zhang, Shijian Lu, Song Bai, Changhu Wang

This paper presents Contextual Text Detection, a new setup that detects contextual text blocks for better understanding of texts in scenes.

Text Detection

Paper
Add Code

Skeleton Cloud Colorization for Unsupervised 3D Action Representation Learning

no code implementations • ICCV 2021 • Siyuan Yang, Jun Liu, Shijian Lu, Meng Hwa Er, Alex C. Kot

We investigate unsupervised representation learning for skeleton action recognition, and design a novel skeleton cloud colorization technique that is capable of learning skeleton representations from unlabeled skeleton sequence data.

3D Action Recognition Colorization +1

Paper
Add Code

WaveFill: A Wavelet-based Generation Network for Image Inpainting

1 code implementation • ICCV 2021 • Yingchen Yu, Fangneng Zhan, Shijian Lu, Jianxiong Pan, Feiying Ma, Xuansong Xie, Chunyan Miao

This paper presents WaveFill, a wavelet-based inpainting network that decomposes images into multiple frequency bands and fills the missing regions in each frequency band separately and explicitly.

Image Inpainting

Paper
Code

Domain Adaptive Video Segmentation via Temporal Consistency Regularization

1 code implementation • ICCV 2021 • Dayan Guan, Jiaxing Huang, Aoran Xiao, Shijian Lu

This paper presents DA-VSN, a domain adaptive video segmentation network that addresses domain gaps in videos by temporal consistency regularization (TCR) for consecutive frames of target-domain videos.

Segmentation Unsupervised Domain Adaptation +1

Paper
Code

Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency

1 code implementation • ICCV 2021 • Zhipeng Luo, Zhongang Cai, Changqing Zhou, Gongjie Zhang, Haiyu Zhao, Shuai Yi, Shijian Lu, Hongsheng Li, Shanghang Zhang, Ziwei Liu

In addition, existing 3D domain adaptive detection methods often assume prior access to the target domain annotations, which is rarely feasible in the real world.

3D Object Detection Autonomous Driving +1

Paper
Code

Transfer Learning from Synthetic to Real LiDAR Point Cloud for Semantic Segmentation

1 code implementation • 12 Jul 2021 • Aoran Xiao, Jiaxing Huang, Dayan Guan, Fangneng Zhan, Shijian Lu

Extensive experiments show that SynLiDAR provides a high-quality data source for studying 3D transfer and the proposed PCT achieves superior point cloud translation consistently across the three setups.

Ranked #3 on 3D Unsupervised Domain Adaptation on SynLiDAR-to-SemanticKITTI

3D Unsupervised Domain Adaptation Data Augmentation +5

114

Paper
Code

FBC-GAN: Diverse and Flexible Image Synthesis via Foreground-Background Composition

no code implementations • 7 Jul 2021 • Kaiwen Cui, Gongjie Zhang, Fangneng Zhan, Jiaxing Huang, Shijian Lu

Generative Adversarial Networks (GANs) have become the de-facto standard in image synthesis.

Image Generation Object

Paper
Add Code

Bi-level Feature Alignment for Versatile Image Translation and Manipulation

2 code implementations • 7 Jul 2021 • Fangneng Zhan, Yingchen Yu, Rongliang Wu, Jiahui Zhang, Kaiwen Cui, Aoran Xiao, Shijian Lu, Chunyan Miao

This paper presents a versatile image translation and manipulation framework that achieves accurate semantic and style guidance in image generation by explicitly building a correspondence.

Image Generation Translation

Paper
Code

Learning Disentangled Representation Implicitly via Transformer for Occluded Person Re-Identification

no code implementations • 6 Jul 2021 • Mengxi Jia, Xinhua Cheng, Shijian Lu, Jian Zhang

To better eliminate interference from occlusions, we design a contrast feature learning technique (CFL) for better separation of occlusion features and discriminative ID features.

Person Re-Identification Representation Learning

Paper
Add Code

Blind Image Super-Resolution via Contrastive Representation Learning

no code implementations • 1 Jul 2021 • Jiahui Zhang, Shijian Lu, Fangneng Zhan, Yingchen Yu

Extensive experiments on synthetic datasets and real images show that the proposed CRL-SR can handle multi-modal and spatially variant degradation effectively under blind settings and it also outperforms state-of-the-art SR methods qualitatively and quantitatively.

Contrastive Learning Image Super-Resolution +1

Paper
Add Code

Sparse Needlets for Lighting Estimation with Spherical Transport Loss

no code implementations • ICCV 2021 • Fangneng Zhan, Changgong Zhang, WenBo Hu, Shijian Lu, Feiying Ma, Xuansong Xie, Ling Shao

Accurate lighting estimation is challenging yet critical to many computer vision and computer graphics tasks such as high-dynamic-range (HDR) relighting.

Lighting Estimation

Paper
Add Code

Unbalanced Feature Transport for Exemplar-based Image Translation

no code implementations • CVPR 2021 • Fangneng Zhan, Yingchen Yu, Kaiwen Cui, Gongjie Zhang, Shijian Lu, Jianxiong Pan, Changgong Zhang, Feiying Ma, Xuansong Xie, Chunyan Miao

In addition, we design a semantic-activation normalization scheme that injects style features of exemplars into the image translation process successfully.

Image-to-Image Translation Semantic Segmentation +1

Paper
Add Code

Domain Consistency Regularization for Unsupervised Multi-source Domain Adaptive Classification

no code implementations • 16 Jun 2021 • Zhipeng Luo, Xiaobing Zhang, Shijian Lu, Shuai Yi

Compared with single-source unsupervised domain adaptation (SUDA), domain shift in MUDA exists not only between the source and target domains but also among multiple source domains.

Classification Multi-Source Unsupervised Domain Adaptation +2

Paper
Add Code

Spectral Unsupervised Domain Adaptation for Visual Recognition

no code implementations • CVPR 2022 • Jingyi Zhang, Jiaxing Huang, Zichen Tian, Shijian Lu

Second, it introduces multi-view spectral learning that learns useful unsupervised representations by maximizing mutual information among multiple ST-generated spectral views of each target sample.

Image Classification object-detection +3

Paper
Add Code

Semi-Supervised Domain Adaptation via Adaptive and Progressive Feature Alignment

no code implementations • 5 Jun 2021 • Jiaxing Huang, Dayan Guan, Aoran Xiao, Shijian Lu

We position the few labeled target samples as references that gauge the similarity between source and target features and guide adaptive inter-domain alignment for learning more similar source features.

Domain Adaptation Image Classification +4

Paper
Add Code

Category Contrast for Unsupervised Domain Adaptation in Visual Tasks

1 code implementation • CVPR 2022 • Jiaxing Huang, Dayan Guan, Aoran Xiao, Shijian Lu, Ling Shao

In this work, we explore the idea of instance contrastive learning in unsupervised domain adaptation (UDA) and propose a novel Category Contrast technique (CaCo) that introduces semantic priors on top of instance discrimination for visual UDA tasks.

Contrastive Learning Representation Learning +1

Paper
Code

RDA: Robust Domain Adaptation via Fourier Adversarial Attacking

1 code implementation • ICCV 2021 • Jiaxing Huang, Dayan Guan, Aoran Xiao, Shijian Lu

With FAA-generated samples, the training can continue the 'random walk' and drift into an area with a flat loss landscape, leading to more robust domain adaptation.

Unsupervised Domain Adaptation

Paper
Code

I2C2W: Image-to-Character-to-Word Transformers for Accurate Scene Text Recognition

no code implementations • 18 May 2021 • Chuhui Xue, Jiaxing Huang, Wenqing Zhang, Shijian Lu, Changhu Wang, Song Bai

The first task focuses on image-to-character (I2C) mapping which detects a set of character candidates from images based on different alignments of visual features in an non-sequential way.

Scene Text Recognition

Paper
Add Code

Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

no code implementations • 26 Apr 2021 • Yingchen Yu, Fangneng Zhan, Rongliang Wu, Jianxiong Pan, Kaiwen Cui, Shijian Lu, Feiying Ma, Xuansong Xie, Chunyan Miao

With image-level attention, transformers enable to model long-range dependencies and generate diverse contents with autoregressive modeling of pixel-sequence distributions.

Image Inpainting Language Modelling

Paper
Add Code

DA-DETR: Domain Adaptive Detection Transformer with Information Fusion

no code implementations • CVPR 2023 • Jingyi Zhang, Jiaxing Huang, Zhipeng Luo, Gongjie Zhang, Xiaoqin Zhang, Shijian Lu

DA-DETR introduces a novel CNN-Transformer Blender (CTBlender) that fuses the CNN features and Transformer features ingeniously for effective feature alignment and knowledge transfer across domains.

Domain Adaptation Object +3

Paper
Add Code

Defect-GAN: High-Fidelity Defect Synthesis for Automated Defect Inspection

no code implementations • 28 Mar 2021 • Gongjie Zhang, Kaiwen Cui, Tzu-Yi Hung, Shijian Lu

In addition, the synthesized defect samples demonstrate their effectiveness in training better defect inspection networks.

Vocal Bursts Intensity Prediction

Paper
Add Code

MLAN: Multi-Level Adversarial Network for Domain Adaptive Semantic Segmentation

no code implementations • 24 Mar 2021 • Jiaxing Huang, Dayan Guan, Shijian Lu, Aoran Xiao

Recent progresses in domain adaptive semantic segmentation demonstrate the effectiveness of adversarial learning (AL) in unsupervised domain adaptation.

Image-to-Image Translation Semantic Segmentation +2

Paper
Add Code

Meta-DETR: Image-Level Few-Shot Object Detection with Inter-Class Correlation Exploitation

2 code implementations • 22 Mar 2021 • Gongjie Zhang, Zhipeng Luo, Kaiwen Cui, Shijian Lu

Few-shot object detection has been extensively investigated by incorporating meta-learning into region-based detection frameworks.

Ranked #7 on Few-Shot Object Detection on MS-COCO (30-shot)

Few-Shot Object Detection Meta-Learning +2

366

Paper
Code

Cross-View Regularization for Domain Adaptive Panoptic Segmentation

1 code implementation • CVPR 2021 • Jiaxing Huang, Dayan Guan, Aoran Xiao, Shijian Lu

The inter-task regularization exploits the complementary nature of instance segmentation and semantic segmentation and uses it as a constraint for better feature alignment across domains.

Ranked #2 on Domain Adaptation on Panoptic SYNTHIA-to-Mapillary

Domain Adaptation Instance Segmentation +2

Paper
Code

FSDR: Frequency Space Domain Randomization for Domain Generalization

1 code implementation • CVPR 2021 • Jiaxing Huang, Dayan Guan, Aoran Xiao, Shijian Lu

It has been studied widely by domain randomization that transfers source images to different styles in spatial space for learning domain-agnostic features.

Domain Generalization

Paper
Code

FPS-Net: A Convolutional Fusion Network for Large-Scale LiDAR Point Cloud Segmentation

1 code implementation • 1 Mar 2021 • Aoran Xiao, Xiaofei Yang, Shijian Lu, Dayan Guan, Jiaxing Huang

Specifically, we design a residual dense block with multiple receptive fields as a building block in the encoder which preserves detailed information in each modality and learns hierarchical modality-specific and fused features effectively.

Ranked #23 on 3D Semantic Segmentation on SemanticKITTI

3D Semantic Segmentation Point Cloud Segmentation +2

Paper
Code

Detection and Rectification of Arbitrary Shaped Scene Texts by using Text Keypoints and Links

no code implementations • 1 Mar 2021 • Chuhui Xue, Shijian Lu, Steven Hoi

Detection and recognition of scene texts of arbitrary shapes remain a grand challenge due to the super-rich text shape variation in text line orientations, lengths, curvatures, etc.

Scene Text Detection Text Detection

Paper
Add Code

Uncertainty-Aware Unsupervised Domain Adaptation in Object Detection

3 code implementations • 27 Feb 2021 • Dayan Guan, Jiaxing Huang, Aoran Xiao, Shijian Lu, Yanpeng Cao

Specifically, we design an uncertainty metric that assesses the alignment of each sample and adjusts the strength of adversarial learning for well-aligned and poorly-aligned samples adaptively.

Object object-detection +2

Paper
Code

GMLight: Lighting Estimation via Geometric Distribution Approximation

1 code implementation • 20 Feb 2021 • Fangneng Zhan, Yingchen Yu, Changgong Zhang, Rongliang Wu, WenBo Hu, Shijian Lu, Feiying Ma, Xuansong Xie, Ling Shao

This paper presents Geometric Mover's Light (GMLight), a lighting estimation framework that employs a regression network and a generative projector for effective illumination estimation.

Lighting Estimation regression

160

Paper
Code

EMLight: Lighting Estimation via Spherical Distribution Approximation

no code implementations • 21 Dec 2020 • Fangneng Zhan, Changgong Zhang, Yingchen Yu, Yuan Chang, Shijian Lu, Feiying Ma, Xuansong Xie

Motivated by the Earth Mover distance, we design a novel spherical mover's loss that guides to regress light distribution parameters accurately by taking advantage of the subtleties of spherical distribution.

Lighting Estimation regression

Paper
Add Code

Adversarial Image Composition with Auxiliary Illumination

no code implementations • 17 Sep 2020 • Fangneng Zhan, Shijian Lu, Changgong Zhang, Feiying Ma, Xuansong Xie

State-of-the-art methods strive to harmonize the composed image by adapting the style of foreground objects to be compatible with the background image, whereas the potential shadow of foreground objects within the composed image which is critical to the composition realism is largely neglected.

Paper
Add Code

LEED: Label-Free Expression Editing via Disentanglement

no code implementations • ECCV 2020 • Rongliang Wu, Shijian Lu

Recent studies on facial expression editing have obtained very promising progress.

Attribute Disentanglement

Paper
Add Code

Towards Realistic 3D Embedding via View Alignment

no code implementations • 14 Jul 2020 • Changgong Zhang, Fangneng Zhan, Shijian Lu, Feiying Ma, Xuansong Xie

Recent advances in generative adversarial networks (GANs) have achieved great success in automated image composition that generates new images by embedding interested foreground objects into background images automatically.

Paper
Add Code

Contextual-Relation Consistent Domain Adaptation for Semantic Segmentation

1 code implementation • ECCV 2020 • Jiaxing Huang, Shijian Lu, Dayan Guan, Xiaobing Zhang

Recent advances in unsupervised domain adaptation for semantic segmentation have shown great potentials to relieve the demand of expensive per-pixel annotations.

Relation Segmentation +2

Paper
Code

Multiple Expert Brainstorming for Domain Adaptive Person Re-identification

2 code implementations • ECCV 2020 • Yunpeng Zhai, Qixiang Ye, Shijian Lu, Mengxi Jia, Rongrong Ji, Yonghong Tian

Often the best performing deep neural models are ensembles of multiple base-level networks, nevertheless, ensemble learning with respect to domain adaptive person re-ID remains unexplored.

Domain Adaptive Person Re-Identification Ensemble Learning +1

103

Paper
Code

A Similarity Inference Metric for RGB-Infrared Cross-Modality Person Re-identification

no code implementations • 3 Jul 2020 • Mengxi Jia, Yunpeng Zhai, Shijian Lu, Siwei Ma, Jian Zhang

RGB-Infrared (IR) cross-modality person re-identification (re-ID), which aims to search an IR image in RGB gallery or vice versa, is a challenging task due to the large discrepancy between IR and RGB modalities.

Cross-Modality Person Re-identification Person Re-Identification

Paper
Add Code

AD-Cluster: Augmented Discriminative Clustering for Domain Adaptive Person Re-identification

no code implementations • CVPR 2020 • Yunpeng Zhai, Shijian Lu, Qixiang Ye, Xuebo Shan, Jie Chen, Rongrong Ji, Yonghong Tian

Domain adaptive person re-identification (re-ID) is a challenging task, especially when person identities in target domains are unknown.

Ranked #8 on Unsupervised Domain Adaptation on Duke to Market

Clustering Domain Adaptive Person Re-Identification +2

Paper
Add Code

Self-Guided Adaptation: Progressive Representation Alignment for Domain Adaptive Object Detection

no code implementations • 19 Mar 2020 • Zongxian Li, Qixiang Ye, Chong Zhang, Jingjing Liu, Shijian Lu, Yonghong Tian

In this work, we propose a Self-Guided Adaptation (SGA) model, target at aligning feature representation and transferring object detection models across domains while considering the instantaneous alignment difficulty.

object-detection Object Detection +1

Paper
Add Code

Cascade EF-GAN: Progressive Facial Expression Editing with Local Focuses

no code implementations • CVPR 2020 • Rongliang Wu, Gongjie Zhang, Shijian Lu, Tao Chen

Recent advances in Generative Adversarial Nets (GANs) have shown remarkable improvements for facial expression editing.

Paper
Add Code

Suppressing Uncertainties for Large-Scale Facial Expression Recognition

2 code implementations • CVPR 2020 • Kai Wang, Xiaojiang Peng, Jianfei Yang, Shijian Lu, Yu Qiao

Annotating a qualitative large-scale facial expression dataset is extremely difficult due to the uncertainties caused by ambiguous facial expressions, low-quality facial images, and the subjectiveness of annotators.

Facial Expression Recognition Facial Expression Recognition (FER)

404

Paper
Code

ICDAR 2019 Robust Reading Challenge on Reading Chinese Text on Signboard

no code implementations • 20 Dec 2019 • Xi Liu, Rui Zhang, Yongsheng Zhou, Qianyi Jiang, Qi Song, Nan Li, Kai Zhou, Lei Wang, Dong Wang, Minghui Liao, Mingkun Yang, Xiang Bai, Baoguang Shi, Dimosthenis Karatzas, Shijian Lu, C. V. Jawahar

21 teams submit results for Task 1, 23 teams submit results for Task 2, 24 teams submit results for Task 3, and 13 teams submit results for Task 4.

Line Detection Task 2

Paper
Add Code

GA-DAN: Geometry-Aware Domain Adaptation Network for Scene Text Detection and Recognition

no code implementations • ICCV 2019 • Fangneng Zhan, Chuhui Xue, Shijian Lu

Recent adversarial learning research has achieved very impressive progress for modelling cross-domain data shifts in appearance space but its counterpart in modelling cross-domain shifts in geometry space lags far behind.

Domain Adaptation Scene Text Detection +1

Paper
Add Code

Coupled-Projection Residual Network for MRI Super-Resolution

no code implementations • 12 Jul 2019 • Chun-Mei Feng, Kai Wang, Shijian Lu, Yong Xu, Heng Kong, Ling Shao

The deep sub-network learns from the residuals of the high-frequency image information, where multiple residual blocks are cascaded to magnify the MRI images at the last network layer.

Super-Resolution

Paper
Add Code

Hierarchy Composition GAN for High-fidelity Image Synthesis

no code implementations • 12 May 2019 • Fangneng Zhan, Jiaxing Huang, Shijian Lu

Despite the rapid progress of generative adversarial networks (GANs) in image synthesis in recent years, the existing image synthesis approaches work in either geometry domain or appearance domain alone which often introduces various synthesis artifacts.

Image Generation Vocal Bursts Intensity Prediction

Paper
Add Code

CAD-Net: A Context-Aware Detection Network for Objects in Remote Sensing Imagery

1 code implementation • 3 Mar 2019 • Gongjie Zhang, Shijian Lu, Wei zhang

This paper presents a novel object detection network (CAD-Net) that exploits attention-modulated features as well as global and local contexts to address the new challenges in detecting objects from remote sensing images.

Novel Object Detection Object +2

Paper
Code

Scene Text Synthesis for Efficient and Effective Deep Network Training

no code implementations • 26 Jan 2019 • Changgong Zhang, Fangneng Zhan, Hongyuan Zhu, Shijian Lu

Experiments over a number of public datasets demonstrate the effectiveness of our proposed image synthesis technique - the use of our synthesized images in deep network training is capable of achieving similar or even better scene text detection and scene text recognition performance as compared with using real images.

Image Generation Scene Text Detection +2

Paper
Add Code

MSR: Multi-Scale Shape Regression for Scene Text Detection

no code implementations • 9 Jan 2019 • Chuhui Xue, Shijian Lu, Wei zhang

State-of-the-art scene text detection techniques predict quadrilateral boxes that are prone to localization errors while dealing with straight or curved text lines of different orientations and lengths in scenes.

regression Scene Text Detection +1

Paper
Add Code

ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification

no code implementations • CVPR 2019 • Fangneng Zhan, Shijian Lu

Automated recognition of texts in scenes has been a research challenge for years, largely due to the arbitrary variation of text appearances in perspective distortion, text line curvature, text styles and different types of imaging artifacts.

Scene Text Recognition

Paper
Add Code

Spatial Fusion GAN for Image Synthesis

no code implementations • CVPR 2019 • Fangneng Zhan, Hongyuan Zhu, Shijian Lu

Recent advances in generative adversarial networks (GANs) have shown great potentials in realistic image synthesis whereas most existing works address synthesis realism in either appearance space or geometry space but few in both.

Image Generation

Paper
Add Code

A pooling based scene text proposal technique for scene text reading in the wild

no code implementations • 25 Nov 2018 • Dinh NguyenVan, Shijian Lu, Shangxuan Tian, Nizar Ouarti, Mounir Mokhtari

Automatic reading texts in scenes has attracted increasing interest in recent years as texts often carry rich semantic information that is useful for scene understanding.

Scene Understanding Text Spotting

Paper
Add Code

Attention Driven Person Re-identification

no code implementations • 13 Oct 2018 • Fan Yang, Ke Yan, Shijian Lu, Huizhu Jia, Xiaodong Xie, Wen Gao

Person re-identification (ReID) is a challenging task due to arbitrary human pose variations, background clutters, etc.

Person Re-Identification

Paper
Add Code

Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping

no code implementations • ECCV 2018 • Chuhui Xue, Shijian Lu, Fangneng Zhan

This paper presents a scene text detection technique that exploits bootstrapping and text border semantics for accurate localization of texts in scenes.

Scene Text Detection Text Detection

Paper
Add Code

Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes

no code implementations • ECCV 2018 • Fangneng Zhan, Shijian Lu, Chuhui Xue

This paper presents a novel image synthesis technique that aims to generate a large amount of annotated scene text images for training accurate and robust scene text detection and recognition models.

Image Generation Scene Text Detection +2

Paper
Add Code

WeText: Scene Text Detection under Weak Supervision

no code implementations • ICCV 2017 • Shangxuan Tian, Shijian Lu, Chongshou Li

With a "light" supervised model trained on a small fully annotated dataset, we explore semi-supervised and weakly supervised learning on a large unannotated dataset and a large weakly annotated dataset, respectively.

Scene Text Detection Text Detection +1

Paper
Add Code

TORNADO: A Spatio-Temporal Convolutional Regression Network for Video Action Proposal

no code implementations • ICCV 2017 • Hongyuan Zhu, Romain Vial, Shijian Lu

Recently, the regression-based object detectors and long-term recurrent convolutional network (LRCN) have demonstrated superior performance in human action detection and recognition.

Action Detection regression

Paper
Add Code

ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17)

5 code implementations • 31 Aug 2017 • Baoguang Shi, Cong Yao, Minghui Liao, Mingkun Yang, Pei Xu, Linyan Cui, Serge Belongie, Shijian Lu, Xiang Bai

This report introduces RCTW, a new competition that focuses on Chinese text reading.

valid

2,893

Paper
Code

YoTube: Searching Action Proposal via Recurrent and Static Regression Networks

no code implementations • 26 Jun 2017 • Hongyuan Zhu, Romain Vial, Shijian Lu, Yonghong Tian, Xian-Bin Cao

In this paper, we present YoTube-a novel network fusion framework for searching action proposals in untrimmed videos, where each action proposal corresponds to a spatialtemporal video tube that potentially locates one human action.

Optical Flow Estimation regression

Paper
Add Code

Enriched Deep Recurrent Visual Attention Model for Multiple Object Recognition

no code implementations • 12 Jun 2017 • Artsiom Ablavatski, Shijian Lu, Jianfei Cai

We design an Enriched Deep Recurrent Visual Attention Model (EDRAM) - an improved attention-based architecture for multiple object recognition.

Object Recognition

Paper
Add Code

WordFence: Text Detection in Natural Images with Border Awareness

no code implementations • 15 May 2017 • Andrei Polzounov, Artsiom Ablavatski, Sergio Escalera, Shijian Lu, Jianfei Cai

In recent years, text recognition has achieved remarkable success in recognizing scanned document text.

Semantic Segmentation Text Detection

Paper
Add Code

Discriminative Multi-Modal Feature Fusion for RGBD Indoor Scene Recognition

no code implementations • CVPR 2016 • Hongyuan Zhu, Jean-Baptiste Weibel, Shijian Lu

RGBD scene recognition has attracted increasingly attention due to the rapid development of depth sensors and their wide application scenarios.

Image Segmentation Object Recognition +3

Paper
Add Code

Text Flow: A Unified Text Detection System in Natural Scene Images

no code implementations • ICCV 2015 • Shangxuan Tian, Yifeng Pan, Chang Huang, Shijian Lu, Kai Yu, Chew Lim Tan

With character candidates detected by cascade boosting, the min-cost flow network model integrates the last three sequential steps into a single process which solves the error accumulation problem at both character level and text line level effectively.

Scene Text Detection Text Detection +1

Paper
Add Code

Diagnosing State-Of-The-Art Object Proposal Methods

no code implementations • 16 Jul 2015 • Hongyuan Zhu, Shijian Lu, Jianfei Cai, Quangqing Lee

Recently, Hosang et al. conduct the first unified study of existing methods' in terms of various image-level degradations.

Object object-detection +1

Paper
Add Code

Beyond Pixels: A Comprehensive Survey from Bottom-up to Semantic Image Segmentation and Cosegmentation

no code implementations • 3 Feb 2015 • Hongyuan Zhu, Fanman Meng, Jianfei Cai, Shijian Lu

Image segmentation refers to the process to divide an image into nonoverlapping meaningful regions according to human perception, which has become a classic topic since the early ages of computer vision.

Image Segmentation Segmentation +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.