Search Results for author: Wengang Zhou

Found 118 papers, 62 papers with code

Wavelet-Based Dual-Branch Network for Image Demoiréing

no code implementations ECCV 2020 Lin Liu, Jianzhuang Liu, Shanxin Yuan, Gregory Slabaugh, Aleš Leonardis, Wengang Zhou, Qi Tian

When smartphone cameras are used to take photos of digital screens, usually moire patterns result, severely degrading photo quality.

Image Restoration Rain Removal

DeepEraser: Deep Iterative Context Mining for Generic Text Eraser

1 code implementation29 Feb 2024 Hao Feng, Wendi Wang, Shaokai Liu, Jiajun Deng, Wengang Zhou, Houqiang Li

In this work, we present DeepEraser, an effective deep network for generic text removal.

Sinkhorn Distance Minimization for Knowledge Distillation

1 code implementation27 Feb 2024 Xiao Cui, Yulei Qin, Yuting Gao, Enwei Zhang, Zihan Xu, Tong Wu, Ke Li, Xing Sun, Wengang Zhou, Houqiang Li

We propose the Sinkhorn Knowledge Distillation (SinKD) that exploits the Sinkhorn distance to ensure a nuanced and precise assessment of the disparity between teacher and student distributions.

Knowledge Distillation

Instance-aware Exploration-Verification-Exploitation for Instance ImageGoal Navigation

no code implementations25 Feb 2024 Xiaohan Lei, Min Wang, Wengang Zhou, Li Li, Houqiang Li

In this work, we propose to imitate the human behaviour of ``getting closer to confirm" when distinguishing objects from a distance.

Navigate

Exploiting GPT-4 Vision for Zero-shot Point Cloud Understanding

no code implementations15 Jan 2024 Qi Sun, Xiao Cui, Wengang Zhou, Houqiang Li

In this study, we tackle the challenge of classifying the object category in point clouds, which previous works like PointCLIP struggle to address due to the inherent limitations of the CLIP architecture.

Point Cloud Classification Robust classification +1

DanZero+: Dominating the GuanDan Game through Reinforcement Learning

1 code implementation5 Dec 2023 Youpeng Zhao, Yudong Lu, Jian Zhao, Wengang Zhou, Houqiang Li

The utilization of artificial intelligence (AI) in card games has been a well-explored subject within AI research for an extensive period.

Card Games reinforcement-learning

Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs

1 code implementation22 Nov 2023 Yonghui Wang, Wengang Zhou, Hao Feng, Keyi Zhou, Houqiang Li

Moreover, we curate a collection of text-rich images and prompt the text-only GPT-4 to generate 12K high-quality conversations, featuring textual locations within text-rich scenarios.

document understanding Instruction Following +3

DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding

no code implementations20 Nov 2023 Hao Feng, Qi Liu, Hao liu, Wengang Zhou, Houqiang Li, Can Huang

This work presents DocPedia, a novel large multimodal model (LMM) for versatile OCR-free document understanding, capable of parsing images up to 2, 560$\times$2, 560 resolution.

document understanding Language Modelling +2

Progressive Recurrent Network for Shadow Removal

no code implementations1 Nov 2023 Yonghui Wang, Wengang Zhou, Hao Feng, Li Li, Houqiang Li

To handle this issue, we consider removing the shadow in a coarse-to-fine fashion and propose a simple but effective Progressive Recurrent Network (PRNet).

Image Shadow Removal Shadow Removal

I$^2$MD: 3D Action Representation Learning with Inter- and Intra-modal Mutual Distillation

no code implementations24 Oct 2023 Yunyao Mao, Jiajun Deng, Wengang Zhou, Zhenbo Lu, Wanli Ouyang, Houqiang Li

Different from existing distillation solutions that transfer the knowledge of a pre-trained and fixed teacher to the student, in CMD, the knowledge is continuously updated and bidirectionally distilled between modalities during pre-training.

Contrastive Learning Representation Learning

UniDoc: A Universal Large Multimodal Model for Simultaneous Text Detection, Recognition, Spotting and Understanding

no code implementations19 Aug 2023 Hao Feng, Zijian Wang, Jingqun Tang, Jinghui Lu, Wengang Zhou, Houqiang Li, Can Huang

However, existing advanced algorithms are limited to effectively utilizing the immense representation capabilities and rich world knowledge inherent to these large pre-trained models, and the beneficial connections among tasks within the context of text-rich scenarios have not been sufficiently explored.

Instruction Following Text Detection +1

SimFIR: A Simple Framework for Fisheye Image Rectification with Self-supervised Representation Learning

no code implementations ICCV 2023 Hao Feng, Wendi Wang, Jiajun Deng, Wengang Zhou, Li Li, Houqiang Li

To make the best of such rectification cues, we introduce SimFIR, a simple framework for fisheye image rectification based on self-supervised representation learning.

Representation Learning

Text-Only Training for Visual Storytelling

no code implementations17 Aug 2023 Yuechen Wang, Wengang Zhou, Zhenbo Lu, Houqiang Li

Visual storytelling aims to generate a narrative based on a sequence of images, necessitating both vision-language alignment and coherent story generation.

Informativeness Visual Storytelling

Masked Motion Predictors are Strong 3D Action Representation Learners

1 code implementation ICCV 2023 Yunyao Mao, Jiajun Deng, Wengang Zhou, Yao Fang, Wanli Ouyang, Houqiang Li

To be specific, the proposed MAMP takes as input the masked spatio-temporal skeleton sequence and predicts the corresponding temporal motion of the masked human joints.

motion prediction Skeleton Based Action Recognition

Cyclic-Bootstrap Labeling for Weakly Supervised Object Detection

1 code implementation ICCV 2023 Yufei Yin, Jiajun Deng, Wengang Zhou, Li Li, Houqiang Li

These inaccurate high-scoring region proposals will mislead the training of subsequent refinement modules and thus hamper the detection performance.

Object object-detection +1

Exploiting Spatial-Temporal Context for Interacting Hand Reconstruction on Monocular RGB Video

no code implementations8 Aug 2023 Weichao Zhao, Hezhen Hu, Wengang Zhou, Li Li, Houqiang Li

Reconstructing interacting hands from monocular RGB data is a challenging task, as it involves many interfering factors, e. g. self- and mutual occlusion and similar textures.

AltFreezing for More General Video Face Forgery Detection

1 code implementation CVPR 2023 Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Houqiang Li

In this paper, we propose to capture both spatial and temporal artifacts in one model for face forgery detection.

Data Augmentation

Exploring Effective Mask Sampling Modeling for Neural Image Compression

no code implementations9 Jun 2023 Lin Liu, Mingming Zhao, Shanxin Yuan, Wenlong Lyu, Wengang Zhou, Houqiang Li, Yanfeng Wang, Qi Tian

Specifically, Cube Mask Sampling Module (CMSM) is proposed to apply both spatial and channel mask sampling modeling to image compression in the pre-training stage.

Image Compression Self-Supervised Learning

MA2CL:Masked Attentive Contrastive Learning for Multi-Agent Reinforcement Learning

1 code implementation3 Jun 2023 Haolin Song, Mingxiao Feng, Wengang Zhou, Houqiang Li

Recent approaches have utilized self-supervised auxiliary tasks as representation learning to improve the performance and sample efficiency of vision-based reinforcement learning algorithms in single-agent settings.

Contrastive Learning Multi-agent Reinforcement Learning +2

Detect Any Shadow: Segment Anything for Video Shadow Detection

1 code implementation26 May 2023 Yonghui Wang, Wengang Zhou, Yunyao Mao, Houqiang Li

Segment anything model (SAM) has achieved great success in the field of natural image segmentation.

Image Segmentation Semantic Segmentation +1

Hybrid and Collaborative Passage Reranking

1 code implementation16 May 2023 Zongmeng Zhang, Wengang Zhou, Jiaxin Shi, Houqiang Li

In passage retrieval system, the initial passage retrieval results may be unsatisfactory, which can be refined by a reranking scheme.

Passage Retrieval Retrieval

O-GNN: Incorporating Ring Priors into Molecular Modeling

1 code implementation ICLR 2023 Jinhua Zhu, Kehan Wu, Bohan Wang, Yingce Xia, Shufang Xie, Qi Meng, Lijun Wu, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu

Despite the recent success of molecular modeling with graph neural networks (GNNs), few models explicitly take rings in compounds into consideration, consequently limiting the expressiveness of the models.

 Ranked #1 on Graph Regression on PCQM4M-LSC (Validation MAE metric)

Graph Regression Molecular Property Prediction +3

DocMAE: Document Image Rectification via Self-supervised Representation Learning

1 code implementation20 Apr 2023 Shaokai Liu, Hao Feng, Wengang Zhou, Houqiang Li, Cong Liu, Feng Wu

Tremendous efforts have been made on document image rectification, but how to learn effective representation of such distorted images is still under-explored.

Representation Learning Self-Supervised Learning

Deep Unrestricted Document Image Rectification

1 code implementation18 Apr 2023 Hao Feng, Shaokai Liu, Jiajun Deng, Wengang Zhou, Houqiang Li

To our best knowledge, this is the first learning-based method for the rectification of unrestricted document images.

Local Distortion

Learning Transferable Pedestrian Representation from Multimodal Information Supervision

1 code implementation12 Apr 2023 Liping Bao, Longhui Wei, Xiaoyu Qiu, Wengang Zhou, Houqiang Li, Qi Tian

Recent researches on unsupervised person re-identification~(reID) have demonstrated that pre-training on unlabeled person images achieves superior performance on downstream reID tasks than pre-training on ImageNet.

Attribute Contrastive Learning +3

HandNeRF: Neural Radiance Fields for Animatable Interacting Hands

no code implementations CVPR 2023 Zhiyang Guo, Wengang Zhou, Min Wang, Li Li, Houqiang Li

We propose a novel framework to reconstruct accurate appearance and geometry with neural radiance fields (NeRF) for interacting hands, enabling the rendering of photo-realistic images and videos for gesture animation from arbitrary views.

DIRE for Diffusion-Generated Image Detection

1 code implementation ICCV 2023 Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Hezhen Hu, Hong Chen, Houqiang Li

We find that existing detectors struggle to detect images generated by diffusion models, even if we include generated images from a specific diffusion model in their training data.

Focus on Your Target: A Dual Teacher-Student Framework for Domain-adaptive Semantic Segmentation

no code implementations ICCV 2023 Xinyue Huo, Lingxi Xie, Wengang Zhou, Houqiang Li, Qi Tian

Currently, a popular UDA framework lies in self-training which endows the model with two-fold abilities: (i) learning reliable semantics from the labeled images in the source domain, and (ii) adapting to the target domain via generating pseudo labels on the unlabeled images.

Semantic Segmentation Unsupervised Domain Adaptation

BEST: BERT Pre-Training for Sign Language Recognition with Coupling Tokenization

no code implementations10 Feb 2023 Weichao Zhao, Hezhen Hu, Wengang Zhou, Jiaxin Shi, Houqiang Li

In this work, we are dedicated to leveraging the BERT pre-training success and modeling the domain-specific statistics to fertilize the sign language recognition~(SLR) model.

Pseudo Label Sign Language Recognition

Asymmetric Feature Fusion for Image Retrieval

no code implementations CVPR 2023 Hui Wu, Min Wang, Wengang Zhou, Zhenbo Lu, Houqiang Li

Then, a dynamic mixer is introduced to aggregate these features into a compact embedding for efficient search.

Image Retrieval Retrieval

CLIP2GAN: Towards Bridging Text with the Latent Space of GANs

no code implementations28 Nov 2022 YiXuan Wang, Wengang Zhou, Jianmin Bao, Weilun Wang, Li Li, Houqiang Li

The key idea of our CLIP2GAN is to bridge the output feature embedding space of CLIP and the input latent space of StyleGAN, which is realized by introducing a mapping network.

Attribute Image Generation +1

Hand-Object Interaction Image Generation

no code implementations28 Nov 2022 Hezhen Hu, Weilun Wang, Wengang Zhou, Houqiang Li

In this work, we are dedicated to a new task, i. e., hand-object interaction image generation, which aims to conditionally generate the hand-object image under the given hand, object and their interaction status.

Image Generation Object

SinDiffusion: Learning a Diffusion Model from a Single Natural Image

1 code implementation22 Nov 2022 Weilun Wang, Jianmin Bao, Wengang Zhou, Dongdong Chen, Dong Chen, Lu Yuan, Houqiang Li

We present SinDiffusion, leveraging denoising diffusion models to capture internal distribution of patches from a single natural image.

Denoising Image Generation +1

DanZero: Mastering GuanDan Game with Reinforcement Learning

no code implementations31 Oct 2022 Yudong Lu, Jian Zhao, Youpeng Zhao, Wengang Zhou, Houqiang Li

We compare it with 8 baseline AI programs which are based on heuristic rules and the results reveal the outstanding performance of DanZero.

Card Games reinforcement-learning +1

Fine-grained Semantic Alignment Network for Weakly Supervised Temporal Language Grounding

no code implementations Findings (EMNLP) 2021 Yuechen Wang, Wengang Zhou, Houqiang Li

In this work, we propose a novel candidate-free framework: Fine-grained Semantic Alignment Network (FSAN), for weakly supervised TLG.

Sentence

UDoc-GAN: Unpaired Document Illumination Correction with Background Light Prior

1 code implementation15 Oct 2022 Yonghui Wang, Wengang Zhou, Zhenbo Lu, Houqiang Li

To this end, we propose UDoc-GAN, the first framework to address the problem of document illumination correction under the unpaired setting.

Geometric Representation Learning for Document Image Rectification

2 code implementations15 Oct 2022 Hao Feng, Wengang Zhou, Jiajun Deng, Yuechen Wang, Houqiang Li

In document image rectification, there exist rich geometric constraints between the distorted image and the ground truth one.

Representation Learning

Low-Light Video Enhancement with Synthetic Event Guidance

no code implementations23 Aug 2022 Lin Liu, Junfeng An, Jianzhuang Liu, Shanxin Yuan, Xiangyu Chen, Wengang Zhou, Houqiang Li, Yanfeng Wang, Qi Tian

Low-light video enhancement (LLVE) is an important yet challenging task with many applications such as photographing and autonomous driving.

Autonomous Driving Image Enhancement +1

Unified 2D and 3D Pre-Training of Molecular Representations

1 code implementation14 Jul 2022 Jinhua Zhu, Yingce Xia, Lijun Wu, Shufang Xie, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu

The model is pre-trained on three tasks: reconstruction of masked atoms and coordinates, 3D conformation generation conditioned on 2D graph, and 2D graph generation conditioned on 3D conformation.

Graph Generation Molecular Property Prediction +3

Semantic Image Synthesis via Diffusion Models

2 code implementations30 Jun 2022 Weilun Wang, Jianmin Bao, Wengang Zhou, Dongdong Chen, Dong Chen, Lu Yuan, Houqiang Li

Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks compared with Generative Adversarial Nets (GANs).

Denoising Image Generation

TransVG++: End-to-End Visual Grounding with Language Conditioned Vision Transformer

1 code implementation14 Jun 2022 Jiajun Deng, Zhengyuan Yang, Daqing Liu, Tianlang Chen, Wengang Zhou, Yanyong Zhang, Houqiang Li, Wanli Ouyang

For another, we devise Language Conditioned Vision Transformer that removes external fusion modules and reuses the uni-modal ViT for vision-language fusion at the intermediate layers.

Visual Grounding

Stabilizing Voltage in Power Distribution Networks via Multi-Agent Reinforcement Learning with Transformer

1 code implementation8 Jun 2022 Minrui Wang, Mingxiao Feng, Wengang Zhou, Houqiang Li

Utilizing MARL algorithms to coordinate multiple control units in the grid, which is able to handle rapid changes of power systems, has been widely studied in active voltage control task recently.

Multi-agent Reinforcement Learning reinforcement-learning +2

Simultaneous Double Q-learning with Conservative Advantage Learning for Actor-Critic Methods

1 code implementation8 May 2022 Qing Li, Wengang Zhou, Zhenbo Lu, Houqiang Li

Actor-critic Reinforcement Learning (RL) algorithms have achieved impressive performance in continuous control tasks.

Continuous Control Q-Learning +1

Multi-Target Active Object Tracking with Monte Carlo Tree Search and Target Motion Modeling

no code implementations7 May 2022 Zheng Chen, Jian Zhao, Mingyu Yang, Wengang Zhou, Houqiang Li

In this work, we are dedicated to multi-target active object tracking (AOT), where there are multiple targets as well as multiple cameras in the environment.

Multi-agent Reinforcement Learning Object Tracking

LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent Reinforcement Learning

no code implementations5 May 2022 Mingyu Yang, Jian Zhao, Xunhan Hu, Wengang Zhou, Jiangcheng Zhu, Houqiang Li

In this way, agents dealing with the same subtask share their learning of specific abilities and different subtasks correspond to different specific abilities.

Multi-agent Reinforcement Learning reinforcement-learning +3

DouZero+: Improving DouDizhu AI by Opponent Modeling and Coach-guided Learning

1 code implementation6 Apr 2022 Youpeng Zhao, Jian Zhao, Xunhan Hu, Wengang Zhou, Houqiang Li

Recent years have witnessed the great breakthrough of deep reinforcement learning (DRL) in various perfect and imperfect information games.

CTDS: Centralized Teacher with Decentralized Student for Multi-Agent Reinforcement Learning

1 code implementation16 Mar 2022 Jian Zhao, Xunhan Hu, Mingyu Yang, Wengang Zhou, Jiangcheng Zhu, Houqiang Li

In this way, CTDS balances the full utilization of global observation during training and the feasibility of decentralized execution for online inference.

Multi-agent Reinforcement Learning reinforcement-learning +3

MVP: Multimodality-guided Visual Pre-training

no code implementations10 Mar 2022 Longhui Wei, Lingxi Xie, Wengang Zhou, Houqiang Li, Qi Tian

Recently, masked image modeling (MIM) has become a promising direction for visual pre-training.

Language Modelling

Revisiting QMIX: Discriminative Credit Assignment by Gradient Entropy Regularization

no code implementations9 Feb 2022 Jian Zhao, Yue Zhang, Xunhan Hu, Weixun Wang, Wengang Zhou, Jianye Hao, Jiangcheng Zhu, Houqiang Li

In cooperative multi-agent systems, agents jointly take actions and receive a team reward instead of individual rewards.

Direct Molecular Conformation Generation

1 code implementation3 Feb 2022 Jinhua Zhu, Yingce Xia, Chang Liu, Lijun Wu, Shufang Xie, Yusong Wang, Tong Wang, Tao Qin, Wengang Zhou, Houqiang Li, Haiguang Liu, Tie-Yan Liu

Molecular conformation generation aims to generate three-dimensional coordinates of all the atoms in a molecule and is an important task in bioinformatics and pharmacology.

Molecular Docking

Contextual Similarity Distillation for Asymmetric Image Retrieval

no code implementations CVPR 2022 Hui Wu, Min Wang, Wengang Zhou, Houqiang Li, Qi Tian

To this end, we propose a flexible contextual similarity distillation framework to enhance the small query model and keep its output feature compatible with that of large gallery model, which is crucial with asymmetric retrieval.

Image Retrieval Retrieval

Learning Token-based Representation for Image Retrieval

1 code implementation12 Dec 2021 Hui Wu, Min Wang, Wengang Zhou, Yang Hu, Houqiang Li

Next, a refinement block is introduced to enhance the visual tokens with self-attention and cross-attention.

Image Retrieval Retrieval

Unsupervised Person Re-Identification with Wireless Positioning under Weak Scene Labeling

1 code implementation29 Oct 2021 Yiheng Liu, Wengang Zhou, Qiaokang Xie, Houqiang Li

To this end, we propose to explore unsupervised person re-identification with both visual data and wireless positioning trajectories under weak scene labeling, in which we only need to know the locations of the cameras.

Scene Labeling Unsupervised Person Re-Identification

DocScanner: Robust Document Image Rectification with Progressive Learning

3 code implementations28 Oct 2021 Hao Feng, Wengang Zhou, Jiajun Deng, Qi Tian, Houqiang Li

The iterative refinements make DocScanner converge to a robust and superior rectification performance, while the lightweight recurrent architecture ensures the running efficiency.

Optical Character Recognition (OCR)

Contextual Similarity Aggregation with Self-attention for Visual Re-ranking

1 code implementation NeurIPS 2021 Jianbo Ouyang, Hui Wu, Min Wang, Wengang Zhou, Houqiang Li

Since our re-ranking model is not directly involved with the visual feature used in the initial retrieval, it is ready to be applied to retrieval result lists obtained from various retrieval algorithms.

Content-Based Image Retrieval Data Augmentation +2

DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction

2 code implementations25 Oct 2021 Hao Feng, Yuechen Wang, Wengang Zhou, Jiajun Deng, Houqiang Li

Specifically, DocTr consists of a geometric unwarping transformer and an illumination correction transformer.

Optical Character Recognition (OCR)

SignBERT: Pre-Training of Hand-Model-Aware Representation for Sign Language Recognition

no code implementations ICCV 2021 Hezhen Hu, Weichao Zhao, Wengang Zhou, Yuechen Wang, Houqiang Li

To validate the effectiveness of our method on SLR, we perform extensive experiments on four public benchmark datasets, i. e., NMFs-CSL, SLR500, MSASL and WLASL.

 Ranked #1 on Sign Language Recognition on WLASL100 (using extra training data)

Self-Supervised Learning Sign Language Recognition

Multi-Agent Reinforcement Learning with Shared Resource in Inventory Management

no code implementations29 Sep 2021 Mingxiao Feng, Guozi Liu, Li Zhao, Lei Song, Jiang Bian, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu

We consider inventory management (IM) problem for a single store with a large number of SKUs (stock keeping units) in this paper, where we need to make replenishment decisions for each SKU to balance its supply and demand.

Management Multi-agent Reinforcement Learning +2

Heredity-aware Child Face Image Generation with Latent Space Disentanglement

no code implementations25 Aug 2021 Xiao Cui, Wengang Zhou, Yang Hu, Weilun Wang, Houqiang Li

The main idea is to disentangle the latent space of a pre-trained generation model and precisely control the face attributes of child images with clear semantics.

Disentanglement Image Generation

Joint Inductive and Transductive Learning for Video Object Segmentation

1 code implementation ICCV 2021 Yunyao Mao, Ning Wang, Wengang Zhou, Houqiang Li

In this work, we propose to integrate transductive and inductive learning into a unified framework to exploit the complementarity between them for accurate and robust video object segmentation.

Object Semantic Segmentation +3

From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object Detection

1 code implementation30 Jul 2021 Jiajun Deng, Wengang Zhou, Yanyong Zhang, Houqiang Li

To this end, in this work, we regard point clouds as hollow-3D data and propose a new architecture, namely Hallucinated Hollow-3D R-CNN ($\text{H}^2$3D R-CNN), to address the problem of 3D object detection.

3D Object Detection object-detection +1

Weakly Supervised Temporal Adjacent Network for Language Grounding

1 code implementation30 Jun 2021 Yuechen Wang, Jiajun Deng, Wengang Zhou, Houqiang Li

To this end, we introduce a novel weakly supervised temporal adjacent network (WSTAN) for temporal language grounding.

Multiple Instance Learning Sentence

ATSO: Asynchronous Teacher-Student Optimization for Semi-Supervised Image Segmentation

no code implementations CVPR 2021 Xinyue Huo, Lingxi Xie, Jianzhong He, Zijie Yang, Wengang Zhou, Houqiang Li, Qi Tian

Semi-supervised learning is a useful tool for image segmentation, mainly due to its ability in extracting knowledge from unlabeled data to assist learning from labeled data.

Continual Learning Image Segmentation +3

Dual-view Molecule Pre-training

1 code implementation17 Jun 2021 Jinhua Zhu, Yingce Xia, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu

After pre-training, we can use either the Transformer branch (this one is recommended according to empirical results), the GNN branch, or both for downstream tasks.

Molecular Property Prediction Property Prediction +2

Exploring the Diversity and Invariance in Yourself for Visual Pre-Training Task

no code implementations1 Jun 2021 Longhui Wei, Lingxi Xie, Wengang Zhou, Houqiang Li, Qi Tian

By simply pulling the different augmented views of each image together or other novel mechanisms, they can learn much unsupervised knowledge and significantly improve the transfer performance of pre-training models.

Self-Supervised Learning

TransVG: End-to-End Visual Grounding with Transformers

2 code implementations ICCV 2021 Jiajun Deng, Zhengyuan Yang, Tianlang Chen, Wengang Zhou, Houqiang Li

In this paper, we present a neat yet effective transformer-based framework for visual grounding, namely TransVG, to address the task of grounding a language query to the corresponding region onto an image.

Referring Expression Comprehension Visual Grounding

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

1 code implementation CVPR 2021 Ning Wang, Wengang Zhou, Jie Wang, Houqaing Li

In video object tracking, there exist rich temporal contexts among successive frames, which have been largely overlooked in existing trackers.

Object Video Object Tracking +2

IOT: Instance-wise Layer Reordering for Transformer Structures

1 code implementation ICLR 2021 Jinhua Zhu, Lijun Wu, Yingce Xia, Shufang Xie, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu

Based on this observation, in this work, we break the assumption of the fixed layer order in the Transformer and introduce instance-wise layer reordering into the model structure.

Abstractive Text Summarization Code Generation +2

Learning Deep Local Features With Multiple Dynamic Attentions for Large-Scale Image Retrieval

1 code implementation ICCV 2021 Hui Wu, Min Wang, Wengang Zhou, Houqiang Li

To this end, we propose a novel deep local feature learning architecture to simultaneously focus on multiple discriminative local patterns in an image.

Image Retrieval Metric Learning +1

Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection

5 code implementations31 Dec 2020 Jiajun Deng, Shaoshuai Shi, Peiwei Li, Wengang Zhou, Yanyong Zhang, Houqiang Li

In this paper, we take a slightly different viewpoint -- we find that precise positioning of raw points is not essential for high performance 3D object detection and that the coarse voxel granularity can also offer sufficient detection accuracy.

3D Object Detection object-detection +2

Contrastive Transformation for Self-supervised Correspondence Learning

1 code implementation9 Dec 2020 Ning Wang, Wengang Zhou, Houqiang Li

It is worth mentioning that our method also surpasses the fully-supervised affinity representation (e. g., ResNet) and performs competitively against the recent fully-supervised algorithms designed for the specific tasks (e. g., VOT and VOS).

Self-Supervised Learning Semantic Segmentation +3

Temporal-Channel Transformer for 3D Lidar-Based Video Object Detection in Autonomous Driving

no code implementations27 Nov 2020 Zhenxun Yuan, Xiao Song, Lei Bai, Wengang Zhou, Zhe Wang, Wanli Ouyang

As a special design of this transformer, the information encoded in the encoder is different from that in the decoder, i. e. the encoder encodes temporal-channel information of multiple frames while the decoder decodes the spatial-channel information for the current frame in a voxel-wise manner.

3D Object Detection Autonomous Driving +3

Heterogeneous Contrastive Learning: Encoding Spatial Information for Compact Visual Representations

no code implementations19 Nov 2020 Xinyue Huo, Lingxi Xie, Longhui Wei, Xiaopeng Zhang, Hao Li, Zijie Yang, Wengang Zhou, Houqiang Li, Qi Tian

Contrastive learning has achieved great success in self-supervised visual representation learning, but existing approaches mostly ignored spatial information which is often crucial for visual representation.

Contrastive Learning Data Augmentation +1

Masked Contrastive Representation Learning for Reinforcement Learning

1 code implementation15 Oct 2020 Jinhua Zhu, Yingce Xia, Lijun Wu, Jiajun Deng, Wengang Zhou, Tao Qin, Houqiang Li

During inference, the CNN encoder and the policy network are used to take actions, and the Transformer module is discarded.

Atari Games Contrastive Learning +3

Boosting Continuous Sign Language Recognition via Cross Modality Augmentation

no code implementations11 Oct 2020 Junfu Pu, Wengang Zhou, Hezhen Hu, Houqiang Li

Continuous sign language recognition (SLR) deals with unaligned video-text pair and uses the word error rate (WER), i. e., edit distance, as the main evaluation metric.

Sentence Sign Language Recognition

Global-local Enhancement Network for NMFs-aware Sign Language Recognition

no code implementations24 Aug 2020 Hezhen Hu, Wengang Zhou, Junfu Pu, Houqiang Li

Sign language recognition (SLR) is a challenging problem, involving complex manual features, i. e., hand gestures, and fine-grained non-manual features (NMFs), i. e., facial expression, mouth shapes, etc.

Sign Language Recognition

Wavelet-Based Dual-Branch Network for Image Demoireing

1 code implementation14 Jul 2020 Lin Liu, Jianzhuang Liu, Shanxin Yuan, Gregory Slabaugh, Ales Leonardis, Wengang Zhou, Qi Tian

When smartphone cameras are used to take photos of digital screens, usually moire patterns result, severely degrading photo quality.

Demoire Image Restoration +1

Single Shot Video Object Detector

1 code implementation7 Jul 2020 Jiajun Deng, Yingwei Pan, Ting Yao, Wengang Zhou, Houqiang Li, Tao Mei

Single shot detectors that are potentially faster and simpler than two-stage detectors tend to be more applicable to object detection in videos.

Object object-detection +2

Cascaded Regression Tracking: Towards Online Hard Distractor Discrimination

no code implementations18 Jun 2020 Ning Wang, Wengang Zhou, Qi Tian, Houqiang Li

In the second stage, a discrete sampling based ridge regression is designed to double-check the remaining ambiguous hard samples, which serves as an alternative of fully-connected layers and benefits from the closed-form solver for efficient learning.

regression Visual Tracking

Incorporating BERT into Neural Machine Translation

3 code implementations ICLR 2020 Jinhua Zhu, Yingce Xia, Lijun Wu, Di He, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu

While BERT is more commonly used as fine-tuning instead of contextual embedding for downstream language understanding tasks, in NMT, our preliminary exploration of using BERT as contextual embedding is better than using for fine-tuning.

Natural Language Understanding NMT +5

An End-to-End Foreground-Aware Network for Person Re-Identification

no code implementations25 Oct 2019 Yiheng Liu, Wengang Zhou, Jianzhuang Liu, Guo-Jun Qi, Qi Tian, Houqiang Li

By presenting a target attention loss, the pedestrian features extracted from the foreground branch become more insensitive to the backgrounds, which greatly reduces the negative impacts of changing backgrounds on matching an identical across different camera views.

Person Re-Identification

Relation Distillation Networks for Video Object Detection

2 code implementations ICCV 2019 Jiajun Deng, Yingwei Pan, Ting Yao, Wengang Zhou, Houqiang Li, Tao Mei

In this paper, we introduce a new design to capture the interactions across the objects in spatio-temporal context.

Object object-detection +3

Real-Time Correlation Tracking via Joint Model Compression and Transfer

1 code implementation23 Jul 2019 Ning Wang, Wengang Zhou, Yibing Song, Chao Ma, Houqiang Li

In the distillation process, we propose a fidelity loss to enable the student network to maintain the representation capability of the teacher network.

Computational Efficiency Image Classification +4

Online Filter Clustering and Pruning for Efficient Convnets

no code implementations28 May 2019 Zhengguang Zhou, Wengang Zhou, Richang Hong, Houqiang Li

Pruning filters is an effective method for accelerating deep neural networks (DNNs), but most existing approaches prune filters on a pre-trained network directly which limits in acceleration.

Clustering

Progressive Learning of Low-Precision Networks

no code implementations28 May 2019 Zhengguang Zhou, Wengang Zhou, Xutao Lv, Xuan Huang, Xiaoyu Wang, Houqiang Li

Recent years have witnessed the great advance of deep learning in a variety of vision tasks.

Soft Contextual Data Augmentation for Neural Machine Translation

1 code implementation ACL 2019 Jinhua Zhu, Fei Gao, Lijun Wu, Yingce Xia, Tao Qin, Wengang Zhou, Xue-Qi Cheng, Tie-Yan Liu

While data augmentation is an important trick to boost the accuracy of deep learning methods in computer vision tasks, its study in natural language tasks is still very limited.

Data Augmentation Language Modelling +3

Spatial and Temporal Mutual Promotion for Video-based Person Re-identification

1 code implementation26 Dec 2018 Yiheng Liu, Zhenxun Yuan, Wengang Zhou, Houqiang Li

How to explore the abundant spatial-temporal information in video sequences is the key to solve this problem.

Video-Based Person Re-Identification

Affinity Derivation and Graph Merge for Instance Segmentation

1 code implementation ECCV 2018 Yiding Liu, Siyu Yang, Bin Li, Wengang Zhou, Jizheng Xu, Houqiang Li, Yan Lu

We present an instance segmentation scheme based on pixel affinity information, which is the relationship of two pixels belonging to a same instance.

Instance Segmentation Semantic Segmentation +1

Multi-Cue Correlation Filters for Robust Visual Tracking

1 code implementation CVPR 2018 Ning Wang, Wengang Zhou, Qi Tian, Richang Hong, Meng Wang, Houqiang Li

By combining different types of features, our approach constructs multiple experts through Discriminative Correlation Filter (DCF) and each of them tracks the target independently.

Visual Tracking

Visual Attribute-augmented Three-dimensional Convolutional Neural Network for Enhanced Human Action Recognition

no code implementations8 May 2018 Yunfeng Wang, Wengang Zhou, Qilin Zhang, Houqiang Li

Visual attributes in individual video frames, such as the presence of characteristic objects and scenes, offer substantial information for action recognition in videos.

Action Recognition In Videos Attribute +4

Low-Latency Human Action Recognition with Weighted Multi-Region Convolutional Neural Network

no code implementations8 May 2018 Yunfeng Wang, Wengang Zhou, Qilin Zhang, Xiaotian Zhu, Houqiang Li

Termed "Weighted Multi-Region Convolutional Neural Network" (WMR ConvNet), the proposed system is LSTM-free, and is based on 2D ConvNet that does not require the accumulation of video frames for 3D ConvNet filtering.

Action Recognition Chunking +2

Video-based Sign Language Recognition without Temporal Segmentation

no code implementations30 Jan 2018 Jie Huang, Wengang Zhou, Qilin Zhang, Houqiang Li, Weiping Li

Worse still, isolated SLR methods typically require strenuous labeling of each word separately in a sentence, severely limiting the amount of attainable training data.

Segmentation Sentence +1

Recent Advance in Content-based Image Retrieval: A Literature Survey

no code implementations19 Jun 2017 Wengang Zhou, Houqiang Li, Qi Tian

The explosive increase and ubiquitous accessibility of visual data on the Web have led to the prosperity of research activity in image search or retrieval.

Content-Based Image Retrieval Retrieval

Picking Deep Filter Responses for Fine-Grained Image Recognition

no code implementations CVPR 2016 Xiaopeng Zhang, Hongkai Xiong, Wengang Zhou, Weiyao Lin, Qi Tian

Recognizing fine-grained sub-categories such as birds and dogs is extremely challenging due to the highly localized and subtle differences in some specific parts.

Fine-Grained Image Recognition

SOM: Semantic Obviousness Metric for Image Quality Assessment

no code implementations CVPR 2015 Peng Zhang, Wengang Zhou, Lei Wu, Houqiang Li

We propose to extract two types of features, one to measure the semantic obviousness of the image and the other to discover local characteristic.

Image Quality Estimation No-Reference Image Quality Assessment +1

Bayes Merging of Multiple Vocabularies for Scalable Image Retrieval

no code implementations CVPR 2014 Liang Zheng, Shengjin Wang, Wengang Zhou, Qi Tian

Albeit simple, Bayes merging can be well applied in various merging tasks, and consistently improves the baselines on multi-vocabulary merging.

Image Retrieval Quantization +1

Cannot find the paper you are looking for? You can Submit a new open access paper.