Search Results for author: Jinqiao Wang

Found 52 papers, 19 papers with code

Fast Segment Anything

1 code implementation • 21 Jun 2023 • Xu Zhao, Wenchao Ding, Yongqi An, Yinglong Du, Tao Yu, Min Li, Ming Tang, Jinqiao Wang

In this paper, we propose a speed-up alternative method for this fundamental task with comparable performance.

Ranked #4 on Zero-Shot Instance Segmentation on LVIS v1.0 val

Edge Detection Image Segmentation +6

6,812

Paper
Code

AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models

1 code implementation • 29 Aug 2023 • Zhaopeng Gu, Bingke Zhu, Guibo Zhu, Yingying Chen, Ming Tang, Jinqiao Wang

Large Vision-Language Models (LVLMs) such as MiniGPT-4 and LLaVA have demonstrated the capability of understanding images and achieved remarkable performance in various visual tasks.

Anomaly Detection In-Context Learning

532

Paper
Code

OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation

2 code implementations • 1 Jul 2021 • Jing Liu, Xinxin Zhu, Fei Liu, Longteng Guo, Zijia Zhao, Mingzhen Sun, Weining Wang, Hanqing Lu, Shiyu Zhou, Jiajun Zhang, Jinqiao Wang

In this paper, we propose an Omni-perception Pre-Trainer (OPT) for cross-modal understanding and generation, by jointly modeling visual, text and audio resources.

Ranked #1 on Image Retrieval on Localized Narratives

Audio to Text Retrieval Cross-Modal Retrieval +3

334

Paper
Code

DPT: Deformable Patch-based Transformer for Visual Recognition

1 code implementation • 30 Jul 2021 • Zhiyang Chen, Yousong Zhu, Chaoyang Zhao, Guosheng Hu, Wei Zeng, Jinqiao Wang, Ming Tang

To address this problem, we propose a new Deformable Patch (DePatch) module which learns to adaptively split the images into patches with different positions and scales in a data-driven way rather than using predefined fixed patches.

Ranked #17 on Semantic Segmentation on DensePASS

Image Classification object-detection +2

144

Paper
Code

ChineseWebText: Large-scale High-quality Chinese Web Text Extracted with Effective Evaluation Model

1 code implementation • 2 Nov 2023 • Jianghao Chen, Pu Jian, Tengxiao Xi, Dongyi Yi, Qianlong Du, Chenglin Ding, Guibo Zhu, Chengqing Zong, Jinqiao Wang, Jiajun Zhang

Using our proposed approach, we release the largest and latest large-scale high-quality Chinese web text ChineseWebText, which consists of 1. 42 TB and each text is associated with a quality score, facilitating the LLM researchers to choose the data according to the desired quality thresholds.

111

Paper
Code

Identity-Guided Human Semantic Parsing for Person Re-Identification

1 code implementation • ECCV 2020 • Kuan Zhu, Haiyun Guo, Zhiwei Liu, Ming Tang, Jinqiao Wang

In this paper, we propose the identity-guided human semantic parsing approach (ISP) to locate both the human body parts and personal belongings at pixel-level for aligned person re-ID only with person identity labels.

Ranked #40 on Person Re-Identification on DukeMTMC-reID

Clustering Human Parsing +3

Paper
Code

Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks

2 code implementations • 28 Sep 2022 • Zhiyang Chen, Yousong Zhu, Zhaowen Li, Fan Yang, Wei Li, Haixin Wang, Chaoyang Zhao, Liwei Wu, Rui Zhao, Jinqiao Wang, Ming Tang

Obj2Seq is able to flexibly determine input categories to satisfy customized requirements, and be easily extended to different visual tasks.

Multi-Label Classification Object +2

Paper
Code

Adaptive Class Suppression Loss for Long-Tail Object Detection

1 code implementation • CVPR 2021 • Tong Wang, Yousong Zhu, Chaoyang Zhao, Wei Zeng, Jinqiao Wang, Ming Tang

To address the problem of long-tail distribution for the large vocabulary object detection task, existing methods usually divide the whole categories into several groups and treat each group with different strategies.

Object object-detection +1

Paper
Code

Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models

1 code implementation • 24 Nov 2023 • Yufei Zhan, Yousong Zhu, Zhiyang Chen, Fan Yang, Ming Tang, Jinqiao Wang

More importantly, we present $\textbf{Griffon}$, a purely LVLM-based baseline, which does not require the introduction of any special tokens, expert models, or additional detection modules.

Referring Expression Referring Expression Comprehension

Paper
Code

Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring

1 code implementation • 14 Mar 2024 • Yufei Zhan, Yousong Zhu, Hongyin Zhao, Fan Yang, Ming Tang, Jinqiao Wang

Large Vision Language Models have achieved fine-grained object perception, but the limitation of image resolution remains a significant obstacle to surpass the performance of task-specific experts in complex and dense scenarios.

Object Object Counting +3

Paper
Code

CoupleNet: Coupling Global Structure with Local Parts for Object Detection

3 code implementations • ICCV 2017 • Yousong Zhu, Chaoyang Zhao, Jinqiao Wang, Xu Zhao, Yi Wu, Hanqing Lu

To fully explore the local and global properties, in this paper, we propose a novel fully convolutional network, named as CoupleNet, to couple the global structure with local parts for object detection.

Ranked #5 on Object Detection on PASCAL VOC 2007

Object object-detection +3

Paper
Code

PASS: Part-Aware Self-Supervised Pre-Training for Person Re-Identification

1 code implementation • 8 Mar 2022 • Kuan Zhu, Haiyun Guo, Tianyi Yan, Yousong Zhu, Jinqiao Wang, Ming Tang

PASS learns to match the output of the local views and global views on the same [PART].

Image Classification Person Re-Identification +1

Paper
Code

ZBS: Zero-shot Background Subtraction via Instance-level Background Modeling and Foreground Selection

1 code implementation • CVPR 2023 • Yongqi An, Xu Zhao, Tao Yu, Haiyun Guo, Chaoyang Zhao, Ming Tang, Jinqiao Wang

However, previous unsupervised deep learning BGS algorithms perform poorly in sophisticated scenarios such as shadows or night lights, and they cannot detect objects outside the pre-defined categories.

Foreground Segmentation Object +2

Paper
Code

Fast Deep Matting for Portrait Animation on Mobile Phone

1 code implementation • 26 Jul 2017 • Bingke Zhu, Yingying Chen, Jinqiao Wang, Si Liu, Bo Zhang, Ming Tang

Finally, an automatic portrait animation system based on fast deep matting is built on mobile devices, which does not need any interaction and can realize real-time matting with 15 fps.

Image Matting Video Editing

Paper
Code

Masked Contrastive Pre-Training for Efficient Video-Text Retrieval

1 code implementation • 2 Dec 2022 • Fangxun Shu, Biaolong Chen, Yue Liao, Shuwen Xiao, Wenyu Sun, Xiaobo Li, Yousong Zhu, Jinqiao Wang, Si Liu

Our MAC aims to reduce video representation's spatial and temporal redundancy in the VidLP model by a mask sampling mechanism to improve pre-training efficiency.

Ranked #36 on Video Retrieval on MSR-VTT-1kA (using extra training data)

Retrieval Text Retrieval +1

Paper
Code

Fluctuation-based Adaptive Structured Pruning for Large Language Models

1 code implementation • 19 Dec 2023 • Yongqi An, Xu Zhao, Tao Yu, Ming Tang, Jinqiao Wang

Retraining-free is important for LLMs' pruning methods.

Network Pruning

Paper
Code

Task Decoupled Knowledge Distillation For Lightweight Face Detectors

1 code implementation • 14 Oct 2020 • Xiaoqing Liang, Xu Zhao, Chaoyang Zhao, Nanfei Jiang, Ming Tang, Jinqiao Wang

This method decouples the distillation task of face detection into two subtasks, i. e., the classification distillation subtask and the regression distillation subtask.

Face Detection Knowledge Distillation +1

Paper
Code

Pruning-aware Sparse Regularization for Network Pruning

1 code implementation • 18 Jan 2022 • Nanfei Jiang, Xu Zhao, Chaoyang Zhao, Yongqi An, Ming Tang, Jinqiao Wang

MaskSparsity imposes the fine-grained sparse regularization on the specific filters selected by a pruning mask, rather than all the filters of the model.

Network Pruning

Paper
Code

Temporal-Channel Topology Enhanced Network for Skeleton-Based Action Recognition

1 code implementation • 25 Feb 2023 • Jinzhao Luo, Lu Zhou, Guibo Zhu, Guojing Ge, Beiying Yang, Jinqiao Wang

Most current methods adopt graph convolutional network (GCN) for topology modeling, but GCN-based methods are limited in long-distance correlation modeling and generalizability.

Action Recognition Skeleton Based Action Recognition

Paper
Code

High-speed Tracking with Multi-kernel Correlation Filters

no code implementations • CVPR 2018 • Ming Tang, Bin Yu, Fan Zhang, Jinqiao Wang

In this paper, we will introduce the MKL into KCF in a different way than MKCF.

Vocal Bursts Intensity Prediction

Paper
Add Code

Fast Kernelized Correlation Filters without Boundary Effect

no code implementations • 17 Jun 2018 • Ming Tang, Linyu Zheng, Bin Yu, Jinqiao Wang

To achieve the fast training and detection, a set of cyclic bases is introduced to construct the filter.

Visual Tracking

Paper
Add Code

On the Relations of Correlation Filter Based Trackers and Struck

no code implementations • 25 Nov 2017 • Jinqiao Wang, Ming Tang, Linyu Zheng, Jiayi Feng

In recent years, two types of trackers, namely correlation filter based tracker (CF tracker) and structured output tracker (Struck), have exhibited the state-of-the-art performance.

Relation

Paper
Add Code

Reading Scene Text with Attention Convolutional Sequence Modeling

no code implementations • 13 Sep 2017 • Yunze Gao, Yingying Chen, Jinqiao Wang, Hanqing Lu

Reading text in the wild is a challenging task in the field of computer vision.

Scene Text Recognition

Paper
Add Code

Joint Background Reconstruction and Foreground Segmentation via A Two-stage Convolutional Neural Network

no code implementations • 24 Jul 2017 • Xu Zhao, Yingying Chen, Ming Tang, Jinqiao Wang

In the first stage, a convolutional encoder-decoder sub-network is employed to reconstruct the background images and encode rich prior knowledge of background scenes.

Foreground Segmentation Segmentation

Paper
Add Code

Recurrent Calibration Network for Irregular Text Recognition

no code implementations • 18 Dec 2018 • Yunze Gao, Yingying Chen, Jinqiao Wang, Zhen Lei, Xiao-Yu Zhang, Hanqing Lu

In this paper, we propose a novel Recurrent Calibration Network (RCN) for irregular scene text recognition.

Irregular Text Recognition Scene Text Recognition

Paper
Add Code

Learning Adaptive Receptive Fields for Deep Image Parsing Network

no code implementations • CVPR 2017 • Zhen Wei, Yao Sun, Jinqiao Wang, Hanjiang Lai, Si Liu

In this paper, we introduce a novel approach to regulate receptive field in deep image parsing network automatically.

Face Parsing

Paper
Add Code

Relaxing From Vocabulary: Robust Weakly-Supervised Deep Learning for Vocabulary-Free Image Tagging

no code implementations • ICCV 2015 • Jianlong Fu, Yue Wu, Tao Mei, Jinqiao Wang, Hanqing Lu, Yong Rui

The development of deep learning has empowered machines with comparable capability of recognizing limited image categories to human beings.

Paper
Add Code

Semantic Alignment: Finding Semantically Consistent Ground-truth for Facial Landmark Detection

no code implementations • CVPR 2019 • Zhiwei Liu, Xiangyu Zhu, Guosheng Hu, Haiyun Guo, Ming Tang, Zhen Lei, Neil M. Robertson, Jinqiao Wang

Despite this, we notice that the semantic ambiguity greatly degrades the detection performance.

Ranked #1 on Face Alignment on 300W (NME_inter-pupil (%, Full) metric)

Face Alignment Facial Landmark Detection

Paper
Add Code

Learning Feature Embeddings for Discriminant Model based Tracking

no code implementations • ECCV 2020 • Linyu Zheng, Ming Tang, Yingying Chen, Jinqiao Wang, Hanqing Lu

After observing that the features used in most online discriminatively trained trackers are not optimal, in this paper, we propose a novel and effective architecture to learn optimal feature embeddings for online discriminative tracking.

Visual Tracking

Paper
Add Code

Occlusion-Aware Siamese Network for Human Pose Estimation

no code implementations • ECCV 2020 • Lu Zhou, Yingying Chen, Yunze Gao, Jinqiao Wang, Hanqing Lu

To overcome the defects caused by the erasing operation, we perform feature reconstruction to recover the information destroyed by occlusion and details lost in cleaning procedure.

Pose Estimation

Paper
Add Code

Large Batch Optimization for Object Detection: Training COCO in 12 Minutes

no code implementations • ECCV 2020 • Tong Wang, Yousong Zhu, Chaoyang Zhao, Wei Zeng, Yao-Wei Wang, Jinqiao Wang, Ming Tang

Most of existing object detectors usually adopt a small training batch size ( ~16), which severely hinders the whole community from exploring large-scale datasets due to the extremely long training procedure.

object-detection Object Detection

Paper
Add Code

Adaptive Variance Based Label Distribution Learning For Facial Age Estimation

no code implementations • ECCV 2020 • Xin Wen, Biying Li, Haiyun Guo, Zhiwei Liu, Guosheng Hu, Ming Tang, Jinqiao Wang

Some existing methods adopt distribution learning to tackle this issue by exploiting the semantic correlation between age labels.

Ranked #6 on Age Estimation on MORPH album2 (Caucasian)

Age Estimation Meta-Learning +1

Paper
Add Code

Blended Grammar Network for Human Parsing

no code implementations • ECCV 2020 • Xiaomei Zhang, Yingying Chen, Bingke Zhu, Jinqiao Wang, Ming Tang

Although human parsing has made great progress, it still faces a challenge, i. e., how to extract the whole foreground from similar or cluttered scenes effectively.

Human Parsing

Paper
Add Code

AAformer: Auto-Aligned Transformer for Person Re-Identification

no code implementations • 2 Apr 2021 • Kuan Zhu, Haiyun Guo, Shiliang Zhang, YaoWei Wang, Gaopan Huang, Honglin Qiao, Jing Liu, Jinqiao Wang, Ming Tang

In this paper, we introduce an alignment scheme in Transformer architecture for the first time and propose the Auto-Aligned Transformer (AAformer) to automatically locate both the human parts and non-human ones at patch-level.

Human Parsing Image Classification +3

Paper
Add Code

MST: Masked Self-Supervised Transformer for Visual Representation

no code implementations • NeurIPS 2021 • Zhaowen Li, Zhiyang Chen, Fan Yang, Wei Li, Yousong Zhu, Chaoyang Zhao, Rui Deng, Liwei Wu, Rui Zhao, Ming Tang, Jinqiao Wang

More importantly, the masked tokens together with the remaining tokens are further recovered by a global image decoder, which preserves the spatial information of the image and is more friendly to the downstream dense prediction tasks.

Language Modelling Masked Language Modeling +3

Paper
Add Code

Improving Multiple Object Tracking With Single Object Tracking

no code implementations • CVPR 2021 • Linyu Zheng, Ming Tang, Yingying Chen, Guibo Zhu, Jinqiao Wang, Hanqing Lu

Despite considerable similarities between multiple object tracking (MOT) and single object tracking (SOT) tasks, modern MOT methods have not benefited from the development of SOT ones to achieve satisfactory performance.

Multiple Object Tracking Object +2

Paper
Add Code

High-Performance Discriminative Tracking With Transformers

no code implementations • ICCV 2021 • Bin Yu, Ming Tang, Linyu Zheng, Guibo Zhu, Jinqiao Wang, Hao Feng, Xuetao Feng, Hanqing Lu

End-to-end discriminative trackers improve the state of the art significantly, yet the improvement in robustness and efficiency is restricted by the conventional discriminative model, i. e., least-squares based regression.

Object Visual Tracking +1

Paper
Add Code

Multi-initialization Optimization Network for Accurate 3D Human Pose and Shape Estimation

no code implementations • 24 Dec 2021 • Zhiwei Liu, Xiangyu Zhu, Lu Yang, Xiang Yan, Ming Tang, Zhen Lei, Guibo Zhu, Xuetao Feng, Yan Wang, Jinqiao Wang

In the second stage, we design a mesh refinement transformer (MRT) to respectively refine each coarse reconstruction result via a self-attention mechanism.

Ranked #65 on 3D Human Pose Estimation on 3DPW (MPJPE metric)

3D human pose and shape estimation 3D Reconstruction

Paper
Add Code

UniVIP: A Unified Framework for Self-Supervised Visual Pre-training

no code implementations • CVPR 2022 • Zhaowen Li, Yousong Zhu, Fan Yang, Wei Li, Chaoyang Zhao, Yingying Chen, Zhiyang Chen, Jiahao Xie, Liwei Wu, Rui Zhao, Ming Tang, Jinqiao Wang

Furthermore, our method can also exploit single-centric-object dataset such as ImageNet and outperforms BYOL by 2. 5% with the same pre-training epochs in linear probing, and surpass current self-supervised object detection methods on COCO dataset, demonstrating its universality and potential.

Image Classification Object +4

Paper
Add Code

C2AM Loss: Chasing a Better Decision Boundary for Long-Tail Object Detection

no code implementations • CVPR 2022 • Tong Wang, Yousong Zhu, Yingying Chen, Chaoyang Zhao, Bin Yu, Jinqiao Wang, Ming Tang

The decision boundary between any two categories is the angular bisector of their weight vectors.

object-detection Object Detection

Paper
Add Code

Plug-and-Play Pseudo Label Correction Network for Unsupervised Person Re-identification

no code implementations • 14 Jun 2022 • Tianyi Yan, Kuan Zhu, Haiyun Guo, Guibo Zhu, Ming Tang, Jinqiao Wang

Clustering-based methods, which alternate between the generation of pseudo labels and the optimization of the feature extraction network, play a dominant role in both unsupervised learning (USL) and unsupervised domain adaptive (UDA) person re-identification (Re-ID).

Clustering Pseudo Label +1

Paper
Add Code

Transfering Low-Frequency Features for Domain Adaptation

no code implementations • 31 Aug 2022 • Zhaowen Li, Xu Zhao, Chaoyang Zhao, Ming Tang, Jinqiao Wang

Previous unsupervised domain adaptation methods did not handle the cross-domain problem from the perspective of frequency for computer vision.

Image Classification object-detection +2

Paper
Add Code

Efficient Masked Autoencoders with Self-Consistency

no code implementations • 28 Feb 2023 • Zhaowen Li, Yousong Zhu, Zhiyang Chen, Wei Li, Chaoyang Zhao, Liwei Wu, Rui Zhao, Ming Tang, Jinqiao Wang

However, its high random mask ratio would result in two serious problems: 1) the data are not efficiently exploited, which brings inefficient pre-training (\eg, 1600 epochs for MAE $vs.$ 300 epochs for the supervised), and 2) the high uncertainty and inconsistency of the pre-trained model, \ie, the prediction of the same patch may be inconsistent under different mask rounds.

Language Modelling Masked Language Modeling +3

Paper
Add Code

FreConv: Frequency Branch-and-Integration Convolutional Networks

no code implementations • 10 Apr 2023 • Zhaowen Li, Xu Zhao, Peigeng Ding, Zongxin Gao, Yuting Yang, Ming Tang, Jinqiao Wang

In the high-frequency branch, a derivative-filter-like architecture is designed to extract the high-frequency information while a light extractor is employed in the low-frequency branch because the low-frequency information is usually redundant.

Paper
Add Code

IAIFNet: An Illumination-Aware Infrared and Visible Image Fusion Network

no code implementations • 26 Sep 2023 • Qiao Yang, Yu Zhang, Jian Zhang, Zijing Zhao, Shunli Zhang, Jinqiao Wang, Junzhe Chen

Infrared and visible image fusion (IVIF) is used to generate fusion images with comprehensive features of both images, which is beneficial for downstream vision tasks.

Infrared And Visible Image Fusion

Paper
Add Code

SSPFusion: A Semantic Structure-Preserving Approach for Infrared and Visible Image Fusion

no code implementations • 26 Sep 2023 • Qiao Yang, Yu Zhang, Jian Zhang, Zijing Zhao, Shunli Zhang, Jinqiao Wang, Junzhe Chen

Most existing learning-based infrared and visible image fusion (IVIF) methods exhibit massive redundant information in the fusion images, i. e., yielding edge-blurring effect or unrecognizable for object detectors.

Infrared And Visible Image Fusion

Paper
Add Code

Surgical Temporal Action-aware Network with Sequence Regularization for Phase Recognition

no code implementations • 21 Nov 2023 • Zhen Chen, Yuhao Zhai, Jun Zhang, Jinqiao Wang

Specifically, we propose an efficient multi-scale surgical temporal action (MS-STA) module, which integrates visual features with spatial and temporal knowledge of surgical actions at the cost of 2D networks.

Surgical phase recognition

Paper
Add Code

Continual Instruction Tuning for Large Multimodal Models

no code implementations • 27 Nov 2023 • Jinghan He, Haiyun Guo, Ming Tang, Jinqiao Wang

2) Are the existing three classes of continual learning methods still applicable to the continual instruction tuning of LMMs?

Continual Learning

Paper
Add Code

Mitigating Hallucination in Visual Language Models with Visual Supervision

no code implementations • 27 Nov 2023 • Zhiyang Chen, Yousong Zhu, Yufei Zhan, Zhaowen Li, Chaoyang Zhao, Jinqiao Wang, Ming Tang

Large vision-language models (LVLMs) suffer from hallucination a lot, generating responses that apparently contradict to the image content occasionally.

Hallucination

Paper
Add Code

PFDM: Parser-Free Virtual Try-on via Diffusion Model

no code implementations • 5 Feb 2024 • Yunfang Niu, Dong Yi, Lingxiang Wu, Zhiwei Liu, Pengxiang Cai, Jinqiao Wang

Virtual try-on can significantly improve the garment shopping experiences in both online and in-store scenarios, attracting broad interest in computer vision.

Virtual Try-on

Paper
Add Code

BFRFormer: Transformer-based generator for Real-World Blind Face Restoration

no code implementations • 29 Feb 2024 • Guojing Ge, Qi Song, Guibo Zhu, Yuting Zhang, Jinglu Chen, Miao Xin, Ming Tang, Jinqiao Wang

Blind face restoration is a challenging task due to the unknown and complex degradation.

Blind Face Restoration Blocking

Paper
Add Code

Optimization of Prompt Learning via Multi-Knowledge Representation for Vision-Language Models

no code implementations • 16 Apr 2024 • Enming Zhang, Bingke Zhu, Yingying Chen, Qinghai Miao, Ming Tang, Jinqiao Wang

This limitation restricts the capabilities of pretrained VLMs and can result in incorrect predictions in downstream tasks.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.