Search Results for author: Jing Zhang

Found 341 papers, 192 papers with code

LLMTune: Accelerate Database Knob Tuning with Large Language Models

1 code implementation • 17 Apr 2024 • Xinmei Huang, Haoyang Li, Jing Zhang, Xinxin Zhao, Zhiming Yao, Yiyan Li, Zhuohao Yu, Tieying Zhang, Hong Chen, Cuiping Li

Database knob tuning is a critical challenge in the database community, aiming to optimize knob values to enhance database performance for specific workloads.

Language Modelling Large Language Model

21,493

Paper
Code

Importance Weighted Adversarial Nets for Partial Domain Adaptation

1 code implementation • CVPR 2018 • Jing Zhang, Zewei Ding, Wanqing Li, Philip Ogunbona

This paper proposes an importance weighted adversarial nets-based method for unsupervised domain adaptation, specific for partial domain adaptation where the target domain has less number of classes compared to the source domain.

Partial Domain Adaptation Transfer Learning +2

3,166

Paper
Code

AlignBench: Benchmarking Chinese Alignment of Large Language Models

2 code implementations • 30 Nov 2023 • Xiao Liu, Xuanyu Lei, Shengyuan Wang, Yue Huang, Zhuoer Feng, Bosi Wen, Jiale Cheng, Pei Ke, Yifan Xu, Weng Lam Tam, Xiaohan Zhang, Lichao Sun, Hongning Wang, Jing Zhang, Minlie Huang, Yuxiao Dong, Jie Tang

We will provide public APIs for evaluating AlignBench with CritiqueLLM to facilitate the evaluation of LLMs' Chinese alignment.

Benchmarking

1,273

Paper
Code

ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation

5 code implementations • 26 Apr 2022 • Yufei Xu, Jing Zhang, Qiming Zhang, DaCheng Tao

In this paper, we show the surprisingly good capabilities of plain vision transformers for pose estimation from various aspects, namely simplicity in model structure, scalability in model size, flexibility in training paradigm, and transferability of knowledge between models, through a simple baseline model called ViTPose.

Ranked #1 on Pose Estimation on COCO test-dev

2D Human Pose Estimation Keypoint Detection

1,184

Paper
Code

ViTPose++: Vision Transformer for Generic Body Pose Estimation

1 code implementation • 7 Dec 2022 • Yufei Xu, Jing Zhang, Qiming Zhang, DaCheng Tao

In this paper, we show the surprisingly good properties of plain vision transformers for body pose estimation from various aspects, namely simplicity in model structure, scalability in model size, flexibility in training paradigm, and transferability of knowledge between models, through a simple baseline model dubbed ViTPose.

Ranked #1 on Animal Pose Estimation on AP-10K (using extra training data)

2D Human Pose Estimation Animal Pose Estimation +1

1,184

Paper
Code

Bridging Composite and Real: Towards End-to-end Deep Image Matting

1 code implementation • 30 Oct 2020 • Jizhizi Li, Jing Zhang, Stephen J. Maybank, DaCheng Tao

Furthermore, we provide a benchmark containing 2, 000 high-resolution real-world animal images and 10, 000 portrait images along with their manually labeled alpha mattes to serve as a test bed for evaluating matting model's generalization ability on real-world images.

Ranked #2 on Image Matting on AM-2K

Image Matting Semantic Segmentation

905

Paper
Code

GMFlow: Learning Optical Flow via Global Matching

4 code implementations • CVPR 2022 • Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, DaCheng Tao

Learning-based optical flow estimation has been dominated with the pipeline of cost volume with convolutions for flow regression, which is inherently limited to local correlations and thus is hard to address the long-standing challenge of large displacements.

Ranked #8 on Optical Flow Estimation on Spring

Optical Flow Estimation regression

899

Paper
Code

Unifying Flow, Stereo and Depth Estimation

1 code implementation • 10 Nov 2022 • Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, Fisher Yu, DaCheng Tao, Andreas Geiger

We present a unified formulation and model for three motion and 3D perception tasks: optical flow, rectified stereo matching and unrectified stereo depth estimation from posed images.

Ranked #1 on Optical Flow Estimation on Sintel-clean

Optical Flow Estimation Stereo Depth Estimation +1

899

Paper
Code

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

2 code implementations • 6 Dec 2021 • Kaustubh D. Dhole, Varun Gangal, Sebastian Gehrmann, Aadesh Gupta, Zhenhao Li, Saad Mahamood, Abinaya Mahendiran, Simon Mille, Ashish Shrivastava, Samson Tan, Tongshuang Wu, Jascha Sohl-Dickstein, Jinho D. Choi, Eduard Hovy, Ondrej Dusek, Sebastian Ruder, Sajant Anand, Nagender Aneja, Rabin Banjade, Lisa Barthe, Hanna Behnke, Ian Berlot-Attwell, Connor Boyle, Caroline Brun, Marco Antonio Sobrevilla Cabezudo, Samuel Cahyawijaya, Emile Chapuis, Wanxiang Che, Mukund Choudhary, Christian Clauss, Pierre Colombo, Filip Cornell, Gautier Dagan, Mayukh Das, Tanay Dixit, Thomas Dopierre, Paul-Alexis Dray, Suchitra Dubey, Tatiana Ekeinhor, Marco Di Giovanni, Tanya Goyal, Rishabh Gupta, Louanes Hamla, Sang Han, Fabrice Harel-Canada, Antoine Honore, Ishan Jindal, Przemyslaw K. Joniak, Denis Kleyko, Venelin Kovatchev, Kalpesh Krishna, Ashutosh Kumar, Stefan Langer, Seungjae Ryan Lee, Corey James Levinson, Hualou Liang, Kaizhao Liang, Zhexiong Liu, Andrey Lukyanenko, Vukosi Marivate, Gerard de Melo, Simon Meoni, Maxime Meyer, Afnan Mir, Nafise Sadat Moosavi, Niklas Muennighoff, Timothy Sum Hon Mun, Kenton Murray, Marcin Namysl, Maria Obedkova, Priti Oli, Nivranshu Pasricha, Jan Pfister, Richard Plant, Vinay Prabhu, Vasile Pais, Libo Qin, Shahab Raji, Pawan Kumar Rajpoot, Vikas Raunak, Roy Rinberg, Nicolas Roberts, Juan Diego Rodriguez, Claude Roux, Vasconcellos P. H. S., Ananya B. Sai, Robin M. Schmidt, Thomas Scialom, Tshephisho Sefara, Saqib N. Shamsi, Xudong Shen, Haoyue Shi, Yiwen Shi, Anna Shvets, Nick Siegel, Damien Sileo, Jamie Simon, Chandan Singh, Roman Sitelew, Priyank Soni, Taylor Sorensen, William Soto, Aman Srivastava, KV Aditya Srivatsa, Tony Sun, Mukund Varma T, A Tabassum, Fiona Anting Tan, Ryan Teehan, Mo Tiwari, Marie Tolkiehn, Athena Wang, Zijian Wang, Gloria Wang, Zijie J. Wang, Fuxuan Wei, Bryan Wilie, Genta Indra Winata, Xinyi Wu, Witold Wydmański, Tianbao Xie, Usama Yaseen, Michael A. Yee, Jing Zhang, Yue Zhang

Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on.

Data Augmentation

759

Paper
Code

SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model

2 code implementations • NeurIPS 2023 • Di Wang, Jing Zhang, Bo Du, Minqiang Xu, Lin Liu, DaCheng Tao, Liangpei Zhang

In this study, we leverage SAM and existing RS object detection datasets to develop an efficient pipeline for generating a large-scale RS segmentation dataset, dubbed SAMRS.

Instance Segmentation Object +4

642

Paper
Code

HandRefiner: Refining Malformed Hands in Generated Images by Diffusion-based Conditional Inpainting

1 code implementation • 29 Nov 2023 • Wenquan Lu, Yufei Xu, Jing Zhang, Chaoyue Wang, DaCheng Tao

Given a generated failed image due to malformed hands, we utilize ControlNet modules to re-inject such correct hand information.

642

Paper
Code

ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond

6 code implementations • 21 Feb 2022 • Qiming Zhang, Yufei Xu, Jing Zhang, DaCheng Tao

Vision transformers have shown great potential in various computer vision tasks owing to their strong capability to model long-range dependency using the self-attention mechanism.

Ranked #2 on Image Classification on ImageNet ReaL

Image Classification Inductive Bias

513

Paper
Code

Audio-Visual Segmentation

1 code implementation • 11 Jul 2022 • Jinxing Zhou, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, Jing Zhang, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng Wang, Yiran Zhong

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

Segmentation

432

Paper
Code

Audio-Visual Segmentation with Semantics

1 code implementation • 30 Jan 2023 • Jinxing Zhou, Xuyang Shen, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, Jing Zhang, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng Wang, Yiran Zhong

To deal with these problems, we propose a new baseline method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

Segmentation Semantic Segmentation +1

432

Paper
Code

An Empirical Study of Remote Sensing Pretraining

2 code implementations • 6 Apr 2022 • Di Wang, Jing Zhang, Bo Du, Gui-Song Xia, DaCheng Tao

To this end, we train different networks from scratch with the help of the largest RS scene recognition dataset up to now -- MillionAID, to obtain a series of RS pretrained backbones, including both convolutional neural networks (CNN) and vision transformers such as Swin and ViTAE, which have shown promising performance on computer vision tasks.

Ranked #1 on Aerial Scene Classification on UCM (80% as trainset)

Aerial Scene Classification Building change detection for remote sensing images +5

415

Paper
Code

Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model

2 code implementations • 8 Aug 2022 • Di Wang, Qiming Zhang, Yufei Xu, Jing Zhang, Bo Du, DaCheng Tao, Liangpei Zhang

Large-scale vision foundation models have made significant progress in visual tasks on natural images, with vision transformers being the primary choice due to their good scalability and representation ability.

Ranked #1 on Aerial Scene Classification on AID (50% as trainset)

Aerial Scene Classification Few-Shot Learning +2

415

Paper
Code

Deep Learning for Camera Calibration and Beyond: A Survey

1 code implementation • 19 Mar 2023 • Kang Liao, Lang Nie, Shujuan Huang, Chunyu Lin, Jing Zhang, Yao Zhao, Moncef Gabbouj, DaCheng Tao

In this paper, we provide a comprehensive survey of learning-based camera calibration techniques, by analyzing their strengths and limitations.

Camera Calibration

401

Paper
Code

Deep Automatic Natural Image Matting

1 code implementation • 15 Jul 2021 • Jizhizi Li, Jing Zhang, DaCheng Tao

To address the problem, a novel end-to-end matting network is proposed, which can predict a generalized trimap for any image of the above types as a unified semantic representation.

Ranked #2 on Image Matting on AIM-500

Image Matting

372

Paper
Code

Uncertainty Inspired RGB-D Saliency Detection

4 code implementations • 7 Sep 2020 • Jing Zhang, Deng-Ping Fan, Yuchao Dai, Saeed Anwar, Fatemeh Saleh, Sadegh Aliakbarian, Nick Barnes

Our framework includes two main models: 1) a generator model, which maps the input image and latent variable to stochastic saliency prediction, and 2) an inference model, which gradually updates the latent variable by sampling it from the true or approximate posterior distribution.

Ranked #1 on RGB-D Salient Object Detection on LFSD

Decoder RGB-D Salient Object Detection +2

321

Paper
Code

GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training

4 code implementations • 17 Jun 2020 • Jiezhong Qiu, Qibin Chen, Yuxiao Dong, Jing Zhang, Hongxia Yang, Ming Ding, Kuansan Wang, Jie Tang

Graph representation learning has emerged as a powerful technique for addressing real-world problems.

Contrastive Learning General Classification +5

319

Paper
Code

Privacy-Preserving Portrait Matting

1 code implementation • 29 Apr 2021 • Jizhizi Li, Sihan Ma, Jing Zhang, DaCheng Tao

We systematically evaluate both trimap-free and trimap-based matting methods on P3M-10k and find that existing matting methods show different generalization capabilities when following the Privacy-Preserving Training (PPT) setting, i. e., training on face-blurred images and testing on arbitrary images.

Ranked #3 on Image Matting on P3M-10k

Image Matting Privacy Preserving

277

Paper
Code

ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

2 code implementations • NeurIPS 2021 • Yufei Xu, Qiming Zhang, Jing Zhang, DaCheng Tao

Nevertheless, vision transformers treat an image as 1D sequence of visual tokens, lacking an intrinsic inductive bias (IB) in modeling local visual structures and dealing with scale variance.

Ranked #2 on Video Object Segmentation on DAVIS 2017

Image Classification Inductive Bias +2

240

Paper
Code

DUT: Learning Video Stabilization by Simply Watching Unstable Videos

2 code implementations • 30 Nov 2020 • Yufei Xu, Jing Zhang, Stephen J. Maybank, DaCheng Tao

In this paper, we attempt to tackle the video stabilization problem in a deep unsupervised learning manner, which borrows the divide-and-conquer idea from traditional stabilizers while leveraging the representation power of DNNs to handle the challenges in real-world scenarios.

Homography Estimation Video Stabilization

227

Paper
Code

DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting

1 code implementation • CVPR 2023 • Maoyuan Ye, Jing Zhang, Shanshan Zhao, Juhua Liu, Tongliang Liu, Bo Du, DaCheng Tao

In this paper, we present DeepSolo, a simple DETR-like baseline that lets a single Decoder with Explicit Points Solo for text detection and recognition simultaneously.

Ranked #1 on Text Spotting on Total-Text (using extra training data)

Decoder Scene Text Detection +3

226

Paper
Code

DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting

1 code implementation • 31 May 2023 • Maoyuan Ye, Jing Zhang, Shanshan Zhao, Juhua Liu, Tongliang Liu, Bo Du, DaCheng Tao

In this paper, we present DeepSolo++, a simple DETR-like baseline that lets a single decoder with explicit points solo for text detection, recognition, and script identification simultaneously.

Ranked #1 on Text Spotting on Inverse-Text

Decoder Scene Text Detection +2

226

Paper
Code

RESDSQL: Decoupling Schema Linking and Skeleton Parsing for Text-to-SQL

1 code implementation • 12 Feb 2023 • Haoyang Li, Jing Zhang, Cuiping Li, Hong Chen

Due to the structural property of the SQL queries, the seq2seq model takes the responsibility of parsing both the schema items (i. e., tables and columns) and the skeleton (i. e., SQL keywords).

Ranked #1 on Semantic Parsing on spider

Decoder Language Modelling +3

219

Paper
Code

Transferable and Efficient Non-Factual Content Detection via Probe Training with Offline Consistency Checking

2 code implementations • 10 Apr 2024 • Xiaokang Zhang, Zijun Yao, Jing Zhang, Kaifeng Yun, Jifan Yu, Juanzi Li, Jie Tang

Detecting non-factual content is a longstanding goal to increase the trustworthiness of large language models (LLMs) generations.

Question Answering

208

Paper
Code

DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models

1 code implementation • NeurIPS 2023 • XiMing Xing, Chuang Wang, Haitao Zhou, Jing Zhang, Qian Yu, Dong Xu

Even though trained mainly on images, we discover that pretrained diffusion models show impressive power in guiding sketch synthesis.

199

Paper
Code

Referring Image Matting

1 code implementation • CVPR 2023 • Jizhizi Li, Jing Zhang, DaCheng Tao

Different from conventional image matting, which either requires user-defined scribbles/trimap to extract a specific foreground object or directly extracts all the foreground objects in the image indiscriminately, we introduce a new task named Referring Image Matting (RIM) in this paper, which aims to extract the meticulous alpha matte of the specific object that best matches the given natural language description, thus enabling a more natural and simpler instruction for image matting.

Ranked #1 on Referring Image Matting (RefMatte-RW100) on RefMatte

Domain Generalization Image Matting +5

198

Paper
Code

Towards Scale-Aware, Robust, and Generalizable Unsupervised Monocular Depth Estimation by Integrating IMU Motion Dynamics

1 code implementation • 11 Jul 2022 • Sen Zhang, Jing Zhang, DaCheng Tao

Unsupervised monocular depth and ego-motion estimation has drawn extensive research attention in recent years.

Monocular Depth Estimation Motion Estimation +1

192

Paper
Code

UC-Net: Uncertainty Inspired RGB-D Saliency Detection via Conditional Variational Autoencoders

1 code implementation • CVPR 2020 • Jing Zhang, Deng-Ping Fan, Yuchao Dai, Saeed Anwar, Fatemeh Sadat Saleh, Tong Zhang, Nick Barnes

In this paper, we propose the first framework (UCNet) to employ uncertainty for RGB-D saliency detection by learning from the data labeling process.

Ranked #4 on RGB-D Salient Object Detection on LFSD

RGB-D Salient Object Detection Saliency Detection +1

174

Paper
Code

The KiTS21 Challenge: Automatic segmentation of kidneys, renal tumors, and renal cysts in corticomedullary-phase CT

1 code implementation • 5 Jul 2023 • Nicholas Heller, Fabian Isensee, Dasha Trofimova, Resha Tejpaul, Zhongchen Zhao, Huai Chen, Lisheng Wang, Alex Golts, Daniel Khapun, Daniel Shats, Yoel Shoshan, Flora Gilboa-Solomon, Yasmeen George, Xi Yang, Jianpeng Zhang, Jing Zhang, Yong Xia, Mengran Wu, Zhiyang Liu, Ed Walczak, Sean McSweeney, Ranveer Vasdev, Chris Hornung, Rafat Solaiman, Jamee Schoephoerster, Bailey Abernathy, David Wu, Safa Abdulkadir, Ben Byun, Justice Spriggs, Griffin Struyk, Alexandra Austin, Ben Simpson, Michael Hagstrom, Sierra Virnig, John French, Nitin Venkatesh, Sarah Chan, Keenan Moore, Anna Jacobsen, Susan Austin, Mark Austin, Subodh Regmi, Nikolaos Papanikolopoulos, Christopher Weight

Overall KiTS21 facilitated a significant advancement in the state of the art in kidney tumor segmentation, and provides useful insights that are applicable to the field of semantic segmentation as a whole.

Segmentation Tumor Segmentation

171

Paper
Code

DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Transformer

1 code implementation • 10 Jul 2022 • Maoyuan Ye, Jing Zhang, Shanshan Zhao, Juhua Liu, Bo Du, DaCheng Tao

However, these methods built upon detection transformer framework might achieve sub-optimal training efficiency and performance due to coarse positional query modeling. In addition, the point label form exploited in previous works implies the reading order of humans, which impedes the detection robustness from our observation.

Ranked #3 on Scene Text Detection on SCUT-CTW1500

Inductive Bias Scene Text Detection +1

159

Paper
Code

Deep Image Matting: A Comprehensive Survey

1 code implementation • 10 Apr 2023 • Jizhizi Li, Jing Zhang, DaCheng Tao

Image matting refers to extracting precise alpha matte from natural images, and it plays a critical role in various downstream applications, such as image editing.

Image Matting Referring Image Matting

156

Paper
Code

Progressive LiDAR Adaptation for Road Detection

1 code implementation • 2 Apr 2019 • Zhe Chen, Jing Zhang, DaCheng Tao

To this end, LiDAR sensor data can be incorporated to improve the visual image-based road detection, because LiDAR data is less susceptible to visual noises.

155

Paper
Code

VSA: Learning Varied-Size Window Attention in Vision Transformers

2 code implementations • 18 Apr 2022 • Qiming Zhang, Yufei Xu, Jing Zhang, DaCheng Tao

Attention within windows has been widely explored in vision transformers to balance the performance, computation complexity, and memory footprint.

Instance Segmentation Object Detection +1

147

Paper
Code

Weakly-Supervised Salient Object Detection via Scribble Annotations

1 code implementation • CVPR 2020 • Jing Zhang, Xin Yu, Aixuan Li, Peipei Song, Bowen Liu, Yuchao Dai

In this paper, we propose a weakly-supervised salient object detection model to learn saliency from such annotations.

Edge Detection Object +3

144

Paper
Code

Category Anchor-Guided Unsupervised Domain Adaptation for Semantic Segmentation

1 code implementation • NeurIPS 2019 • Qiming Zhang, Jing Zhang, Wei Liu, DaCheng Tao

Although there has been a progress in matching the marginal distributions between two domains, the classifier favors the source domain features and makes incorrect predictions on the target domain due to category-agnostic feature alignment.

Ranked #24 on Image-to-Image Translation on SYNTHIA-to-Cityscapes

Semantic Segmentation Synthetic-to-Real Translation +1

138

Paper
Code

Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation

1 code implementation • 31 Jan 2024 • Maoyuan Ye, Jing Zhang, Juhua Liu, Chenyu Liu, BaoCai Yin, Cong Liu, Bo Du, DaCheng Tao

In terms of the AMG mode, Hi-SAM segments text stroke foreground masks initially, then samples foreground points for hierarchical text mask generation and achieves layout analysis in passing.

Ranked #1 on Hierarchical Text Segmentation on HierText

Hierarchical Text Segmentation Segmentation +1

132

Paper
Code

AP-10K: A Benchmark for Animal Pose Estimation in the Wild

4 code implementations • 28 Aug 2021 • Hang Yu, Yufei Xu, Jing Zhang, Wei Zhao, Ziyu Guan, DaCheng Tao

The experimental results provide sound empirical evidence on the superiority of learning from diverse animals species in terms of both accuracy and generalization ability.

Animal Pose Estimation Domain Generalization +1

124

Paper
Code

APT-36K: A Large-scale Benchmark for Animal Pose Estimation and Tracking

4 code implementations • 12 Jun 2022 • Yuxiang Yang, Junjie Yang, Yufei Xu, Jing Zhang, Long Lan, DaCheng Tao

Based on APT-36K, we benchmark several representative models on the following three tracks: (1) supervised animal pose estimation on a single frame under intra- and inter-domain transfer learning settings, (2) inter-species domain generalization test for unseen animals, and (3) animal pose estimation with animal tracking.

Animal Pose Estimation Domain Generalization +1

124

Paper
Code

Vision Transformer with Quadrangle Attention

1 code implementation • 27 Mar 2023 • Qiming Zhang, Jing Zhang, Yufei Xu, DaCheng Tao

Window-based attention has become a popular choice in vision transformers due to its superior performance, lower computational complexity, and less memory footprint.

object-detection Object Detection +2

123

Paper
Code

P2C: Self-Supervised Point Cloud Completion from Single Partial Clouds

1 code implementation • ICCV 2023 • Ruikai Cui, Shi Qiu, Saeed Anwar, Jiawei Liu, Chaoyue Xing, Jing Zhang, Nick Barnes

Point cloud completion aims to recover the complete shape based on a partial observation.

Point Cloud Completion

122

Paper
Code

ISNet: Shape Matters for Infrared Small Target Detection

1 code implementation • CVPR 2022 • Mingjin Zhang, Rui Zhang, Yuxiang Yang, Haichen Bai, Jing Zhang, Jie Guo

TOAA block calculates the low-level information with attention mechanism in both row and column directions and fuses it with the high-level information to capture the shape characteristic of targets and suppress noises.

Management

110

Paper
Code

Visual Semantics Allow for Textual Reasoning Better in Scene Text Recognition

1 code implementation • AAAI 2022 2021 • Yue He, Chen Chen, Jing Zhang, Juhua Liu, Fengxiang He, Chaoyue Wang, Bo Du

Technically, given the character segmentation maps predicted by a VR model, we construct a subgraph for each instance, where nodes represent the pixels in it and edges are added between nodes based on their spatial similarity.

Ranked #10 on Scene Text Recognition on ICDAR2015 (using extra training data)

Language Modelling Scene Text Recognition

105

Paper
Code

SVGDreamer: Text Guided SVG Generation with Diffusion Model

1 code implementation • 27 Dec 2023 • XiMing Xing, Haitao Zhou, Chuang Wang, Jing Zhang, Dong Xu, Qian Yu

However, existing text-to-SVG generation methods lack editability and struggle with visual quality and result diversity.

Vector Graphics

102

Paper
Code

Pruning Self-attentions into Convolutional Layers in Single Path

3 code implementations • 23 Nov 2021 • Haoyu He, Jianfei Cai, Jing Liu, Zizheng Pan, Jing Zhang, DaCheng Tao, Bohan Zhuang

Relying on the single-path space, we introduce learnable binary gates to encode the operation choices in MSA layers.

Ranked #18 on Efficient ViTs on ImageNet-1K (with DeiT-T)

Efficient ViTs Inductive Bias +1

100

Paper
Code

MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining

1 code implementation • 20 Mar 2024 • Di Wang, Jing Zhang, Minqiang Xu, Lin Liu, Dongsheng Wang, Erzhong Gao, Chengxi Han, HaoNan Guo, Bo Du, DaCheng Tao, Liangpei Zhang

However, transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks.

Ranked #1 on Semantic Segmentation on SpaceNet 1 (using extra training data)

Aerial Scene Classification Building change detection for remote sensing images +13

Paper
Code

Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers

1 code implementation • 27 Jul 2021 • Wen Wang, Yang Cao, Jing Zhang, Fengxiang He, Zheng-Jun Zha, Yonggang Wen, DaCheng Tao

In DQFA, a novel domain query is used to aggregate and align global context from the token sequence of both domains.

Decoder Domain Adaptation +3

Paper
Code

A Comprehensive Survey and Taxonomy on Single Image Dehazing Based on Deep Learning

1 code implementation • 7 Jun 2021 • Jie Gui, Xiaofeng Cong, Yuan Cao, Wenqi Ren, Jun Zhang, Jing Zhang, Jiuxin Cao, DaCheng Tao

With the development of convolutional neural networks, hundreds of deep learning based dehazing methods have been proposed.

Image Dehazing Single Image Dehazing

Paper
Code

Subgraph Retrieval Enhanced Model for Multi-hop Knowledge Base Question Answering

1 code implementation • ACL 2022 • Jing Zhang, Xiaokang Zhang, Jifan Yu, Jian Tang, Jie Tang, Cuiping Li, Hong Chen

Recent works on knowledge base question answering (KBQA) retrieve subgraphs for easier reasoning.

Knowledge Base Question Answering Retrieval

Paper
Code

SASA: Semantics-Augmented Set Abstraction for Point-based 3D Object Detection

1 code implementation • 6 Jan 2022 • Chen Chen, Zhe Chen, Jing Zhang, DaCheng Tao

We observe that the prevailing set abstraction design for down-sampling points may maintain too much unimportant background information that can affect feature learning for detecting objects.

3D Object Detection object-detection

Paper
Code

Event-based Simultaneous Localization and Mapping: A Comprehensive Survey

1 code implementation • 19 Apr 2023 • Kunping Huang, Sen Zhang, Jing Zhang, DaCheng Tao

This paper presents a timely and comprehensive review of event-based vSLAM algorithms that exploit the benefits of asynchronous and irregular event streams for localization and mapping tasks.

Motion Compensation Simultaneous Localization and Mapping

Paper
Code

Towards Data-Efficient Detection Transformers

2 code implementations • 17 Mar 2022 • Wen Wang, Jing Zhang, Yang Cao, Yongliang Shen, DaCheng Tao

Besides, we introduce a simple yet effective label augmentation method to provide richer supervision and improve data efficiency.

Paper
Code

Rethinking Portrait Matting with Privacy Preserving

1 code implementation • 31 Mar 2022 • Sihan Ma, Jizhizi Li, Jing Zhang, He Zhang, DaCheng Tao

P3M-10k consists of 10, 421 high resolution face-blurred portrait images along with high-quality alpha mattes, which enables us to systematically evaluate both trimap-free and trimap-based matting methods and obtain some useful findings about model generalization ability under the privacy preserving training (PPT) setting.

Ranked #1 on Image Matting on P3M-10k

Domain Generalization Image Matting +1

Paper
Code

JPerceiver: Joint Perception Network for Depth, Pose and Layout Estimation in Driving Scenes

1 code implementation • 16 Jul 2022 • Haimei Zhao, Jing Zhang, Sen Zhang, DaCheng Tao

A naive way is to accomplish them independently in a sequential or parallel manner, but there are many drawbacks, i. e., 1) the depth and VO results suffer from the inherent scale ambiguity issue; 2) the BEV layout is directly predicted from the front-view image without using any depth-related information, although the depth map contains useful geometry clues for inferring scene layouts.

Autonomous Driving Depth Estimation +3

Paper
Code

CodeS: Towards Building Open-source Language Models for Text-to-SQL

1 code implementation • 26 Feb 2024 • Haoyang Li, Jing Zhang, Hanbing Liu, Ju Fan, Xiaokang Zhang, Jun Zhu, Renjie Wei, Hongyan Pan, Cuiping Li, Hong Chen

To address the limitations, we introduce CodeS, a series of pre-trained language models with parameters ranging from 1B to 15B, specifically designed for the text-to-SQL task.

Data Augmentation Domain Adaptation +2

Paper
Code

Simultaneously Localize, Segment and Rank the Camouflaged Objects

1 code implementation • CVPR 2021 • Yunqiu Lv, Jing Zhang, Yuchao Dai, Aixuan Li, Bowen Liu, Nick Barnes, Deng-Ping Fan

With the above understanding about camouflaged objects, we present the first ranking based COD network (Rank-Net) to simultaneously localize, segment and rank camouflaged objects.

object-detection Object Detection

Paper
Code

Towards Deeper Understanding of Camouflaged Object Detection

1 code implementation • 23 May 2022 • Yunqiu Lv, Jing Zhang, Yuchao Dai, Aixuan Li, Nick Barnes, Deng-Ping Fan

With the above understanding about camouflaged objects, we present the first triple-task learning framework to simultaneously localize, segment, and rank camouflaged objects, indicating the conspicuousness level of camouflage.

Object object-detection +1

Paper
Code

Progressive One-shot Human Parsing

1 code implementation • 22 Dec 2020 • Haoyu He, Jing Zhang, Bhavani Thuraisingham, DaCheng Tao

In this paper, we devise a novel Progressive One-shot Parsing network (POPNet) to address two critical challenges , i. e., testing bias and small sizes.

Human Parsing Metric Learning +1

Paper
Code

End-to-end One-shot Human Parsing

1 code implementation • 4 May 2021 • Haoyu He, Bohan Zhuang, Jing Zhang, Jianfei Cai, DaCheng Tao

To address three main challenges in OSHP, i. e., small sizes, testing bias, and similar parts, we devise an End-to-end One-shot human Parsing Network (EOP-Net).

Human Parsing Metric Learning +1

Paper
Code

I3CL:Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection

1 code implementation • 3 Aug 2021 • Bo Du, Jian Ye, Jing Zhang, Juhua Liu, DaCheng Tao

Existing methods for arbitrary-shaped text detection in natural scenes face two critical issues, i. e., 1) fracture detections at the gaps in a text instance; and 2) inaccurate detections of arbitrary-shaped text instances with diverse background context.

Ranked #5 on Scene Text Detection on SCUT-CTW1500

Scene Text Detection Text Detection

Paper
Code

Explicit Boundary Guided Semi-Push-Pull Contrastive Learning for Supervised Anomaly Detection

1 code implementation • CVPR 2023 • Xincheng Yao, Ruoqi Li, Jing Zhang, Jun Sun, Chongyang Zhang

In this way, our model can form a more explicit and discriminative decision boundary to distinguish known and also unseen anomalies from normal samples more effectively.

Ranked #3 on Supervised Anomaly Detection on MVTec AD (using extra training data)

Contrastive Learning Supervised Anomaly Detection

Paper
Code

Bi-Temporal Semantic Reasoning for the Semantic Change Detection in HR Remote Sensing Images

1 code implementation • 13 Aug 2021 • Lei Ding, Haitao Guo, Sicong Liu, Lichao Mou, Jing Zhang, Lorenzo Bruzzone

Recent studies indicate that the SCD can be modeled through a triple-branch Convolutional Neural Network (CNN), which contains two temporal branches and a change branch.

Change Detection

Paper
Code

GLT-T: Global-Local Transformer Voting for 3D Single Object Tracking in Point Clouds

2 code implementations • 20 Nov 2022 • Jiahao Nie, Zhiwei He, Yuxiang Yang, Mingyu Gao, Jing Zhang

Technically, a global-local transformer (GLT) module is employed to integrate object- and patch-aware prior into seed point features to effectively form strong feature representation for geometric positions of the seed points, thus providing more robust and accurate cues for offset learning.

3D Single Object Tracking Object Tracking +1

Paper
Code

OSP2B: One-Stage Point-to-Box Network for 3D Siamese Tracking

2 code implementations • 23 Apr 2023 • Jiahao Nie, Zhiwei He, Yuxiang Yang, Zhengyi Bao, Mingyu Gao, Jing Zhang

By integrating the derived classification scores with the center-ness scores, the resulting network can effectively suppress interference proposals and further mitigate task misalignment.

3D Single Object Tracking Object Tracking

Paper
Code

BEVTrack: A Simple and Strong Baseline for 3D Single Object Tracking in Bird's-Eye View

1 code implementation • 5 Sep 2023 • Yuxiang Yang, Yingqi Deng, Jing Zhang, Jiahao Nie, Zheng-Jun Zha

The spatial information indicating objects' spatial adjacency across consecutive frames is crucial for effective object tracking.

3D Single Object Tracking Autonomous Driving +2

Paper
Code

Grapy-ML: Graph Pyramid Mutual Learning for Cross-dataset Human Parsing

1 code implementation • 27 Nov 2019 • Haoyu He, Jing Zhang, Qiming Zhang, DaCheng Tao

In this paper, we propose a novel GRAph PYramid Mutual Learning (Grapy-ML) method to address the cross-dataset human parsing problem, where the annotations are at different granularities.

Human Parsing Semantic Segmentation

Paper
Code

GLM-Dialog: Noise-tolerant Pre-training for Knowledge-grounded Dialogue Generation

1 code implementation • 28 Feb 2023 • Jing Zhang, Xiaokang Zhang, Daniel Zhang-li, Jifan Yu, Zijun Yao, Zeyao Ma, Yiqi Xu, Haohua Wang, Xiaohan Zhang, Nianyi Lin, Sunrui Lu, Juanzi Li, Jie Tang

We present GLM-Dialog, a large-scale language model (LLM) with 10B parameters capable of knowledge-grounded conversation in Chinese using a search engine to access the Internet knowledge.

Dialogue Evaluation Dialogue Generation +2

Paper
Code

One-Shot Affordance Detection

2 code implementations • 28 Jun 2021 • Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, DaCheng Tao

To empower robots with this ability in unseen scenarios, we consider the challenging one-shot affordance detection problem in this paper, i. e., given a support image that depicts the action purpose, all objects in a scene with the common affordance should be detected.

4k Affordance Detection

Paper
Code

One-Shot Object Affordance Detection in the Wild

1 code implementation • 8 Aug 2021 • Wei Zhai, Hongchen Luo, Jing Zhang, Yang Cao, DaCheng Tao

To empower robots with this ability in unseen scenarios, we first study the challenging one-shot affordance detection problem in this paper, i. e., given a support image that depicts the action purpose, all objects in a scene with the common affordance should be detected.

Action Recognition Affordance Detection +3

Paper
Code

RGB-D Saliency Detection via Cascaded Mutual Information Minimization

1 code implementation • ICCV 2021 • Jing Zhang, Deng-Ping Fan, Yuchao Dai, Xin Yu, Yiran Zhong, Nick Barnes, Ling Shao

In this paper, we introduce a novel multi-stage cascaded learning framework via mutual information minimization to "explicitly" model the multi-modal information between RGB image and depth data.

Ranked #5 on Thermal Image Segmentation on RGB-T-Glass-Segmentation

Saliency Detection Thermal Image Segmentation

Paper
Code

AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation

1 code implementation • 13 Nov 2023 • Junyang Wang, Yuhang Wang, Guohai Xu, Jing Zhang, Yukai Gu, Haitao Jia, Jiaqi Wang, Haiyang Xu, Ming Yan, Ji Zhang, Jitao Sang

Despite making significant progress in multi-modal tasks, current Multi-modal Large Language Models (MLLMs) encounter the significant challenge of hallucinations, which may lead to harmful consequences.

Attribute Hallucination +2

Paper
Code

Looking Outside the Window: Wide-Context Transformer for the Semantic Segmentation of High-Resolution Remote Sensing Images

1 code implementation • 29 Jun 2021 • Lei Ding, Dong Lin, Shaofu Lin, Jing Zhang, Xiaojie Cui, Yuebin Wang, Hao Tang, Lorenzo Bruzzone

To overcome this limitation, we propose a Wide-Context Network (WiCoNet) for the semantic segmentation of HR RSIs.

Image Cropping Semantic Segmentation

Paper
Code

PolyphonicFormer: Unified Query Learning for Depth-aware Video Panoptic Segmentation

1 code implementation • 5 Dec 2021 • Haobo Yuan, Xiangtai Li, Yibo Yang, Guangliang Cheng, Jing Zhang, Yunhai Tong, Lefei Zhang, DaCheng Tao

The Depth-aware Video Panoptic Segmentation (DVPS) is a new challenging vision problem that aims to predict panoptic segmentation and depth in a video simultaneously.

Ranked #1 on Depth-aware Video Panoptic Segmentation on SemKITTI-DVPS

Depth-aware Video Panoptic Segmentation Depth Estimation +4

Paper
Code

Salient Objects in Clutter

2 code implementations • 7 May 2021 • Deng-Ping Fan, Jing Zhang, Gang Xu, Ming-Ming Cheng, Ling Shao

This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.

Image Augmentation Object +4

Paper
Code

Dynamic Focus-aware Positional Queries for Semantic Segmentation

2 code implementations • CVPR 2023 • Haoyu He, Jianfei Cai, Zizheng Pan, Jing Liu, Jing Zhang, DaCheng Tao, Bohan Zhuang

In this paper, we propose a simple yet effective query design for semantic segmentation termed Dynamic Focus-aware Positional Queries (DFPQ), which dynamically generates positional queries conditioned on the cross-attention scores from the preceding decoder block and the positional encodings for the corresponding image features, simultaneously.

Ranked #21 on Semantic Segmentation on ADE20K

Decoder Semantic Segmentation

Paper
Code

Nighttime Dehazing with a Synthetic Benchmark

1 code implementation • 10 Aug 2020 • Jing Zhang, Yang Cao, Zheng-Jun Zha, DaCheng Tao

To address this issue, we propose a novel synthetic method called 3R to simulate nighttime hazy images from daytime clear images, which first reconstructs the scene geometry, then simulates the light rays and object reflectance, and finally renders the haze effects.

Decoder

Paper
Code

Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning

1 code implementation • ICCV 2023 • Haoyu He, Jianfei Cai, Jing Zhang, DaCheng Tao, Bohan Zhuang

Visual Parameter-Efficient Fine-Tuning (PEFT) has become a powerful alternative for full fine-tuning so as to adapt pre-trained vision models to downstream tasks, which only tunes a small number of parameters while freezing the vast majority ones to ease storage burden and optimization difficulty.

Paper
Code

TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios

1 code implementation • 28 Mar 2024 • Xiaokang Zhang, Jing Zhang, Zeyao Ma, Yang Li, Bohan Zhang, Guanlin Li, Zijun Yao, Kangli Xu, Jinchang Zhou, Daniel Zhang-li, Jifan Yu, Shu Zhao, Juanzi Li, Jie Tang

We introduce TableLLM, a robust large language model (LLM) with 13 billion parameters, purpose-built for proficiently handling tabular data manipulation tasks, whether they are embedded within documents or spreadsheets, catering to real-world office scenarios.

Language Modelling Large Language Model

Paper
Code

Dense Uncertainty Estimation

1 code implementation • 13 Oct 2021 • Jing Zhang, Yuchao Dai, Mochu Xiang, Deng-Ping Fan, Peyman Moghadam, Mingyi He, Christian Walder, Kaihao Zhang, Mehrtash Harandi, Nick Barnes

Deep neural networks can be roughly divided into deterministic neural networks and stochastic neural networks. The former is usually trained to achieve a mapping from input space to output space via maximum likelihood estimation for the weights, which leads to deterministic predictions during testing.

Decision Making

Paper
Code

Uncertainty-aware Joint Salient Object and Camouflaged Object Detection

2 code implementations • CVPR 2021 • Aixuan Li, Jing Zhang, Yunqiu Lv, Bowen Liu, Tong Zhang, Yuchao Dai

Visual salient object detection (SOD) aims at finding the salient object(s) that attract human attention, while camouflaged object detection (COD) on the contrary intends to discover the camouflaged object(s) that hidden in the surrounding.

Object object-detection +2

Paper
Code

Towards Explainable 3D Grounded Visual Question Answering: A New Benchmark and Strong Baseline

1 code implementation • 24 Sep 2022 • Lichen Zhao, Daigang Cai, Jing Zhang, Lu Sheng, Dong Xu, Rui Zheng, Yinjie Zhao, Lipeng Wang, Xibo Fan

We also propose a new 3D VQA framework to effectively predict the completely visually grounded and explainable answer.

Question Answering Visual Question Answering

Paper
Code

A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends

1 code implementation • 13 Jan 2023 • Jie Gui, Tuo Chen, Jing Zhang, Qiong Cao, Zhenan Sun, Hao Luo, DaCheng Tao

Deep supervised learning algorithms typically require a large volume of labeled data to achieve satisfactory performance.

Self-Supervised Learning

Paper
Code

SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation

1 code implementation • 17 Aug 2023 • Wenxi Yue, Jing Zhang, Kun Hu, Yong Xia, Jiebo Luo, Zhiyong Wang

However, we observe two problems with this naive pipeline: (1) the domain gap between natural objects and surgical instruments leads to inferior generalisation of SAM; and (2) SAM relies on precise point or box locations for accurate segmentation, requiring either extensive manual guidance or a well-performing specialist detector for prompt preparation, which leads to a complex multi-stage pipeline.

Image Segmentation Segmentation +1

Paper
Code

SurgicalPart-SAM: Part-to-Whole Collaborative Prompting for Surgical Instrument Segmentation

2 code implementations • 22 Dec 2023 • Wenxi Yue, Jing Zhang, Kun Hu, Qiuxia Wu, ZongYuan Ge, Yong Xia, Jiebo Luo, Zhiyong Wang

Specifically, we achieve this by proposing (1) Collaborative Prompts that describe instrument structures via collaborating category-level and part-level texts; (2) Cross-Modal Prompt Encoder that encodes text prompts jointly with visual embeddings into discriminative part-level representations; and (3) Part-to-Whole Adaptive Fusion and Hierarchical Decoding that adaptively fuse the part-level representations into a whole for accurate instrument segmentation in surgical scenarios.

Segmentation Semantic Segmentation

Paper
Code

Weakly Supervised Video Salient Object Detection

1 code implementation • CVPR 2021 • Wangbo Zhao, Jing Zhang, Long Li, Nick Barnes, Nian Liu, Junwei Han

Significant performance improvement has been achieved for fully-supervised video salient object detection with the pixel-wise labeled training datasets, which are time-consuming and expensive to obtain.

Object object-detection +4

Paper
Code

DCN-T: Dual Context Network with Transformer for Hyperspectral Image Classification

2 code implementations • 19 Apr 2023 • Di Wang, Jing Zhang, Bo Du, Liangpei Zhang, DaCheng Tao

Hyperspectral image (HSI) classification is challenging due to spatial variability caused by complex imaging conditions.

Hyperspectral Image Classification Image Generation

Paper
Code

Recurrent Glimpse-based Decoder for Detection with Transformer

1 code implementation • CVPR 2022 • Zhe Chen, Jing Zhang, DaCheng Tao

Then, a glimpse-based decoder is introduced to provide refined detection results based on both the glimpse features and the attention modeling outputs of the previous stage.

Ranked #1 on Object Detection on MS COCO (GFlops metric)

Decoder Object Detection

Paper
Code

Generative Transformer for Accurate and Reliable Salient Object Detection

2 code implementations • 20 Apr 2021 • Yuxin Mao, Jing Zhang, Zhexiong Wan, Yuchao Dai, Aixuan Li, Yunqiu Lv, Xinyu Tian, Deng-Ping Fan, Nick Barnes

For the former, we apply transformer to a deterministic model, and explain that the effective structure modeling and global context modeling abilities lead to its superior performance compared with the CNN based frameworks.

Attribute Camouflaged Object Segmentation +8

Paper
Code

Learning Affordance Grounding from Exocentric Images

2 code implementations • CVPR 2022 • Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, DaCheng Tao

To empower an agent with such ability, this paper proposes a task of affordance grounding from exocentric view, i. e., given exocentric human-object interaction and egocentric object images, learning the affordance knowledge of the object and transferring it to the egocentric image using only the affordance label as supervision.

Human-Object Interaction Detection Object +1

Paper
Code

Grounded Affordance from Exocentric View

2 code implementations • 28 Aug 2022 • Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, DaCheng Tao

Due to the diversity of interactive affordance, the uniqueness of different individuals leads to diverse interactions, which makes it difficult to establish an explicit link between object parts and affordance labels.

Human-Object Interaction Detection Object +1

Paper
Code

ReAct: Temporal Action Detection with Relational Queries

1 code implementation • 14 Jul 2022 • Dingfeng Shi, Yujie Zhong, Qiong Cao, Jing Zhang, Lin Ma, Jia Li, DaCheng Tao

Moreover, we propose two losses to facilitate and stabilize the training of action classification.

Ranked #15 on Temporal Action Localization on THUMOS’14

Action Classification Action Detection +5

Paper
Code

FAMED-Net: A Fast and Accurate Multi-scale End-to-end Dehazing Network

1 code implementation • 11 Jun 2019 • Jing Zhang, DaCheng Tao

Single image dehazing is a critical image pre-processing step for subsequent high-level computer vision tasks.

Computational Efficiency Image Dehazing +1

Paper
Code

Learn, Imagine and Create: Text-to-Image Generation from Prior Knowledge

1 code implementation • NeurIPS 2019 • Tingting Qiao, Jing Zhang, Duanqing Xu, DaCheng Tao

Given a text description, we immediately imagine an overall visual impression using this prior and, based on this, we draw a picture by progressively adding more and more details.

Text-to-Image Generation

Paper
Code

Pre-Training Graph Neural Networks for Cold-Start Users and Items Representation

1 code implementation • 13 Dec 2020 • Bowen Hao, Jing Zhang, Hongzhi Yin, Cuiping Li, Hong Chen

Cold-start problem is a fundamental challenge for recommendation tasks.

Paper
Code

RegionCL: Can Simple Region Swapping Contribute to Contrastive Learning?

2 code implementations • 24 Nov 2021 • Yufei Xu, Qiming Zhang, Jing Zhang, DaCheng Tao

In this paper, we make the first attempt to demonstrate the importance of both regions in cropping from a complete perspective and propose a simple yet effective pretext task called Region Contrastive Learning (RegionCL).

Contrastive Learning

Paper
Code

GLT-T++: Global-Local Transformer for 3D Siamese Tracking with Ranking Loss

1 code implementation • 1 Apr 2023 • Jiahao Nie, Zhiwei He, Yuxiang Yang, Xudong Lv, Mingyu Gao, Jing Zhang

Incorporating this transformer-based voting scheme into 3D RPN, a novel Siamese method dubbed GLT-T is developed for 3D single object tracking on point clouds.

3D Single Object Tracking Object Tracking +1

Paper
Code

Web-Scale Academic Name Disambiguation: the WhoIsWho Benchmark, Leaderboard, and Toolkit

1 code implementation • 23 Feb 2023 • Bo Chen, Jing Zhang, Fanjin Zhang, Tianyi Han, Yuqing Cheng, Xiaoyan Li, Yuxiao Dong, Jie Tang

The toolkit is at https://github. com/THUDM/WhoIsWho.

Data Integration

Paper
Code

Joint Spatio-Temporal Modeling for the Semantic Change Detection in Remote Sensing Images

1 code implementation • 10 Dec 2022 • Lei Ding, Jing Zhang, Kai Zhang, Haitao Guo, Bing Liu, Lorenzo Bruzzone

Semantic Change Detection (SCD) refers to the task of simultaneously extracting the changed areas and the semantic categories (before and after the changes) in Remote Sensing Images (RSIs).

Ranked #1 on Change Detection on SECOND

Change Detection

Paper
Code

GraMMaR: Ground-aware Motion Model for 3D Human Motion Reconstruction

1 code implementation • 29 Jun 2023 • Sihan Ma, Qiong Cao, Hongwei Yi, Jing Zhang, DaCheng Tao

Demystifying complex human-ground interactions is essential for accurate and realistic 3D human motion reconstruction from RGB videos, as it ensures consistency between the humans and the ground plane.

Paper
Code

Graph Contrastive Learning for Anomaly Detection

2 code implementations • 17 Aug 2021 • Bo Chen, Jing Zhang, Xiaokang Zhang, Yuxiao Dong, Jian Song, Peng Zhang, Kaibo Xu, Evgeny Kharlamov, Jie Tang

To achieve the contrastive objective, we design a graph neural network encoder that can infer and further remove suspicious links during message passing, as well as learn the global context of the input graph.

Anomaly Detection Binary Classification +2

Paper
Code

Panther: Fast Top-k Similarity Search in Large Networks

2 code implementations • 10 Apr 2015 • Jing Zhang, Jie Tang, Cong Ma, Hanghang Tong, Yu Jing, Juanzi Li

The algorithm is based on a novel idea of random path, and an extended method is also presented, to enhance the structural similarity when two vertices are completely disconnected.

Social and Information Networks

Paper
Code

CLAMP: Prompt-based Contrastive Learning for Connecting Language and Animal Pose

1 code implementation • CVPR 2023 • Xu Zhang, Wen Wang, Zhe Chen, Yufei Xu, Jing Zhang, DaCheng Tao

Motivated by the progress of visual-language research, we propose that pre-trained language models (e. g., CLIP) can facilitate animal pose estimation by providing rich prior knowledge for describing animal keypoints in text.

Animal Pose Estimation Contrastive Learning

Paper
Code

MirrorGAN: Learning Text-to-image Generation by Redescription

2 code implementations • CVPR 2019 • Tingting Qiao, Jing Zhang, Duanqing Xu, DaCheng Tao

Generating an image from a given text description has two goals: visual realism and semantic consistency.

Ranked #8 on Text-to-Image Generation on CUB (Inception score metric)

Sentence Text-to-Image Generation

Paper
Code

Out-of-boundary View Synthesis Towards Full-Frame Video Stabilization

1 code implementation • ICCV 2021 • Yufei Xu, Jing Zhang, DaCheng Tao

However, since the view outside the boundary is not available during warping, the resulting holes around the boundary of the stabilized frame must be discarded (i. e., cropping) to maintain visual consistency, and thus does leads to a tradeoff between stability and cropping ratio.

Video Stabilization

Paper
Code

BMD: A General Class-balanced Multicentric Dynamic Prototype Strategy for Source-free Domain Adaptation

1 code implementation • 6 Apr 2022 • Sanqing Qu, Guang Chen, Jing Zhang, Zhijun Li, wei he, DaCheng Tao

Source-free Domain Adaptation (SFDA) aims to adapt a pre-trained source model to the unlabeled target domain without accessing the well-labeled source data, which is a much more practical setting due to the data privacy, security, and transmission issues.

Clustering Pseudo Label +1

Paper
Code

Towards High Performance Human Keypoint Detection

1 code implementation • 3 Feb 2020 • Jing Zhang, Zhe Chen, DaCheng Tao

Human keypoint detection from a single image is very challenging due to occlusion, blur, illumination and scale variance.

Ranked #5 on Pose Estimation on COCO test-dev

Human Detection Keypoint Detection +1

Paper
Code

Feature Decomposition for Reducing Negative Transfer: A Novel Multi-task Learning Method for Recommender System

1 code implementation • 10 Feb 2023 • Jie zhou, Qian Yu, Chuan Luo, Jing Zhang

In recent years, thanks to the rapid development of deep learning (DL), DL-based multi-task learning (MTL) has made significant progress, and it has been successfully applied to recommendation systems (RS).

Multi-Task Learning Recommendation Systems

Paper
Code

FIBA: Frequency-Injection based Backdoor Attack in Medical Image Analysis

3 code implementations • CVPR 2022 • Yu Feng, Benteng Ma, Jing Zhang, Shanshan Zhao, Yong Xia, DaCheng Tao

However, designing a unified BA method that can be applied to various MIA systems is challenging due to the diversity of imaging modalities (e. g., X-Ray, CT, and MRI) and analysis tasks (e. g., classification, detection, and segmentation).

Artifact Detection Backdoor Attack +6

Paper
Code

ESSAformer: Efficient Transformer for Hyperspectral Image Super-resolution

1 code implementation • ICCV 2023 • Mingjin Zhang, Chi Zhang, Qiming Zhang, Jie Guo, Xinbo Gao, Jing Zhang

Single hyperspectral image super-resolution (single-HSI-SR) aims to restore a high-resolution hyperspectral image from a low-resolution observation.

Hyperspectral Image Super-Resolution Image Super-Resolution

Paper
Code

Data-driven Estimation of Origin-Destination Demand and User Cost Functions for the Optimization of Transportation Networks

2 code implementations • 29 Oct 2016 • Jing Zhang, Sepideh Pourazarm, Christos G. Cassandras, Ioannis Ch. Paschalidis

In earlier work (Zhang et al., 2016) we used actual traffic data from the Eastern Massachusetts transportation network in the form of spatial average speeds and road segment flow capacities in order to estimate Origin-Destination (OD) flow demand matrices for the network.

Systems and Control 90B06

Paper
Code

Data-Driven Estimation of Travel Latency Cost Functions via Inverse Optimization in Multi-Class Transportation Networks

2 code implementations • 11 Mar 2017 • Jing Zhang, Ioannis Ch. Paschalidis

We develop a method to estimate from data travel latency cost functions in multi-class transportation networks, which accommodate different types of vehicles with very different characteristics (e. g., cars and trucks).

Systems and Control Optimization and Control 90C33, 90C90, 90C30

Paper
Code

LineaRE: Simple but Powerful Knowledge Graph Embedding for Link Prediction

1 code implementation • 21 Apr 2020 • Yanhui Peng, Jing Zhang

Specifically, we regard knowledge graph embedding as a simple linear regression task, where a relation is modeled as a linear function of two low-dimensional vector-presented entities with two weight vectors and a bias vector.

Ranked #1 on Link Prediction on FB15k

Knowledge Graph Embedding Knowledge Graphs +1

Paper
Code

Confidence-Aware Learning for Camouflaged Object Detection

1 code implementation • 22 Jun 2021 • Jiawei Liu, Jing Zhang, Nick Barnes

Then, we concatenate it with the input image and feed it to the confidence estimation network to produce an one channel confidence map. We generate dynamic supervision for the confidence estimation network, representing the agreement of camouflage prediction with the ground truth camouflage map.

Object object-detection +1

Paper
Code

RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation

1 code implementation • 3 Jul 2023 • Yonglin Li, Jing Zhang, Xiao Teng, Long Lan

However, it lacks proficiency in referring video object segmentation (RVOS) due to the need for precise user-interactive prompts and a limited understanding of different modalities, such as language and vision.

Image Segmentation Referring Expression +4

Paper
Code

Statistical Anomaly Detection via Composite Hypothesis Testing for Markov Models

2 code implementations • 27 Feb 2017 • Jing Zhang, Ioannis Ch. Paschalidis

Under Markovian assumptions, we leverage a Central Limit Theorem (CLT) for the empirical measure in the test statistic of the composite hypothesis Hoeffding test so as to establish weak convergence results for the test statistic, and, thereby, derive a new estimator for the threshold needed by the test.

Anomaly Detection Two-sample testing

Paper
Code

DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation

1 code implementation • 20 Jul 2021 • Li Gao, Jing Zhang, Lefei Zhang, DaCheng Tao

In addition, feature-level alignment is carried out by aligning the feature maps of the source and target images from student network using a weighted maximum mean discrepancy loss.

Ranked #18 on Synthetic-to-Real Translation on SYNTHIA-to-Cityscapes

Semantic Segmentation Synthetic-to-Real Translation +1

Paper
Code

Auto Learning Attention

1 code implementation • NeurIPS 2020 • Benteng Ma, Jing Zhang, Yong Xia, DaCheng Tao

Attention modules have been demonstrated effective in strengthening the representation ability of a neural network via reweighting spatial or channel features or stacking both operations sequentially.

Image Classification Keypoint Detection +2

Paper
Code

FakeCLR: Exploring Contrastive Learning for Solving Latent Discontinuity in Data-Efficient GANs

1 code implementation • 18 Jul 2022 • Ziqiang Li, Chaoyue Wang, Heliang Zheng, Jing Zhang, Bin Li

Since data augmentation strategies have largely alleviated the training instability, how to further improve the generative performance of DE-GANs becomes a hotspot.

Contrastive Learning Data Augmentation

Paper
Code

Deep Interest Highlight Network for Click-Through Rate Prediction in Trigger-Induced Recommendation

1 code implementation • 5 Feb 2022 • Qijie Shen, Hong Wen, Wanjie Tao, Jing Zhang, Fuyu Lv, Zulong Chen, Zhao Li

In many classical e-commerce platforms, personalized recommendation has been proven to be of great business value, which can improve user satisfaction and increase the revenue of platforms.

Click-Through Rate Prediction

Paper
Code

Toward Real-world Single Image Deraining: A New Benchmark and Beyond

1 code implementation • 11 Jun 2022 • Wei Li, Qiming Zhang, Jing Zhang, Zhen Huang, Xinmei Tian, DaCheng Tao

To address these issues, we establish a new high-quality dataset named RealRain-1k, consisting of $1, 120$ high-resolution paired clean and rainy images with low- and high-density rain streaks, respectively.

Domain Generalization Image Restoration +2

Paper
Code

SimDistill: Simulated Multi-modal Distillation for BEV 3D Object Detection

2 code implementations • 29 Mar 2023 • Haimei Zhao, Qiming Zhang, Shanshan Zhao, Zhe Chen, Jing Zhang, DaCheng Tao

Multi-view camera-based 3D object detection has become popular due to its low cost, but accurately inferring 3D geometry solely from camera data remains challenging and may lead to inferior performance.

3D Object Detection Knowledge Distillation +1

Paper
Code

A Semi-supervised Nighttime Dehazing Baseline with Spatial-Frequency Aware and Realistic Brightness Constraint

1 code implementation • 27 Mar 2024 • Xiaofeng Cong, Jie Gui, Jing Zhang, JunMing Hou, Hao Shen

There are two distinctions between nighttime and daytime haze.

Image Dehazing Pseudo Label

Paper
Code

Understanding WeChat User Preferences and "Wow" Diffusion

1 code implementation • 4 Mar 2021 • Fanjin Zhang, Jie Tang, Xueyi Liu, Zhenyu Hou, Yuxiao Dong, Jing Zhang, Xiao Liu, Ruobing Xie, Kai Zhuang, Xu Zhang, Leyu Lin, Philip S. Yu

"Top Stories" is a novel friend-enhanced recommendation engine in WeChat, in which users can read articles based on preferences of both their own and their friends.

Graph Representation Learning Social and Information Networks

Paper
Code

An Energy-Based Prior for Generative Saliency

1 code implementation • 19 Apr 2022 • Jing Zhang, Jianwen Xie, Nick Barnes, Ping Li

We propose a novel generative saliency prediction framework that adopts an informative energy-based model as a prior distribution.

object-detection RGB-D Salient Object Detection +3

Paper
Code

HOSMEL: A Hot-Swappable Modularized Entity Linking Toolkit for Chinese

1 code implementation • ACL 2022 • Daniel Zhang-li, Jing Zhang, Jifan Yu, Xiaokang Zhang, Peng Zhang, Jie Tang, Juanzi Li

We investigate the usage of entity linking (EL)in downstream tasks and present the first modularized EL toolkit for easy task adaptation.

Entity Linking Question Answering

Paper
Code

RPEFlow: Multimodal Fusion of RGB-PointCloud-Event for Joint Optical Flow and Scene Flow Estimation

1 code implementation • ICCV 2023 • Zhexiong Wan, Yuxin Mao, Jing Zhang, Yuchao Dai

Recently, the RGB images and point clouds fusion methods have been proposed to jointly estimate 2D optical flow and 3D scene flow.

Optical Flow Estimation Scene Flow Estimation

Paper
Code

Recommending Courses in MOOCs for Jobs: An Auto Weak Supervision Approach

1 code implementation • 28 Dec 2020 • Bowen Hao, Jing Zhang, Cuiping Li, Hong Chen, Hongzhi Yin

On the one hand, the framework enables training multiple supervised ranking models upon the pseudo labels produced by multiple unsupervised ranking models.

Paper
Code

Invertible Attention

1 code implementation • 16 Jun 2021 • Jiajun Zha, Yiran Zhong, Jing Zhang, Richard Hartley, Liang Zheng

Attention has been proved to be an efficient mechanism to capture long-range dependencies.

Image Reconstruction

Paper
Code

Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter

1 code implementation • 6 Sep 2023 • Jinglong Wang, Xiawei Li, Jing Zhang, Qingyuan Xu, Qin Zhou, Qian Yu, Lu Sheng, Dong Xu

The pre-trained text-image discriminative models, such as CLIP, has been explored for open-vocabulary semantic segmentation with unsatisfactory results due to the loss of crucial localization information and awareness of object shapes.

Contrastive Learning Denoising +5

Paper
Code

Transmission-Guided Bayesian Generative Model for Smoke Segmentation

1 code implementation • 2 Mar 2023 • Siyuan Yan, Jing Zhang, Nick Barnes

To effectively model the two types of uncertainty, we introduce a Bayesian generative model to simultaneously estimate the posterior distribution of model parameters and its predictions.

Image Dehazing Image Segmentation +2

Paper
Code

Scalable Mask Annotation for Video Text Spotting

1 code implementation • 2 May 2023 • Haibin He, Jing Zhang, Mengyang Xu, Juhua Liu, Bo Du, DaCheng Tao

Video text spotting refers to localizing, recognizing, and tracking textual elements such as captions, logos, license plates, signs, and other forms of text within consecutive video frames.

Text Spotting

Paper
Code

Watermarking for Out-of-distribution Detection

1 code implementation • 27 Oct 2022 • Qizhou Wang, Feng Liu, Yonggang Zhang, Jing Zhang, Chen Gong, Tongliang Liu, Bo Han

Out-of-distribution (OOD) detection aims to identify OOD data based on representations extracted from well-trained deep models.

Ranked #20 on Out-of-Distribution Detection on ImageNet-1k vs Places

Out-of-Distribution Detection

Paper
Code

Forest Fire Clustering for Single-cell Sequencing with Iterative Label Propagation and Parallelized Monte Carlo Simulation

1 code implementation • 22 Mar 2021 • Zhanlin Chen, Jeremy Goldwasser, Philip Tuckman, Jason Liu, Jing Zhang, Mark Gerstein

Here, we introduce Forest Fire Clustering, an efficient and interpretable method for cell-type discovery from single-cell data.

Clustering

Paper
Code

GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching

1 code implementation • 13 Jan 2024 • Haibin He, Maoyuan Ye, Jing Zhang, Juhua Liu, DaCheng Tao

In response to this issue, we propose to efficiently turn an off-the-shelf query-based image text spotter into a specialist on video and present a simple baseline termed GoMatching, which focuses the training efforts on tracking while maintaining strong recognition performance.

Text Detection Text Spotting

Paper
Code

Stagewise Unsupervised Domain Adaptation with Adversarial Self-Training for Road Segmentation of Remote Sensing Images

1 code implementation • 28 Aug 2021 • Lefei Zhang, Meng Lan, Jing Zhang, DaCheng Tao

In this paper, we propose a novel stagewise domain adaptation model called RoadDA to address the DS issue in this field.

Road Segmentation Unsupervised Domain Adaptation

Paper
Code

Improving RGB-D Point Cloud Registration by Learning Multi-scale Local Linear Transformation

1 code implementation • 31 Aug 2022 • ZiMing Wang, Xiaoliang Huo, Zhenghao Chen, Jing Zhang, Lu Sheng, Dong Xu

In addition to previous methods that seek correspondences by hand-crafted or learnt geometric features, recent point cloud registration methods have tried to apply RGB-D data to achieve more accurate correspondence.

Point Cloud Registration

Paper
Code

Multimodal Variational Auto-encoder based Audio-Visual Segmentation

1 code implementation • ICCV 2023 • Yuxin Mao, Jing Zhang, Mochu Xiang, Yiran Zhong, Yuchao Dai

To achieve this, our ECMVAE factorizes the representations of each modality with a modality-shared representation and a modality-specific representation.

Attribute Representation Learning

Paper
Code

Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning

1 code implementation • 1 Feb 2024 • Jitao Sang, Yuhang Wang, Jing Zhang, Yanxu Zhu, Chao Kong, Junhong Ye, Shuyu Wei, Jinlin Xiao

In the first phase, based on human supervision, the quality of weak supervision is enhanced through a combination of scalable oversight and ensemble learning, reducing the capability gap between weak teachers and strong students.

Ensemble Learning In-Context Learning

Paper
Code

On Robust Cross-View Consistency in Self-Supervised Monocular Depth Estimation

1 code implementation • 19 Sep 2022 • Haimei Zhao, Jing Zhang, Zhuo Chen, Bo Yuan, DaCheng Tao

Compared with the photometric consistency loss as well as the rigid point cloud alignment loss, the proposed DFA and VDA losses are more robust owing to the strong representation power of deep features as well as the high tolerance of voxel density to the aforementioned challenges.

Monocular Depth Estimation

Paper
Code

Mutual Information Regularization for Weakly-supervised RGB-D Salient Object Detection

1 code implementation • 6 Jun 2023 • Aixuan Li, Yuxin Mao, Jing Zhang, Yuchao Dai

In particular, following the principle of disentangled representation learning, we introduce a mutual information upper bound with a mutual information minimization regularizer to encourage the disentangled representation of each modality for salient object detection.

Object object-detection +3

Paper
Code

CODE: Contrastive Pre-training with Adversarial Fine-tuning for Zero-shot Expert Linking

2 code implementations • 14 Dec 2020 • Bo Chen, Jing Zhang, Xiaokang Zhang, Xiaobin Tang, Lingfan Cai, Hong Chen, Cuiping Li, Peng Zhang, Jie Tang

In this paper, we propose CODE, which first pre-trains an expert linking model by contrastive learning on AMiner such that it can capture the representation and matching patterns of experts without supervised signals, then it is fine-tuned between AMiner and external sources to enhance the models transferability in an adversarial manner.

Active Learning Contrastive Learning +2

Paper
Code

ESceme: Vision-and-Language Navigation with Episodic Scene Memory

1 code implementation • 2 Mar 2023 • Qi Zheng, Daqing Liu, Chaoyue Wang, Jing Zhang, Dadong Wang, DaCheng Tao

Vision-and-language navigation (VLN) simulates a visual agent that follows natural-language navigation instructions in real-world scenes.

Vision and Language Navigation

Paper
Code

MPMQA: Multimodal Question Answering on Product Manuals

1 code implementation • 19 Apr 2023 • Liang Zhang, Anwen Hu, Jing Zhang, Shuo Hu, Qin Jin

Taking into account the length of product manuals and the fact that a question is always related to a small number of pages, MPMQA can be naturally split into two subtasks: retrieving most related pages and then generating multimodal answers.

Question Answering Sentence

Paper
Code

GETAM: Gradient-weighted Element-wise Transformer Attention Map for Weakly-supervised Semantic segmentation

1 code implementation • 6 Dec 2021 • Weixuan Sun, Jing Zhang, Zheyuan Liu, Yiran Zhong, Nick Barnes

To bridge their gap, a Class Activation Map (CAM) is usually generated to provide pixel level pseudo labels.

Ranked #19 on Weakly-Supervised Semantic Segmentation on PASCAL VOC 2012 test

Weakly supervised Semantic Segmentation Weakly-Supervised Semantic Segmentation

Paper
Code

Learning to Learn Better for Video Object Segmentation

1 code implementation • 5 Dec 2022 • Meng Lan, Jing Zhang, Lefei Zhang, DaCheng Tao

Recently, the joint learning framework (JOINT) integrates matching based transductive reasoning and online inductive learning to achieve accurate and robust semi-supervised video object segmentation (SVOS).

Object Semantic Segmentation +2

Paper
Code

Latent-based Diffusion Model for Long-tailed Recognition

1 code implementation • 6 Apr 2024 • Pengxiao Han, Changkun Ye, Jieming Zhou, Jing Zhang, Jie Hong, Xuesong Li

We propose a new approach, the Latent-based Diffusion Model for Long-tailed Recognition (LDMLR), as a feature augmentation method to tackle the issue.

Denoising Transfer Learning

Paper
Code

Wide-angle Image Rectification: A Survey

1 code implementation • 30 Oct 2020 • Jinlong Fan, Jing Zhang, Stephen J. Maybank, DaCheng Tao

In this paper, we comprehensively survey progress in wide-angle image rectification from transformation models to rectification methods.

3D Reconstruction Autonomous Driving

Paper
Code

RU-Net: Regularized Unrolling Network for Scene Graph Generation

1 code implementation • CVPR 2022 • Xin Lin, Changxing Ding, Jing Zhang, Yibing Zhan, DaCheng Tao

Scene graph generation (SGG) aims to detect objects and predict the relationships between each pair of objects.

Denoising Graph Generation +2

Paper
Code

Inferring the Class Conditional Response Map for Weakly Supervised Semantic Segmentation

1 code implementation • 27 Oct 2021 • Weixuan Sun, Jing Zhang, Nick Barnes

To solve this, most existing approaches follow a multi-training pipeline to refine CAMs for better pseudo-labels, which includes: 1) re-training the classification model to generate CAMs; 2) post-processing CAMs to obtain pseudo labels; and 3) training a semantic segmentation model with the obtained pseudo labels.

Ranked #22 on Weakly-Supervised Semantic Segmentation on PASCAL VOC 2012 test (using extra training data)

Segmentation Weakly supervised Semantic Segmentation +1

Paper
Code

Memory-Gated Recurrent Networks

1 code implementation • 24 Dec 2020 • Yaquan Zhang, Qi Wu, Nanbo Peng, Min Dai, Jing Zhang, Hu Wang

The essence of multivariate sequential learning is all about how to extract dependencies in data.

Time Series Time Series Analysis

Paper
Code

Chain of Thought Prompting Elicits Knowledge Augmentation

1 code implementation • 4 Jul 2023 • Dingjun Wu, Jing Zhang, Xinmei Huang

The knowledge-augmented deep learning paradigm refers to a paradigm in which domain knowledge is identified and integrated into deep models.

Retrieval

Paper
Code

Siamese Network with Interactive Transformer for Video Object Segmentation

1 code implementation • 28 Dec 2021 • Meng Lan, Jing Zhang, Fengxiang He, Lefei Zhang

Semi-supervised video object segmentation (VOS) refers to segmenting the target object in remaining frames given its annotation in the first frame, which has been actively studied in recent years.

Decoder Object +3

Paper
Code

From heavy rain removal to detail restoration: A faster and better network

1 code implementation • 7 May 2022 • Yuanbo Wen, Tao Gao, Jing Zhang, Kaihao Zhang, Ting Chen

This approach comprises two key modules, a rain streaks removal network (R$^2$Net) focusing on accurate rain removal, and a details reconstruction network (DRNet) designed to recover the textural details of rain-free images.

Rain Removal

Paper
Code

Energy-Based Residual Latent Transport for Unsupervised Point Cloud Completion

1 code implementation • 13 Nov 2022 • Ruikai Cui, Shi Qiu, Saeed Anwar, Jing Zhang, Nick Barnes

Unsupervised point cloud completion aims to infer the whole geometry of a partial object observation without requiring partial-complete correspondence.

Decoder Point Cloud Completion

Paper
Code

DA-STC: Domain Adaptive Video Semantic Segmentation via Spatio-Temporal Consistency

1 code implementation • 22 Nov 2023 • Zhe Zhang, Gaochang Wu, Jing Zhang, Chunhua Shen, DaCheng Tao, Tianyou Chai

To solve the challenge, we propose a novel DA-STC method for domain adaptive video semantic segmentation, which incorporates a bidirectional multi-level spatio-temporal fusion module and a category-aware spatio-temporal feature alignment module to facilitate consistent learning for domain-invariant features.

Representation Learning Segmentation +2

Paper
Code

Training A Small Emotional Vision Language Model for Visual Art Comprehension

1 code implementation • 17 Mar 2024 • Jing Zhang, Liang Zheng, Dan Guo, Meng Wang

This paper develops small vision language models to understand visual art, which, given an art work, aims to identify its emotion category and explain this prediction with natural language.

Language Modelling

Paper
Code

Energy-Based Generative Cooperative Saliency Prediction

1 code implementation • 25 Jun 2021 • Jing Zhang, Jianwen Xie, Zilong Zheng, Nick Barnes

In this paper, to model the uncertainty of visual saliency, we study the saliency prediction problem from the perspective of generative models by learning a conditional probability distribution over the saliency map given an input image, and treating the saliency prediction as a sampling process from the learned distribution.

Saliency Prediction

Paper
Code

Exemplar-free Class Incremental Learning via Discriminative and Comparable One-class Classifiers

1 code implementation • 5 Jan 2022 • Wenju Sun, Qingyong Li, Jing Zhang, Danyu Wang, Wen Wang, Yangli-ao Geng

DisCOIL follows the basic principle of POC, but it adopts variational auto-encoders (VAE) instead of other well-established one-class classifiers (e. g. deep SVDD), because a trained VAE can not only identify the probability of an input sample belonging to a class but also generate pseudo samples of the class to assist in learning new tasks.

Class Incremental Learning Incremental Learning +1

Paper
Code

Human-imperceptible, Machine-recognizable Images

1 code implementation • 6 Jun 2023 • Fusheng Hao, Fengxiang He, Yikai Wang, Fuxiang Wu, Jing Zhang, Jun Cheng, DaCheng Tao

Massive human-related data is collected to train neural networks for computer vision tasks.

Image Classification object-detection +2

Paper
Code

FHA-Kitchens: A Novel Dataset for Fine-Grained Hand Action Recognition in Kitchen Scenes

1 code implementation • 19 Jun 2023 • Ting Zhe, YongQian Li, Jing Zhang, Yong Luo, Han Hu, Bo Du, Yonggang Wen, DaCheng Tao

We represent the action information in each hand interaction region as a triplet, resulting in a total of 878 action triplets.

Action Recognition Domain Generalization +3

Paper
Code

APTv2: Benchmarking Animal Pose Estimation and Tracking with a Large-scale Dataset and Beyond

1 code implementation • 25 Dec 2023 • Yuxiang Yang, Yingqi Deng, Yufei Xu, Jing Zhang

Animal Pose Estimation and Tracking (APT) is a critical task in detecting and monitoring the keypoints of animals across a series of video frames, which is essential for understanding animal behavior.

Animal Pose Estimation Benchmarking +3

Paper
Code

Open-World Semi-Supervised Learning for Node Classification

1 code implementation • 18 Mar 2024 • Yanling Wang, Jing Zhang, Lingxi Zhang, Lixin Liu, Yuxiao Dong, Cuiping Li, Hong Chen, Hongzhi Yin

Open-world semi-supervised learning (Open-world SSL) for node classification, that classifies unlabeled nodes into seen classes or multiple novel classes, is a practical but under-explored problem in the graph community.

Classification Contrastive Learning +2

Paper
Code

TAVGBench: Benchmarking Text to Audible-Video Generation

1 code implementation • 22 Apr 2024 • Yuxin Mao, Xuyang Shen, Jing Zhang, Zhen Qin, Jinxing Zhou, Mochu Xiang, Yiran Zhong, Yuchao Dai

To support research in this field, we have developed a comprehensive Text to Audible-Video Generation Benchmark (TAVGBench), which contains over 1. 7 million clips with a total duration of 11. 8 thousand hours.

Benchmarking Contrastive Learning +1

Paper
Code

JarKA: Modeling Attribute Interactions for Cross-lingual Knowledge Alignment

1 code implementation • 29 Oct 2019 • Bo Chen, Jing Zhang, Xiaobin Tang, Hong Chen, Cuiping Li

Abstract.

Attribute

Paper
Code

Learning structure-aware semantic segmentation with image-level supervision

1 code implementation • 15 Apr 2021 • Jiawei Liu, Jing Zhang, Yicong Hong, Nick Barnes

Within this pipeline, the class activation map (CAM) is obtained and further processed to serve as a pseudo label to train the semantic segmentation model in a fully-supervised manner.

Boundary Detection Common Sense Reasoning +4

Paper
Code

Multi-grained Hypergraph Interest Modeling for Conversational Recommendation

1 code implementation • 4 May 2023 • Chenzhan Shang, Yupeng Hou, Wayne Xin Zhao, Yaliang Li, Jing Zhang

In our approach, we first employ the hypergraph structure to model users' historical dialogue sessions and form a session-based hypergraph, which captures coarse-grained, session-level relations.

Recommendation Systems

Paper
Code

Decoupling Learning and Remembering: A Bilevel Memory Framework With Knowledge Projection for Task-Incremental Learning

1 code implementation • CVPR 2023 • Wenju Sun, Qingyong Li, Jing Zhang, Wen Wang, Yangli-ao Geng

BMKP decouples the functions of learning and knowledge remembering via a bilevel-memory design: a working memory responsible for adaptively model learning, to ensure plasticity; a long-term memory in charge of enduringly storing the knowledge incorporated within the learned model, to guarantee stability.

Incremental Learning

Paper
Code

Latent Optimal Paths by Gumbel Propagation for Variational Bayesian Dynamic Programming

1 code implementation • 5 Jun 2023 • Xinlei Niu, Christian Walder, Jing Zhang, Charles Patrick Martin

We show the equivalence of the Gibbs distribution to a message-passing algorithm by the properties of the Gumbel distribution and give all the ingredients required for variational Bayesian inference of a latent path, namely Bayesian dynamic programming (BDP).

Bayesian Inference Singing Voice Synthesis

Paper
Code

Localizing Scan Targets from Human Pose for Autonomous Lung Ultrasound Imaging

1 code implementation • 15 Dec 2022 • Jianzhi Long, Jicang Cai, Abdullah Al-Battal, Shiwei Jin, Jing Zhang, DaCheng Tao, Truong Nguyen

Ultrasound is progressing toward becoming an affordable and versatile solution to medical imaging.

Pose Estimation

Paper
Code

Leverage Interactive Affinity for Affordance Learning

1 code implementation • CVPR 2023 • Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, DaCheng Tao

Perceiving potential "action possibilities" (i. e., affordance) regions of images and learning interactive functionalities of objects from human demonstration is a challenging task due to the diversity of human-object interactions.

Human-Object Interaction Detection Object

Paper
Code

Weakly-supervised Contrastive Learning for Unsupervised Object Discovery

1 code implementation • 7 Jul 2023 • Yunqiu Lv, Jing Zhang, Nick Barnes, Yuchao Dai

Unsupervised object discovery (UOD) refers to the task of discriminating the whole region of objects from the background within a scene without relying on labeled datasets, which benefits the task of bounding-box-level localization and pixel-level segmentation.

Contrastive Learning Image Reconstruction +4

Paper
Code

Transferable Attack for Semantic Segmentation

1 code implementation • 31 Jul 2023 • Mengqi He, Jing Zhang, Zhaoyuan Yang, Mingyi He, Nick Barnes, Yuchao Dai

We analysis performance of semantic segmentation models wrt.

Data Augmentation Segmentation +1

Paper
Code

IMPUS: Image Morphing with Perceptually-Uniform Sampling Using Diffusion Models

1 code implementation • 12 Nov 2023 • Zhaoyuan Yang, Zhengyang Yu, Zhiwei Xu, Jaskirat Singh, Jing Zhang, Dylan Campbell, Peter Tu, Richard Hartley

We present a diffusion-based image morphing approach with perceptually-uniform sampling (IMPUS) that produces smooth, direct and realistic interpolations given an image pair.

Image Generation Image Morphing

Paper
Code

SGSH: Stimulate Large Language Models with Skeleton Heuristics for Knowledge Base Question Generation

1 code implementation • 2 Apr 2024 • Shasha Guo, Lizi Liao, Jing Zhang, Yanling Wang, Cuiping Li, Hong Chen

Knowledge base question generation (KBQG) aims to generate natural language questions from a set of triplet facts extracted from KB.

Question Generation Question-Generation

Paper
Code

Registration of multi-view point sets under the perspective of expectation-maximization

1 code implementation • 18 Feb 2020 • Jihua Zhu, Jing Zhang, Huimin Lu, Zhongyu Li

Registration of multi-view point sets is a prerequisite for 3D model reconstruction.

Paper
Code

Constrained Policy Optimization with Explicit Behavior Density for Offline Reinforcement Learning

1 code implementation • NeurIPS 2023 • Jing Zhang, Chi Zhang, Wenjia Wang, Bing-Yi Jing

Due to the inability to interact with the environment, offline reinforcement learning (RL) methods face the challenge of estimating the Out-of-Distribution (OOD) points.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Rethinking Polyp Segmentation from an Out-of-Distribution Perspective

1 code implementation • 13 Jun 2023 • Ge-Peng Ji, Jing Zhang, Dylan Campbell, Huan Xiong, Nick Barnes

Unlike existing fully-supervised approaches, we rethink colorectal polyp segmentation from an out-of-distribution perspective with a simple but effective self-supervised learning approach.

Segmentation Self-Supervised Learning

Paper
Code

Neural Operators for Delay-Compensating Control of Hyperbolic PIDEs

1 code implementation • 21 Jul 2023 • Jie Qi, Jing Zhang, Miroslav Krstic

The recently introduced DeepONet operator-learning framework for PDE control is extended from the results for basic hyperbolic and parabolic PDEs to an advanced hyperbolic class that involves delays on both the state and the system output or input.

Operator learning

Paper
Code

Object-aware Adaptive-Positivity Learning for Audio-Visual Question Answering

1 code implementation • 20 Dec 2023 • Zhangbin Li, Dan Guo, Jinxing Zhou, Jing Zhang, Meng Wang

These selected pairs are constrained to have larger similarity values than the mismatched pairs.

Audio-visual Question Answering Audio-Visual Question Answering (AVQA) +4

Paper
Code

Human Keypoint Detection by Progressive Context Refinement

1 code implementation • 27 Oct 2019 • Jing Zhang, Zhe Chen, DaCheng Tao

Human keypoint detection from a single image is very challenging due to occlusion, blur, illumination and scale variance of person instances.

Human Detection Keypoint Detection +1

Paper
Code

Injecting Numerical Reasoning Skills into Knowledge Base Question Answering Models

1 code implementation • 12 Dec 2021 • Yu Feng, Jing Zhang, Xiaokang Zhang, Lemao Liu, Cuiping Li, Hong Chen

Embedding-based methods are popular for Knowledge Base Question Answering (KBQA), but few current models have numerical reasoning skills and thus struggle to answer ordinal constrained questions.

Data Augmentation Knowledge Base Question Answering

Paper
Code

Model Calibration in Dense Classification with Adaptive Label Perturbation

1 code implementation • ICCV 2023 • Jiawei Liu, Changkun Ye, Shan Wang, Ruikai Cui, Jing Zhang, Kaihao Zhang, Nick Barnes

To improve model calibration, we propose Adaptive Stochastic Label Perturbation (ASLP) which learns a unique label perturbation level for each training image.

Binary Classification Classification +1

Paper
Code

Distortion-aware Transformer in 360° Salient Object Detection

1 code implementation • 7 Aug 2023 • Yinjie Zhao, Lichen Zhao, Qian Yu, Jing Zhang, Lu Sheng, Dong Xu

The first is a Distortion Mapping Module, which guides the model to pre-adapt to distorted features globally.

ERP Object +3

Paper
Code

Automated Detection of Myopic Maculopathy in MMAC 2023: Achievements in Classification, Segmentation, and Spherical Equivalent Prediction

1 code implementation • 8 Jan 2024 • Yihao Li, Philippe Zhang, Yubo Tan, Jing Zhang, Zhihan Wang, Weili Jiang, Pierre-Henri Conze, Mathieu Lamard, Gwenolé Quellec, Mostafa El Habib Daho

As for Task 3 (prediction of spherical equivalent), we have designed a deep regression model based on the data distribution of the dataset and employed an integration strategy to enhance the model's prediction accuracy.

Classification Contrastive Learning +3

Paper
Code

Multi-Level Deep Cascade Trees for Conversion Rate Prediction in Recommendation System

no code implementations • 24 May 2018 • Hong Wen, Jing Zhang, Quan Lin, Keping Yang, Pipei Huang

The deep cascade structure and the combination rule enable the proposed \textit{ldcTree} to have a stronger distributed feature representation ability.

Click-Through Rate Prediction Ensemble Learning

Paper
Add Code

Deep Unsupervised Saliency Detection: A Multiple Noisy Labeling Perspective

no code implementations • CVPR 2018 • Jing Zhang, Tong Zhang, Yuchao Dai, Mehrtash Harandi, Richard Hartley

Such supervision, while labor-intensive and not always possible, tends to hinder the generalization ability of the learned models.

Benchmarking Saliency Prediction +1

Paper
Add Code

Unsupervised Domain Adaptation: A Multi-task Learning-based Method

no code implementations • 25 Mar 2018 • Jing Zhang, Wanqing Li, Philip Ogunbona

This paper presents a novel multi-task learning-based method for unsupervised domain adaptation.

Multi-Task Learning Unsupervised Domain Adaptation

Paper
Add Code

Fully Point-wise Convolutional Neural Network for Modeling Statistical Regularities in Natural Images

no code implementations • 19 Jan 2018 • Jing Zhang, Yang Cao, Yang Wang, Chenglin Wen, Chang Wen Chen

Specifically, we propose to randomly shuffle the pixels in the origin images and leverage the shuffled image as input to make CNN more concerned with the statistical properties.

Color Constancy Image Dehazing

Paper
Add Code

Unfolding Hidden Barriers by Active Enhanced Sampling

no code implementations • 21 May 2017 • Jing Zhang, Ming Chen

We introduce an active learning scheme that consists of a parametric CV learner based on deep neural network and a CV-based enhanced sampler.

Active Learning

Paper
Add Code

Deep Edge-Aware Saliency Detection

no code implementations • 15 Aug 2017 • Jing Zhang, Yuchao Dai, Fatih Porikli, Mingyi He

There has been profound progress in visual saliency thanks to the deep learning architectures, however, there still exist three major challenges that hinder the detection performance for scenes with complex compositions, multiple salient objects, and salient objects of diverse scales.

Descriptive Saliency Detection

Paper
Add Code

Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective

no code implementations • 11 May 2017 • Jing Zhang, Wanqing Li, Philip Ogunbona, Dong Xu

This paper takes a problem-oriented perspective and presents a comprehensive review of transfer learning methods, both shallow and deep, for cross-dataset visual recognition.

Transfer Learning

Paper
Add Code

Integrated Deep and Shallow Networks for Salient Object Detection

no code implementations • 2 Jun 2017 • Jing Zhang, Bo Li, Yuchao Dai, Fatih Porikli, Mingyi He

Then the results from deep FCNN and RBD are concatenated to feed into a shallow network to map the concatenated feature maps to saliency maps.

Object object-detection +3

Paper
Add Code

Joint Geometrical and Statistical Alignment for Visual Domain Adaptation

no code implementations • CVPR 2017 • Jing Zhang, Wanqing Li, Philip Ogunbona

This paper presents a novel unsupervised domain adaptation method for cross-domain visual recognition.

Ranked #5 on Domain Adaptation on Office-Caltech

Unsupervised Domain Adaptation

Paper
Add Code

Nighttime Haze Removal with Illumination Correction

no code implementations • 5 Jun 2016 • Jing Zhang, Yang Cao, Zengfu Wang

ii) Then it achieves a color-balance result by performing a color correction step after estimating the color characteristics of the incident light.

Paper
Add Code

RGB-D-based Action Recognition Datasets: A Survey

no code implementations • 21 Jan 2016 • Jing Zhang, Wanqing Li, Philip O. Ogunbona, Pichao Wang, Chang Tang

Human action recognition from RGB-D (Red, Green, Blue and Depth) data has attracted increasing attention since the first work reported in 2010.

Action Recognition Temporal Action Localization

Paper
Add Code

Deep Convolutional Neural Networks for Action Recognition Using Depth Map Sequences

no code implementations • 20 Jan 2015 • Pichao Wang, Wanqing Li, Zhimin Gao, Jing Zhang, Chang Tang, Philip Ogunbona

The results show that our approach can achieve state-of-the-art results on the individual datasets and without dramatical performance degradation on the Combined Dataset.

Action Recognition Temporal Action Localization

Paper
Add Code

Towards Practical Visual Search Engine within Elasticsearch

no code implementations • 23 Jun 2018 • Cun Mu, Jun Zhao, Guang Yang, Jing Zhang, Zheng Yan

In this paper, we describe our end-to-end content-based image retrieval system built upon Elasticsearch, a well-known and popular textual search engine.

Content-Based Image Retrieval Retrieval

Paper
Add Code

Robust Tracking via Weighted Online Extreme Learning Machine

no code implementations • 26 Jul 2018 • Jing Zhang, Huibing Wang, Yong-Gong Ren

Therefore, our tracking method can fully learn both of the target object and background information to enhance the tracking performance, and it is evaluated in 20 challenge image sequences with different attributes including illumination, occlusion, deformation, etc., which achieves better performance than several state-of-the-art methods in terms of effectiveness and robustness.

Classification General Classification +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.