Search Results for author: Jing Zhang

Found 337 papers, 190 papers with code

P-INT: A Path-based Interaction Model for Few-shot Knowledge Graph Completion

no code implementations • Findings (EMNLP) 2021 • Jingwen Xu, Jing Zhang, Xirui Ke, Yuxiao Dong, Hong Chen, Cuiping Li, Yongbin Liu

Its general process is to first encode the implicit relation of an entity pair and then match the relation of a query entity pair with the relations of the reference entity pairs.

Knowledge Graph Completion Relation

Paper
Add Code

Long-range Sequence Modeling with Predictable Sparse Attention

no code implementations • ACL 2022 • Yimeng Zhuang, Jing Zhang, Mei Tu

(2) A sparse attention matrix estimation module, which predicts dominant elements of an attention matrix based on the output of the previous hidden state cross module.

Math

Paper
Add Code

SRCB at SemEval-2022 Task 5: Pretraining Based Image to Text Late Sequential Fusion System for Multimodal Misogynous Meme Identification

no code implementations • SemEval (NAACL) 2022 • Jing Zhang, Yujin Wang

Online misogyny meme detection is an image/text multimodal classification task, the complicated relation of image and text challenges the intelligent system’s modality fusion learning capability.

Paper
Add Code

A Pretraining Numerical Reasoning Model for Ordinal Constrained Question Answering on Knowledge Base

no code implementations • Findings (EMNLP) 2021 • Yu Feng, Jing Zhang, Gaole He, Wayne Xin Zhao, Lemao Liu, Quan Liu, Cuiping Li, Hong Chen

Knowledge Base Question Answering (KBQA) is to answer natural language questions posed over knowledge bases (KBs).

Knowledge Base Question Answering

Paper
Add Code

HOSMEL: A Hot-Swappable Modularized Entity Linking Toolkit for Chinese

1 code implementation • ACL 2022 • Daniel Zhang-li, Jing Zhang, Jifan Yu, Xiaokang Zhang, Peng Zhang, Jie Tang, Juanzi Li

We investigate the usage of entity linking (EL)in downstream tasks and present the first modularized EL toolkit for easy task adaptation.

Entity Linking Question Answering

Paper
Code

LLMTune: Accelerate Database Knob Tuning with Large Language Models

no code implementations • 17 Apr 2024 • Xinmei Huang, Haoyang Li, Jing Zhang, Xinxin Zhao, Zhiming Yao, Yiyan Li, Zhuohao Yu, Tieying Zhang, Hong Chen, Cuiping Li

Database knob tuning is a critical challenge in the database community, aiming to optimize knob values to enhance database performance for specific workloads.

Language Modelling Large Language Model

Paper
Add Code

Transferable and Efficient Non-Factual Content Detection via Probe Training with Offline Consistency Checking

2 code implementations • 10 Apr 2024 • Xiaokang Zhang, Zijun Yao, Jing Zhang, Kaifeng Yun, Jifan Yu, Juanzi Li, Jie Tang

Detecting non-factual content is a longstanding goal to increase the trustworthiness of large language models (LLMs) generations.

Question Answering

131

Paper
Code

UniMix: Towards Domain Adaptive and Generalizable LiDAR Semantic Segmentation in Adverse Weather

no code implementations • 8 Apr 2024 • Haimei Zhao, Jing Zhang, Zhuo Chen, Shanshan Zhao, DaCheng Tao

We devote UniMix to two main setups: 1) unsupervised domain adaption, adapting the model from the clear weather source domain to the adverse weather target domain; 2) domain generalization, learning a model that generalizes well to unseen scenes in adverse weather.

Autonomous Driving Domain Generalization +2

Paper
Add Code

Latent-based Diffusion Model for Long-tailed Recognition

no code implementations • 6 Apr 2024 • Pengxiao Han, Changkun Ye, Jieming Zhou, Jing Zhang, Jie Hong, Xuesong Li

We propose a new approach, the Latent-based Diffusion Model for Long-tailed Recognition (LDMLR), as a feature augmentation method to tackle the issue.

Denoising Transfer Learning

Paper
Add Code

RaFE: Generative Radiance Fields Restoration

no code implementations • 4 Apr 2024 • Zhongkai Wu, Ziyu Wan, Jing Zhang, Jing Liao, Dong Xu

Instead of reconstructing a blurred NeRF by averaging inconsistencies, we introduce a novel approach using Generative Adversarial Networks (GANs) for NeRF generation to better accommodate the geometric and appearance inconsistencies present in the multi-view images.

3D Reconstruction Novel View Synthesis

Paper
Add Code

A Cause-Effect Look at Alleviating Hallucination of Knowledge-grounded Dialogue Generation

no code implementations • 4 Apr 2024 • Jifan Yu, Xiaohan Zhang, Yifan Xu, Xuanyu Lei, Zijun Yao, Jing Zhang, Lei Hou, Juanzi Li

Recently, knowledge-grounded dialogue generation models, that intentionally invoke external knowledge resources to more informative responses, are also proven to be effective in reducing hallucination.

counterfactual Counterfactual Reasoning +2

Paper
Add Code

SGSH: Stimulate Large Language Models with Skeleton Heuristics for Knowledge Base Question Generation

1 code implementation • 2 Apr 2024 • Shasha Guo, Lizi Liao, Jing Zhang, Yanling Wang, Cuiping Li, Hong Chen

Knowledge base question generation (KBQG) aims to generate natural language questions from a set of triplet facts extracted from KB.

Question Generation Question-Generation

Paper
Code

Compressing Large Language Models by Streamlining the Unimportant Layer

no code implementations • 28 Mar 2024 • Xiaodong Chen, Yuxuan Hu, Jing Zhang

Based on this phenomenon, we propose LLM-Streamline, which consists of two parts: layer pruning, where we remove a set of consecutive layers with the lowest importance in the model according to the target sparsity; and layer replacement, where we train a lightweight model to substitute the pruned layers, thereby mitigating the performance degradation caused by pruning.

Paper
Add Code

Enhanced Bayesian Personalized Ranking for Robust Hard Negative Sampling in Recommender Systems

no code implementations • 28 Mar 2024 • Kexin Shi, Jing Zhang, Linjiajie Fang, Wenjia Wang, BingYi Jing

In implicit collaborative filtering, hard negative mining techniques are developed to accelerate and enhance the recommendation model learning.

Collaborative Filtering Recommendation Systems

Paper
Add Code

TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios

1 code implementation • 28 Mar 2024 • Xiaokang Zhang, Jing Zhang, Zeyao Ma, Yang Li, Bohan Zhang, Guanlin Li, Zijun Yao, Kangli Xu, Jinchang Zhou, Daniel Zhang-li, Jifan Yu, Shu Zhao, Juanzi Li, Jie Tang

We introduce TableLLM, a robust large language model (LLM) with 13 billion parameters, purpose-built for proficiently handling tabular data manipulation tasks, whether they are embedded within documents or spreadsheets, catering to real-world office scenarios.

Language Modelling Large Language Model

Paper
Code

A Semi-supervised Nighttime Dehazing Baseline with Spatial-Frequency Aware and Realistic Brightness Constraint

1 code implementation • 27 Mar 2024 • Xiaofeng Cong, Jie Gui, Jing Zhang, JunMing Hou, Hao Shen

There are two distinctions between nighttime and daytime haze.

Image Dehazing Pseudo Label

Paper
Code

Contact-aware Human Motion Generation from Textual Descriptions

no code implementations • 23 Mar 2024 • Sihan Ma, Qiong Cao, Jing Zhang, DaCheng Tao

This paper addresses the problem of generating 3D interactive human motion from text.

Motion Synthesis

Paper
Add Code

Learning Gaussian Representation for Eye Fixation Prediction

no code implementations • 21 Mar 2024 • Peipei Song, Jing Zhang, Piotr Koniusz, Nick Barnes

Existing eye fixation prediction methods perform the mapping from input images to the corresponding dense fixation maps generated from raw fixation points.

Paper
Add Code

MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining

1 code implementation • 20 Mar 2024 • Di Wang, Jing Zhang, Minqiang Xu, Lin Liu, Dongsheng Wang, Erzhong Gao, Chengxi Han, HaoNan Guo, Bo Du, DaCheng Tao, Liangpei Zhang

However, transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks.

Ranked #1 on Semantic Segmentation on SpaceNet 1 (using extra training data)

Aerial Scene Classification Building change detection for remote sensing images +13

Paper
Code

LUWA Dataset: Learning Lithic Use-Wear Analysis on Microscopic Images

no code implementations • 19 Mar 2024 • Jing Zhang, Irving Fang, Juexiao Zhang, Hao Wu, Akshat Kaushik, Alice Rodriguez, Hanwen Zhao, Zhuo Zheng, Radu Iovita, Chen Feng

Most importantly, the LUWA dataset provides an underexplored opportunity for vision and learning communities and complements existing image classification problems on common objects.

Few-Shot Learning Image Classification

Paper
Add Code

Open-World Semi-Supervised Learning for Node Classification

1 code implementation • 18 Mar 2024 • Yanling Wang, Jing Zhang, Lingxi Zhang, Lixin Liu, Yuxiao Dong, Cuiping Li, Hong Chen, Hongzhi Yin

Open-world semi-supervised learning (Open-world SSL) for node classification, that classifies unlabeled nodes into seen classes or multiple novel classes, is a practical but under-explored problem in the graph community.

Classification Contrastive Learning +2

Paper
Code

Training A Small Emotional Vision Language Model for Visual Art Comprehension

1 code implementation • 17 Mar 2024 • Jing Zhang, Liang Zheng, Dan Guo, Meng Wang

This paper develops small vision language models to understand visual art, which, given an art work, aims to identify its emotion category and explain this prediction with natural language.

Language Modelling

Paper
Code

Reverse That Number! Decoding Order Matters in Arithmetic Learning

no code implementations • 9 Mar 2024 • Daniel Zhang-li, Nianyi Lin, Jifan Yu, Zheyuan Zhang, Zijun Yao, Xiaokang Zhang, Lei Hou, Jing Zhang, Juanzi Li

Recent advancements in pretraining have demonstrated that modern Large Language Models (LLMs) possess the capability to effectively learn arithmetic operations.

Paper
Add Code

CodeS: Towards Building Open-source Language Models for Text-to-SQL

1 code implementation • 26 Feb 2024 • Haoyang Li, Jing Zhang, Hanbing Liu, Ju Fan, Xiaokang Zhang, Jun Zhu, Renjie Wei, Hongyan Pan, Cuiping Li, Hong Chen

To address the limitations, we introduce CodeS, a series of pre-trained language models with parameters ranging from 1B to 15B, specifically designed for the text-to-SQL task.

Data Augmentation Domain Adaptation +2

Paper
Code

Question Calibration and Multi-Hop Modeling for Temporal Question Answering

no code implementations • 20 Feb 2024 • Chao Xue, Di Liang, Pengfei Wang, Jing Zhang

In the real world, many facts contained in KGs are time-constrained thus temporal KGQA has received increasing attention.

Knowledge Graphs Multi-hop Question Answering +1

Paper
Add Code

LogicPrpBank: A Corpus for Logical Implication and Equivalence

1 code implementation • 14 Feb 2024 • Zhexiong Liu, Jing Zhang, Jiaying Lu, Wenjing Ma, Joyce C Ho

Logic reasoning has been critically needed in problem-solving and decision-making.

Decision Making

Paper
Code

RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization

no code implementations • 8 Feb 2024 • Zhikai Li, Xuewen Liu, Jing Zhang, Qingyi Gu

In particular, for the former, we introduce a learnable per-channel dual clipping scheme, which is designed to efficiently identify outliers in the unbalanced activations with fine granularity.

Quantization

Paper
Add Code

BioDrone: A Bionic Drone-based Single Object Tracking Benchmark for Robust Vision

no code implementations • 7 Feb 2024 • Xin Zhao, Shiyu Hu, Yipei Wang, Jing Zhang, Yimin Hu, Rongshuai Liu, Haibin Ling, Yin Li, Renshu Li, Kun Liu, Jiadong Li

These challenges are especially manifested in videos captured by unmanned aerial vehicles (UAV), where the target is usually far away from the camera and often with significant motion relative to the camera.

Autonomous Driving Object Tracking +1

Paper
Add Code

Large Language Model for Table Processing: A Survey

no code implementations • 4 Feb 2024 • Weizheng Lu, Jiaming Zhang, Jing Zhang, Yueguo Chen

Tables, typically two-dimensional and structured to store large amounts of data, are essential in daily activities like database queries, spreadsheet calculations, and generating reports from web tables.

Fact Verification Language Modelling +2

Paper
Add Code

Are Synthetic Time-series Data Really not as Good as Real Data?

no code implementations • 1 Feb 2024 • Fanzhe Fu, Junru Chen, Jing Zhang, Carl Yang, Lvbin Ma, Yang Yang

Time-series data presents limitations stemming from data quality issues, bias and vulnerabilities, and generalization problem.

Representation Learning Time Series

Paper
Add Code

Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning

1 code implementation • 1 Feb 2024 • Jitao Sang, Yuhang Wang, Jing Zhang, Yanxu Zhu, Chao Kong, Junhong Ye, Shuyu Wei, Jinlin Xiao

In the first phase, based on human supervision, the quality of weak supervision is enhanced through a combination of scalable oversight and ensemble learning, reducing the capability gap between weak teachers and strong students.

Ensemble Learning In-Context Learning

Paper
Code

Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation

1 code implementation • 31 Jan 2024 • Maoyuan Ye, Jing Zhang, Juhua Liu, Chenyu Liu, BaoCai Yin, Cong Liu, Bo Du, DaCheng Tao

In terms of the AMG mode, Hi-SAM segments text stroke foreground masks initially, then samples foreground points for hierarchical text mask generation and achieves layout analysis in passing.

Ranked #1 on Hierarchical Text Segmentation on HierText

Hierarchical Text Segmentation Segmentation +1

117

Paper
Code

Data-Free Generalized Zero-Shot Learning

no code implementations • 28 Jan 2024 • Bowen Tang, Long Yan, Jing Zhang, Qian Yu, Lu Sheng, Dong Xu

Firstly, to recover the virtual features of the base data, we model the CLIP features of base class images as samples from a von Mises-Fisher (vMF) distribution based on the pre-trained classifier.

Generalized Zero-Shot Learning Zero-shot Generalization

Paper
Add Code

GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching

1 code implementation • 13 Jan 2024 • Haibin He, Maoyuan Ye, Jing Zhang, Juhua Liu, DaCheng Tao

In response to this issue, we propose to efficiently turn an off-the-shelf query-based image text spotter into a specialist on video and present a simple baseline termed GoMatching, which focuses the training efforts on tracking while maintaining strong recognition performance.

Text Detection Text Spotting

Paper
Code

Automated Detection of Myopic Maculopathy in MMAC 2023: Achievements in Classification, Segmentation, and Spherical Equivalent Prediction

1 code implementation • 8 Jan 2024 • Yihao Li, Philippe Zhang, Yubo Tan, Jing Zhang, Zhihan Wang, Weili Jiang, Pierre-Henri Conze, Mathieu Lamard, Gwenolé Quellec, Mostafa El Habib Daho

As for Task 3 (prediction of spherical equivalent), we have designed a deep regression model based on the data distribution of the dataset and employed an integration strategy to enhance the model's prediction accuracy.

Classification Contrastive Learning +3

Paper
Code

Robust single-particle cryo-EM image denoising and restoration

no code implementations • 2 Jan 2024 • Jing Zhang, Tengfei Zhao, Shiyu Hu, Xin Zhao

Cryo-electron microscopy (cryo-EM) has achieved near-atomic level resolution of biomolecules by reconstructing 2D micrographs.

Image Denoising

Paper
Add Code

SVGDreamer: Text Guided SVG Generation with Diffusion Model

1 code implementation • 27 Dec 2023 • XiMing Xing, Haitao Zhou, Chuang Wang, Jing Zhang, Dong Xu, Qian Yu

However, existing text-to-SVG generation methods lack editability and struggle with visual quality and result diversity.

Vector Graphics

Paper
Code

Multi-modality Affinity Inference for Weakly Supervised 3D Semantic Segmentation

1 code implementation • 27 Dec 2023 • Xiawei Li, Qingyuan Xu, Jing Zhang, Tianyi Zhang, Qian Yu, Lu Sheng, Dong Xu

The point affinity proposed in this paper is characterized by features from multiple modalities (e. g., point cloud and RGB), and is further refined by normalizing the classifier weights to alleviate the detrimental effects of long-tailed distribution without the need of the prior of category distribution.

3D Semantic Segmentation Point Cloud Segmentation +1

Paper
Code

APTv2: Benchmarking Animal Pose Estimation and Tracking with a Large-scale Dataset and Beyond

no code implementations • 25 Dec 2023 • Yuxiang Yang, Yingqi Deng, Yufei Xu, Jing Zhang

Animal Pose Estimation and Tracking (APT) is a critical task in detecting and monitoring the keypoints of animals across a series of video frames, which is essential for understanding animal behavior.

Animal Pose Estimation Benchmarking +3

Paper
Add Code

SurgicalPart-SAM: Part-to-Whole Collaborative Prompting for Surgical Instrument Segmentation

2 code implementations • 22 Dec 2023 • Wenxi Yue, Jing Zhang, Kun Hu, Qiuxia Wu, ZongYuan Ge, Yong Xia, Jiebo Luo, Zhiyong Wang

Specifically, we achieve this by proposing (1) Collaborative Prompts that describe instrument structures via collaborating category-level and part-level texts; (2) Cross-Modal Prompt Encoder that encodes text prompts jointly with visual embeddings into discriminative part-level representations; and (3) Part-to-Whole Adaptive Fusion and Hierarchical Decoding that adaptively fuse the part-level representations into a whole for accurate instrument segmentation in surgical scenarios.

Segmentation Semantic Segmentation

Paper
Code

Object-aware Adaptive-Positivity Learning for Audio-Visual Question Answering

1 code implementation • 20 Dec 2023 • Zhangbin Li, Dan Guo, Jinxing Zhou, Jing Zhang, Meng Wang

These selected pairs are constrained to have larger similarity values than the mismatched pairs.

Audio-visual Question Answering Audio-Visual Question Answering (AVQA) +4

Paper
Code

LaViP:Language-Grounded Visual Prompts

no code implementations • 18 Dec 2023 • Nilakshan Kunananthaseelan, Jing Zhang, Mehrtash Harandi

We introduce a language-grounded visual prompting method to adapt the visual encoder of vision-language models for downstream tasks.

Few-Shot Learning Transfer Learning +1

Paper
Add Code

Encoder-minimal and Decoder-minimal Framework for Remote Sensing Image Dehazing

1 code implementation • 13 Dec 2023 • Yuanbo Wen, Tao Gao, ZiQi Li, Jing Zhang, Ting Chen

Haze obscures remote sensing images, hindering valuable information extraction.

Image Dehazing

Paper
Code

AlignBench: Benchmarking Chinese Alignment of Large Language Models

1 code implementation • 30 Nov 2023 • Xiao Liu, Xuanyu Lei, Shengyuan Wang, Yue Huang, Zhuoer Feng, Bosi Wen, Jiale Cheng, Pei Ke, Yifan Xu, Weng Lam Tam, Xiaohan Zhang, Lichao Sun, Hongning Wang, Jing Zhang, Minlie Huang, Yuxiao Dong, Jie Tang

We will provide public APIs for evaluating AlignBench with CritiqueLLM to facilitate the evaluation of LLMs' Chinese alignment.

Benchmarking

188

Paper
Code

HandRefiner: Refining Malformed Hands in Generated Images by Diffusion-based Conditional Inpainting

1 code implementation • 29 Nov 2023 • Wenquan Lu, Yufei Xu, Jing Zhang, Chaoyue Wang, DaCheng Tao

Given a generated failed image due to malformed hands, we utilize ControlNet modules to re-inject such correct hand information.

629

Paper
Code

ArGue: Attribute-Guided Prompt Tuning for Vision-Language Models

no code implementations • 27 Nov 2023 • Xinyu Tian, Shu Zou, Zhaoyuan Yang, Jing Zhang

Although soft prompt tuning is effective in efficiently adapting Vision-Language (V&L) models for downstream tasks, it shows limitations in dealing with distribution shifts.

Attribute Out-of-Distribution Generalization

Paper
Add Code

Low-Complexity Joint Beamforming for RIS-Assisted MU-MISO Systems Based on Model-Driven Deep Learning

no code implementations • 26 Nov 2023 • Weijie Jin, Jing Zhang, Chao-Kai Wen, Shi Jin, Xiao Li, Shuangfeng Han

Reconfigurable intelligent surfaces (RIS) can improve signal propagation environments by adjusting the phase of the incident signal.

Stochastic Optimization

Paper
Add Code

DA-STC: Domain Adaptive Video Semantic Segmentation via Spatio-Temporal Consistency

1 code implementation • 22 Nov 2023 • Zhe Zhang, Gaochang Wu, Jing Zhang, Chunhua Shen, DaCheng Tao, Tianyou Chai

To solve the challenge, we propose a novel DA-STC method for domain adaptive video semantic segmentation, which incorporates a bidirectional multi-level spatio-temporal fusion module and a category-aware spatio-temporal feature alignment module to facilitate consistent learning for domain-invariant features.

Representation Learning Segmentation +2

Paper
Code

AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation

1 code implementation • 13 Nov 2023 • Junyang Wang, Yuhang Wang, Guohai Xu, Jing Zhang, Yukai Gu, Haitao Jia, Jiaqi Wang, Haiyang Xu, Ming Yan, Ji Zhang, Jitao Sang

Despite making significant progress in multi-modal tasks, current Multi-modal Large Language Models (MLLMs) encounter the significant challenge of hallucinations, which may lead to harmful consequences.

Attribute Hallucination +2

Paper
Code

IMPUS: Image Morphing with Perceptually-Uniform Sampling Using Diffusion Models

1 code implementation • 12 Nov 2023 • Zhaoyuan Yang, Zhengyang Yu, Zhiwei Xu, Jaskirat Singh, Jing Zhang, Dylan Campbell, Peter Tu, Richard Hartley

We present a diffusion-based image morphing approach with perceptually-uniform sampling (IMPUS) that produces smooth, direct and realistic interpolations given an image pair.

Image Generation Image Morphing

Paper
Code

PT-Tuning: Bridging the Gap between Time Series Masked Reconstruction and Forecasting via Prompt Token Tuning

no code implementations • 7 Nov 2023 • Hao liu, Jinrui Gan, Xiaoxuan Fan, Yi Zhang, Chuanxian Luo, Jing Zhang, Guangxin Jiang, Yucheng Qian, Changwei Zhao, Huan Ma, Zhenyu Guo

In this paper, we first point out that the unification of task objectives and adaptation for task difficulty are critical for bridging the gap between time series masked reconstruction and forecasting.

Representation Learning Self-Supervised Learning +1

Paper
Add Code

Multimodal Variational Auto-encoder based Audio-Visual Segmentation

1 code implementation • ICCV 2023 • Yuxin Mao, Jing Zhang, Mochu Xiang, Yiran Zhong, Yuchao Dai

To achieve this, our ECMVAE factorizes the representations of each modality with a modality-shared representation and a modality-specific representation.

Attribute Representation Learning

Paper
Code

RPEFlow: Multimodal Fusion of RGB-PointCloud-Event for Joint Optical Flow and Scene Flow Estimation

1 code implementation • ICCV 2023 • Zhexiong Wan, Yuxin Mao, Jing Zhang, Yuchao Dai

Recently, the RGB images and point clouds fusion methods have been proposed to jointly estimate 2D optical flow and 3D scene flow.

Optical Flow Estimation Scene Flow Estimation

Paper
Code

Decoding trust: A reinforcement learning perspective

no code implementations • 26 Sep 2023 • Guozhong Zheng, Jiqiang Zhang, Jing Zhang, Weiran Cai, Li Chen

In the pairwise scenario, we reveal that high levels of trust and trustworthiness emerge when individuals appreciate both their historical experience and returns in the future.

Decision Making Q-Learning +1

Paper
Add Code

Diversifying Question Generation over Knowledge Base via External Natural Questions

no code implementations • 23 Sep 2023 • Shasha Guo, Jing Zhang, Xirui Ke, Cuiping Li, Hong Chen

The above insights make diversifying question generation an intriguing task, where the first challenge is evaluation metrics for diversity.

Natural Questions Question Answering +2

Paper
Add Code

Multi-dimension Queried and Interacting Network for Stereo Image Deraining

1 code implementation • 19 Sep 2023 • Yuanbo Wen, Tao Gao, ZiQi Li, Jing Zhang, Ting Chen

This module leverages dimension-wise queries that are independent of the input features and employs global context-aware attention (GCA) to capture essential features while avoiding the entanglement of redundant or irrelevant information.

Rain Removal

Paper
Code

Decompose Semantic Shifts for Composed Image Retrieval

no code implementations • 18 Sep 2023 • Xingyu Yang, Daqing Liu, Heng Zhang, Yong Luo, Chaoyue Wang, Jing Zhang

Composed image retrieval is a type of image retrieval task where the user provides a reference image as a starting point and specifies a text on how to shift from the starting point to the desired target image.

Image Retrieval Retrieval

Paper
Add Code

Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter

1 code implementation • 6 Sep 2023 • Jinglong Wang, Xiawei Li, Jing Zhang, Qingyuan Xu, Qin Zhou, Qian Yu, Lu Sheng, Dong Xu

The pre-trained text-image discriminative models, such as CLIP, has been explored for open-vocabulary semantic segmentation with unsatisfactory results due to the loss of crucial localization information and awareness of object shapes.

Contrastive Learning Denoising +5

Paper
Code

BEVTrack: A Simple and Strong Baseline for 3D Single Object Tracking in Bird's-Eye View

1 code implementation • 5 Sep 2023 • Yuxiang Yang, Yingqi Deng, Jing Zhang, Jiahao Nie, Zheng-Jun Zha

The spatial information indicating objects' spatial adjacency across consecutive frames is crucial for effective object tracking.

3D Single Object Tracking Autonomous Driving +2

Paper
Code

Transformer Compression via Subspace Projection

no code implementations • 31 Aug 2023 • Yuxuan Hu, Jing Zhang, Chen Zhao, Cuiping Li, Hong Chen

By projecting the whole transform model into a subspace, we enable matrix operations between the weight matrices in the model and features in a reduced-dimensional space, leading to significant reductions in model parameters and computing resources.

Paper
Add Code

PartSeg: Few-shot Part Segmentation via Part-aware Prompt Learning

no code implementations • 24 Aug 2023 • Mengya Han, Heliang Zheng, Chaoyue Wang, Yong Luo, Han Hu, Jing Zhang, Yonggang Wen

In this work, we address the task of few-shot part segmentation, which aims to segment the different parts of an unseen object using very few labeled examples.

Language Modelling Segmentation

Paper
Add Code

SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation

1 code implementation • 17 Aug 2023 • Wenxi Yue, Jing Zhang, Kun Hu, Yong Xia, Jiebo Luo, Zhiyong Wang

However, we observe two problems with this naive pipeline: (1) the domain gap between natural objects and surgical instruments leads to inferior generalisation of SAM; and (2) SAM relies on precise point or box locations for accurate segmentation, requiring either extensive manual guidance or a well-performing specialist detector for prompt preparation, which leads to a complex multi-stage pipeline.

Image Segmentation Segmentation +1

Paper
Code

Gradient-Based Markov Chain Monte Carlo for MIMO Detection

no code implementations • 12 Aug 2023 • Xingyu Zhou, Le Liang, Jing Zhang, Chao-Kai Wen, Shi Jin

However, optimal MIMO detection is associated with a complexity that grows exponentially with the MIMO dimensions and quickly becomes impractical.

Bayesian Inference

Paper
Add Code

Distortion-aware Transformer in 360° Salient Object Detection

1 code implementation • 7 Aug 2023 • Yinjie Zhao, Lichen Zhao, Qian Yu, Jing Zhang, Lu Sheng, Dong Xu

The first is a Distortion Mapping Module, which guides the model to pre-adapt to distorted features globally.

ERP Object +3

Paper
Code

Transferable Attack for Semantic Segmentation

1 code implementation • 31 Jul 2023 • Mengqi He, Jing Zhang, Zhaoyuan Yang, Mingyi He, Nick Barnes, Yuchao Dai

We analysis performance of semantic segmentation models wrt.

Data Augmentation Segmentation +1

Paper
Code

Contrastive Conditional Latent Diffusion for Audio-visual Segmentation

no code implementations • 31 Jul 2023 • Yuxin Mao, Jing Zhang, Mochu Xiang, Yunqiu Lv, Yiran Zhong, Yuchao Dai

We propose a latent diffusion model with contrastive learning for audio-visual segmentation (AVS) to extensively explore the contribution of audio.

Contrastive Learning Denoising +2

Paper
Add Code

P2C: Self-Supervised Point Cloud Completion from Single Partial Clouds

1 code implementation • ICCV 2023 • Ruikai Cui, Shi Qiu, Saeed Anwar, Jiawei Liu, Chaoyue Xing, Jing Zhang, Nick Barnes

Point cloud completion aims to recover the complete shape based on a partial observation.

Point Cloud Completion

120

Paper
Code

ESSAformer: Efficient Transformer for Hyperspectral Image Super-resolution

1 code implementation • ICCV 2023 • Mingjin Zhang, Chi Zhang, Qiming Zhang, Jie Guo, Xinbo Gao, Jing Zhang

Single hyperspectral image super-resolution (single-HSI-SR) aims to restore a high-resolution hyperspectral image from a low-resolution observation.

Hyperspectral Image Super-Resolution Image Super-Resolution

Paper
Code

Model Calibration in Dense Classification with Adaptive Label Perturbation

1 code implementation • ICCV 2023 • Jiawei Liu, Changkun Ye, Shan Wang, Ruikai Cui, Jing Zhang, Kaihao Zhang, Nick Barnes

To improve model calibration, we propose Adaptive Stochastic Label Perturbation (ASLP) which learns a unique label perturbation level for each training image.

Binary Classification Classification +1

Paper
Code

Neural Operators for Delay-Compensating Control of Hyperbolic PIDEs

1 code implementation • 21 Jul 2023 • Jie Qi, Jing Zhang, Miroslav Krstic

The recently introduced DeepONet operator-learning framework for PDE control is extended from the results for basic hyperbolic and parabolic PDEs to an advanced hyperbolic class that involves delays on both the state and the system output or input.

Operator learning

Paper
Code

Measuring and Modeling Uncertainty Degree for Monocular Depth Estimation

no code implementations • 19 Jul 2023 • Mochu Xiang, Jing Zhang, Nick Barnes, Yuchao Dai

Effectively measuring and modeling the reliability of a trained model is essential to the real-world deployment of monocular depth estimation (MDE) models.

Monocular Depth Estimation

Paper
Add Code

Joint Salient Object Detection and Camouflaged Object Detection via Uncertainty-aware Learning

no code implementations • 10 Jul 2023 • Aixuan Li, Jing Zhang, Yunqiu Lv, Tong Zhang, Yiran Zhong, Mingyi He, Yuchao Dai

In this case, salient objects are typically non-camouflaged, and camouflaged objects are usually not salient.

Attribute Contrastive Learning +5

Paper
Add Code

Weakly-supervised Contrastive Learning for Unsupervised Object Discovery

1 code implementation • 7 Jul 2023 • Yunqiu Lv, Jing Zhang, Nick Barnes, Yuchao Dai

Unsupervised object discovery (UOD) refers to the task of discriminating the whole region of objects from the background within a scene without relying on labeled datasets, which benefits the task of bounding-box-level localization and pixel-level segmentation.

Contrastive Learning Image Reconstruction +4

Paper
Code

Probabilistic and Semantic Descriptions of Image Manifolds and Their Applications

no code implementations • 6 Jul 2023 • Peter Tu, Zhaoyuan Yang, Richard Hartley, Zhiwei Xu, Jing Zhang, Yiwei Fu, Dylan Campbell, Jaskirat Singh, Tianyu Wang

This paper begins with a description of methods for estimating image probability density functions that reflects the observation that such data is usually constrained to lie in restricted regions of the high-dimensional image space-not every pattern of pixels is an image.

Paper
Add Code

The KiTS21 Challenge: Automatic segmentation of kidneys, renal tumors, and renal cysts in corticomedullary-phase CT

1 code implementation • 5 Jul 2023 • Nicholas Heller, Fabian Isensee, Dasha Trofimova, Resha Tejpaul, Zhongchen Zhao, Huai Chen, Lisheng Wang, Alex Golts, Daniel Khapun, Daniel Shats, Yoel Shoshan, Flora Gilboa-Solomon, Yasmeen George, Xi Yang, Jianpeng Zhang, Jing Zhang, Yong Xia, Mengran Wu, Zhiyang Liu, Ed Walczak, Sean McSweeney, Ranveer Vasdev, Chris Hornung, Rafat Solaiman, Jamee Schoephoerster, Bailey Abernathy, David Wu, Safa Abdulkadir, Ben Byun, Justice Spriggs, Griffin Struyk, Alexandra Austin, Ben Simpson, Michael Hagstrom, Sierra Virnig, John French, Nitin Venkatesh, Sarah Chan, Keenan Moore, Anna Jacobsen, Susan Austin, Mark Austin, Subodh Regmi, Nikolaos Papanikolopoulos, Christopher Weight

Overall KiTS21 facilitated a significant advancement in the state of the art in kidney tumor segmentation, and provides useful insights that are applicable to the field of semantic segmentation as a whole.

Segmentation Tumor Segmentation

171

Paper
Code

Chain of Thought Prompting Elicits Knowledge Augmentation

1 code implementation • 4 Jul 2023 • Dingjun Wu, Jing Zhang, Xinmei Huang

The knowledge-augmented deep learning paradigm refers to a paradigm in which domain knowledge is identified and integrated into deep models.

Retrieval

Paper
Code

RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation

1 code implementation • 3 Jul 2023 • Yonglin Li, Jing Zhang, Xiao Teng, Long Lan

However, it lacks proficiency in referring video object segmentation (RVOS) due to the need for precise user-interactive prompts and a limited understanding of different modalities, such as language and vision.

Image Segmentation Referring Expression +4

Paper
Code

GraMMaR: Ground-aware Motion Model for 3D Human Motion Reconstruction

1 code implementation • 29 Jun 2023 • Sihan Ma, Qiong Cao, Hongwei Yi, Jing Zhang, DaCheng Tao

Demystifying complex human-ground interactions is essential for accurate and realistic 3D human motion reconstruction from RGB videos, as it ensures consistency between the humans and the ground plane.

Paper
Code

FC-KBQA: A Fine-to-Coarse Composition Framework for Knowledge Base Question Answering

no code implementations • 26 Jun 2023 • Lingxi Zhang, Jing Zhang, Yanling Wang, Shulin Cao, Xinmei Huang, Cuiping Li, Hong Chen, Juanzi Li

The generalization problem on KBQA has drawn considerable attention.

Knowledge Base Question Answering

Paper
Add Code

DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models

1 code implementation • NeurIPS 2023 • XiMing Xing, Chuang Wang, Haitao Zhou, Jing Zhang, Qian Yu, Dong Xu

Even though trained mainly on images, we discover that pretrained diffusion models show impressive power in guiding sketch synthesis.

193

Paper
Code

FHA-Kitchens: A Novel Dataset for Fine-Grained Hand Action Recognition in Kitchen Scenes

1 code implementation • 19 Jun 2023 • Ting Zhe, YongQian Li, Jing Zhang, Yong Luo, Han Hu, Bo Du, Yonggang Wen, DaCheng Tao

We represent the action information in each hand interaction region as a triplet, resulting in a total of 878 action triplets.

Action Recognition Domain Generalization +3

Paper
Code

Rethinking Polyp Segmentation from an Out-of-Distribution Perspective

1 code implementation • 13 Jun 2023 • Ge-Peng Ji, Jing Zhang, Dylan Campbell, Huan Xiong, Nick Barnes

Unlike existing fully-supervised approaches, we rethink colorectal polyp segmentation from an out-of-distribution perspective with a simple but effective self-supervised learning approach.

Segmentation Self-Supervised Learning

Paper
Code

Human-imperceptible, Machine-recognizable Images

1 code implementation • 6 Jun 2023 • Fusheng Hao, Fengxiang He, Yikai Wang, Fuxiang Wu, Jing Zhang, Jun Cheng, DaCheng Tao

Massive human-related data is collected to train neural networks for computer vision tasks.

Image Classification object-detection +2

Paper
Code

Mutual Information Regularization for Weakly-supervised RGB-D Salient Object Detection

1 code implementation • 6 Jun 2023 • Aixuan Li, Yuxin Mao, Jing Zhang, Yuchao Dai

In particular, following the principle of disentangled representation learning, we introduce a mutual information upper bound with a mutual information minimization regularizer to encourage the disentangled representation of each modality for salient object detection.

Object object-detection +3

Paper
Code

Latent Optimal Paths by Gumbel Propagation for Variational Bayesian Dynamic Programming

1 code implementation • 5 Jun 2023 • Xinlei Niu, Christian Walder, Jing Zhang, Charles Patrick Martin

We propose a unified approach to obtain structured sparse optimal paths in the latent space of a variational autoencoder (VAE) using dynamic programming and Gumbel propagation.

Bayesian Inference Singing Voice Synthesis

Paper
Code

DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting

2 code implementations • 31 May 2023 • Maoyuan Ye, Jing Zhang, Shanshan Zhao, Juhua Liu, Tongliang Liu, Bo Du, DaCheng Tao

In this paper, we present DeepSolo++, a simple DETR-like baseline that lets a single decoder with explicit points solo for text detection, recognition, and script identification simultaneously.

Ranked #1 on Text Spotting on Inverse-Text

Scene Text Detection Text Detection +1

222

Paper
Code

MGL2Rank: Learning to Rank the Importance of Nodes in Road Networks Based on Multi-Graph Fusion

no code implementations • 20 May 2023 • Ming Xu, Jing Zhang

In this framework, we first develop an embedding module that contains a sampling algorithm (MGWalk) and an encoder network to learn latent representation for each road segment.

Graph Learning Learning-To-Rank

Paper
Add Code

Multi-grained Hypergraph Interest Modeling for Conversational Recommendation

1 code implementation • 4 May 2023 • Chenzhan Shang, Yupeng Hou, Wayne Xin Zhao, Yaliang Li, Jing Zhang

In our approach, we first employ the hypergraph structure to model users' historical dialogue sessions and form a session-based hypergraph, which captures coarse-grained, session-level relations.

Recommendation Systems

Paper
Code

Revolutionizing Agrifood Systems with Artificial Intelligence: A Survey

no code implementations • 3 May 2023 • Tao Chen, Liang Lv, Di Wang, Jing Zhang, Yue Yang, Zeyang Zhao, Chen Wang, Xiaowei Guo, Hao Chen, Qingye Wang, Yufei Xu, Qiming Zhang, Bo Du, Liangpei Zhang, DaCheng Tao

With the world population rapidly increasing, transforming our agrifood systems to be more productive, efficient, safe, and sustainable is crucial to mitigate potential food shortages.

Paper
Add Code

SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model

2 code implementations • NeurIPS 2023 • Di Wang, Jing Zhang, Bo Du, Minqiang Xu, Lin Liu, DaCheng Tao, Liangpei Zhang

In this study, we leverage SAM and existing RS object detection datasets to develop an efficient pipeline for generating a large-scale RS segmentation dataset, dubbed SAMRS.

Instance Segmentation Object +4

629

Paper
Code

Scalable Mask Annotation for Video Text Spotting

1 code implementation • 2 May 2023 • Haibin He, Jing Zhang, Mengyang Xu, Juhua Liu, Bo Du, DaCheng Tao

Video text spotting refers to localizing, recognizing, and tracking textual elements such as captions, logos, license plates, signs, and other forms of text within consecutive video frames.

Text Spotting

Paper
Code

OSP2B: One-Stage Point-to-Box Network for 3D Siamese Tracking

2 code implementations • 23 Apr 2023 • Jiahao Nie, Zhiwei He, Yuxiang Yang, Zhengyi Bao, Mingyu Gao, Jing Zhang

By integrating the derived classification scores with the center-ness scores, the resulting network can effectively suppress interference proposals and further mitigate task misalignment.

3D Single Object Tracking Object Tracking

Paper
Code

MPMQA: Multimodal Question Answering on Product Manuals

1 code implementation • 19 Apr 2023 • Liang Zhang, Anwen Hu, Jing Zhang, Shuo Hu, Qin Jin

Taking into account the length of product manuals and the fact that a question is always related to a small number of pages, MPMQA can be naturally split into two subtasks: retrieving most related pages and then generating multimodal answers.

Question Answering Sentence

Paper
Code

Event-based Simultaneous Localization and Mapping: A Comprehensive Survey

1 code implementation • 19 Apr 2023 • Kunping Huang, Sen Zhang, Jing Zhang, DaCheng Tao

This paper presents a timely and comprehensive review of event-based vSLAM algorithms that exploit the benefits of asynchronous and irregular event streams for localization and mapping tasks.

Motion Compensation Simultaneous Localization and Mapping

Paper
Code

DCN-T: Dual Context Network with Transformer for Hyperspectral Image Classification

2 code implementations • 19 Apr 2023 • Di Wang, Jing Zhang, Bo Du, Liangpei Zhang, DaCheng Tao

Hyperspectral image (HSI) classification is challenging due to spatial variability caused by complex imaging conditions.

Hyperspectral Image Classification Image Generation

Paper
Code

Cold-Start based Multi-Scenario Ranking Model for Click-Through Rate Prediction

no code implementations • 16 Apr 2023 • Peilin Chen, Hong Wen, Jing Zhang, Fuyu Lv, Zhao Li, Qijie Shen, Wanjie Tao, Ying Zhou, Chao Zhang

Online travel platforms (OTPs), e. g., Ctrip. com or Fliggy. com, can effectively provide travel-related products or services to users.

Click-Through Rate Prediction

Paper
Add Code

UVA: Towards Unified Volumetric Avatar for View Synthesis, Pose rendering, Geometry and Texture Editing

no code implementations • 14 Apr 2023 • Jinlong Fan, Jing Zhang, DaCheng Tao

Experiments on multiple human avatars demonstrate that our UVA achieves competitive results in novel view synthesis and novel pose rendering while enabling local and independent editing of geometry and appearance.

Novel View Synthesis

Paper
Add Code

Deep Image Matting: A Comprehensive Survey

1 code implementation • 10 Apr 2023 • Jizhizi Li, Jing Zhang, DaCheng Tao

Image matting refers to extracting precise alpha matte from natural images, and it plays a critical role in various downstream applications, such as image editing.

Image Matting Referring Image Matting

152

Paper
Code

Hierarchically Fusing Long and Short-Term User Interests for Click-Through Rate Prediction in Product Search

no code implementations • 4 Apr 2023 • Qijie Shen, Hong Wen, Jing Zhang, Qi Rao

Specifically, SIE is proposed to extract user's short-term interests by integrating three fundamental interests encoders within it namely query-dependent, target-dependent and causal-dependent interest encoder, respectively, followed by delivering the resultant representation to the module LIE, where it can effectively capture user long-term interests by devising an attention mechanism with respect to the short-term interests from SIE module.

Click-Through Rate Prediction Disentanglement

Paper
Add Code

GLT-T++: Global-Local Transformer for 3D Siamese Tracking with Ranking Loss

1 code implementation • 1 Apr 2023 • Jiahao Nie, Zhiwei He, Yuxiang Yang, Xudong Lv, Mingyu Gao, Jing Zhang

Incorporating this transformer-based voting scheme into 3D RPN, a novel Siamese method dubbed GLT-T is developed for 3D single object tracking on point clouds.

3D Single Object Tracking Object Tracking +1

Paper
Code

SimDistill: Simulated Multi-modal Distillation for BEV 3D Object Detection

2 code implementations • 29 Mar 2023 • Haimei Zhao, Qiming Zhang, Shanshan Zhao, Zhe Chen, Jing Zhang, DaCheng Tao

Multi-view camera-based 3D object detection has become popular due to its low cost, but accurately inferring 3D geometry solely from camera data remains challenging and may lead to inferior performance.

3D Object Detection Knowledge Distillation +1

Paper
Code

Vision Transformer with Quadrangle Attention

1 code implementation • 27 Mar 2023 • Qiming Zhang, Jing Zhang, Yufei Xu, DaCheng Tao

Window-based attention has become a popular choice in vision transformers due to its superior performance, lower computational complexity, and less memory footprint.

object-detection Object Detection +2

117

Paper
Code

LPFF: A Portrait Dataset for Face Generators Across Large Poses

no code implementations • ICCV 2023 • Yiqian Wu, Jing Zhang, Hongbo Fu, Xiaogang Jin

To better validate our pose-conditional 3D-aware generators, we develop a new FID measure to evaluate the 3D-level performance.

3D Reconstruction

Paper
Add Code

A Survey on Class Imbalance in Federated Learning

no code implementations • 21 Mar 2023 • Jing Zhang, Chuanwen Li, Jianzgong Qi, Jiayuan He

We first introduce various types of class imbalance in federated learning, after which we review existing methods for estimating the extent of class imbalance without the need of knowing the actual data to preserve data privacy.

Federated Learning

Paper
Add Code

Deep Learning for Camera Calibration and Beyond: A Survey

1 code implementation • 19 Mar 2023 • Kang Liao, Lang Nie, Shujuan Huang, Chunyu Lin, Jing Zhang, Yao Zhao, Moncef Gabbouj, DaCheng Tao

In this paper, we provide a comprehensive survey of learning-based camera calibration techniques, by analyzing their strengths and limitations.

Camera Calibration

392

Paper
Code

Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning

1 code implementation • ICCV 2023 • Haoyu He, Jianfei Cai, Jing Zhang, DaCheng Tao, Bohan Zhuang

Visual Parameter-Efficient Fine-Tuning (PEFT) has become a powerful alternative for full fine-tuning so as to adapt pre-trained vision models to downstream tasks, which only tunes a small number of parameters while freezing the vast majority ones to ease storage burden and optimization difficulty.

Paper
Code

ESceme: Vision-and-Language Navigation with Episodic Scene Memory

1 code implementation • 2 Mar 2023 • Qi Zheng, Daqing Liu, Chaoyue Wang, Jing Zhang, Dadong Wang, DaCheng Tao

Vision-and-language navigation (VLN) simulates a visual agent that follows natural-language navigation instructions in real-world scenes.

Vision and Language Navigation

Paper
Code

Transmission-Guided Bayesian Generative Model for Smoke Segmentation

1 code implementation • 2 Mar 2023 • Siyuan Yan, Jing Zhang, Nick Barnes

To effectively model the two types of uncertainty, we introduce a Bayesian generative model to simultaneously estimate the posterior distribution of model parameters and its predictions.

Image Dehazing Image Segmentation +2

Paper
Code

OmniForce: On Human-Centered, Large Model Empowered and Cloud-Edge Collaborative AutoML System

no code implementations • 1 Mar 2023 • Chao Xue, Wei Liu, Shuai Xie, Zhenfang Wang, Jiaxing Li, Xuyang Peng, Liang Ding, Shanshan Zhao, Qiong Cao, Yibo Yang, Fengxiang He, Bohua Cai, Rongcheng Bian, Yiyan Zhao, Heliang Zheng, Xiangyang Liu, Dongkai Liu, Daqing Liu, Li Shen, Chang Li, Shijin Zhang, Yukang Zhang, Guanpu Chen, Shixiang Chen, Yibing Zhan, Jing Zhang, Chaoyue Wang, DaCheng Tao

Automated machine learning (AutoML) seeks to build ML models with minimal human effort.

AutoML

Paper
Add Code

GLM-Dialog: Noise-tolerant Pre-training for Knowledge-grounded Dialogue Generation

1 code implementation • 28 Feb 2023 • Jing Zhang, Xiaokang Zhang, Daniel Zhang-li, Jifan Yu, Zijun Yao, Zeyao Ma, Yiqi Xu, Haohua Wang, Xiaohan Zhang, Nianyi Lin, Sunrui Lu, Juanzi Li, Jie Tang

We present GLM-Dialog, a large-scale language model (LLM) with 10B parameters capable of knowledge-grounded conversation in Chinese using a search engine to access the Internet knowledge.

Dialogue Evaluation Dialogue Generation +2

Paper
Code

Dual Path Modeling for Semantic Matching by Perceiving Subtle Conflicts

no code implementations • 24 Feb 2023 • Chao Xue, Di Liang, Sirui Wang, Wei Wu, Jing Zhang

To alleviate this problem, we propose a novel Dual Path Modeling Framework to enhance the model's ability to perceive subtle differences in sentence pairs by separately modeling affinity and difference semantics.

Sentence

Paper
Add Code

Web-Scale Academic Name Disambiguation: the WhoIsWho Benchmark, Leaderboard, and Toolkit

1 code implementation • 23 Feb 2023 • Bo Chen, Jing Zhang, Fanjin Zhang, Tianyi Han, Yuqing Cheng, Xiaoyan Li, Yuxiao Dong, Jie Tang

The toolkit is at https://github. com/THUDM/WhoIsWho.

Data Integration

Paper
Code

RESDSQL: Decoupling Schema Linking and Skeleton Parsing for Text-to-SQL

1 code implementation • 12 Feb 2023 • Haoyang Li, Jing Zhang, Cuiping Li, Hong Chen

Due to the structural property of the SQL queries, the seq2seq model takes the responsibility of parsing both the schema items (i. e., tables and columns) and the skeleton (i. e., SQL keywords).

Ranked #1 on Semantic Parsing on spider

Language Modelling Semantic Parsing +2

211

Paper
Code

Feature Decomposition for Reducing Negative Transfer: A Novel Multi-task Learning Method for Recommender System

1 code implementation • 10 Feb 2023 • Jie zhou, Qian Yu, Chuan Luo, Jing Zhang

In recent years, thanks to the rapid development of deep learning (DL), DL-based multi-task learning (MTL) has made significant progress, and it has been successfully applied to recommendation systems (RS).

Multi-Task Learning Recommendation Systems

Paper
Code

AniPixel: Towards Animatable Pixel-Aligned Human Avatar

no code implementations • 7 Feb 2023 • Jinlong Fan, Jing Zhang, Zhi Hou, DaCheng Tao

In this paper, we propose AniPixel, a novel animatable and generalizable human avatar reconstruction method that leverages pixel-aligned features for body geometry prediction and RGB color blending.

3D Scene Reconstruction

Paper
Add Code

Audio-Visual Segmentation with Semantics

1 code implementation • 30 Jan 2023 • Jinxing Zhou, Xuyang Shen, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, Jing Zhang, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng Wang, Yiran Zhong

To deal with these problems, we propose a new baseline method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

Segmentation Semantic Segmentation +1

430

Paper
Code

Constrained Policy Optimization with Explicit Behavior Density for Offline Reinforcement Learning

1 code implementation • NeurIPS 2023 • Jing Zhang, Chi Zhang, Wenjia Wang, Bing-Yi Jing

Due to the inability to interact with the environment, offline reinforcement learning (RL) methods face the challenge of estimating the Out-of-Distribution (OOD) points.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends

1 code implementation • 13 Jan 2023 • Jie Gui, Tuo Chen, Jing Zhang, Qiong Cao, Zhenan Sun, Hao Luo, DaCheng Tao

Deep supervised learning algorithms typically require a large volume of labeled data to achieve satisfactory performance.

Self-Supervised Learning

Paper
Code

Decoupling Learning and Remembering: A Bilevel Memory Framework With Knowledge Projection for Task-Incremental Learning

1 code implementation • CVPR 2023 • Wenju Sun, Qingyong Li, Jing Zhang, Wen Wang, Yangli-ao Geng

BMKP decouples the functions of learning and knowledge remembering via a bilevel-memory design: a working memory responsible for adaptively model learning, to ensure plasticity; a long-term memory in charge of enduringly storing the knowledge incorporated within the learned model, to guarantee stability.

Incremental Learning

Paper
Code

Domain Specified Optimization for Deployment Authorization

no code implementations • ICCV 2023 • Haotian Wang, Haoang Chi, Wenjing Yang, Zhipeng Lin, Mingyang Geng, Long Lan, Jing Zhang, DaCheng Tao

As a complementary of SDPA, we also propose Target-Combined Deployment Authorization (TPDA), where unauthorized domains are partially accessible, and simplify the DSO method to a perturbation operation on the pseudo predictions, referred to as Target-Dependent Domain-Specified Optimization (TDSO).

Paper
Add Code

Leverage Interactive Affinity for Affordance Learning

1 code implementation • CVPR 2023 • Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, DaCheng Tao

Perceiving potential "action possibilities" (i. e., affordance) regions of images and learning interactive functionalities of objects from human demonstration is a challenging task due to the diversity of human-object interactions.

Human-Object Interaction Detection Object

Paper
Code

Modeling the Distributional Uncertainty for Salient Object Detection Models

no code implementations • CVPR 2023 • Xinyu Tian, Jing Zhang, Mochu Xiang, Yuchao Dai

Most of the existing salient object detection (SOD) models focus on improving the overall model performance, without explicitly explaining the discrepancy between the training and testing distributions.

Long-tail Learning Object +3

Paper
Add Code

Localizing Scan Targets from Human Pose for Autonomous Lung Ultrasound Imaging

1 code implementation • 15 Dec 2022 • Jianzhi Long, Jicang Cai, Abdullah Al-Battal, Shiwei Jin, Jing Zhang, DaCheng Tao, Truong Nguyen

Ultrasound is progressing toward becoming an affordable and versatile solution to medical imaging.

Pose Estimation

Paper
Code

Joint Spatio-Temporal Modeling for the Semantic Change Detection in Remote Sensing Images

1 code implementation • 10 Dec 2022 • Lei Ding, Jing Zhang, Kai Zhang, Haitao Guo, Bing Liu, Lorenzo Bruzzone

Semantic Change Detection (SCD) refers to the task of simultaneously extracting the changed areas and the semantic categories (before and after the changes) in Remote Sensing Images (RSIs).

Ranked #1 on Change Detection on SECOND

Change Detection

Paper
Code

ViTPose++: Vision Transformer for Generic Body Pose Estimation

1 code implementation • 7 Dec 2022 • Yufei Xu, Jing Zhang, Qiming Zhang, DaCheng Tao

In this paper, we show the surprisingly good properties of plain vision transformers for body pose estimation from various aspects, namely simplicity in model structure, scalability in model size, flexibility in training paradigm, and transferability of knowledge between models, through a simple baseline model dubbed ViTPose.

Ranked #1 on Animal Pose Estimation on AP-10K (using extra training data)

2D Human Pose Estimation Animal Pose Estimation +1

1,162

Paper
Code

Learning to Learn Better for Video Object Segmentation

1 code implementation • 5 Dec 2022 • Meng Lan, Jing Zhang, Lefei Zhang, DaCheng Tao

Recently, the joint learning framework (JOINT) integrates matching based transductive reasoning and online inductive learning to achieve accurate and robust semi-supervised video object segmentation (SVOS).

Object Semantic Segmentation +2

Paper
Code

1st Workshop on Maritime Computer Vision (MaCVi) 2023: Challenge Results

no code implementations • 24 Nov 2022 • Benjamin Kiefer, Matej Kristan, Janez Perš, Lojze Žust, Fabio Poiesi, Fabio Augusto de Alcantara Andrade, Alexandre Bernardino, Matthew Dawkins, Jenni Raitoharju, Yitong Quan, Adem Atmaca, Timon Höfer, Qiming Zhang, Yufei Xu, Jing Zhang, DaCheng Tao, Lars Sommer, Raphael Spraul, Hangyue Zhao, Hongpu Zhang, Yanyun Zhao, Jan Lukas Augustin, Eui-ik Jeon, Impyeong Lee, Luca Zedda, Andrea Loddo, Cecilia Di Ruberto, Sagar Verma, Siddharth Gupta, Shishir Muralidhara, Niharika Hegde, Daitao Xing, Nikolaos Evangeliou, Anthony Tzes, Vojtěch Bartl, Jakub Špaňhel, Adam Herout, Neelanjan Bhowmik, Toby P. Breckon, Shivanand Kundargi, Tejas Anvekar, Chaitra Desai, Ramesh Ashok Tabib, Uma Mudengudi, Arpita Vats, Yang song, Delong Liu, Yonglin Li, Shuman Li, Chenhao Tan, Long Lan, Vladimir Somers, Christophe De Vleeschouwer, Alexandre Alahi, Hsiang-Wei Huang, Cheng-Yen Yang, Jenq-Neng Hwang, Pyong-Kun Kim, Kwangju Kim, Kyoungoh Lee, Shuai Jiang, Haiwen Li, Zheng Ziqiang, Tuan-Anh Vu, Hai Nguyen-Truong, Sai-Kit Yeung, Zhuang Jia, Sophia Yang, Chih-Chung Hsu, Xiu-Yu Hou, Yu-An Jhang, Simon Yang, Mau-Tsuen Yang

The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection.

Object object-detection +2

Paper
Add Code

GLT-T: Global-Local Transformer Voting for 3D Single Object Tracking in Point Clouds

2 code implementations • 20 Nov 2022 • Jiahao Nie, Zhiwei He, Yuxiang Yang, Mingyu Gao, Jing Zhang

Technically, a global-local transformer (GLT) module is employed to integrate object- and patch-aware prior into seed point features to effectively form strong feature representation for geometric positions of the seed points, thus providing more robust and accurate cues for offset learning.

3D Single Object Tracking Object Tracking +1

Paper
Code

DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting

2 code implementations • CVPR 2023 • Maoyuan Ye, Jing Zhang, Shanshan Zhao, Juhua Liu, Tongliang Liu, Bo Du, DaCheng Tao

In this paper, we present DeepSolo, a simple DETR-like baseline that lets a single Decoder with Explicit Points Solo for text detection and recognition simultaneously.

Ranked #1 on Text Spotting on Total-Text (using extra training data)

Scene Text Detection Text Detection +2

222

Paper
Code

Energy-Based Residual Latent Transport for Unsupervised Point Cloud Completion

1 code implementation • 13 Nov 2022 • Ruikai Cui, Shi Qiu, Saeed Anwar, Jing Zhang, Nick Barnes

Unsupervised point cloud completion aims to infer the whole geometry of a partial object observation without requiring partial-complete correspondence.

Point Cloud Completion

Paper
Code

Unifying Flow, Stereo and Depth Estimation

1 code implementation • 10 Nov 2022 • Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, Fisher Yu, DaCheng Tao, Andreas Geiger

We present a unified formulation and model for three motion and 3D perception tasks: optical flow, rectified stereo matching and unrectified stereo depth estimation from posed images.

Ranked #1 on Optical Flow Estimation on Sintel-clean

Optical Flow Estimation Stereo Depth Estimation +1

883

Paper
Code

Rethinking Hierarchies in Pre-trained Plain Vision Transformer

no code implementations • 3 Nov 2022 • Yufei Xu, Jing Zhang, Qiming Zhang, DaCheng Tao

Self-supervised pre-training vision transformer (ViT) via masked image modeling (MIM) has been proven very effective.

Paper
Add Code

Watermarking for Out-of-distribution Detection

1 code implementation • 27 Oct 2022 • Qizhou Wang, Feng Liu, Yonggang Zhang, Jing Zhang, Chen Gong, Tongliang Liu, Bo Han

Out-of-distribution (OOD) detection aims to identify OOD data based on representations extracted from well-trained deep models.

Ranked #20 on Out-of-Distribution Detection on ImageNet-1k vs Places

Out-of-Distribution Detection

Paper
Code

Adversarial Purification with the Manifold Hypothesis

no code implementations • 26 Oct 2022 • Zhaoyuan Yang, Zhiwei Xu, Jing Zhang, Richard Hartley, Peter Tu

In this work, we formulate a novel framework for adversarial robustness using the manifold hypothesis.

Adversarial Robustness Variational Inference

Paper
Add Code

Oscillatory cooperation prevalence emerges from misperception

no code implementations • 17 Oct 2022 • Jing Zhang, Zhao Li, Jiqiang Zhang, Lin Ma, Guozhong Zheng, Li Chen

Here we show that oscillatory behaviors naturally emerge if incomplete information is incorporated into the cooperation evolution of a non-Markov model.

Paper
Add Code

DP-TrajGAN_ A privacy-aware trajectory generation model with differential privacy

no code implementations • Future Generation Computer Systems 2022 • Jing Zhang, Qihan Huang, Yirui Huang, Qian Ding, Pei-Wei Tsai

Open Data Processing Services (ODPS) offers vast storage capacity and excellent efficiency, which collects and stores a lot of data.

Generative Adversarial Network Privacy Preserving

Paper
Add Code

Towards Explainable 3D Grounded Visual Question Answering: A New Benchmark and Strong Baseline

1 code implementation • 24 Sep 2022 • Lichen Zhao, Daigang Cai, Jing Zhang, Lu Sheng, Dong Xu, Rui Zheng, Yinjie Zhao, Lipeng Wang, Xibo Fan

We also propose a new 3D VQA framework to effectively predict the completely visually grounded and explainable answer.

Question Answering Visual Question Answering

Paper
Code

On Robust Cross-View Consistency in Self-Supervised Monocular Depth Estimation

1 code implementation • 19 Sep 2022 • Haimei Zhao, Jing Zhang, Zhuo Chen, Bo Yuan, DaCheng Tao

Compared with the photometric consistency loss as well as the rigid point cloud alignment loss, the proposed DFA and VDA losses are more robust owing to the strong representation power of deep features as well as the high tolerance of voxel density to the aforementioned challenges.

Monocular Depth Estimation

Paper
Code

Improving RGB-D Point Cloud Registration by Learning Multi-scale Local Linear Transformation

1 code implementation • 31 Aug 2022 • ZiMing Wang, Xiaoliang Huo, Zhenghao Chen, Jing Zhang, Lu Sheng, Dong Xu

In addition to previous methods that seek correspondences by hand-crafted or learnt geometric features, recent point cloud registration methods have tried to apply RGB-D data to achieve more accurate correspondence.

Point Cloud Registration

Paper
Code

Grounded Affordance from Exocentric View

2 code implementations • 28 Aug 2022 • Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, DaCheng Tao

Due to the diversity of interactive affordance, the uniqueness of different individuals leads to diverse interactions, which makes it difficult to establish an explicit link between object parts and affordance labels.

Human-Object Interaction Detection Object +1

Paper
Code

Robust control problems of BSDEs coupled with value functions

no code implementations • 23 Aug 2022 • Zhou Yang, Jing Zhang, Chao Zhou

A robust control problem is considered in this paper, where the controlled stochastic differential equations (SDEs) include ambiguity parameters and their coefficients satisfy non-Lipschitz continuous and non-linear growth conditions, the objective function is expressed as a backward stochastic differential equation (BSDE) with the generator depending on the value function.

Paper
Add Code

Generalised Co-Salient Object Detection

no code implementations • 20 Aug 2022 • Jiawei Liu, Jing Zhang, Ruikai Cui, Kaihao Zhang, Weihao Li, Nick Barnes

We propose a new setting that relaxes an assumption in the conventional Co-Salient Object Detection (CoSOD) setting by allowing the presence of "noisy images" which do not show the shared co-salient object.

Co-Salient Object Detection Object +3

Paper
Add Code

Transformer Networks for Predictive Group Elevator Control

no code implementations • 15 Aug 2022 • Jing Zhang, Athanasios Tsiligkaridis, Hiroshi Taguchi, Arvind Raghunathan, Daniel Nikovski

We propose a Predictive Group Elevator Scheduler by using predictive information of passengers arrivals from a Transformer based destination predictor and a linear regression model that predicts remaining time to destinations.

regression

Paper
Add Code

Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model

2 code implementations • 8 Aug 2022 • Di Wang, Qiming Zhang, Yufei Xu, Jing Zhang, Bo Du, DaCheng Tao, Liangpei Zhang

Large-scale vision foundation models have made significant progress in visual tasks on natural images, with vision transformers being the primary choice due to their good scalability and representation ability.

Ranked #1 on Aerial Scene Classification on AID (50% as trainset)

Aerial Scene Classification Few-Shot Learning +2

414

Paper
Code

Subtype-Former: a deep learning approach for cancer subtype discovery with multi-omics data

no code implementations • 28 Jul 2022 • Hai Yang, Yuhang Sheng, Yi Jiang, Xiaoyang Fang, Dongdong Li, Jing Zhang, Zhe Wang

In addition, Subtype-Former also achieved outstanding results in pan-cancer subtyping, which can help analyze the commonalities and differences across various cancer types at the molecular level.

Survival Analysis

Paper
Add Code

MeshMAE: Masked Autoencoders for 3D Mesh Data Analysis

no code implementations • 20 Jul 2022 • Yaqian Liang, Shanshan Zhao, Baosheng Yu, Jing Zhang, Fazhi He

We first randomly mask some patches of the mesh and feed the corrupted mesh into Mesh Transformers.

Paper
Add Code

FakeCLR: Exploring Contrastive Learning for Solving Latent Discontinuity in Data-Efficient GANs

1 code implementation • 18 Jul 2022 • Ziqiang Li, Chaoyue Wang, Heliang Zheng, Jing Zhang, Bin Li

Since data augmentation strategies have largely alleviated the training instability, how to further improve the generative performance of DE-GANs becomes a hotspot.

Contrastive Learning Data Augmentation

Paper
Code

JPerceiver: Joint Perception Network for Depth, Pose and Layout Estimation in Driving Scenes

1 code implementation • 16 Jul 2022 • Haimei Zhao, Jing Zhang, Sen Zhang, DaCheng Tao

A naive way is to accomplish them independently in a sequential or parallel manner, but there are many drawbacks, i. e., 1) the depth and VO results suffer from the inherent scale ambiguity issue; 2) the BEV layout is directly predicted from the front-view image without using any depth-related information, although the depth map contains useful geometry clues for inferring scene layouts.

Autonomous Driving Depth Estimation +3

Paper
Code

Transformer-based Context Condensation for Boosting Feature Pyramids in Object Detection

no code implementations • 14 Jul 2022 • Zhe Chen, Jing Zhang, Yufei Xu, DaCheng Tao

Current object detectors typically have a feature pyramid (FP) module for multi-level feature fusion (MFF) which aims to mitigate the gap between features from different levels and form a comprehensive object representation to achieve better detection performance.

object-detection Object Detection

Paper
Add Code

ReAct: Temporal Action Detection with Relational Queries

1 code implementation • 14 Jul 2022 • Dingfeng Shi, Yujie Zhong, Qiong Cao, Jing Zhang, Lin Ma, Jia Li, DaCheng Tao

Moreover, we propose two losses to facilitate and stabilize the training of action classification.

Ranked #15 on Temporal Action Localization on THUMOS’14

Action Classification Action Detection +4

Paper
Code

Audio-Visual Segmentation

1 code implementation • 11 Jul 2022 • Jinxing Zhou, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, Jing Zhang, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng Wang, Yiran Zhong

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

Segmentation

430

Paper
Code

Towards Scale-Aware, Robust, and Generalizable Unsupervised Monocular Depth Estimation by Integrating IMU Motion Dynamics

1 code implementation • 11 Jul 2022 • Sen Zhang, Jing Zhang, DaCheng Tao

Unsupervised monocular depth and ego-motion estimation has drawn extensive research attention in recent years.

Monocular Depth Estimation Motion Estimation +1

191

Paper
Code

DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Transformer

3 code implementations • 10 Jul 2022 • Maoyuan Ye, Jing Zhang, Shanshan Zhao, Juhua Liu, Bo Du, DaCheng Tao

However, these methods built upon detection transformer framework might achieve sub-optimal training efficiency and performance due to coarse positional query modeling. In addition, the point label form exploited in previous works implies the reading order of humans, which impedes the detection robustness from our observation.

Ranked #3 on Scene Text Detection on SCUT-CTW1500

Inductive Bias Scene Text Detection +1

222

Paper
Code

A State Transition Model for Mobile Notifications via Survival Analysis

no code implementations • 7 Jul 2022 • Yiping Yuan, Jing Zhang, Shaunak Chatterjee, Shipeng Yu, Romer Rosales

In particular, we provide an online use case on notification delivery time optimization to show how we make better decisions, drive more user engagement, and provide more value to users.

Decision Making Survival Analysis

Paper
Add Code

Re-weighting Negative Samples for Model-Agnostic Matching

no code implementations • 6 Jul 2022 • Jiazhen Lou, Hong Wen, Fuyu Lv, Jing Zhang, Tengfei Yuan, Zhao Li

Recommender Systems (RS), as an efficient tool to discover users' interested items from a very large corpus, has attracted more and more attention from academia and industry.

Multi-Task Learning Recommendation Systems

Paper
Add Code

Explicit Boundary Guided Semi-Push-Pull Contrastive Learning for Supervised Anomaly Detection

1 code implementation • CVPR 2023 • Xincheng Yao, Ruoqi Li, Jing Zhang, Jun Sun, Chongyang Zhang

In this way, our model can form a more explicit and discriminative decision boundary to distinguish known and also unseen anomalies from normal samples more effectively.

Ranked #3 on Supervised Anomaly Detection on MVTec AD (using extra training data)

Contrastive Learning Supervised Anomaly Detection

Paper
Code

CLAMP: Prompt-based Contrastive Learning for Connecting Language and Animal Pose

1 code implementation • CVPR 2023 • Xu Zhang, Wen Wang, Zhe Chen, Yufei Xu, Jing Zhang, DaCheng Tao

Motivated by the progress of visual-language research, we propose that pre-trained language models (e. g., CLIP) can facilitate animal pose estimation by providing rich prior knowledge for describing animal keypoints in text.

Animal Pose Estimation Contrastive Learning

Paper
Code

Knowledge Learning with Crowdsourcing: A Brief Review and Systematic Perspective

no code implementations • 19 Jun 2022 • Jing Zhang

Big data have the characteristics of enormous volume, high velocity, diversity, value-sparsity, and uncertainty, which lead the knowledge learning from them full of challenges.

Paper
Add Code

APT-36K: A Large-scale Benchmark for Animal Pose Estimation and Tracking

4 code implementations • 12 Jun 2022 • Yuxiang Yang, Junjie Yang, Yufei Xu, Jing Zhang, Long Lan, DaCheng Tao

Based on APT-36K, we benchmark several representative models on the following three tracks: (1) supervised animal pose estimation on a single frame under intra- and inter-domain transfer learning settings, (2) inter-species domain generalization test for unseen animals, and (3) animal pose estimation with animal tracking.

Animal Pose Estimation Domain Generalization +1

123

Paper
Code

Toward Real-world Single Image Deraining: A New Benchmark and Beyond

1 code implementation • 11 Jun 2022 • Wei Li, Qiming Zhang, Jing Zhang, Zhen Huang, Xinmei Tian, DaCheng Tao

To address these issues, we establish a new high-quality dataset named RealRain-1k, consisting of $1, 120$ high-resolution paired clean and rainy images with low- and high-density rain streaks, respectively.

Domain Generalization Image Restoration +2

Paper
Code

Referring Image Matting

1 code implementation • CVPR 2023 • Jizhizi Li, Jing Zhang, DaCheng Tao

Different from conventional image matting, which either requires user-defined scribbles/trimap to extract a specific foreground object or directly extracts all the foreground objects in the image indiscriminately, we introduce a new task named Referring Image Matting (RIM) in this paper, which aims to extract the meticulous alpha matte of the specific object that best matches the given natural language description, thus enabling a more natural and simpler instruction for image matting.

Ranked #1 on Referring Image Matting (RefMatte-RW100) on RefMatte

Domain Generalization Image Matting +5

197

Paper
Code

Towards Deeper Understanding of Camouflaged Object Detection

1 code implementation • 23 May 2022 • Yunqiu Lv, Jing Zhang, Yuchao Dai, Aixuan Li, Nick Barnes, Deng-Ping Fan

With the above understanding about camouflaged objects, we present the first triple-task learning framework to simultaneously localize, segment, and rank camouflaged objects, indicating the conspicuousness level of camouflage.

Object object-detection +1

Paper
Code

Salient Object Detection via Bounding-box Supervision

no code implementations • 11 May 2022 • Mengqi He, Jing Zhang, Wenxin Yu

However, as a large amount of background is excluded, the foreground bounding box region contains a less complex background, making it possible to perform handcrafted features-based saliency detection with only the cropped foreground region.

Object object-detection +3

Paper
Add Code

From heavy rain removal to detail restoration: A faster and better network

1 code implementation • 7 May 2022 • Yuanbo Wen, Tao Gao, Jing Zhang, Kaihao Zhang, Ting Chen

This approach comprises two key modules, a rain streaks removal network (R$^2$Net) focusing on accurate rain removal, and a details reconstruction network (DRNet) designed to recover the textural details of rain-free images.

Rain Removal

Paper
Code

RU-Net: Regularized Unrolling Network for Scene Graph Generation

1 code implementation • CVPR 2022 • Xin Lin, Changxing Ding, Jing Zhang, Yibing Zhan, DaCheng Tao

Scene graph generation (SGG) aims to detect objects and predict the relationships between each pair of objects.

Denoising Graph Generation +2

Paper
Code

DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers

no code implementations • CVPR 2022 • Xianing Chen, Qiong Cao, Yujie Zhong, Jing Zhang, Shenghua Gao, DaCheng Tao

Our DearKD is a two-stage framework that first distills the inductive biases from the early intermediate layers of a CNN and then gives the transformer full play by training without distillation.

Knowledge Distillation

Paper
Add Code

ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation

5 code implementations • 26 Apr 2022 • Yufei Xu, Jing Zhang, Qiming Zhang, DaCheng Tao

In this paper, we show the surprisingly good capabilities of plain vision transformers for pose estimation from various aspects, namely simplicity in model structure, scalability in model size, flexibility in training paradigm, and transferability of knowledge between models, through a simple baseline model called ViTPose.

Ranked #1 on Pose Estimation on COCO test-dev (using extra training data)

2D Human Pose Estimation Keypoint Detection

1,162

Paper
Code

Learning to Purification for Unsupervised Person Re-identification

no code implementations • 21 Apr 2022 • Long Lan, Xiao Teng, Jing Zhang, Xiang Zhang, DaCheng Tao

To purify the label noise, we propose to take advantage of the knowledge of teacher model in an offline scheme.

Knowledge Distillation Unsupervised Person Re-Identification

Paper
Add Code

An Energy-Based Prior for Generative Saliency

1 code implementation • 19 Apr 2022 • Jing Zhang, Jianwen Xie, Nick Barnes, Ping Li

We propose a novel generative saliency prediction framework that adopts an informative energy-based model as a prior distribution.

object-detection RGB-D Salient Object Detection +3

Paper
Code

VSA: Learning Varied-Size Window Attention in Vision Transformers

2 code implementations • 18 Apr 2022 • Qiming Zhang, Yufei Xu, Jing Zhang, DaCheng Tao

Attention within windows has been widely explored in vision transformers to balance the performance, computation complexity, and memory footprint.

Instance Segmentation Object Detection +1

148

Paper
Code

A Comprehensive Survey on Data-Efficient GANs in Image Generation

no code implementations • 18 Apr 2022 • Ziqiang Li, Beihao Xia, Jing Zhang, Chaoyue Wang, Bin Li

Generative Adversarial Networks (GANs) have achieved remarkable achievements in image synthesis.

Image Generation

Paper
Add Code

An Empirical Study of Remote Sensing Pretraining

2 code implementations • 6 Apr 2022 • Di Wang, Jing Zhang, Bo Du, Gui-Song Xia, DaCheng Tao

To this end, we train different networks from scratch with the help of the largest RS scene recognition dataset up to now -- MillionAID, to obtain a series of RS pretrained backbones, including both convolutional neural networks (CNN) and vision transformers such as Swin and ViTAE, which have shown promising performance on computer vision tasks.

Ranked #1 on Aerial Scene Classification on UCM (80% as trainset)

Aerial Scene Classification Building change detection for remote sensing images +5

414

Paper
Code

BMD: A General Class-balanced Multicentric Dynamic Prototype Strategy for Source-free Domain Adaptation

1 code implementation • 6 Apr 2022 • Sanqing Qu, Guang Chen, Jing Zhang, Zhijun Li, wei he, DaCheng Tao

Source-free Domain Adaptation (SFDA) aims to adapt a pre-trained source model to the unlabeled target domain without accessing the well-labeled source data, which is a much more practical setting due to the data privacy, security, and transmission issues.

Clustering Pseudo Label +1

Paper
Code

Dynamic Focus-aware Positional Queries for Semantic Segmentation

2 code implementations • CVPR 2023 • Haoyu He, Jianfei Cai, Zizheng Pan, Jing Liu, Jing Zhang, DaCheng Tao, Bohan Zhuang

In this paper, we propose a simple yet effective query design for semantic segmentation termed Dynamic Focus-aware Positional Queries (DFPQ), which dynamically generates positional queries conditioned on the cross-attention scores from the preceding decoder block and the positional encodings for the corresponding image features, simultaneously.

Ranked #21 on Semantic Segmentation on ADE20K

Semantic Segmentation

Paper
Code

Rethinking Portrait Matting with Privacy Preserving

1 code implementation • 31 Mar 2022 • Sihan Ma, Jizhizi Li, Jing Zhang, He Zhang, DaCheng Tao

P3M-10k consists of 10, 421 high resolution face-blurred portrait images along with high-quality alpha mattes, which enables us to systematically evaluate both trimap-free and trimap-based matting methods and obtain some useful findings about model generalization ability under the privacy preserving training (PPT) setting.

Ranked #1 on Image Matting on P3M-10k

Domain Generalization Image Matting +1

Paper
Code

A Roadmap for Big Model

no code implementations • 26 Mar 2022 • Sha Yuan, Hanyu Zhao, Shuai Zhao, Jiahong Leng, Yangxiao Liang, Xiaozhi Wang, Jifan Yu, Xin Lv, Zhou Shao, Jiaao He, Yankai Lin, Xu Han, Zhenghao Liu, Ning Ding, Yongming Rao, Yizhao Gao, Liang Zhang, Ming Ding, Cong Fang, Yisen Wang, Mingsheng Long, Jing Zhang, Yinpeng Dong, Tianyu Pang, Peng Cui, Lingxiao Huang, Zheng Liang, HuaWei Shen, HUI ZHANG, Quanshi Zhang, Qingxiu Dong, Zhixing Tan, Mingxuan Wang, Shuo Wang, Long Zhou, Haoran Li, Junwei Bao, Yingwei Pan, Weinan Zhang, Zhou Yu, Rui Yan, Chence Shi, Minghao Xu, Zuobai Zhang, Guoqiang Wang, Xiang Pan, Mengjie Li, Xiaoyu Chu, Zijun Yao, Fangwei Zhu, Shulin Cao, Weicheng Xue, Zixuan Ma, Zhengyan Zhang, Shengding Hu, Yujia Qin, Chaojun Xiao, Zheni Zeng, Ganqu Cui, Weize Chen, Weilin Zhao, Yuan YAO, Peng Li, Wenzhao Zheng, Wenliang Zhao, Ziyi Wang, Borui Zhang, Nanyi Fei, Anwen Hu, Zenan Ling, Haoyang Li, Boxi Cao, Xianpei Han, Weidong Zhan, Baobao Chang, Hao Sun, Jiawen Deng, Chujie Zheng, Juanzi Li, Lei Hou, Xigang Cao, Jidong Zhai, Zhiyuan Liu, Maosong Sun, Jiwen Lu, Zhiwu Lu, Qin Jin, Ruihua Song, Ji-Rong Wen, Zhouchen Lin, LiWei Wang, Hang Su, Jun Zhu, Zhifang Sui, Jiajun Zhang, Yang Liu, Xiaodong He, Minlie Huang, Jian Tang, Jie Tang

With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm.

Language Modelling Machine Translation +1

Paper
Add Code

Learning Affordance Grounding from Exocentric Images

2 code implementations • CVPR 2022 • Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, DaCheng Tao

To empower an agent with such ability, this paper proposes a task of affordance grounding from exocentric view, i. e., given exocentric human-object interaction and egocentric object images, learning the affordance knowledge of the object and transferring it to the egocentric image using only the affordance label as supervision.

Human-Object Interaction Detection Object +1

Paper
Code

AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation

no code implementations • 18 Mar 2022 • Di You, Fenglin Liu, Shen Ge, Xiaoxia Xie, Jing Zhang, Xian Wu

The acquired disease-grounded visual features can better represent the abnormal regions of the input image, which could alleviate data bias problem; 2) MGT module effectively uses the multi-grained features and Transformer framework to generate the long medical report.

Descriptive Image Captioning +1

Paper
Add Code

Towards Data-Efficient Detection Transformers

2 code implementations • 17 Mar 2022 • Wen Wang, Jing Zhang, Yang Cao, Yongliang Shen, DaCheng Tao

Besides, we introduce a simple yet effective label augmentation method to provide richer supervision and improve data efficiency.

Paper
Code

Information-Theoretic Odometry Learning

no code implementations • 11 Mar 2022 • Sen Zhang, Jing Zhang, DaCheng Tao

In this paper, we propose a unified information theoretic framework for learning-motivated methods aimed at odometry estimation, a crucial component of many robotics and vision tasks such as navigation and virtual reality where relative camera poses are required in real time.

Paper
Add Code

Towards Scale Consistent Monocular Visual Odometry by Learning from the Virtual World

no code implementations • 11 Mar 2022 • Sen Zhang, Jing Zhang, DaCheng Tao

In this work, we propose VRVO, a novel framework for retrieving the absolute scale from virtual data that can be easily obtained from modern simulation environments, whereas in the real domain no stereo or ground-truth data are required in either the training or inference phases.

Monocular Visual Odometry

Paper
Add Code

Subgraph Retrieval Enhanced Model for Multi-hop Knowledge Base Question Answering

1 code implementation • ACL 2022 • Jing Zhang, Xiaokang Zhang, Jifan Yu, Jian Tang, Jie Tang, Cuiping Li, Hong Chen

Recent works on knowledge base question answering (KBQA) retrieve subgraphs for easier reasoning.

Knowledge Base Question Answering Retrieval

Paper
Code

ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond

4 code implementations • 21 Feb 2022 • Qiming Zhang, Yufei Xu, Jing Zhang, DaCheng Tao

Vision transformers have shown great potential in various computer vision tasks owing to their strong capability to model long-range dependency using the self-attention mechanism.

Ranked #2 on Image Classification on ImageNet ReaL

Image Classification Inductive Bias

512

Paper
Code

Deep Interest Highlight Network for Click-Through Rate Prediction in Trigger-Induced Recommendation

1 code implementation • 5 Feb 2022 • Qijie Shen, Hong Wen, Wanjie Tao, Jing Zhang, Fuyu Lv, Zulong Chen, Zhao Li

In many classical e-commerce platforms, personalized recommendation has been proven to be of great business value, which can improve user satisfaction and increase the revenue of platforms.

Click-Through Rate Prediction

Paper
Code

SASA: Semantics-Augmented Set Abstraction for Point-based 3D Object Detection

1 code implementation • 6 Jan 2022 • Chen Chen, Zhe Chen, Jing Zhang, DaCheng Tao

We observe that the prevailing set abstraction design for down-sampling points may maintain too much unimportant background information that can affect feature learning for detecting objects.

3D Object Detection object-detection

Paper
Code

Exemplar-free Class Incremental Learning via Discriminative and Comparable One-class Classifiers

1 code implementation • 5 Jan 2022 • Wenju Sun, Qingyong Li, Jing Zhang, Danyu Wang, Wen Wang, Yangli-ao Geng

DisCOIL follows the basic principle of POC, but it adopts variational auto-encoders (VAE) instead of other well-established one-class classifiers (e. g. deep SVDD), because a trained VAE can not only identify the probability of an input sample belonging to a class but also generate pseudo samples of the class to assist in learning new tasks.

Class Incremental Learning Incremental Learning +1

Paper
Code

ISNet: Shape Matters for Infrared Small Target Detection

1 code implementation • CVPR 2022 • Mingjin Zhang, Rui Zhang, Yuxiang Yang, Haichen Bai, Jing Zhang, Jie Guo

TOAA block calculates the low-level information with attention mechanism in both row and column directions and fuses it with the high-level information to capture the shape characteristic of targets and suppress noises.

Management

107

Paper
Code

3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds

no code implementations • CVPR 2022 • Daigang Cai, Lichen Zhao, Jing Zhang, Lu Sheng, Dong Xu

Observing that the 3D captioning task and the 3D grounding task contain both shared and complementary information in nature, in this work, we propose a unified framework to jointly solve these two distinct but closely related tasks in a synergistic fashion, which consists of both shared task-agnostic modules and lightweight task-specific modules.

Attribute Dense Captioning +1

Paper
Add Code

Siamese Network with Interactive Transformer for Video Object Segmentation

1 code implementation • 28 Dec 2021 • Meng Lan, Jing Zhang, Fengxiang He, Lefei Zhang

Semi-supervised video object segmentation (VOS) refers to segmenting the target object in remaining frames given its annotation in the first frame, which has been actively studied in recent years.

Object Semantic Segmentation +2

Paper
Code

Semi-supervised Salient Object Detection with Effective Confidence Estimation

no code implementations • 28 Dec 2021 • Jiawei Liu, Jing Zhang, Nick Barnes

We study semi-supervised salient object detection, with access to a small number of labeled samples and a large number of unlabeled samples.

Object object-detection +3

Paper
Add Code

MetaCVR: Conversion Rate Prediction via Meta Learning in Small-Scale Recommendation Scenarios

no code implementations • 27 Dec 2021 • Xiaofeng Pan, Ming Li, Jing Zhang, Keren Yu, Luping Wang, Hong Wen, Chengjun Mao, Bo Cao

At last, we develop an Ensemble Prediction Network (EPN) which incorporates the output of FRN and DMN to make the final CVR prediction.

Meta-Learning

Paper
Add Code

Learning Generative Vision Transformer with Energy-Based Latent Space for Saliency Prediction

no code implementations • NeurIPS 2021 • Jing Zhang, Jianwen Xie, Nick Barnes, Ping Li

In this paper, we take a step further by proposing a novel generative vision transformer with latent variables following an informative energy-based prior for salient object detection.

Ranked #3 on Thermal Image Segmentation on RGB-T-Glass-Segmentation

object-detection RGB-D Salient Object Detection +3

Paper
Add Code

MOEF: Modeling Occasion Evolution in Frequency Domain for Promotion-Aware Click-Through Rate Prediction

1 code implementation • 27 Dec 2021 • Xiaofeng Pan, Yibin Shen, Jing Zhang, Xu He, Yang Huang, Hong Wen, Chengjun Mao, Bo Cao

In this paper, we propose a novel CTR model named MOEF for recommendations under frequent changes of occasions.

Click-Through Rate Prediction Recommendation Systems +2

Paper
Code

Visual Semantics Allow for Textual Reasoning Better in Scene Text Recognition

1 code implementation • AAAI 2022 2021 • Yue He, Chen Chen, Jing Zhang, Juhua Liu, Fengxiang He, Chaoyue Wang, Bo Du

Technically, given the character segmentation maps predicted by a VR model, we construct a subgraph for each instance, where nodes represent the pixels in it and edges are added between nodes based on their spatial similarity.

Ranked #9 on Scene Text Recognition on ICDAR2015 (using extra training data)

Language Modelling Scene Text Recognition

105

Paper
Code

Injecting Numerical Reasoning Skills into Knowledge Base Question Answering Models

1 code implementation • 12 Dec 2021 • Yu Feng, Jing Zhang, Xiaokang Zhang, Lemao Liu, Cuiping Li, Hong Chen

Embedding-based methods are popular for Knowledge Base Question Answering (KBQA), but few current models have numerical reasoning skills and thus struggle to answer ordinal constrained questions.

Data Augmentation Knowledge Base Question Answering

Paper
Code

Recurrent Glimpse-based Decoder for Detection with Transformer

1 code implementation • CVPR 2022 • Zhe Chen, Jing Zhang, DaCheng Tao

Then, a glimpse-based decoder is introduced to provide refined detection results based on both the glimpse features and the attention modeling outputs of the previous stage.

Ranked #1 on Object Detection on MS COCO (GFlops metric)

Object Detection

Paper
Code

GETAM: Gradient-weighted Element-wise Transformer Attention Map for Weakly-supervised Semantic segmentation

1 code implementation • 6 Dec 2021 • Weixuan Sun, Jing Zhang, Zheyuan Liu, Yiran Zhong, Nick Barnes

To bridge their gap, a Class Activation Map (CAM) is usually generated to provide pixel level pseudo labels.

Ranked #19 on Weakly-Supervised Semantic Segmentation on PASCAL VOC 2012 test

Weakly supervised Semantic Segmentation Weakly-Supervised Semantic Segmentation

Paper
Code

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

2 code implementations • 6 Dec 2021 • Kaustubh D. Dhole, Varun Gangal, Sebastian Gehrmann, Aadesh Gupta, Zhenhao Li, Saad Mahamood, Abinaya Mahendiran, Simon Mille, Ashish Shrivastava, Samson Tan, Tongshuang Wu, Jascha Sohl-Dickstein, Jinho D. Choi, Eduard Hovy, Ondrej Dusek, Sebastian Ruder, Sajant Anand, Nagender Aneja, Rabin Banjade, Lisa Barthe, Hanna Behnke, Ian Berlot-Attwell, Connor Boyle, Caroline Brun, Marco Antonio Sobrevilla Cabezudo, Samuel Cahyawijaya, Emile Chapuis, Wanxiang Che, Mukund Choudhary, Christian Clauss, Pierre Colombo, Filip Cornell, Gautier Dagan, Mayukh Das, Tanay Dixit, Thomas Dopierre, Paul-Alexis Dray, Suchitra Dubey, Tatiana Ekeinhor, Marco Di Giovanni, Tanya Goyal, Rishabh Gupta, Louanes Hamla, Sang Han, Fabrice Harel-Canada, Antoine Honore, Ishan Jindal, Przemyslaw K. Joniak, Denis Kleyko, Venelin Kovatchev, Kalpesh Krishna, Ashutosh Kumar, Stefan Langer, Seungjae Ryan Lee, Corey James Levinson, Hualou Liang, Kaizhao Liang, Zhexiong Liu, Andrey Lukyanenko, Vukosi Marivate, Gerard de Melo, Simon Meoni, Maxime Meyer, Afnan Mir, Nafise Sadat Moosavi, Niklas Muennighoff, Timothy Sum Hon Mun, Kenton Murray, Marcin Namysl, Maria Obedkova, Priti Oli, Nivranshu Pasricha, Jan Pfister, Richard Plant, Vinay Prabhu, Vasile Pais, Libo Qin, Shahab Raji, Pawan Kumar Rajpoot, Vikas Raunak, Roy Rinberg, Nicolas Roberts, Juan Diego Rodriguez, Claude Roux, Vasconcellos P. H. S., Ananya B. Sai, Robin M. Schmidt, Thomas Scialom, Tshephisho Sefara, Saqib N. Shamsi, Xudong Shen, Haoyue Shi, Yiwen Shi, Anna Shvets, Nick Siegel, Damien Sileo, Jamie Simon, Chandan Singh, Roman Sitelew, Priyank Soni, Taylor Sorensen, William Soto, Aman Srivastava, KV Aditya Srivatsa, Tony Sun, Mukund Varma T, A Tabassum, Fiona Anting Tan, Ryan Teehan, Mo Tiwari, Marie Tolkiehn, Athena Wang, Zijian Wang, Gloria Wang, Zijie J. Wang, Fuxuan Wei, Bryan Wilie, Genta Indra Winata, Xinyi Wu, Witold Wydmański, Tianbao Xie, Usama Yaseen, Michael A. Yee, Jing Zhang, Yue Zhang

Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on.

Data Augmentation

758

Paper
Code

PolyphonicFormer: Unified Query Learning for Depth-aware Video Panoptic Segmentation

1 code implementation • 5 Dec 2021 • Haobo Yuan, Xiangtai Li, Yibo Yang, Guangliang Cheng, Jing Zhang, Yunhai Tong, Lefei Zhang, DaCheng Tao

The Depth-aware Video Panoptic Segmentation (DVPS) is a new challenging vision problem that aims to predict panoptic segmentation and depth in a video simultaneously.

Ranked #1 on Depth-aware Video Panoptic Segmentation on SemKITTI-DVPS

Depth-aware Video Panoptic Segmentation Depth Estimation +4

Paper
Code

A Multi-Strategy based Pre-Training Method for Cold-Start Recommendation

no code implementations • 4 Dec 2021 • Bowen Hao, Hongzhi Yin, Jing Zhang, Cuiping Li, Hong Chen

In terms of the pretext task, in addition to considering the intra-correlations of users and items by the embedding reconstruction task, we add embedding contrastive learning task to capture inter-correlations of users and items.

Contrastive Learning Meta-Learning +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.