Search Results for author: Jing Zhang

Found 337 papers, 190 papers with code

P-INT: A Path-based Interaction Model for Few-shot Knowledge Graph Completion

no code implementations Findings (EMNLP) 2021 Jingwen Xu, Jing Zhang, Xirui Ke, Yuxiao Dong, Hong Chen, Cuiping Li, Yongbin Liu

Its general process is to first encode the implicit relation of an entity pair and then match the relation of a query entity pair with the relations of the reference entity pairs.

Knowledge Graph Completion Relation

Long-range Sequence Modeling with Predictable Sparse Attention

no code implementations ACL 2022 Yimeng Zhuang, Jing Zhang, Mei Tu

(2) A sparse attention matrix estimation module, which predicts dominant elements of an attention matrix based on the output of the previous hidden state cross module.

Math

SRCB at SemEval-2022 Task 5: Pretraining Based Image to Text Late Sequential Fusion System for Multimodal Misogynous Meme Identification

no code implementations SemEval (NAACL) 2022 Jing Zhang, Yujin Wang

Online misogyny meme detection is an image/text multimodal classification task, the complicated relation of image and text challenges the intelligent system’s modality fusion learning capability.

HOSMEL: A Hot-Swappable Modularized Entity Linking Toolkit for Chinese

1 code implementation ACL 2022 Daniel Zhang-li, Jing Zhang, Jifan Yu, Xiaokang Zhang, Peng Zhang, Jie Tang, Juanzi Li

We investigate the usage of entity linking (EL)in downstream tasks and present the first modularized EL toolkit for easy task adaptation.

Entity Linking Question Answering

LLMTune: Accelerate Database Knob Tuning with Large Language Models

no code implementations17 Apr 2024 Xinmei Huang, Haoyang Li, Jing Zhang, Xinxin Zhao, Zhiming Yao, Yiyan Li, Zhuohao Yu, Tieying Zhang, Hong Chen, Cuiping Li

Database knob tuning is a critical challenge in the database community, aiming to optimize knob values to enhance database performance for specific workloads.

Language Modelling Large Language Model

Transferable and Efficient Non-Factual Content Detection via Probe Training with Offline Consistency Checking

2 code implementations10 Apr 2024 Xiaokang Zhang, Zijun Yao, Jing Zhang, Kaifeng Yun, Jifan Yu, Juanzi Li, Jie Tang

Detecting non-factual content is a longstanding goal to increase the trustworthiness of large language models (LLMs) generations.

Question Answering

UniMix: Towards Domain Adaptive and Generalizable LiDAR Semantic Segmentation in Adverse Weather

no code implementations8 Apr 2024 Haimei Zhao, Jing Zhang, Zhuo Chen, Shanshan Zhao, DaCheng Tao

We devote UniMix to two main setups: 1) unsupervised domain adaption, adapting the model from the clear weather source domain to the adverse weather target domain; 2) domain generalization, learning a model that generalizes well to unseen scenes in adverse weather.

Autonomous Driving Domain Generalization +2

Latent-based Diffusion Model for Long-tailed Recognition

no code implementations6 Apr 2024 Pengxiao Han, Changkun Ye, Jieming Zhou, Jing Zhang, Jie Hong, Xuesong Li

We propose a new approach, the Latent-based Diffusion Model for Long-tailed Recognition (LDMLR), as a feature augmentation method to tackle the issue.

Denoising Transfer Learning

RaFE: Generative Radiance Fields Restoration

no code implementations4 Apr 2024 Zhongkai Wu, Ziyu Wan, Jing Zhang, Jing Liao, Dong Xu

Instead of reconstructing a blurred NeRF by averaging inconsistencies, we introduce a novel approach using Generative Adversarial Networks (GANs) for NeRF generation to better accommodate the geometric and appearance inconsistencies present in the multi-view images.

3D Reconstruction Novel View Synthesis

A Cause-Effect Look at Alleviating Hallucination of Knowledge-grounded Dialogue Generation

no code implementations4 Apr 2024 Jifan Yu, Xiaohan Zhang, Yifan Xu, Xuanyu Lei, Zijun Yao, Jing Zhang, Lei Hou, Juanzi Li

Recently, knowledge-grounded dialogue generation models, that intentionally invoke external knowledge resources to more informative responses, are also proven to be effective in reducing hallucination.

counterfactual Counterfactual Reasoning +2

SGSH: Stimulate Large Language Models with Skeleton Heuristics for Knowledge Base Question Generation

1 code implementation2 Apr 2024 Shasha Guo, Lizi Liao, Jing Zhang, Yanling Wang, Cuiping Li, Hong Chen

Knowledge base question generation (KBQG) aims to generate natural language questions from a set of triplet facts extracted from KB.

Question Generation Question-Generation

Compressing Large Language Models by Streamlining the Unimportant Layer

no code implementations28 Mar 2024 Xiaodong Chen, Yuxuan Hu, Jing Zhang

Based on this phenomenon, we propose LLM-Streamline, which consists of two parts: layer pruning, where we remove a set of consecutive layers with the lowest importance in the model according to the target sparsity; and layer replacement, where we train a lightweight model to substitute the pruned layers, thereby mitigating the performance degradation caused by pruning.

Enhanced Bayesian Personalized Ranking for Robust Hard Negative Sampling in Recommender Systems

no code implementations28 Mar 2024 Kexin Shi, Jing Zhang, Linjiajie Fang, Wenjia Wang, BingYi Jing

In implicit collaborative filtering, hard negative mining techniques are developed to accelerate and enhance the recommendation model learning.

Collaborative Filtering Recommendation Systems

TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios

1 code implementation28 Mar 2024 Xiaokang Zhang, Jing Zhang, Zeyao Ma, Yang Li, Bohan Zhang, Guanlin Li, Zijun Yao, Kangli Xu, Jinchang Zhou, Daniel Zhang-li, Jifan Yu, Shu Zhao, Juanzi Li, Jie Tang

We introduce TableLLM, a robust large language model (LLM) with 13 billion parameters, purpose-built for proficiently handling tabular data manipulation tasks, whether they are embedded within documents or spreadsheets, catering to real-world office scenarios.

Language Modelling Large Language Model

Contact-aware Human Motion Generation from Textual Descriptions

no code implementations23 Mar 2024 Sihan Ma, Qiong Cao, Jing Zhang, DaCheng Tao

This paper addresses the problem of generating 3D interactive human motion from text.

Motion Synthesis

Learning Gaussian Representation for Eye Fixation Prediction

no code implementations21 Mar 2024 Peipei Song, Jing Zhang, Piotr Koniusz, Nick Barnes

Existing eye fixation prediction methods perform the mapping from input images to the corresponding dense fixation maps generated from raw fixation points.

MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining

1 code implementation20 Mar 2024 Di Wang, Jing Zhang, Minqiang Xu, Lin Liu, Dongsheng Wang, Erzhong Gao, Chengxi Han, HaoNan Guo, Bo Du, DaCheng Tao, Liangpei Zhang

However, transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks.

 Ranked #1 on Semantic Segmentation on SpaceNet 1 (using extra training data)

Aerial Scene Classification Building change detection for remote sensing images +13

LUWA Dataset: Learning Lithic Use-Wear Analysis on Microscopic Images

no code implementations19 Mar 2024 Jing Zhang, Irving Fang, Juexiao Zhang, Hao Wu, Akshat Kaushik, Alice Rodriguez, Hanwen Zhao, Zhuo Zheng, Radu Iovita, Chen Feng

Most importantly, the LUWA dataset provides an underexplored opportunity for vision and learning communities and complements existing image classification problems on common objects.

Few-Shot Learning Image Classification

Open-World Semi-Supervised Learning for Node Classification

1 code implementation18 Mar 2024 Yanling Wang, Jing Zhang, Lingxi Zhang, Lixin Liu, Yuxiao Dong, Cuiping Li, Hong Chen, Hongzhi Yin

Open-world semi-supervised learning (Open-world SSL) for node classification, that classifies unlabeled nodes into seen classes or multiple novel classes, is a practical but under-explored problem in the graph community.

Classification Contrastive Learning +2

Training A Small Emotional Vision Language Model for Visual Art Comprehension

1 code implementation17 Mar 2024 Jing Zhang, Liang Zheng, Dan Guo, Meng Wang

This paper develops small vision language models to understand visual art, which, given an art work, aims to identify its emotion category and explain this prediction with natural language.

Language Modelling

Reverse That Number! Decoding Order Matters in Arithmetic Learning

no code implementations9 Mar 2024 Daniel Zhang-li, Nianyi Lin, Jifan Yu, Zheyuan Zhang, Zijun Yao, Xiaokang Zhang, Lei Hou, Jing Zhang, Juanzi Li

Recent advancements in pretraining have demonstrated that modern Large Language Models (LLMs) possess the capability to effectively learn arithmetic operations.

CodeS: Towards Building Open-source Language Models for Text-to-SQL

1 code implementation26 Feb 2024 Haoyang Li, Jing Zhang, Hanbing Liu, Ju Fan, Xiaokang Zhang, Jun Zhu, Renjie Wei, Hongyan Pan, Cuiping Li, Hong Chen

To address the limitations, we introduce CodeS, a series of pre-trained language models with parameters ranging from 1B to 15B, specifically designed for the text-to-SQL task.

Data Augmentation Domain Adaptation +2

Question Calibration and Multi-Hop Modeling for Temporal Question Answering

no code implementations20 Feb 2024 Chao Xue, Di Liang, Pengfei Wang, Jing Zhang

In the real world, many facts contained in KGs are time-constrained thus temporal KGQA has received increasing attention.

Knowledge Graphs Multi-hop Question Answering +1

LogicPrpBank: A Corpus for Logical Implication and Equivalence

1 code implementation14 Feb 2024 Zhexiong Liu, Jing Zhang, Jiaying Lu, Wenjing Ma, Joyce C Ho

Logic reasoning has been critically needed in problem-solving and decision-making.

Decision Making

RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization

no code implementations8 Feb 2024 Zhikai Li, Xuewen Liu, Jing Zhang, Qingyi Gu

In particular, for the former, we introduce a learnable per-channel dual clipping scheme, which is designed to efficiently identify outliers in the unbalanced activations with fine granularity.

Quantization

BioDrone: A Bionic Drone-based Single Object Tracking Benchmark for Robust Vision

no code implementations7 Feb 2024 Xin Zhao, Shiyu Hu, Yipei Wang, Jing Zhang, Yimin Hu, Rongshuai Liu, Haibin Ling, Yin Li, Renshu Li, Kun Liu, Jiadong Li

These challenges are especially manifested in videos captured by unmanned aerial vehicles (UAV), where the target is usually far away from the camera and often with significant motion relative to the camera.

Autonomous Driving Object Tracking +1

Large Language Model for Table Processing: A Survey

no code implementations4 Feb 2024 Weizheng Lu, Jiaming Zhang, Jing Zhang, Yueguo Chen

Tables, typically two-dimensional and structured to store large amounts of data, are essential in daily activities like database queries, spreadsheet calculations, and generating reports from web tables.

Fact Verification Language Modelling +2

Are Synthetic Time-series Data Really not as Good as Real Data?

no code implementations1 Feb 2024 Fanzhe Fu, Junru Chen, Jing Zhang, Carl Yang, Lvbin Ma, Yang Yang

Time-series data presents limitations stemming from data quality issues, bias and vulnerabilities, and generalization problem.

Representation Learning Time Series

Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning

1 code implementation1 Feb 2024 Jitao Sang, Yuhang Wang, Jing Zhang, Yanxu Zhu, Chao Kong, Junhong Ye, Shuyu Wei, Jinlin Xiao

In the first phase, based on human supervision, the quality of weak supervision is enhanced through a combination of scalable oversight and ensemble learning, reducing the capability gap between weak teachers and strong students.

Ensemble Learning In-Context Learning

Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation

1 code implementation31 Jan 2024 Maoyuan Ye, Jing Zhang, Juhua Liu, Chenyu Liu, BaoCai Yin, Cong Liu, Bo Du, DaCheng Tao

In terms of the AMG mode, Hi-SAM segments text stroke foreground masks initially, then samples foreground points for hierarchical text mask generation and achieves layout analysis in passing.

Hierarchical Text Segmentation Segmentation +1

Data-Free Generalized Zero-Shot Learning

no code implementations28 Jan 2024 Bowen Tang, Long Yan, Jing Zhang, Qian Yu, Lu Sheng, Dong Xu

Firstly, to recover the virtual features of the base data, we model the CLIP features of base class images as samples from a von Mises-Fisher (vMF) distribution based on the pre-trained classifier.

Generalized Zero-Shot Learning Zero-shot Generalization

GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching

1 code implementation13 Jan 2024 Haibin He, Maoyuan Ye, Jing Zhang, Juhua Liu, DaCheng Tao

In response to this issue, we propose to efficiently turn an off-the-shelf query-based image text spotter into a specialist on video and present a simple baseline termed GoMatching, which focuses the training efforts on tracking while maintaining strong recognition performance.

Text Detection Text Spotting

Automated Detection of Myopic Maculopathy in MMAC 2023: Achievements in Classification, Segmentation, and Spherical Equivalent Prediction

1 code implementation8 Jan 2024 Yihao Li, Philippe Zhang, Yubo Tan, Jing Zhang, Zhihan Wang, Weili Jiang, Pierre-Henri Conze, Mathieu Lamard, Gwenolé Quellec, Mostafa El Habib Daho

As for Task 3 (prediction of spherical equivalent), we have designed a deep regression model based on the data distribution of the dataset and employed an integration strategy to enhance the model's prediction accuracy.

Classification Contrastive Learning +3

Robust single-particle cryo-EM image denoising and restoration

no code implementations2 Jan 2024 Jing Zhang, Tengfei Zhao, Shiyu Hu, Xin Zhao

Cryo-electron microscopy (cryo-EM) has achieved near-atomic level resolution of biomolecules by reconstructing 2D micrographs.

Image Denoising

SVGDreamer: Text Guided SVG Generation with Diffusion Model

1 code implementation27 Dec 2023 XiMing Xing, Haitao Zhou, Chuang Wang, Jing Zhang, Dong Xu, Qian Yu

However, existing text-to-SVG generation methods lack editability and struggle with visual quality and result diversity.

Vector Graphics

Multi-modality Affinity Inference for Weakly Supervised 3D Semantic Segmentation

1 code implementation27 Dec 2023 Xiawei Li, Qingyuan Xu, Jing Zhang, Tianyi Zhang, Qian Yu, Lu Sheng, Dong Xu

The point affinity proposed in this paper is characterized by features from multiple modalities (e. g., point cloud and RGB), and is further refined by normalizing the classifier weights to alleviate the detrimental effects of long-tailed distribution without the need of the prior of category distribution.

3D Semantic Segmentation Point Cloud Segmentation +1

APTv2: Benchmarking Animal Pose Estimation and Tracking with a Large-scale Dataset and Beyond

no code implementations25 Dec 2023 Yuxiang Yang, Yingqi Deng, Yufei Xu, Jing Zhang

Animal Pose Estimation and Tracking (APT) is a critical task in detecting and monitoring the keypoints of animals across a series of video frames, which is essential for understanding animal behavior.

Animal Pose Estimation Benchmarking +3

SurgicalPart-SAM: Part-to-Whole Collaborative Prompting for Surgical Instrument Segmentation

2 code implementations22 Dec 2023 Wenxi Yue, Jing Zhang, Kun Hu, Qiuxia Wu, ZongYuan Ge, Yong Xia, Jiebo Luo, Zhiyong Wang

Specifically, we achieve this by proposing (1) Collaborative Prompts that describe instrument structures via collaborating category-level and part-level texts; (2) Cross-Modal Prompt Encoder that encodes text prompts jointly with visual embeddings into discriminative part-level representations; and (3) Part-to-Whole Adaptive Fusion and Hierarchical Decoding that adaptively fuse the part-level representations into a whole for accurate instrument segmentation in surgical scenarios.

Segmentation Semantic Segmentation

LaViP:Language-Grounded Visual Prompts

no code implementations18 Dec 2023 Nilakshan Kunananthaseelan, Jing Zhang, Mehrtash Harandi

We introduce a language-grounded visual prompting method to adapt the visual encoder of vision-language models for downstream tasks.

Few-Shot Learning Transfer Learning +1

HandRefiner: Refining Malformed Hands in Generated Images by Diffusion-based Conditional Inpainting

1 code implementation29 Nov 2023 Wenquan Lu, Yufei Xu, Jing Zhang, Chaoyue Wang, DaCheng Tao

Given a generated failed image due to malformed hands, we utilize ControlNet modules to re-inject such correct hand information.

ArGue: Attribute-Guided Prompt Tuning for Vision-Language Models

no code implementations27 Nov 2023 Xinyu Tian, Shu Zou, Zhaoyuan Yang, Jing Zhang

Although soft prompt tuning is effective in efficiently adapting Vision-Language (V&L) models for downstream tasks, it shows limitations in dealing with distribution shifts.

Attribute Out-of-Distribution Generalization

Low-Complexity Joint Beamforming for RIS-Assisted MU-MISO Systems Based on Model-Driven Deep Learning

no code implementations26 Nov 2023 Weijie Jin, Jing Zhang, Chao-Kai Wen, Shi Jin, Xiao Li, Shuangfeng Han

Reconfigurable intelligent surfaces (RIS) can improve signal propagation environments by adjusting the phase of the incident signal.

Stochastic Optimization

DA-STC: Domain Adaptive Video Semantic Segmentation via Spatio-Temporal Consistency

1 code implementation22 Nov 2023 Zhe Zhang, Gaochang Wu, Jing Zhang, Chunhua Shen, DaCheng Tao, Tianyou Chai

To solve the challenge, we propose a novel DA-STC method for domain adaptive video semantic segmentation, which incorporates a bidirectional multi-level spatio-temporal fusion module and a category-aware spatio-temporal feature alignment module to facilitate consistent learning for domain-invariant features.

Representation Learning Segmentation +2

AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation

1 code implementation13 Nov 2023 Junyang Wang, Yuhang Wang, Guohai Xu, Jing Zhang, Yukai Gu, Haitao Jia, Jiaqi Wang, Haiyang Xu, Ming Yan, Ji Zhang, Jitao Sang

Despite making significant progress in multi-modal tasks, current Multi-modal Large Language Models (MLLMs) encounter the significant challenge of hallucinations, which may lead to harmful consequences.

Attribute Hallucination +2

IMPUS: Image Morphing with Perceptually-Uniform Sampling Using Diffusion Models

1 code implementation12 Nov 2023 Zhaoyuan Yang, Zhengyang Yu, Zhiwei Xu, Jaskirat Singh, Jing Zhang, Dylan Campbell, Peter Tu, Richard Hartley

We present a diffusion-based image morphing approach with perceptually-uniform sampling (IMPUS) that produces smooth, direct and realistic interpolations given an image pair.

Image Generation Image Morphing

PT-Tuning: Bridging the Gap between Time Series Masked Reconstruction and Forecasting via Prompt Token Tuning

no code implementations7 Nov 2023 Hao liu, Jinrui Gan, Xiaoxuan Fan, Yi Zhang, Chuanxian Luo, Jing Zhang, Guangxin Jiang, Yucheng Qian, Changwei Zhao, Huan Ma, Zhenyu Guo

In this paper, we first point out that the unification of task objectives and adaptation for task difficulty are critical for bridging the gap between time series masked reconstruction and forecasting.

Representation Learning Self-Supervised Learning +1

Multimodal Variational Auto-encoder based Audio-Visual Segmentation

1 code implementation ICCV 2023 Yuxin Mao, Jing Zhang, Mochu Xiang, Yiran Zhong, Yuchao Dai

To achieve this, our ECMVAE factorizes the representations of each modality with a modality-shared representation and a modality-specific representation.

Attribute Representation Learning

Decoding trust: A reinforcement learning perspective

no code implementations26 Sep 2023 Guozhong Zheng, Jiqiang Zhang, Jing Zhang, Weiran Cai, Li Chen

In the pairwise scenario, we reveal that high levels of trust and trustworthiness emerge when individuals appreciate both their historical experience and returns in the future.

Decision Making Q-Learning +1

Diversifying Question Generation over Knowledge Base via External Natural Questions

no code implementations23 Sep 2023 Shasha Guo, Jing Zhang, Xirui Ke, Cuiping Li, Hong Chen

The above insights make diversifying question generation an intriguing task, where the first challenge is evaluation metrics for diversity.

Natural Questions Question Answering +2

Multi-dimension Queried and Interacting Network for Stereo Image Deraining

1 code implementation19 Sep 2023 Yuanbo Wen, Tao Gao, ZiQi Li, Jing Zhang, Ting Chen

This module leverages dimension-wise queries that are independent of the input features and employs global context-aware attention (GCA) to capture essential features while avoiding the entanglement of redundant or irrelevant information.

Rain Removal

Decompose Semantic Shifts for Composed Image Retrieval

no code implementations18 Sep 2023 Xingyu Yang, Daqing Liu, Heng Zhang, Yong Luo, Chaoyue Wang, Jing Zhang

Composed image retrieval is a type of image retrieval task where the user provides a reference image as a starting point and specifies a text on how to shift from the starting point to the desired target image.

Image Retrieval Retrieval

Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter

1 code implementation6 Sep 2023 Jinglong Wang, Xiawei Li, Jing Zhang, Qingyuan Xu, Qin Zhou, Qian Yu, Lu Sheng, Dong Xu

The pre-trained text-image discriminative models, such as CLIP, has been explored for open-vocabulary semantic segmentation with unsatisfactory results due to the loss of crucial localization information and awareness of object shapes.

Contrastive Learning Denoising +5

BEVTrack: A Simple and Strong Baseline for 3D Single Object Tracking in Bird's-Eye View

1 code implementation5 Sep 2023 Yuxiang Yang, Yingqi Deng, Jing Zhang, Jiahao Nie, Zheng-Jun Zha

The spatial information indicating objects' spatial adjacency across consecutive frames is crucial for effective object tracking.

3D Single Object Tracking Autonomous Driving +2

Transformer Compression via Subspace Projection

no code implementations31 Aug 2023 Yuxuan Hu, Jing Zhang, Chen Zhao, Cuiping Li, Hong Chen

By projecting the whole transform model into a subspace, we enable matrix operations between the weight matrices in the model and features in a reduced-dimensional space, leading to significant reductions in model parameters and computing resources.

PartSeg: Few-shot Part Segmentation via Part-aware Prompt Learning

no code implementations24 Aug 2023 Mengya Han, Heliang Zheng, Chaoyue Wang, Yong Luo, Han Hu, Jing Zhang, Yonggang Wen

In this work, we address the task of few-shot part segmentation, which aims to segment the different parts of an unseen object using very few labeled examples.

Language Modelling Segmentation

SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation

1 code implementation17 Aug 2023 Wenxi Yue, Jing Zhang, Kun Hu, Yong Xia, Jiebo Luo, Zhiyong Wang

However, we observe two problems with this naive pipeline: (1) the domain gap between natural objects and surgical instruments leads to inferior generalisation of SAM; and (2) SAM relies on precise point or box locations for accurate segmentation, requiring either extensive manual guidance or a well-performing specialist detector for prompt preparation, which leads to a complex multi-stage pipeline.

Image Segmentation Segmentation +1

Gradient-Based Markov Chain Monte Carlo for MIMO Detection

no code implementations12 Aug 2023 Xingyu Zhou, Le Liang, Jing Zhang, Chao-Kai Wen, Shi Jin

However, optimal MIMO detection is associated with a complexity that grows exponentially with the MIMO dimensions and quickly becomes impractical.

Bayesian Inference

Distortion-aware Transformer in 360° Salient Object Detection

1 code implementation7 Aug 2023 Yinjie Zhao, Lichen Zhao, Qian Yu, Jing Zhang, Lu Sheng, Dong Xu

The first is a Distortion Mapping Module, which guides the model to pre-adapt to distorted features globally.

ERP Object +3

Contrastive Conditional Latent Diffusion for Audio-visual Segmentation

no code implementations31 Jul 2023 Yuxin Mao, Jing Zhang, Mochu Xiang, Yunqiu Lv, Yiran Zhong, Yuchao Dai

We propose a latent diffusion model with contrastive learning for audio-visual segmentation (AVS) to extensively explore the contribution of audio.

Contrastive Learning Denoising +2

ESSAformer: Efficient Transformer for Hyperspectral Image Super-resolution

1 code implementation ICCV 2023 Mingjin Zhang, Chi Zhang, Qiming Zhang, Jie Guo, Xinbo Gao, Jing Zhang

Single hyperspectral image super-resolution (single-HSI-SR) aims to restore a high-resolution hyperspectral image from a low-resolution observation.

Hyperspectral Image Super-Resolution Image Super-Resolution

Model Calibration in Dense Classification with Adaptive Label Perturbation

1 code implementation ICCV 2023 Jiawei Liu, Changkun Ye, Shan Wang, Ruikai Cui, Jing Zhang, Kaihao Zhang, Nick Barnes

To improve model calibration, we propose Adaptive Stochastic Label Perturbation (ASLP) which learns a unique label perturbation level for each training image.

Binary Classification Classification +1

Neural Operators for Delay-Compensating Control of Hyperbolic PIDEs

1 code implementation21 Jul 2023 Jie Qi, Jing Zhang, Miroslav Krstic

The recently introduced DeepONet operator-learning framework for PDE control is extended from the results for basic hyperbolic and parabolic PDEs to an advanced hyperbolic class that involves delays on both the state and the system output or input.

Operator learning

Measuring and Modeling Uncertainty Degree for Monocular Depth Estimation

no code implementations19 Jul 2023 Mochu Xiang, Jing Zhang, Nick Barnes, Yuchao Dai

Effectively measuring and modeling the reliability of a trained model is essential to the real-world deployment of monocular depth estimation (MDE) models.

Monocular Depth Estimation

Weakly-supervised Contrastive Learning for Unsupervised Object Discovery

1 code implementation7 Jul 2023 Yunqiu Lv, Jing Zhang, Nick Barnes, Yuchao Dai

Unsupervised object discovery (UOD) refers to the task of discriminating the whole region of objects from the background within a scene without relying on labeled datasets, which benefits the task of bounding-box-level localization and pixel-level segmentation.

Contrastive Learning Image Reconstruction +4

Probabilistic and Semantic Descriptions of Image Manifolds and Their Applications

no code implementations6 Jul 2023 Peter Tu, Zhaoyuan Yang, Richard Hartley, Zhiwei Xu, Jing Zhang, Yiwei Fu, Dylan Campbell, Jaskirat Singh, Tianyu Wang

This paper begins with a description of methods for estimating image probability density functions that reflects the observation that such data is usually constrained to lie in restricted regions of the high-dimensional image space-not every pattern of pixels is an image.

Chain of Thought Prompting Elicits Knowledge Augmentation

1 code implementation4 Jul 2023 Dingjun Wu, Jing Zhang, Xinmei Huang

The knowledge-augmented deep learning paradigm refers to a paradigm in which domain knowledge is identified and integrated into deep models.

Retrieval

RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation

1 code implementation3 Jul 2023 Yonglin Li, Jing Zhang, Xiao Teng, Long Lan

However, it lacks proficiency in referring video object segmentation (RVOS) due to the need for precise user-interactive prompts and a limited understanding of different modalities, such as language and vision.

Image Segmentation Referring Expression +4

GraMMaR: Ground-aware Motion Model for 3D Human Motion Reconstruction

1 code implementation29 Jun 2023 Sihan Ma, Qiong Cao, Hongwei Yi, Jing Zhang, DaCheng Tao

Demystifying complex human-ground interactions is essential for accurate and realistic 3D human motion reconstruction from RGB videos, as it ensures consistency between the humans and the ground plane.

DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models

1 code implementation NeurIPS 2023 XiMing Xing, Chuang Wang, Haitao Zhou, Jing Zhang, Qian Yu, Dong Xu

Even though trained mainly on images, we discover that pretrained diffusion models show impressive power in guiding sketch synthesis.

FHA-Kitchens: A Novel Dataset for Fine-Grained Hand Action Recognition in Kitchen Scenes

1 code implementation19 Jun 2023 Ting Zhe, YongQian Li, Jing Zhang, Yong Luo, Han Hu, Bo Du, Yonggang Wen, DaCheng Tao

We represent the action information in each hand interaction region as a triplet, resulting in a total of 878 action triplets.

Action Recognition Domain Generalization +3

Rethinking Polyp Segmentation from an Out-of-Distribution Perspective

1 code implementation13 Jun 2023 Ge-Peng Ji, Jing Zhang, Dylan Campbell, Huan Xiong, Nick Barnes

Unlike existing fully-supervised approaches, we rethink colorectal polyp segmentation from an out-of-distribution perspective with a simple but effective self-supervised learning approach.

Segmentation Self-Supervised Learning

Mutual Information Regularization for Weakly-supervised RGB-D Salient Object Detection

1 code implementation6 Jun 2023 Aixuan Li, Yuxin Mao, Jing Zhang, Yuchao Dai

In particular, following the principle of disentangled representation learning, we introduce a mutual information upper bound with a mutual information minimization regularizer to encourage the disentangled representation of each modality for salient object detection.

Object object-detection +3

Latent Optimal Paths by Gumbel Propagation for Variational Bayesian Dynamic Programming

1 code implementation5 Jun 2023 Xinlei Niu, Christian Walder, Jing Zhang, Charles Patrick Martin

We propose a unified approach to obtain structured sparse optimal paths in the latent space of a variational autoencoder (VAE) using dynamic programming and Gumbel propagation.

Bayesian Inference Singing Voice Synthesis

DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting

2 code implementations31 May 2023 Maoyuan Ye, Jing Zhang, Shanshan Zhao, Juhua Liu, Tongliang Liu, Bo Du, DaCheng Tao

In this paper, we present DeepSolo++, a simple DETR-like baseline that lets a single decoder with explicit points solo for text detection, recognition, and script identification simultaneously.

Scene Text Detection Text Detection +1

MGL2Rank: Learning to Rank the Importance of Nodes in Road Networks Based on Multi-Graph Fusion

no code implementations20 May 2023 Ming Xu, Jing Zhang

In this framework, we first develop an embedding module that contains a sampling algorithm (MGWalk) and an encoder network to learn latent representation for each road segment.

Graph Learning Learning-To-Rank

Multi-grained Hypergraph Interest Modeling for Conversational Recommendation

1 code implementation4 May 2023 Chenzhan Shang, Yupeng Hou, Wayne Xin Zhao, Yaliang Li, Jing Zhang

In our approach, we first employ the hypergraph structure to model users' historical dialogue sessions and form a session-based hypergraph, which captures coarse-grained, session-level relations.

Recommendation Systems

Revolutionizing Agrifood Systems with Artificial Intelligence: A Survey

no code implementations3 May 2023 Tao Chen, Liang Lv, Di Wang, Jing Zhang, Yue Yang, Zeyang Zhao, Chen Wang, Xiaowei Guo, Hao Chen, Qingye Wang, Yufei Xu, Qiming Zhang, Bo Du, Liangpei Zhang, DaCheng Tao

With the world population rapidly increasing, transforming our agrifood systems to be more productive, efficient, safe, and sustainable is crucial to mitigate potential food shortages.

SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model

2 code implementations NeurIPS 2023 Di Wang, Jing Zhang, Bo Du, Minqiang Xu, Lin Liu, DaCheng Tao, Liangpei Zhang

In this study, we leverage SAM and existing RS object detection datasets to develop an efficient pipeline for generating a large-scale RS segmentation dataset, dubbed SAMRS.

Instance Segmentation Object +4

Scalable Mask Annotation for Video Text Spotting

1 code implementation2 May 2023 Haibin He, Jing Zhang, Mengyang Xu, Juhua Liu, Bo Du, DaCheng Tao

Video text spotting refers to localizing, recognizing, and tracking textual elements such as captions, logos, license plates, signs, and other forms of text within consecutive video frames.

Text Spotting

OSP2B: One-Stage Point-to-Box Network for 3D Siamese Tracking

2 code implementations23 Apr 2023 Jiahao Nie, Zhiwei He, Yuxiang Yang, Zhengyi Bao, Mingyu Gao, Jing Zhang

By integrating the derived classification scores with the center-ness scores, the resulting network can effectively suppress interference proposals and further mitigate task misalignment.

3D Single Object Tracking Object Tracking

MPMQA: Multimodal Question Answering on Product Manuals

1 code implementation19 Apr 2023 Liang Zhang, Anwen Hu, Jing Zhang, Shuo Hu, Qin Jin

Taking into account the length of product manuals and the fact that a question is always related to a small number of pages, MPMQA can be naturally split into two subtasks: retrieving most related pages and then generating multimodal answers.

Question Answering Sentence

Event-based Simultaneous Localization and Mapping: A Comprehensive Survey

1 code implementation19 Apr 2023 Kunping Huang, Sen Zhang, Jing Zhang, DaCheng Tao

This paper presents a timely and comprehensive review of event-based vSLAM algorithms that exploit the benefits of asynchronous and irregular event streams for localization and mapping tasks.

Motion Compensation Simultaneous Localization and Mapping

DCN-T: Dual Context Network with Transformer for Hyperspectral Image Classification

2 code implementations19 Apr 2023 Di Wang, Jing Zhang, Bo Du, Liangpei Zhang, DaCheng Tao

Hyperspectral image (HSI) classification is challenging due to spatial variability caused by complex imaging conditions.

Hyperspectral Image Classification Image Generation

Cold-Start based Multi-Scenario Ranking Model for Click-Through Rate Prediction

no code implementations16 Apr 2023 Peilin Chen, Hong Wen, Jing Zhang, Fuyu Lv, Zhao Li, Qijie Shen, Wanjie Tao, Ying Zhou, Chao Zhang

Online travel platforms (OTPs), e. g., Ctrip. com or Fliggy. com, can effectively provide travel-related products or services to users.

Click-Through Rate Prediction

UVA: Towards Unified Volumetric Avatar for View Synthesis, Pose rendering, Geometry and Texture Editing

no code implementations14 Apr 2023 Jinlong Fan, Jing Zhang, DaCheng Tao

Experiments on multiple human avatars demonstrate that our UVA achieves competitive results in novel view synthesis and novel pose rendering while enabling local and independent editing of geometry and appearance.

Novel View Synthesis

Deep Image Matting: A Comprehensive Survey

1 code implementation10 Apr 2023 Jizhizi Li, Jing Zhang, DaCheng Tao

Image matting refers to extracting precise alpha matte from natural images, and it plays a critical role in various downstream applications, such as image editing.

Image Matting Referring Image Matting

Hierarchically Fusing Long and Short-Term User Interests for Click-Through Rate Prediction in Product Search

no code implementations4 Apr 2023 Qijie Shen, Hong Wen, Jing Zhang, Qi Rao

Specifically, SIE is proposed to extract user's short-term interests by integrating three fundamental interests encoders within it namely query-dependent, target-dependent and causal-dependent interest encoder, respectively, followed by delivering the resultant representation to the module LIE, where it can effectively capture user long-term interests by devising an attention mechanism with respect to the short-term interests from SIE module.

Click-Through Rate Prediction Disentanglement

GLT-T++: Global-Local Transformer for 3D Siamese Tracking with Ranking Loss

1 code implementation1 Apr 2023 Jiahao Nie, Zhiwei He, Yuxiang Yang, Xudong Lv, Mingyu Gao, Jing Zhang

Incorporating this transformer-based voting scheme into 3D RPN, a novel Siamese method dubbed GLT-T is developed for 3D single object tracking on point clouds.

3D Single Object Tracking Object Tracking +1

SimDistill: Simulated Multi-modal Distillation for BEV 3D Object Detection

2 code implementations29 Mar 2023 Haimei Zhao, Qiming Zhang, Shanshan Zhao, Zhe Chen, Jing Zhang, DaCheng Tao

Multi-view camera-based 3D object detection has become popular due to its low cost, but accurately inferring 3D geometry solely from camera data remains challenging and may lead to inferior performance.

3D Object Detection Knowledge Distillation +1

Vision Transformer with Quadrangle Attention

1 code implementation27 Mar 2023 Qiming Zhang, Jing Zhang, Yufei Xu, DaCheng Tao

Window-based attention has become a popular choice in vision transformers due to its superior performance, lower computational complexity, and less memory footprint.

object-detection Object Detection +2

LPFF: A Portrait Dataset for Face Generators Across Large Poses

no code implementations ICCV 2023 Yiqian Wu, Jing Zhang, Hongbo Fu, Xiaogang Jin

To better validate our pose-conditional 3D-aware generators, we develop a new FID measure to evaluate the 3D-level performance.

3D Reconstruction

A Survey on Class Imbalance in Federated Learning

no code implementations21 Mar 2023 Jing Zhang, Chuanwen Li, Jianzgong Qi, Jiayuan He

We first introduce various types of class imbalance in federated learning, after which we review existing methods for estimating the extent of class imbalance without the need of knowing the actual data to preserve data privacy.

Federated Learning

Deep Learning for Camera Calibration and Beyond: A Survey

1 code implementation19 Mar 2023 Kang Liao, Lang Nie, Shujuan Huang, Chunyu Lin, Jing Zhang, Yao Zhao, Moncef Gabbouj, DaCheng Tao

In this paper, we provide a comprehensive survey of learning-based camera calibration techniques, by analyzing their strengths and limitations.

Camera Calibration

Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning

1 code implementation ICCV 2023 Haoyu He, Jianfei Cai, Jing Zhang, DaCheng Tao, Bohan Zhuang

Visual Parameter-Efficient Fine-Tuning (PEFT) has become a powerful alternative for full fine-tuning so as to adapt pre-trained vision models to downstream tasks, which only tunes a small number of parameters while freezing the vast majority ones to ease storage burden and optimization difficulty.

ESceme: Vision-and-Language Navigation with Episodic Scene Memory

1 code implementation2 Mar 2023 Qi Zheng, Daqing Liu, Chaoyue Wang, Jing Zhang, Dadong Wang, DaCheng Tao

Vision-and-language navigation (VLN) simulates a visual agent that follows natural-language navigation instructions in real-world scenes.

Vision and Language Navigation

Transmission-Guided Bayesian Generative Model for Smoke Segmentation

1 code implementation2 Mar 2023 Siyuan Yan, Jing Zhang, Nick Barnes

To effectively model the two types of uncertainty, we introduce a Bayesian generative model to simultaneously estimate the posterior distribution of model parameters and its predictions.

Image Dehazing Image Segmentation +2

GLM-Dialog: Noise-tolerant Pre-training for Knowledge-grounded Dialogue Generation

1 code implementation28 Feb 2023 Jing Zhang, Xiaokang Zhang, Daniel Zhang-li, Jifan Yu, Zijun Yao, Zeyao Ma, Yiqi Xu, Haohua Wang, Xiaohan Zhang, Nianyi Lin, Sunrui Lu, Juanzi Li, Jie Tang

We present GLM-Dialog, a large-scale language model (LLM) with 10B parameters capable of knowledge-grounded conversation in Chinese using a search engine to access the Internet knowledge.

Dialogue Evaluation Dialogue Generation +2

Dual Path Modeling for Semantic Matching by Perceiving Subtle Conflicts

no code implementations24 Feb 2023 Chao Xue, Di Liang, Sirui Wang, Wei Wu, Jing Zhang

To alleviate this problem, we propose a novel Dual Path Modeling Framework to enhance the model's ability to perceive subtle differences in sentence pairs by separately modeling affinity and difference semantics.

Sentence

RESDSQL: Decoupling Schema Linking and Skeleton Parsing for Text-to-SQL

1 code implementation12 Feb 2023 Haoyang Li, Jing Zhang, Cuiping Li, Hong Chen

Due to the structural property of the SQL queries, the seq2seq model takes the responsibility of parsing both the schema items (i. e., tables and columns) and the skeleton (i. e., SQL keywords).

Language Modelling Semantic Parsing +2

Feature Decomposition for Reducing Negative Transfer: A Novel Multi-task Learning Method for Recommender System

1 code implementation10 Feb 2023 Jie zhou, Qian Yu, Chuan Luo, Jing Zhang

In recent years, thanks to the rapid development of deep learning (DL), DL-based multi-task learning (MTL) has made significant progress, and it has been successfully applied to recommendation systems (RS).

Multi-Task Learning Recommendation Systems

AniPixel: Towards Animatable Pixel-Aligned Human Avatar

no code implementations7 Feb 2023 Jinlong Fan, Jing Zhang, Zhi Hou, DaCheng Tao

In this paper, we propose AniPixel, a novel animatable and generalizable human avatar reconstruction method that leverages pixel-aligned features for body geometry prediction and RGB color blending.

3D Scene Reconstruction

Audio-Visual Segmentation with Semantics

1 code implementation30 Jan 2023 Jinxing Zhou, Xuyang Shen, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, Jing Zhang, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng Wang, Yiran Zhong

To deal with these problems, we propose a new baseline method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

Segmentation Semantic Segmentation +1

Constrained Policy Optimization with Explicit Behavior Density for Offline Reinforcement Learning

1 code implementation NeurIPS 2023 Jing Zhang, Chi Zhang, Wenjia Wang, Bing-Yi Jing

Due to the inability to interact with the environment, offline reinforcement learning (RL) methods face the challenge of estimating the Out-of-Distribution (OOD) points.

reinforcement-learning Reinforcement Learning (RL)

A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends

1 code implementation13 Jan 2023 Jie Gui, Tuo Chen, Jing Zhang, Qiong Cao, Zhenan Sun, Hao Luo, DaCheng Tao

Deep supervised learning algorithms typically require a large volume of labeled data to achieve satisfactory performance.

Self-Supervised Learning

Decoupling Learning and Remembering: A Bilevel Memory Framework With Knowledge Projection for Task-Incremental Learning

1 code implementation CVPR 2023 Wenju Sun, Qingyong Li, Jing Zhang, Wen Wang, Yangli-ao Geng

BMKP decouples the functions of learning and knowledge remembering via a bilevel-memory design: a working memory responsible for adaptively model learning, to ensure plasticity; a long-term memory in charge of enduringly storing the knowledge incorporated within the learned model, to guarantee stability.

Incremental Learning

Domain Specified Optimization for Deployment Authorization

no code implementations ICCV 2023 Haotian Wang, Haoang Chi, Wenjing Yang, Zhipeng Lin, Mingyang Geng, Long Lan, Jing Zhang, DaCheng Tao

As a complementary of SDPA, we also propose Target-Combined Deployment Authorization (TPDA), where unauthorized domains are partially accessible, and simplify the DSO method to a perturbation operation on the pseudo predictions, referred to as Target-Dependent Domain-Specified Optimization (TDSO).

Leverage Interactive Affinity for Affordance Learning

1 code implementation CVPR 2023 Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, DaCheng Tao

Perceiving potential "action possibilities" (i. e., affordance) regions of images and learning interactive functionalities of objects from human demonstration is a challenging task due to the diversity of human-object interactions.

Human-Object Interaction Detection Object

Modeling the Distributional Uncertainty for Salient Object Detection Models

no code implementations CVPR 2023 Xinyu Tian, Jing Zhang, Mochu Xiang, Yuchao Dai

Most of the existing salient object detection (SOD) models focus on improving the overall model performance, without explicitly explaining the discrepancy between the training and testing distributions.

Long-tail Learning Object +3

Joint Spatio-Temporal Modeling for the Semantic Change Detection in Remote Sensing Images

1 code implementation10 Dec 2022 Lei Ding, Jing Zhang, Kai Zhang, Haitao Guo, Bing Liu, Lorenzo Bruzzone

Semantic Change Detection (SCD) refers to the task of simultaneously extracting the changed areas and the semantic categories (before and after the changes) in Remote Sensing Images (RSIs).

Change Detection

ViTPose++: Vision Transformer for Generic Body Pose Estimation

1 code implementation7 Dec 2022 Yufei Xu, Jing Zhang, Qiming Zhang, DaCheng Tao

In this paper, we show the surprisingly good properties of plain vision transformers for body pose estimation from various aspects, namely simplicity in model structure, scalability in model size, flexibility in training paradigm, and transferability of knowledge between models, through a simple baseline model dubbed ViTPose.

 Ranked #1 on Animal Pose Estimation on AP-10K (using extra training data)

2D Human Pose Estimation Animal Pose Estimation +1

Learning to Learn Better for Video Object Segmentation

1 code implementation5 Dec 2022 Meng Lan, Jing Zhang, Lefei Zhang, DaCheng Tao

Recently, the joint learning framework (JOINT) integrates matching based transductive reasoning and online inductive learning to achieve accurate and robust semi-supervised video object segmentation (SVOS).

Object Semantic Segmentation +2

1st Workshop on Maritime Computer Vision (MaCVi) 2023: Challenge Results

no code implementations24 Nov 2022 Benjamin Kiefer, Matej Kristan, Janez Perš, Lojze Žust, Fabio Poiesi, Fabio Augusto de Alcantara Andrade, Alexandre Bernardino, Matthew Dawkins, Jenni Raitoharju, Yitong Quan, Adem Atmaca, Timon Höfer, Qiming Zhang, Yufei Xu, Jing Zhang, DaCheng Tao, Lars Sommer, Raphael Spraul, Hangyue Zhao, Hongpu Zhang, Yanyun Zhao, Jan Lukas Augustin, Eui-ik Jeon, Impyeong Lee, Luca Zedda, Andrea Loddo, Cecilia Di Ruberto, Sagar Verma, Siddharth Gupta, Shishir Muralidhara, Niharika Hegde, Daitao Xing, Nikolaos Evangeliou, Anthony Tzes, Vojtěch Bartl, Jakub Špaňhel, Adam Herout, Neelanjan Bhowmik, Toby P. Breckon, Shivanand Kundargi, Tejas Anvekar, Chaitra Desai, Ramesh Ashok Tabib, Uma Mudengudi, Arpita Vats, Yang song, Delong Liu, Yonglin Li, Shuman Li, Chenhao Tan, Long Lan, Vladimir Somers, Christophe De Vleeschouwer, Alexandre Alahi, Hsiang-Wei Huang, Cheng-Yen Yang, Jenq-Neng Hwang, Pyong-Kun Kim, Kwangju Kim, Kyoungoh Lee, Shuai Jiang, Haiwen Li, Zheng Ziqiang, Tuan-Anh Vu, Hai Nguyen-Truong, Sai-Kit Yeung, Zhuang Jia, Sophia Yang, Chih-Chung Hsu, Xiu-Yu Hou, Yu-An Jhang, Simon Yang, Mau-Tsuen Yang

The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection.

Object object-detection +2

GLT-T: Global-Local Transformer Voting for 3D Single Object Tracking in Point Clouds

2 code implementations20 Nov 2022 Jiahao Nie, Zhiwei He, Yuxiang Yang, Mingyu Gao, Jing Zhang

Technically, a global-local transformer (GLT) module is employed to integrate object- and patch-aware prior into seed point features to effectively form strong feature representation for geometric positions of the seed points, thus providing more robust and accurate cues for offset learning.

3D Single Object Tracking Object Tracking +1

DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting

2 code implementations CVPR 2023 Maoyuan Ye, Jing Zhang, Shanshan Zhao, Juhua Liu, Tongliang Liu, Bo Du, DaCheng Tao

In this paper, we present DeepSolo, a simple DETR-like baseline that lets a single Decoder with Explicit Points Solo for text detection and recognition simultaneously.

 Ranked #1 on Text Spotting on Total-Text (using extra training data)

Scene Text Detection Text Detection +2

Energy-Based Residual Latent Transport for Unsupervised Point Cloud Completion

1 code implementation13 Nov 2022 Ruikai Cui, Shi Qiu, Saeed Anwar, Jing Zhang, Nick Barnes

Unsupervised point cloud completion aims to infer the whole geometry of a partial object observation without requiring partial-complete correspondence.

Point Cloud Completion

Unifying Flow, Stereo and Depth Estimation

1 code implementation10 Nov 2022 Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, Fisher Yu, DaCheng Tao, Andreas Geiger

We present a unified formulation and model for three motion and 3D perception tasks: optical flow, rectified stereo matching and unrectified stereo depth estimation from posed images.

Optical Flow Estimation Stereo Depth Estimation +1

Rethinking Hierarchies in Pre-trained Plain Vision Transformer

no code implementations3 Nov 2022 Yufei Xu, Jing Zhang, Qiming Zhang, DaCheng Tao

Self-supervised pre-training vision transformer (ViT) via masked image modeling (MIM) has been proven very effective.

Watermarking for Out-of-distribution Detection

1 code implementation27 Oct 2022 Qizhou Wang, Feng Liu, Yonggang Zhang, Jing Zhang, Chen Gong, Tongliang Liu, Bo Han

Out-of-distribution (OOD) detection aims to identify OOD data based on representations extracted from well-trained deep models.

Out-of-Distribution Detection

Adversarial Purification with the Manifold Hypothesis

no code implementations26 Oct 2022 Zhaoyuan Yang, Zhiwei Xu, Jing Zhang, Richard Hartley, Peter Tu

In this work, we formulate a novel framework for adversarial robustness using the manifold hypothesis.

Adversarial Robustness Variational Inference

Oscillatory cooperation prevalence emerges from misperception

no code implementations17 Oct 2022 Jing Zhang, Zhao Li, Jiqiang Zhang, Lin Ma, Guozhong Zheng, Li Chen

Here we show that oscillatory behaviors naturally emerge if incomplete information is incorporated into the cooperation evolution of a non-Markov model.

On Robust Cross-View Consistency in Self-Supervised Monocular Depth Estimation

1 code implementation19 Sep 2022 Haimei Zhao, Jing Zhang, Zhuo Chen, Bo Yuan, DaCheng Tao

Compared with the photometric consistency loss as well as the rigid point cloud alignment loss, the proposed DFA and VDA losses are more robust owing to the strong representation power of deep features as well as the high tolerance of voxel density to the aforementioned challenges.

Monocular Depth Estimation

Improving RGB-D Point Cloud Registration by Learning Multi-scale Local Linear Transformation

1 code implementation31 Aug 2022 ZiMing Wang, Xiaoliang Huo, Zhenghao Chen, Jing Zhang, Lu Sheng, Dong Xu

In addition to previous methods that seek correspondences by hand-crafted or learnt geometric features, recent point cloud registration methods have tried to apply RGB-D data to achieve more accurate correspondence.

Point Cloud Registration

Grounded Affordance from Exocentric View

2 code implementations28 Aug 2022 Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, DaCheng Tao

Due to the diversity of interactive affordance, the uniqueness of different individuals leads to diverse interactions, which makes it difficult to establish an explicit link between object parts and affordance labels.

Human-Object Interaction Detection Object +1

Robust control problems of BSDEs coupled with value functions

no code implementations23 Aug 2022 Zhou Yang, Jing Zhang, Chao Zhou

A robust control problem is considered in this paper, where the controlled stochastic differential equations (SDEs) include ambiguity parameters and their coefficients satisfy non-Lipschitz continuous and non-linear growth conditions, the objective function is expressed as a backward stochastic differential equation (BSDE) with the generator depending on the value function.

Generalised Co-Salient Object Detection

no code implementations20 Aug 2022 Jiawei Liu, Jing Zhang, Ruikai Cui, Kaihao Zhang, Weihao Li, Nick Barnes

We propose a new setting that relaxes an assumption in the conventional Co-Salient Object Detection (CoSOD) setting by allowing the presence of "noisy images" which do not show the shared co-salient object.

Co-Salient Object Detection Object +3

Transformer Networks for Predictive Group Elevator Control

no code implementations15 Aug 2022 Jing Zhang, Athanasios Tsiligkaridis, Hiroshi Taguchi, Arvind Raghunathan, Daniel Nikovski

We propose a Predictive Group Elevator Scheduler by using predictive information of passengers arrivals from a Transformer based destination predictor and a linear regression model that predicts remaining time to destinations.

regression

Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model

2 code implementations8 Aug 2022 Di Wang, Qiming Zhang, Yufei Xu, Jing Zhang, Bo Du, DaCheng Tao, Liangpei Zhang

Large-scale vision foundation models have made significant progress in visual tasks on natural images, with vision transformers being the primary choice due to their good scalability and representation ability.

Aerial Scene Classification Few-Shot Learning +2

Subtype-Former: a deep learning approach for cancer subtype discovery with multi-omics data

no code implementations28 Jul 2022 Hai Yang, Yuhang Sheng, Yi Jiang, Xiaoyang Fang, Dongdong Li, Jing Zhang, Zhe Wang

In addition, Subtype-Former also achieved outstanding results in pan-cancer subtyping, which can help analyze the commonalities and differences across various cancer types at the molecular level.

Survival Analysis

MeshMAE: Masked Autoencoders for 3D Mesh Data Analysis

no code implementations20 Jul 2022 Yaqian Liang, Shanshan Zhao, Baosheng Yu, Jing Zhang, Fazhi He

We first randomly mask some patches of the mesh and feed the corrupted mesh into Mesh Transformers.

FakeCLR: Exploring Contrastive Learning for Solving Latent Discontinuity in Data-Efficient GANs

1 code implementation18 Jul 2022 Ziqiang Li, Chaoyue Wang, Heliang Zheng, Jing Zhang, Bin Li

Since data augmentation strategies have largely alleviated the training instability, how to further improve the generative performance of DE-GANs becomes a hotspot.

Contrastive Learning Data Augmentation

JPerceiver: Joint Perception Network for Depth, Pose and Layout Estimation in Driving Scenes

1 code implementation16 Jul 2022 Haimei Zhao, Jing Zhang, Sen Zhang, DaCheng Tao

A naive way is to accomplish them independently in a sequential or parallel manner, but there are many drawbacks, i. e., 1) the depth and VO results suffer from the inherent scale ambiguity issue; 2) the BEV layout is directly predicted from the front-view image without using any depth-related information, although the depth map contains useful geometry clues for inferring scene layouts.

Autonomous Driving Depth Estimation +3

Transformer-based Context Condensation for Boosting Feature Pyramids in Object Detection

no code implementations14 Jul 2022 Zhe Chen, Jing Zhang, Yufei Xu, DaCheng Tao

Current object detectors typically have a feature pyramid (FP) module for multi-level feature fusion (MFF) which aims to mitigate the gap between features from different levels and form a comprehensive object representation to achieve better detection performance.

object-detection Object Detection

Audio-Visual Segmentation

1 code implementation11 Jul 2022 Jinxing Zhou, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, Jing Zhang, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng Wang, Yiran Zhong

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

Segmentation

DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Transformer

3 code implementations10 Jul 2022 Maoyuan Ye, Jing Zhang, Shanshan Zhao, Juhua Liu, Bo Du, DaCheng Tao

However, these methods built upon detection transformer framework might achieve sub-optimal training efficiency and performance due to coarse positional query modeling. In addition, the point label form exploited in previous works implies the reading order of humans, which impedes the detection robustness from our observation.

Inductive Bias Scene Text Detection +1

A State Transition Model for Mobile Notifications via Survival Analysis

no code implementations7 Jul 2022 Yiping Yuan, Jing Zhang, Shaunak Chatterjee, Shipeng Yu, Romer Rosales

In particular, we provide an online use case on notification delivery time optimization to show how we make better decisions, drive more user engagement, and provide more value to users.

Decision Making Survival Analysis

Re-weighting Negative Samples for Model-Agnostic Matching

no code implementations6 Jul 2022 Jiazhen Lou, Hong Wen, Fuyu Lv, Jing Zhang, Tengfei Yuan, Zhao Li

Recommender Systems (RS), as an efficient tool to discover users' interested items from a very large corpus, has attracted more and more attention from academia and industry.

Multi-Task Learning Recommendation Systems

Explicit Boundary Guided Semi-Push-Pull Contrastive Learning for Supervised Anomaly Detection

1 code implementation CVPR 2023 Xincheng Yao, Ruoqi Li, Jing Zhang, Jun Sun, Chongyang Zhang

In this way, our model can form a more explicit and discriminative decision boundary to distinguish known and also unseen anomalies from normal samples more effectively.

Ranked #3 on Supervised Anomaly Detection on MVTec AD (using extra training data)

Contrastive Learning Supervised Anomaly Detection

CLAMP: Prompt-based Contrastive Learning for Connecting Language and Animal Pose

1 code implementation CVPR 2023 Xu Zhang, Wen Wang, Zhe Chen, Yufei Xu, Jing Zhang, DaCheng Tao

Motivated by the progress of visual-language research, we propose that pre-trained language models (e. g., CLIP) can facilitate animal pose estimation by providing rich prior knowledge for describing animal keypoints in text.

Animal Pose Estimation Contrastive Learning

Knowledge Learning with Crowdsourcing: A Brief Review and Systematic Perspective

no code implementations19 Jun 2022 Jing Zhang

Big data have the characteristics of enormous volume, high velocity, diversity, value-sparsity, and uncertainty, which lead the knowledge learning from them full of challenges.

APT-36K: A Large-scale Benchmark for Animal Pose Estimation and Tracking

4 code implementations12 Jun 2022 Yuxiang Yang, Junjie Yang, Yufei Xu, Jing Zhang, Long Lan, DaCheng Tao

Based on APT-36K, we benchmark several representative models on the following three tracks: (1) supervised animal pose estimation on a single frame under intra- and inter-domain transfer learning settings, (2) inter-species domain generalization test for unseen animals, and (3) animal pose estimation with animal tracking.

Animal Pose Estimation Domain Generalization +1

Toward Real-world Single Image Deraining: A New Benchmark and Beyond

1 code implementation11 Jun 2022 Wei Li, Qiming Zhang, Jing Zhang, Zhen Huang, Xinmei Tian, DaCheng Tao

To address these issues, we establish a new high-quality dataset named RealRain-1k, consisting of $1, 120$ high-resolution paired clean and rainy images with low- and high-density rain streaks, respectively.

Domain Generalization Image Restoration +2

Referring Image Matting

1 code implementation CVPR 2023 Jizhizi Li, Jing Zhang, DaCheng Tao

Different from conventional image matting, which either requires user-defined scribbles/trimap to extract a specific foreground object or directly extracts all the foreground objects in the image indiscriminately, we introduce a new task named Referring Image Matting (RIM) in this paper, which aims to extract the meticulous alpha matte of the specific object that best matches the given natural language description, thus enabling a more natural and simpler instruction for image matting.

Domain Generalization Image Matting +5

Towards Deeper Understanding of Camouflaged Object Detection

1 code implementation23 May 2022 Yunqiu Lv, Jing Zhang, Yuchao Dai, Aixuan Li, Nick Barnes, Deng-Ping Fan

With the above understanding about camouflaged objects, we present the first triple-task learning framework to simultaneously localize, segment, and rank camouflaged objects, indicating the conspicuousness level of camouflage.

Object object-detection +1

Salient Object Detection via Bounding-box Supervision

no code implementations11 May 2022 Mengqi He, Jing Zhang, Wenxin Yu

However, as a large amount of background is excluded, the foreground bounding box region contains a less complex background, making it possible to perform handcrafted features-based saliency detection with only the cropped foreground region.

Object object-detection +3

From heavy rain removal to detail restoration: A faster and better network

1 code implementation7 May 2022 Yuanbo Wen, Tao Gao, Jing Zhang, Kaihao Zhang, Ting Chen

This approach comprises two key modules, a rain streaks removal network (R$^2$Net) focusing on accurate rain removal, and a details reconstruction network (DRNet) designed to recover the textural details of rain-free images.

Rain Removal

DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers

no code implementations CVPR 2022 Xianing Chen, Qiong Cao, Yujie Zhong, Jing Zhang, Shenghua Gao, DaCheng Tao

Our DearKD is a two-stage framework that first distills the inductive biases from the early intermediate layers of a CNN and then gives the transformer full play by training without distillation.

Knowledge Distillation

ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation

5 code implementations26 Apr 2022 Yufei Xu, Jing Zhang, Qiming Zhang, DaCheng Tao

In this paper, we show the surprisingly good capabilities of plain vision transformers for pose estimation from various aspects, namely simplicity in model structure, scalability in model size, flexibility in training paradigm, and transferability of knowledge between models, through a simple baseline model called ViTPose.

 Ranked #1 on Pose Estimation on COCO test-dev (using extra training data)

2D Human Pose Estimation Keypoint Detection

An Energy-Based Prior for Generative Saliency

1 code implementation19 Apr 2022 Jing Zhang, Jianwen Xie, Nick Barnes, Ping Li

We propose a novel generative saliency prediction framework that adopts an informative energy-based model as a prior distribution.

object-detection RGB-D Salient Object Detection +3

VSA: Learning Varied-Size Window Attention in Vision Transformers

2 code implementations18 Apr 2022 Qiming Zhang, Yufei Xu, Jing Zhang, DaCheng Tao

Attention within windows has been widely explored in vision transformers to balance the performance, computation complexity, and memory footprint.

Instance Segmentation Object Detection +1

A Comprehensive Survey on Data-Efficient GANs in Image Generation

no code implementations18 Apr 2022 Ziqiang Li, Beihao Xia, Jing Zhang, Chaoyue Wang, Bin Li

Generative Adversarial Networks (GANs) have achieved remarkable achievements in image synthesis.

Image Generation

An Empirical Study of Remote Sensing Pretraining

2 code implementations6 Apr 2022 Di Wang, Jing Zhang, Bo Du, Gui-Song Xia, DaCheng Tao

To this end, we train different networks from scratch with the help of the largest RS scene recognition dataset up to now -- MillionAID, to obtain a series of RS pretrained backbones, including both convolutional neural networks (CNN) and vision transformers such as Swin and ViTAE, which have shown promising performance on computer vision tasks.

Aerial Scene Classification Building change detection for remote sensing images +5

BMD: A General Class-balanced Multicentric Dynamic Prototype Strategy for Source-free Domain Adaptation

1 code implementation6 Apr 2022 Sanqing Qu, Guang Chen, Jing Zhang, Zhijun Li, wei he, DaCheng Tao

Source-free Domain Adaptation (SFDA) aims to adapt a pre-trained source model to the unlabeled target domain without accessing the well-labeled source data, which is a much more practical setting due to the data privacy, security, and transmission issues.

Clustering Pseudo Label +1

Dynamic Focus-aware Positional Queries for Semantic Segmentation

2 code implementations CVPR 2023 Haoyu He, Jianfei Cai, Zizheng Pan, Jing Liu, Jing Zhang, DaCheng Tao, Bohan Zhuang

In this paper, we propose a simple yet effective query design for semantic segmentation termed Dynamic Focus-aware Positional Queries (DFPQ), which dynamically generates positional queries conditioned on the cross-attention scores from the preceding decoder block and the positional encodings for the corresponding image features, simultaneously.

Semantic Segmentation

Rethinking Portrait Matting with Privacy Preserving

1 code implementation31 Mar 2022 Sihan Ma, Jizhizi Li, Jing Zhang, He Zhang, DaCheng Tao

P3M-10k consists of 10, 421 high resolution face-blurred portrait images along with high-quality alpha mattes, which enables us to systematically evaluate both trimap-free and trimap-based matting methods and obtain some useful findings about model generalization ability under the privacy preserving training (PPT) setting.

Domain Generalization Image Matting +1

Learning Affordance Grounding from Exocentric Images

2 code implementations CVPR 2022 Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, DaCheng Tao

To empower an agent with such ability, this paper proposes a task of affordance grounding from exocentric view, i. e., given exocentric human-object interaction and egocentric object images, learning the affordance knowledge of the object and transferring it to the egocentric image using only the affordance label as supervision.

Human-Object Interaction Detection Object +1

AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation

no code implementations18 Mar 2022 Di You, Fenglin Liu, Shen Ge, Xiaoxia Xie, Jing Zhang, Xian Wu

The acquired disease-grounded visual features can better represent the abnormal regions of the input image, which could alleviate data bias problem; 2) MGT module effectively uses the multi-grained features and Transformer framework to generate the long medical report.

Descriptive Image Captioning +1

Towards Data-Efficient Detection Transformers

2 code implementations17 Mar 2022 Wen Wang, Jing Zhang, Yang Cao, Yongliang Shen, DaCheng Tao

Besides, we introduce a simple yet effective label augmentation method to provide richer supervision and improve data efficiency.

Information-Theoretic Odometry Learning

no code implementations11 Mar 2022 Sen Zhang, Jing Zhang, DaCheng Tao

In this paper, we propose a unified information theoretic framework for learning-motivated methods aimed at odometry estimation, a crucial component of many robotics and vision tasks such as navigation and virtual reality where relative camera poses are required in real time.

Towards Scale Consistent Monocular Visual Odometry by Learning from the Virtual World

no code implementations11 Mar 2022 Sen Zhang, Jing Zhang, DaCheng Tao

In this work, we propose VRVO, a novel framework for retrieving the absolute scale from virtual data that can be easily obtained from modern simulation environments, whereas in the real domain no stereo or ground-truth data are required in either the training or inference phases.

Monocular Visual Odometry

ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond

4 code implementations21 Feb 2022 Qiming Zhang, Yufei Xu, Jing Zhang, DaCheng Tao

Vision transformers have shown great potential in various computer vision tasks owing to their strong capability to model long-range dependency using the self-attention mechanism.

Image Classification Inductive Bias

Deep Interest Highlight Network for Click-Through Rate Prediction in Trigger-Induced Recommendation

1 code implementation5 Feb 2022 Qijie Shen, Hong Wen, Wanjie Tao, Jing Zhang, Fuyu Lv, Zulong Chen, Zhao Li

In many classical e-commerce platforms, personalized recommendation has been proven to be of great business value, which can improve user satisfaction and increase the revenue of platforms.

Click-Through Rate Prediction

SASA: Semantics-Augmented Set Abstraction for Point-based 3D Object Detection

1 code implementation6 Jan 2022 Chen Chen, Zhe Chen, Jing Zhang, DaCheng Tao

We observe that the prevailing set abstraction design for down-sampling points may maintain too much unimportant background information that can affect feature learning for detecting objects.

3D Object Detection object-detection

Exemplar-free Class Incremental Learning via Discriminative and Comparable One-class Classifiers

1 code implementation5 Jan 2022 Wenju Sun, Qingyong Li, Jing Zhang, Danyu Wang, Wen Wang, Yangli-ao Geng

DisCOIL follows the basic principle of POC, but it adopts variational auto-encoders (VAE) instead of other well-established one-class classifiers (e. g. deep SVDD), because a trained VAE can not only identify the probability of an input sample belonging to a class but also generate pseudo samples of the class to assist in learning new tasks.

Class Incremental Learning Incremental Learning +1

ISNet: Shape Matters for Infrared Small Target Detection

1 code implementation CVPR 2022 Mingjin Zhang, Rui Zhang, Yuxiang Yang, Haichen Bai, Jing Zhang, Jie Guo

TOAA block calculates the low-level information with attention mechanism in both row and column directions and fuses it with the high-level information to capture the shape characteristic of targets and suppress noises.

Management

3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds

no code implementations CVPR 2022 Daigang Cai, Lichen Zhao, Jing Zhang, Lu Sheng, Dong Xu

Observing that the 3D captioning task and the 3D grounding task contain both shared and complementary information in nature, in this work, we propose a unified framework to jointly solve these two distinct but closely related tasks in a synergistic fashion, which consists of both shared task-agnostic modules and lightweight task-specific modules.

Attribute Dense Captioning +1

Siamese Network with Interactive Transformer for Video Object Segmentation

1 code implementation28 Dec 2021 Meng Lan, Jing Zhang, Fengxiang He, Lefei Zhang

Semi-supervised video object segmentation (VOS) refers to segmenting the target object in remaining frames given its annotation in the first frame, which has been actively studied in recent years.

Object Semantic Segmentation +2

Semi-supervised Salient Object Detection with Effective Confidence Estimation

no code implementations28 Dec 2021 Jiawei Liu, Jing Zhang, Nick Barnes

We study semi-supervised salient object detection, with access to a small number of labeled samples and a large number of unlabeled samples.

Object object-detection +3

MetaCVR: Conversion Rate Prediction via Meta Learning in Small-Scale Recommendation Scenarios

no code implementations27 Dec 2021 Xiaofeng Pan, Ming Li, Jing Zhang, Keren Yu, Luping Wang, Hong Wen, Chengjun Mao, Bo Cao

At last, we develop an Ensemble Prediction Network (EPN) which incorporates the output of FRN and DMN to make the final CVR prediction.

Meta-Learning

Learning Generative Vision Transformer with Energy-Based Latent Space for Saliency Prediction

no code implementations NeurIPS 2021 Jing Zhang, Jianwen Xie, Nick Barnes, Ping Li

In this paper, we take a step further by proposing a novel generative vision transformer with latent variables following an informative energy-based prior for salient object detection.

object-detection RGB-D Salient Object Detection +3

Visual Semantics Allow for Textual Reasoning Better in Scene Text Recognition

1 code implementation AAAI 2022 2021 Yue He, Chen Chen, Jing Zhang, Juhua Liu, Fengxiang He, Chaoyue Wang, Bo Du

Technically, given the character segmentation maps predicted by a VR model, we construct a subgraph for each instance, where nodes represent the pixels in it and edges are added between nodes based on their spatial similarity.

Ranked #9 on Scene Text Recognition on ICDAR2015 (using extra training data)

Language Modelling Scene Text Recognition

Injecting Numerical Reasoning Skills into Knowledge Base Question Answering Models

1 code implementation12 Dec 2021 Yu Feng, Jing Zhang, Xiaokang Zhang, Lemao Liu, Cuiping Li, Hong Chen

Embedding-based methods are popular for Knowledge Base Question Answering (KBQA), but few current models have numerical reasoning skills and thus struggle to answer ordinal constrained questions.

Data Augmentation Knowledge Base Question Answering

Recurrent Glimpse-based Decoder for Detection with Transformer

1 code implementation CVPR 2022 Zhe Chen, Jing Zhang, DaCheng Tao

Then, a glimpse-based decoder is introduced to provide refined detection results based on both the glimpse features and the attention modeling outputs of the previous stage.

 Ranked #1 on Object Detection on MS COCO (GFlops metric)

Object Detection

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

2 code implementations6 Dec 2021 Kaustubh D. Dhole, Varun Gangal, Sebastian Gehrmann, Aadesh Gupta, Zhenhao Li, Saad Mahamood, Abinaya Mahendiran, Simon Mille, Ashish Shrivastava, Samson Tan, Tongshuang Wu, Jascha Sohl-Dickstein, Jinho D. Choi, Eduard Hovy, Ondrej Dusek, Sebastian Ruder, Sajant Anand, Nagender Aneja, Rabin Banjade, Lisa Barthe, Hanna Behnke, Ian Berlot-Attwell, Connor Boyle, Caroline Brun, Marco Antonio Sobrevilla Cabezudo, Samuel Cahyawijaya, Emile Chapuis, Wanxiang Che, Mukund Choudhary, Christian Clauss, Pierre Colombo, Filip Cornell, Gautier Dagan, Mayukh Das, Tanay Dixit, Thomas Dopierre, Paul-Alexis Dray, Suchitra Dubey, Tatiana Ekeinhor, Marco Di Giovanni, Tanya Goyal, Rishabh Gupta, Louanes Hamla, Sang Han, Fabrice Harel-Canada, Antoine Honore, Ishan Jindal, Przemyslaw K. Joniak, Denis Kleyko, Venelin Kovatchev, Kalpesh Krishna, Ashutosh Kumar, Stefan Langer, Seungjae Ryan Lee, Corey James Levinson, Hualou Liang, Kaizhao Liang, Zhexiong Liu, Andrey Lukyanenko, Vukosi Marivate, Gerard de Melo, Simon Meoni, Maxime Meyer, Afnan Mir, Nafise Sadat Moosavi, Niklas Muennighoff, Timothy Sum Hon Mun, Kenton Murray, Marcin Namysl, Maria Obedkova, Priti Oli, Nivranshu Pasricha, Jan Pfister, Richard Plant, Vinay Prabhu, Vasile Pais, Libo Qin, Shahab Raji, Pawan Kumar Rajpoot, Vikas Raunak, Roy Rinberg, Nicolas Roberts, Juan Diego Rodriguez, Claude Roux, Vasconcellos P. H. S., Ananya B. Sai, Robin M. Schmidt, Thomas Scialom, Tshephisho Sefara, Saqib N. Shamsi, Xudong Shen, Haoyue Shi, Yiwen Shi, Anna Shvets, Nick Siegel, Damien Sileo, Jamie Simon, Chandan Singh, Roman Sitelew, Priyank Soni, Taylor Sorensen, William Soto, Aman Srivastava, KV Aditya Srivatsa, Tony Sun, Mukund Varma T, A Tabassum, Fiona Anting Tan, Ryan Teehan, Mo Tiwari, Marie Tolkiehn, Athena Wang, Zijian Wang, Gloria Wang, Zijie J. Wang, Fuxuan Wei, Bryan Wilie, Genta Indra Winata, Xinyi Wu, Witold Wydmański, Tianbao Xie, Usama Yaseen, Michael A. Yee, Jing Zhang, Yue Zhang

Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on.

Data Augmentation

A Multi-Strategy based Pre-Training Method for Cold-Start Recommendation

no code implementations4 Dec 2021 Bowen Hao, Hongzhi Yin, Jing Zhang, Cuiping Li, Hong Chen

In terms of the pretext task, in addition to considering the intra-correlations of users and items by the embedding reconstruction task, we add embedding contrastive learning task to capture inter-correlations of users and items.

Contrastive Learning Meta-Learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.