Search Results for author: Jing Zhang

Found 371 papers, 202 papers with code

P-INT: A Path-based Interaction Model for Few-shot Knowledge Graph Completion

no code implementations Findings (EMNLP) 2021 Jingwen Xu, Jing Zhang, Xirui Ke, Yuxiao Dong, Hong Chen, Cuiping Li, Yongbin Liu

Its general process is to first encode the implicit relation of an entity pair and then match the relation of a query entity pair with the relations of the reference entity pairs.

Knowledge Graph Completion Relation

SRCB at SemEval-2022 Task 5: Pretraining Based Image to Text Late Sequential Fusion System for Multimodal Misogynous Meme Identification

no code implementations SemEval (NAACL) 2022 Jing Zhang, Yujin Wang

Online misogyny meme detection is an image/text multimodal classification task, the complicated relation of image and text challenges the intelligent system’s modality fusion learning capability.

HOSMEL: A Hot-Swappable Modularized Entity Linking Toolkit for Chinese

1 code implementation ACL 2022 Daniel Zhang-li, Jing Zhang, Jifan Yu, Xiaokang Zhang, Peng Zhang, Jie Tang, Juanzi Li

We investigate the usage of entity linking (EL)in downstream tasks and present the first modularized EL toolkit for easy task adaptation.

Entity Linking Question Answering

Long-range Sequence Modeling with Predictable Sparse Attention

no code implementations ACL 2022 Yimeng Zhuang, Jing Zhang, Mei Tu

(2) A sparse attention matrix estimation module, which predicts dominant elements of an attention matrix based on the output of the previous hidden state cross module.

Math

IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection

1 code implementation10 Jul 2024 Mingjin Zhang, YuChun Wang, Jie Guo, Yunsong Li, Xinbo Gao, Jing Zhang

The recent Segment Anything Model (SAM) is a significant advancement in natural image segmentation, exhibiting potent zero-shot performance suitable for various downstream image segmentation tasks.

Decoder Image Segmentation +1

Near-Optimal MIMO Detection Using Gradient-Based MCMC in Discrete Spaces

no code implementations8 Jul 2024 Xingyu Zhou, Le Liang, Jing Zhang, Chao-Kai Wen, Shi Jin

The discrete nature of transmitted symbols poses challenges for achieving optimal detection in multiple-input multiple-output (MIMO) systems associated with a large number of antennas.

Heterogeneous Graph Contrastive Learning with Spectral Augmentation

no code implementations30 Jun 2024 Jing Zhang, Xiaoqian Jiang, Yingjie Xie, Cangqi Zhou

The proposed model learns an adaptive topology augmentation scheme through the heterogeneous graph itself, disrupting the structural information of the heterogeneous graph in the spectrum dimension, and ultimately improving the learning effect of the model.

Contrastive Learning Data Augmentation +1

UADSN: Uncertainty-Aware Dual-Stream Network for Facial Nerve Segmentation

no code implementations29 Jun 2024 Guanghao Zhu, Lin Liu, Jing Zhang, Xiaohui Du, Ruqian Hao, Juanxiu Liu

However, since the facial nerve is a tubular organ with a diameter of only 1. 0-1. 5mm, it is challenging to locate and segment the facial nerve in CT scans.

Segmentation

AstMatch: Adversarial Self-training Consistency Framework for Semi-Supervised Medical Image Segmentation

no code implementations28 Jun 2024 Guanghao Zhu, Jing Zhang, Juanxiu Liu, Xiaohui Du, Ruqian Hao, Yong liu, Lin Liu

Semi-supervised learning (SSL) has shown considerable potential in medical image segmentation, primarily leveraging consistency regularization and pseudo-labeling.

Image Segmentation Pseudo Label +3

SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation

no code implementations21 Jun 2024 Zeyao Ma, Bohan Zhang, Jing Zhang, Jifan Yu, Xiaokang Zhang, Xiaohan Zhang, Sijia Luo, Xi Wang, Jie Tang

We introduce SpreadsheetBench, a challenging spreadsheet manipulation benchmark exclusively derived from real-world scenarios, designed to immerse current large language models (LLMs) in the actual workflow of spreadsheet users.

A Learn-Then-Reason Model Towards Generalization in Knowledge Base Question Answering

no code implementations20 Jun 2024 Lingxi Zhang, Jing Zhang, Yanling Wang, Cuiping Li, Hong Chen

In order to improve the generalization capabilities of KBQA models, extensive research has embraced a retrieve-then-reason framework to retrieve relevant evidence for logical expression generation.

Knowledge Base Question Answering Language Modelling +1

PoseBench: Benchmarking the Robustness of Pose Estimation Models under Corruptions

no code implementations20 Jun 2024 Sihan Ma, Jing Zhang, Qiong Cao, DaCheng Tao

We evaluated 60 representative models, including top-down, bottom-up, heatmap-based, regression-based, and classification-based methods, across three datasets for human and animal pose estimation.

Animal Pose Estimation Autonomous Driving +1

Is Your HD Map Constructor Reliable under Sensor Corruptions?

no code implementations18 Jun 2024 Xiaoshuai Hao, Mengchuan Wei, Yifan Yang, Haimei Zhao, HUI ZHANG, Yi Zhou, Qiang Wang, Weiming Li, Lingdong Kong, Jing Zhang

These insights provide a pathway for developing more reliable HD map construction methods, which are essential for the advancement of autonomous driving technology.

Autonomous Driving Data Augmentation

HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

1 code implementation17 Jun 2024 Di Wang, Meiqi Hu, Yao Jin, Yuchun Miao, Jiaqi Yang, Yichu Xu, Xiaolei Qin, Jiaqi Ma, Lingyu Sun, Chenxing Li, Chuan Fu, Hongruixuan Chen, Chengxi Han, Naoto Yokoya, Jing Zhang, Minqiang Xu, Lin Liu, Lefei Zhang, Chen Wu, Bo Du, DaCheng Tao, Liangpei Zhang

To tackle the spectral and spatial redundancy challenges in HSIs, we introduce a novel sparse sampling attention (SSA) mechanism, which effectively promotes the learning of diverse contextual features and serves as the basic block of HyperSIGMA.

R-Eval: A Unified Toolkit for Evaluating Domain Knowledge of Retrieval Augmented Large Language Models

1 code implementation17 Jun 2024 Shangqing Tu, Yuanchun Wang, Jifan Yu, Yuyang Xie, Yaran Shi, Xiaozhi Wang, Jing Zhang, Lei Hou, Juanzi Li

In this paper, we address the challenges of evaluating RALLMs by introducing the R-Eval toolkit, a Python toolkit designed to streamline the evaluation of different RAG workflows in conjunction with LLMs.

RAG Retrieval

Nemotron-4 340B Technical Report

1 code implementation17 Jun 2024 Nvidia, :, Bo Adler, Niket Agarwal, Ashwath Aithal, Dong H. Anh, Pallab Bhattacharya, Annika Brundyn, Jared Casper, Bryan Catanzaro, Sharon Clay, Jonathan Cohen, Sirshak Das, Ayush Dattagupta, Olivier Delalleau, Leon Derczynski, Yi Dong, Daniel Egert, Ellie Evans, Aleksander Ficek, Denys Fridman, Shaona Ghosh, Boris Ginsburg, Igor Gitman, Tomasz Grzegorzek, Robert Hero, Jining Huang, Vibhu Jawa, Joseph Jennings, Aastha Jhunjhunwala, John Kamalu, Sadaf Khan, Oleksii Kuchaiev, Patrick Legresley, Hui Li, Jiwei Liu, Zihan Liu, Eileen Long, Ameya Sunil Mahabaleshwarkar, Somshubra Majumdar, James Maki, Miguel Martinez, Maer Rodrigues de Melo, Ivan Moshkov, Deepak Narayanan, Sean Narenthiran, Jesus Navarro, Phong Nguyen, Osvald Nitski, Vahid Noroozi, Guruprasad Nutheti, Christopher Parisien, Jupinder Parmar, Mostofa Patwary, Krzysztof Pawelec, Wei Ping, Shrimai Prabhumoye, Rajarshi Roy, Trisha Saar, Vasanth Rao Naik Sabavat, Sanjeev Satheesh, Jane Polak Scowcroft, Jason Sewall, Pavel Shamis, Gerald Shen, Mohammad Shoeybi, Dave Sizer, Misha Smelyanskiy, Felipe Soares, Makesh Narsimhan Sreedhar, Dan Su, Sandeep Subramanian, Shengyang Sun, Shubham Toshniwal, Hao Wang, Zhilin Wang, Jiaxuan You, Jiaqi Zeng, Jimmy Zhang, Jing Zhang, Vivienne Zhang, Yian Zhang, Chen Zhu

We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward.

Synthetic Data Generation

Scaling Efficient Masked Autoencoder Learning on Large Remote Sensing Dataset

1 code implementation17 Jun 2024 Fengxiang Wang, Hongzhen Wang, Di Wang, Zonghao Guo, Zhenyu Zhong, Long Lan, Jing Zhang, Zhiyuan Liu, Maosong Sun

To address this, we propose an efficient MIM method, termed \textbf{SelectiveMAE}, which dynamically encodes and reconstructs a subset of patch tokens selected based on their semantic richness.

Catalytic evolution of cooperation in a population with behavioural bimodality

no code implementations17 Jun 2024 Anhui Sheng, Jing Zhang, Guozhong Zheng, Jiqiang Zhang, Weiran Cai, Li Chen

Specifically, we incorporate Q-learning and Tit-for-Tat (TFT) rules into our toy model, where prisoner's dilemma game is played and we investigate the impact of the mode mixture on the evolution of cooperation.

Q-Learning

Hidden Question Representations Tell Non-Factuality Within and Across Large Language Models

no code implementations8 Jun 2024 Yanling Wang, Haoyang Li, Hao Zou, Jing Zhang, Xinlei He, Qi Li, Ke Xu

Despite the remarkable advance of large language models (LLMs), the prevalence of non-factual responses remains a common issue.

Transfer Learning

3rd Place Solution for MOSE Track in CVPR 2024 PVUW workshop: Complex Video Object Segmentation

no code implementations6 Jun 2024 Xinyu Liu, Jing Zhang, Kexin Zhang, Yuting Yang, Licheng Jiao, Shuyuan Yang

Video Object Segmentation (VOS) is a vital task in computer vision, focusing on distinguishing foreground objects from the background across video frames.

Object Position +4

You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet

no code implementations31 May 2024 Zhen Qin, Yuxin Mao, Xuyang Shen, Dong Li, Jing Zhang, Yuchao Dai, Yiran Zhong

Linear attention mechanisms have gained prominence in causal language models due to their linear computational complexity and enhanced speed.

Image Classification Image Generation +1

Diffusion Actor-Critic: Formulating Constrained Policy Iteration as Diffusion Noise Regression for Offline Reinforcement Learning

1 code implementation31 May 2024 Linjiajie Fang, Ruoxue Liu, Jing Zhang, Wenjia Wang, Bing-Yi Jing

In this paper, we propose Diffusion Actor-Critic (DAC) that formulates the Kullback-Leibler (KL) constraint policy iteration as a diffusion noise regression problem, enabling direct representation of target policies as diffusion models.

D4RL Reinforcement Learning (RL)

FlightPatchNet: Multi-Scale Patch Network with Differential Coding for Flight Trajectory Prediction

no code implementations25 May 2024 Lan Wu, Xuebin Wang, Ruijuan Chu, Guangyi Liu, Yingchun Chen, Jing Zhang, Linyu Wang

To address the above issues, we propose FlightPatchNet, a multi-scale patch network with differential coding for flight trajectory prediction.

Trajectory Prediction

A Solution-based LLM API-using Methodology for Academic Information Seeking

1 code implementation24 May 2024 Yuanchun Wang, Jifan Yu, Zijun Yao, Jing Zhang, Yuyang Xie, Shangqing Tu, Yiyang Fu, Youhe Feng, Jinkai Zhang, Jingyao Zhang, Bowen Huang, Yuanyao Li, Huihui Yuan, Lei Hou, Juanzi Li, Jie Tang

Applying large language models (LLMs) for academic API usage shows promise in reducing researchers' academic information seeking efforts.

LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for Remote Sensing Image Interpretation

1 code implementation16 May 2024 Wentao Jiang, Jing Zhang, Di Wang, Qiming Zhang, Zengmao Wang, Bo Du

Experimental results in classification and dense prediction tasks show that LeMeViT has a significant $1. 7 \times$ speedup, fewer parameters, and competitive performance compared to the baseline models, and achieves a better trade-off between efficiency and performance.

OXYGENERATOR: Reconstructing Global Ocean Deoxygenation Over a Century with Deep Learning

no code implementations12 May 2024 Bin Lu, Ze Zhao, Luyu Han, Xiaoying Gan, Yuntao Zhou, Lei Zhou, Luoyi Fu, Xinbing Wang, Chenghu Zhou, Jing Zhang

Accurately reconstructing the global ocean deoxygenation over a century is crucial for assessing and protecting marine ecosystem.

Inductive Bias

TAVGBench: Benchmarking Text to Audible-Video Generation

1 code implementation22 Apr 2024 Yuxin Mao, Xuyang Shen, Jing Zhang, Zhen Qin, Jinxing Zhou, Mochu Xiang, Yiran Zhong, Yuchao Dai

To support research in this field, we have developed a comprehensive Text to Audible-Video Generation Benchmark (TAVGBench), which contains over 1. 7 million clips with a total duration of 11. 8 thousand hours.

Benchmarking Contrastive Learning +1

LLMTune: Accelerate Database Knob Tuning with Large Language Models

1 code implementation17 Apr 2024 Xinmei Huang, Haoyang Li, Jing Zhang, Xinxin Zhao, Zhiming Yao, Yiyan Li, Zhuohao Yu, Tieying Zhang, Hong Chen, Cuiping Li

Database knob tuning is a critical challenge in the database community, aiming to optimize knob values to enhance database performance for specific workloads.

Language Modelling Large Language Model

Transferable and Efficient Non-Factual Content Detection via Probe Training with Offline Consistency Checking

1 code implementation10 Apr 2024 Xiaokang Zhang, Zijun Yao, Jing Zhang, Kaifeng Yun, Jifan Yu, Juanzi Li, Jie Tang

Detecting non-factual content is a longstanding goal to increase the trustworthiness of large language models (LLMs) generations.

Question Answering

UniMix: Towards Domain Adaptive and Generalizable LiDAR Semantic Segmentation in Adverse Weather

no code implementations CVPR 2024 Haimei Zhao, Jing Zhang, Zhuo Chen, Shanshan Zhao, DaCheng Tao

We devote UniMix to two main setups: 1) unsupervised domain adaption, adapting the model from the clear weather source domain to the adverse weather target domain; 2) domain generalization, learning a model that generalizes well to unseen scenes in adverse weather.

Autonomous Driving Domain Generalization +2

Latent-based Diffusion Model for Long-tailed Recognition

1 code implementation6 Apr 2024 Pengxiao Han, Changkun Ye, Jieming Zhou, Jing Zhang, Jie Hong, Xuesong Li

We propose a new approach, the Latent-based Diffusion Model for Long-tailed Recognition (LDMLR), as a feature augmentation method to tackle the issue.

Denoising Transfer Learning

A Cause-Effect Look at Alleviating Hallucination of Knowledge-grounded Dialogue Generation

no code implementations4 Apr 2024 Jifan Yu, Xiaohan Zhang, Yifan Xu, Xuanyu Lei, Zijun Yao, Jing Zhang, Lei Hou, Juanzi Li

Recently, knowledge-grounded dialogue generation models, that intentionally invoke external knowledge resources to more informative responses, are also proven to be effective in reducing hallucination.

counterfactual Counterfactual Reasoning +2

RaFE: Generative Radiance Fields Restoration

no code implementations4 Apr 2024 Zhongkai Wu, Ziyu Wan, Jing Zhang, Jing Liao, Dong Xu

Instead of reconstructing a blurred NeRF by averaging inconsistencies, we introduce a novel approach using Generative Adversarial Networks (GANs) for NeRF generation to better accommodate the geometric and appearance inconsistencies present in the multi-view images.

3D Reconstruction Novel View Synthesis

SGSH: Stimulate Large Language Models with Skeleton Heuristics for Knowledge Base Question Generation

1 code implementation2 Apr 2024 Shasha Guo, Lizi Liao, Jing Zhang, Yanling Wang, Cuiping Li, Hong Chen

Knowledge base question generation (KBQG) aims to generate natural language questions from a set of triplet facts extracted from KB.

Question Generation Question-Generation

Streamlining Redundant Layers to Compress Large Language Models

no code implementations28 Mar 2024 Xiaodong Chen, Yuxuan Hu, Jing Zhang, Yanling Wang, Cuiping Li, Hong Chen

This paper introduces LLM-Streamline, a novel layer pruning approach for large language models.

Model Compression

Enhanced Bayesian Personalized Ranking for Robust Hard Negative Sampling in Recommender Systems

no code implementations28 Mar 2024 Kexin Shi, Jing Zhang, Linjiajie Fang, Wenjia Wang, BingYi Jing

In implicit collaborative filtering, hard negative mining techniques are developed to accelerate and enhance the recommendation model learning.

Collaborative Filtering Recommendation Systems

TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios

1 code implementation28 Mar 2024 Xiaokang Zhang, Jing Zhang, Zeyao Ma, Yang Li, Bohan Zhang, Guanlin Li, Zijun Yao, Kangli Xu, Jinchang Zhou, Daniel Zhang-li, Jifan Yu, Shu Zhao, Juanzi Li, Jie Tang

We introduce TableLLM, a robust large language model (LLM) with 13 billion parameters, purpose-built for proficiently handling tabular data manipulation tasks, whether they are embedded within documents or spreadsheets, catering to real-world office scenarios.

Language Modelling Large Language Model

Contact-aware Human Motion Generation from Textual Descriptions

no code implementations23 Mar 2024 Sihan Ma, Qiong Cao, Jing Zhang, DaCheng Tao

This paper addresses the problem of generating 3D interactive human motion from text.

Motion Synthesis

Learning Gaussian Representation for Eye Fixation Prediction

no code implementations21 Mar 2024 Peipei Song, Jing Zhang, Piotr Koniusz, Nick Barnes

Existing eye fixation prediction methods perform the mapping from input images to the corresponding dense fixation maps generated from raw fixation points.

LUWA Dataset: Learning Lithic Use-Wear Analysis on Microscopic Images

no code implementations CVPR 2024 Jing Zhang, Irving Fang, Juexiao Zhang, Hao Wu, Akshat Kaushik, Alice Rodriguez, Hanwen Zhao, Zhuo Zheng, Radu Iovita, Chen Feng

Most importantly, the LUWA dataset provides an underexplored opportunity for vision and learning communities and complements existing image classification problems on common objects.

Few-Shot Learning Image Classification

Open-World Semi-Supervised Learning for Node Classification

1 code implementation18 Mar 2024 Yanling Wang, Jing Zhang, Lingxi Zhang, Lixin Liu, Yuxiao Dong, Cuiping Li, Hong Chen, Hongzhi Yin

Open-world semi-supervised learning (Open-world SSL) for node classification, that classifies unlabeled nodes into seen classes or multiple novel classes, is a practical but under-explored problem in the graph community.

Classification Contrastive Learning +2

Training A Small Emotional Vision Language Model for Visual Art Comprehension

2 code implementations17 Mar 2024 Jing Zhang, Liang Zheng, Meng Wang, Dan Guo

This paper develops small vision language models to understand visual art, which, given an art work, aims to identify its emotion category and explain this prediction with natural language.

Language Modelling

Reverse That Number! Decoding Order Matters in Arithmetic Learning

no code implementations9 Mar 2024 Daniel Zhang-li, Nianyi Lin, Jifan Yu, Zheyuan Zhang, Zijun Yao, Xiaokang Zhang, Lei Hou, Jing Zhang, Juanzi Li

Recent advancements in pretraining have demonstrated that modern Large Language Models (LLMs) possess the capability to effectively learn arithmetic operations.

CodeS: Towards Building Open-source Language Models for Text-to-SQL

1 code implementation26 Feb 2024 Haoyang Li, Jing Zhang, Hanbing Liu, Ju Fan, Xiaokang Zhang, Jun Zhu, Renjie Wei, Hongyan Pan, Cuiping Li, Hong Chen

To address the limitations, we introduce CodeS, a series of pre-trained language models with parameters ranging from 1B to 15B, specifically designed for the text-to-SQL task.

Data Augmentation Domain Adaptation +2

Question Calibration and Multi-Hop Modeling for Temporal Question Answering

no code implementations20 Feb 2024 Chao Xue, Di Liang, Pengfei Wang, Jing Zhang

In the real world, many facts contained in KGs are time-constrained thus temporal KGQA has received increasing attention.

Knowledge Graphs Multi-hop Question Answering +1

LogicPrpBank: A Corpus for Logical Implication and Equivalence

1 code implementation14 Feb 2024 Zhexiong Liu, Jing Zhang, Jiaying Lu, Wenjing Ma, Joyce C Ho

Logic reasoning has been critically needed in problem-solving and decision-making.

Decision Making

RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization

no code implementations8 Feb 2024 Zhikai Li, Xuewen Liu, Jing Zhang, Qingyi Gu

In particular, for the former, we introduce a learnable per-channel dual clipping scheme, which is designed to efficiently identify outliers in the unbalanced activations with fine granularity.

Quantization

BioDrone: A Bionic Drone-based Single Object Tracking Benchmark for Robust Vision

no code implementations7 Feb 2024 Xin Zhao, Shiyu Hu, Yipei Wang, Jing Zhang, Yimin Hu, Rongshuai Liu, Haibin Ling, Yin Li, Renshu Li, Kun Liu, Jiadong Li

These challenges are especially manifested in videos captured by unmanned aerial vehicles (UAV), where the target is usually far away from the camera and often with significant motion relative to the camera.

Autonomous Driving Object Tracking +1

Large Language Model for Table Processing: A Survey

no code implementations4 Feb 2024 Weizheng Lu, Jiaming Zhang, Jing Zhang, Yueguo Chen

Tables, typically two-dimensional and structured to store large amounts of data, are essential in daily activities like database queries, spreadsheet calculations, and generating reports from web tables.

Fact Verification Language Modelling +2

Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning

1 code implementation1 Feb 2024 Jitao Sang, Yuhang Wang, Jing Zhang, Yanxu Zhu, Chao Kong, Junhong Ye, Shuyu Wei, Jinlin Xiao

In the first phase, based on human supervision, the quality of weak supervision is enhanced through a combination of scalable oversight and ensemble learning, reducing the capability gap between weak teachers and strong students.

Ensemble Learning In-Context Learning

Are Synthetic Time-series Data Really not as Good as Real Data?

no code implementations1 Feb 2024 Fanzhe Fu, Junru Chen, Jing Zhang, Carl Yang, Lvbin Ma, Yang Yang

Time-series data presents limitations stemming from data quality issues, bias and vulnerabilities, and generalization problem.

Representation Learning Time Series

Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation

1 code implementation31 Jan 2024 Maoyuan Ye, Jing Zhang, Juhua Liu, Chenyu Liu, BaoCai Yin, Cong Liu, Bo Du, DaCheng Tao

In terms of the AMG mode, Hi-SAM segments text stroke foreground masks initially, then samples foreground points for hierarchical text mask generation and achieves layout analysis in passing.

Hierarchical Text Segmentation Segmentation +1

Data-Free Generalized Zero-Shot Learning

no code implementations28 Jan 2024 Bowen Tang, Long Yan, Jing Zhang, Qian Yu, Lu Sheng, Dong Xu

Firstly, to recover the virtual features of the base data, we model the CLIP features of base class images as samples from a von Mises-Fisher (vMF) distribution based on the pre-trained classifier.

Generalized Zero-Shot Learning Zero-shot Generalization

GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching

1 code implementation13 Jan 2024 Haibin He, Maoyuan Ye, Jing Zhang, Juhua Liu, DaCheng Tao

In response to this issue, we propose to efficiently turn an off-the-shelf query-based image text spotter into a specialist on video and present a simple baseline termed GoMatching, which focuses the training efforts on tracking while maintaining strong recognition performance.

Text Detection Text Spotting

Automated Detection of Myopic Maculopathy in MMAC 2023: Achievements in Classification, Segmentation, and Spherical Equivalent Prediction

1 code implementation8 Jan 2024 Yihao Li, Philippe Zhang, Yubo Tan, Jing Zhang, Zhihan Wang, Weili Jiang, Pierre-Henri Conze, Mathieu Lamard, Gwenolé Quellec, Mostafa El Habib Daho

As for Task 3 (prediction of spherical equivalent), we have designed a deep regression model based on the data distribution of the dataset and employed an integration strategy to enhance the model's prediction accuracy.

Classification Contrastive Learning +3

Robust single-particle cryo-EM image denoising and restoration

no code implementations2 Jan 2024 Jing Zhang, Tengfei Zhao, Shiyu Hu, Xin Zhao

Cryo-electron microscopy (cryo-EM) has achieved near-atomic level resolution of biomolecules by reconstructing 2D micrographs.

Image Denoising

SVGDreamer: Text Guided SVG Generation with Diffusion Model

1 code implementation CVPR 2024 XiMing Xing, Haitao Zhou, Chuang Wang, Jing Zhang, Dong Xu, Qian Yu

However, existing text-to-SVG generation methods lack editability and struggle with visual quality and result diversity.

Diversity Vector Graphics

Multi-modality Affinity Inference for Weakly Supervised 3D Semantic Segmentation

1 code implementation27 Dec 2023 Xiawei Li, Qingyuan Xu, Jing Zhang, Tianyi Zhang, Qian Yu, Lu Sheng, Dong Xu

The point affinity proposed in this paper is characterized by features from multiple modalities (e. g., point cloud and RGB), and is further refined by normalizing the classifier weights to alleviate the detrimental effects of long-tailed distribution without the need of the prior of category distribution.

3D Semantic Segmentation Point Cloud Segmentation +1

APTv2: Benchmarking Animal Pose Estimation and Tracking with a Large-scale Dataset and Beyond

1 code implementation25 Dec 2023 Yuxiang Yang, Yingqi Deng, Yufei Xu, Jing Zhang

Animal Pose Estimation and Tracking (APT) is a critical task in detecting and monitoring the keypoints of animals across a series of video frames, which is essential for understanding animal behavior.

Animal Pose Estimation Benchmarking +3

SurgicalPart-SAM: Part-to-Whole Collaborative Prompting for Surgical Instrument Segmentation

2 code implementations22 Dec 2023 Wenxi Yue, Jing Zhang, Kun Hu, Qiuxia Wu, ZongYuan Ge, Yong Xia, Jiebo Luo, Zhiyong Wang

Specifically, we achieve this by proposing (1) Collaborative Prompts that describe instrument structures via collaborating category-level and part-level texts; (2) Cross-Modal Prompt Encoder that encodes text prompts jointly with visual embeddings into discriminative part-level representations; and (3) Part-to-Whole Adaptive Fusion and Hierarchical Decoding that adaptively fuse the part-level representations into a whole for accurate instrument segmentation in surgical scenarios.

Segmentation Semantic Segmentation

LaViP:Language-Grounded Visual Prompts

no code implementations18 Dec 2023 Nilakshan Kunananthaseelan, Jing Zhang, Mehrtash Harandi

We introduce a language-grounded visual prompting method to adapt the visual encoder of vision-language models for downstream tasks.

Few-Shot Learning Transfer Learning +1

HandRefiner: Refining Malformed Hands in Generated Images by Diffusion-based Conditional Inpainting

1 code implementation29 Nov 2023 Wenquan Lu, Yufei Xu, Jing Zhang, Chaoyue Wang, DaCheng Tao

Given a generated failed image due to malformed hands, we utilize ControlNet modules to re-inject such correct hand information.

ArGue: Attribute-Guided Prompt Tuning for Vision-Language Models

no code implementations CVPR 2024 Xinyu Tian, Shu Zou, Zhaoyuan Yang, Jing Zhang

Although soft prompt tuning is effective in efficiently adapting Vision-Language (V&L) models for downstream tasks, it shows limitations in dealing with distribution shifts.

Attribute Out-of-Distribution Generalization

Low-Complexity Joint Beamforming for RIS-Assisted MU-MISO Systems Based on Model-Driven Deep Learning

no code implementations26 Nov 2023 Weijie Jin, Jing Zhang, Chao-Kai Wen, Shi Jin, Xiao Li, Shuangfeng Han

Reconfigurable intelligent surfaces (RIS) can improve signal propagation environments by adjusting the phase of the incident signal.

Stochastic Optimization

DA-STC: Domain Adaptive Video Semantic Segmentation via Spatio-Temporal Consistency

1 code implementation22 Nov 2023 Zhe Zhang, Gaochang Wu, Jing Zhang, Chunhua Shen, DaCheng Tao, Tianyou Chai

To solve the challenge, we propose a novel DA-STC method for domain adaptive video semantic segmentation, which incorporates a bidirectional multi-level spatio-temporal fusion module and a category-aware spatio-temporal feature alignment module to facilitate consistent learning for domain-invariant features.

Representation Learning Segmentation +2

AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation

1 code implementation13 Nov 2023 Junyang Wang, Yuhang Wang, Guohai Xu, Jing Zhang, Yukai Gu, Haitao Jia, Jiaqi Wang, Haiyang Xu, Ming Yan, Ji Zhang, Jitao Sang

Despite making significant progress in multi-modal tasks, current Multi-modal Large Language Models (MLLMs) encounter the significant challenge of hallucinations, which may lead to harmful consequences.

Attribute Hallucination +2

IMPUS: Image Morphing with Perceptually-Uniform Sampling Using Diffusion Models

1 code implementation12 Nov 2023 Zhaoyuan Yang, Zhengyang Yu, Zhiwei Xu, Jaskirat Singh, Jing Zhang, Dylan Campbell, Peter Tu, Richard Hartley

We present a diffusion-based image morphing approach with perceptually-uniform sampling (IMPUS) that produces smooth, direct and realistic interpolations given an image pair.

Diversity Image Generation +1

PT-Tuning: Bridging the Gap between Time Series Masked Reconstruction and Forecasting via Prompt Token Tuning

no code implementations7 Nov 2023 Hao liu, Jinrui Gan, Xiaoxuan Fan, Yi Zhang, Chuanxian Luo, Jing Zhang, Guangxin Jiang, Yucheng Qian, Changwei Zhao, Huan Ma, Zhenyu Guo

In this paper, we first point out that the unification of task objectives and adaptation for task difficulty are critical for bridging the gap between time series masked reconstruction and forecasting.

Decoder Representation Learning +2

Multimodal Variational Auto-encoder based Audio-Visual Segmentation

1 code implementation ICCV 2023 Yuxin Mao, Jing Zhang, Mochu Xiang, Yiran Zhong, Yuchao Dai

To achieve this, our ECMVAE factorizes the representations of each modality with a modality-shared representation and a modality-specific representation.

Attribute Representation Learning

Decoding trust: A reinforcement learning perspective

no code implementations26 Sep 2023 Guozhong Zheng, Jiqiang Zhang, Jing Zhang, Weiran Cai, Li Chen

In the pairwise scenario, we reveal that high levels of trust and trustworthiness emerge when individuals appreciate both their historical experience and returns in the future.

Decision Making Q-Learning +1

Diversifying Question Generation over Knowledge Base via External Natural Questions

no code implementations23 Sep 2023 Shasha Guo, Jing Zhang, Xirui Ke, Cuiping Li, Hong Chen

The above insights make diversifying question generation an intriguing task, where the first challenge is evaluation metrics for diversity.

Diversity Natural Questions +3

Multi-dimension Queried and Interacting Network for Stereo Image Deraining

no code implementations19 Sep 2023 Yuanbo Wen, Tao Gao, ZiQi Li, Jing Zhang, Ting Chen

This module leverages dimension-wise queries that are independent of the input features and employs global context-aware attention (GCA) to capture essential features while avoiding the entanglement of redundant or irrelevant information.

Rain Removal

Decompose Semantic Shifts for Composed Image Retrieval

no code implementations18 Sep 2023 Xingyu Yang, Daqing Liu, Heng Zhang, Yong Luo, Chaoyue Wang, Jing Zhang

Composed image retrieval is a type of image retrieval task where the user provides a reference image as a starting point and specifies a text on how to shift from the starting point to the desired target image.

Image Retrieval Retrieval

Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter

1 code implementation6 Sep 2023 Jinglong Wang, Xiawei Li, Jing Zhang, Qingyuan Xu, Qin Zhou, Qian Yu, Lu Sheng, Dong Xu

The pre-trained text-image discriminative models, such as CLIP, has been explored for open-vocabulary semantic segmentation with unsatisfactory results due to the loss of crucial localization information and awareness of object shapes.

Contrastive Learning Denoising +5

BEVTrack: A Simple and Strong Baseline for 3D Single Object Tracking in Bird's-Eye View

1 code implementation5 Sep 2023 Yuxiang Yang, Yingqi Deng, Jing Zhang, Jiahao Nie, Zheng-Jun Zha

3D Single Object Tracking (SOT) is a fundamental task of computer vision, proving essential for applications like autonomous driving.

3D Single Object Tracking Autonomous Driving +2

$\rm SP^3$: Enhancing Structured Pruning via PCA Projection

no code implementations31 Aug 2023 Yuxuan Hu, Jing Zhang, Zhe Zhao, Chen Zhao, Xiaodong Chen, Cuiping Li, Hong Chen

Structured pruning is a widely used technique for reducing the size of pre-trained language models (PLMs), but current methods often overlook the potential of compressing the hidden dimension (d) in PLMs, a dimension critical to model size and efficiency.

PartSeg: Few-shot Part Segmentation via Part-aware Prompt Learning

no code implementations24 Aug 2023 Mengya Han, Heliang Zheng, Chaoyue Wang, Yong Luo, Han Hu, Jing Zhang, Yonggang Wen

In this work, we address the task of few-shot part segmentation, which aims to segment the different parts of an unseen object using very few labeled examples.

Language Modelling Segmentation

SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation

1 code implementation17 Aug 2023 Wenxi Yue, Jing Zhang, Kun Hu, Yong Xia, Jiebo Luo, Zhiyong Wang

However, we observe two problems with this naive pipeline: (1) the domain gap between natural objects and surgical instruments leads to inferior generalisation of SAM; and (2) SAM relies on precise point or box locations for accurate segmentation, requiring either extensive manual guidance or a well-performing specialist detector for prompt preparation, which leads to a complex multi-stage pipeline.

Image Segmentation Segmentation +1

Gradient-Based Markov Chain Monte Carlo for MIMO Detection

no code implementations12 Aug 2023 Xingyu Zhou, Le Liang, Jing Zhang, Chao-Kai Wen, Shi Jin

However, optimal MIMO detection is associated with a complexity that grows exponentially with the MIMO dimensions and quickly becomes impractical.

Bayesian Inference

Distortion-aware Transformer in 360° Salient Object Detection

1 code implementation7 Aug 2023 Yinjie Zhao, Lichen Zhao, Qian Yu, Jing Zhang, Lu Sheng, Dong Xu

The first is a Distortion Mapping Module, which guides the model to pre-adapt to distorted features globally.

ERP Object +3

Contrastive Conditional Latent Diffusion for Audio-visual Segmentation

no code implementations31 Jul 2023 Yuxin Mao, Jing Zhang, Mochu Xiang, Yunqiu Lv, Yiran Zhong, Yuchao Dai

We propose a latent diffusion model with contrastive learning for audio-visual segmentation (AVS) to extensively explore the contribution of audio.

Contrastive Learning Denoising +2

ESSAformer: Efficient Transformer for Hyperspectral Image Super-resolution

1 code implementation ICCV 2023 Mingjin Zhang, Chi Zhang, Qiming Zhang, Jie Guo, Xinbo Gao, Jing Zhang

Single hyperspectral image super-resolution (single-HSI-SR) aims to restore a high-resolution hyperspectral image from a low-resolution observation.

Hyperspectral Image Super-Resolution Image Super-Resolution

Model Calibration in Dense Classification with Adaptive Label Perturbation

1 code implementation ICCV 2023 Jiawei Liu, Changkun Ye, Shan Wang, Ruikai Cui, Jing Zhang, Kaihao Zhang, Nick Barnes

To improve model calibration, we propose Adaptive Stochastic Label Perturbation (ASLP) which learns a unique label perturbation level for each training image.

Binary Classification Classification +1

Neural Operators for PDE Backstepping Control of First-Order Hyperbolic PIDE with Recycle and Delay

1 code implementation21 Jul 2023 Jie Qi, Jing Zhang, Miroslav Krstic

The recently introduced DeepONet operator-learning framework for PDE control is extended from the results for basic hyperbolic and parabolic PDEs to an advanced hyperbolic class that involves delays on both the state and the system output or input.

Operator learning

Measuring and Modeling Uncertainty Degree for Monocular Depth Estimation

no code implementations19 Jul 2023 Mochu Xiang, Jing Zhang, Nick Barnes, Yuchao Dai

Effectively measuring and modeling the reliability of a trained model is essential to the real-world deployment of monocular depth estimation (MDE) models.

Monocular Depth Estimation

Weakly-supervised Contrastive Learning for Unsupervised Object Discovery

1 code implementation7 Jul 2023 Yunqiu Lv, Jing Zhang, Nick Barnes, Yuchao Dai

Unsupervised object discovery (UOD) refers to the task of discriminating the whole region of objects from the background within a scene without relying on labeled datasets, which benefits the task of bounding-box-level localization and pixel-level segmentation.

Contrastive Learning Image Reconstruction +4

Probabilistic and Semantic Descriptions of Image Manifolds and Their Applications

no code implementations6 Jul 2023 Peter Tu, Zhaoyuan Yang, Richard Hartley, Zhiwei Xu, Jing Zhang, Yiwei Fu, Dylan Campbell, Jaskirat Singh, Tianyu Wang

This paper begins with a description of methods for estimating image probability density functions that reflects the observation that such data is usually constrained to lie in restricted regions of the high-dimensional image space-not every pattern of pixels is an image.

Chain of Thought Prompting Elicits Knowledge Augmentation

1 code implementation4 Jul 2023 Dingjun Wu, Jing Zhang, Xinmei Huang

The knowledge-augmented deep learning paradigm refers to a paradigm in which domain knowledge is identified and integrated into deep models.

Retrieval

RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation

1 code implementation3 Jul 2023 Yonglin Li, Jing Zhang, Xiao Teng, Long Lan

However, it lacks proficiency in referring video object segmentation (RVOS) due to the need for precise user-interactive prompts and a limited understanding of different modalities, such as language and vision.

Image Segmentation Referring Expression +4

GraMMaR: Ground-aware Motion Model for 3D Human Motion Reconstruction

1 code implementation29 Jun 2023 Sihan Ma, Qiong Cao, Hongwei Yi, Jing Zhang, DaCheng Tao

Demystifying complex human-ground interactions is essential for accurate and realistic 3D human motion reconstruction from RGB videos, as it ensures consistency between the humans and the ground plane.

DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models

1 code implementation NeurIPS 2023 XiMing Xing, Chuang Wang, Haitao Zhou, Jing Zhang, Qian Yu, Dong Xu

Even though trained mainly on images, we discover that pretrained diffusion models show impressive power in guiding sketch synthesis.

FHA-Kitchens: A Novel Dataset for Fine-Grained Hand Action Recognition in Kitchen Scenes

1 code implementation19 Jun 2023 Ting Zhe, YongQian Li, Jing Zhang, Yong Luo, Han Hu, Bo Du, Yonggang Wen, DaCheng Tao

We represent the action information in each hand interaction region as a triplet, resulting in a total of 878 action triplets.

Action Recognition Domain Generalization +3

Rethinking Polyp Segmentation from an Out-of-Distribution Perspective

1 code implementation13 Jun 2023 Ge-Peng Ji, Jing Zhang, Dylan Campbell, Huan Xiong, Nick Barnes

Unlike existing fully-supervised approaches, we rethink colorectal polyp segmentation from an out-of-distribution perspective with a simple but effective self-supervised learning approach.

Segmentation Self-Supervised Learning

Mutual Information Regularization for Weakly-supervised RGB-D Salient Object Detection

1 code implementation6 Jun 2023 Aixuan Li, Yuxin Mao, Jing Zhang, Yuchao Dai

In particular, following the principle of disentangled representation learning, we introduce a mutual information upper bound with a mutual information minimization regularizer to encourage the disentangled representation of each modality for salient object detection.

Object object-detection +3

Latent Optimal Paths by Gumbel Propagation for Variational Bayesian Dynamic Programming

2 code implementations5 Jun 2023 Xinlei Niu, Christian Walder, Jing Zhang, Charles Patrick Martin

We show the equivalence of the Gibbs distribution to a message-passing algorithm by the properties of the Gumbel distribution and give all the ingredients required for variational Bayesian inference of a latent path, namely Bayesian dynamic programming (BDP).

Bayesian Inference Singing Voice Synthesis

DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting

1 code implementation31 May 2023 Maoyuan Ye, Jing Zhang, Shanshan Zhao, Juhua Liu, Tongliang Liu, Bo Du, DaCheng Tao

In this paper, we present DeepSolo++, a simple DETR-like baseline that lets a single decoder with explicit points solo for text detection, recognition, and script identification simultaneously.

Decoder Scene Text Detection +2

MGL2Rank: Learning to Rank the Importance of Nodes in Road Networks Based on Multi-Graph Fusion

no code implementations20 May 2023 Ming Xu, Jing Zhang

The identification of important nodes with strong propagation capabilities in road networks is a vital topic in urban planning.

Diversity Graph Learning +1

Multi-grained Hypergraph Interest Modeling for Conversational Recommendation

1 code implementation4 May 2023 Chenzhan Shang, Yupeng Hou, Wayne Xin Zhao, Yaliang Li, Jing Zhang

In our approach, we first employ the hypergraph structure to model users' historical dialogue sessions and form a session-based hypergraph, which captures coarse-grained, session-level relations.

Recommendation Systems

SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model

2 code implementations NeurIPS 2023 Di Wang, Jing Zhang, Bo Du, Minqiang Xu, Lin Liu, DaCheng Tao, Liangpei Zhang

In this study, we leverage SAM and existing RS object detection datasets to develop an efficient pipeline for generating a large-scale RS segmentation dataset, dubbed SAMRS.

Instance Segmentation Object +4

Revolutionizing Agrifood Systems with Artificial Intelligence: A Survey

no code implementations3 May 2023 Tao Chen, Liang Lv, Di Wang, Jing Zhang, Yue Yang, Zeyang Zhao, Chen Wang, Xiaowei Guo, Hao Chen, Qingye Wang, Yufei Xu, Qiming Zhang, Bo Du, Liangpei Zhang, DaCheng Tao

With the world population rapidly increasing, transforming our agrifood systems to be more productive, efficient, safe, and sustainable is crucial to mitigate potential food shortages.

Scalable Mask Annotation for Video Text Spotting

1 code implementation2 May 2023 Haibin He, Jing Zhang, Mengyang Xu, Juhua Liu, Bo Du, DaCheng Tao

Video text spotting refers to localizing, recognizing, and tracking textual elements such as captions, logos, license plates, signs, and other forms of text within consecutive video frames.

Text Spotting

OSP2B: One-Stage Point-to-Box Network for 3D Siamese Tracking

2 code implementations23 Apr 2023 Jiahao Nie, Zhiwei He, Yuxiang Yang, Zhengyi Bao, Mingyu Gao, Jing Zhang

By integrating the derived classification scores with the center-ness scores, the resulting network can effectively suppress interference proposals and further mitigate task misalignment.

3D Single Object Tracking Object Tracking

DCN-T: Dual Context Network with Transformer for Hyperspectral Image Classification

2 code implementations19 Apr 2023 Di Wang, Jing Zhang, Bo Du, Liangpei Zhang, DaCheng Tao

Hyperspectral image (HSI) classification is challenging due to spatial variability caused by complex imaging conditions.

Hyperspectral Image Classification Image Generation

Event-based Simultaneous Localization and Mapping: A Comprehensive Survey

1 code implementation19 Apr 2023 Kunping Huang, Sen Zhang, Jing Zhang, DaCheng Tao

This paper presents a timely and comprehensive review of event-based vSLAM algorithms that exploit the benefits of asynchronous and irregular event streams for localization and mapping tasks.

Motion Compensation Simultaneous Localization and Mapping

MPMQA: Multimodal Question Answering on Product Manuals

1 code implementation19 Apr 2023 Liang Zhang, Anwen Hu, Jing Zhang, Shuo Hu, Qin Jin

Taking into account the length of product manuals and the fact that a question is always related to a small number of pages, MPMQA can be naturally split into two subtasks: retrieving most related pages and then generating multimodal answers.

Question Answering Sentence

Cold-Start based Multi-Scenario Ranking Model for Click-Through Rate Prediction

no code implementations16 Apr 2023 Peilin Chen, Hong Wen, Jing Zhang, Fuyu Lv, Zhao Li, Qijie Shen, Wanjie Tao, Ying Zhou, Chao Zhang

Online travel platforms (OTPs), e. g., Ctrip. com or Fliggy. com, can effectively provide travel-related products or services to users.

Click-Through Rate Prediction

UVA: Towards Unified Volumetric Avatar for View Synthesis, Pose rendering, Geometry and Texture Editing

no code implementations14 Apr 2023 Jinlong Fan, Jing Zhang, DaCheng Tao

Experiments on multiple human avatars demonstrate that our UVA achieves competitive results in novel view synthesis and novel pose rendering while enabling local and independent editing of geometry and appearance.

Novel View Synthesis

Deep Image Matting: A Comprehensive Survey

1 code implementation10 Apr 2023 Jizhizi Li, Jing Zhang, DaCheng Tao

Image matting refers to extracting precise alpha matte from natural images, and it plays a critical role in various downstream applications, such as image editing.

Image Matting Referring Image Matting

Hierarchically Fusing Long and Short-Term User Interests for Click-Through Rate Prediction in Product Search

no code implementations4 Apr 2023 Qijie Shen, Hong Wen, Jing Zhang, Qi Rao

Specifically, SIE is proposed to extract user's short-term interests by integrating three fundamental interests encoders within it namely query-dependent, target-dependent and causal-dependent interest encoder, respectively, followed by delivering the resultant representation to the module LIE, where it can effectively capture user long-term interests by devising an attention mechanism with respect to the short-term interests from SIE module.

Click-Through Rate Prediction Disentanglement

GLT-T++: Global-Local Transformer for 3D Siamese Tracking with Ranking Loss

1 code implementation1 Apr 2023 Jiahao Nie, Zhiwei He, Yuxiang Yang, Xudong Lv, Mingyu Gao, Jing Zhang

Incorporating this transformer-based voting scheme into 3D RPN, a novel Siamese method dubbed GLT-T is developed for 3D single object tracking on point clouds.

3D Single Object Tracking Object Tracking +1

SimDistill: Simulated Multi-modal Distillation for BEV 3D Object Detection

2 code implementations29 Mar 2023 Haimei Zhao, Qiming Zhang, Shanshan Zhao, Zhe Chen, Jing Zhang, DaCheng Tao

Multi-view camera-based 3D object detection has become popular due to its low cost, but accurately inferring 3D geometry solely from camera data remains challenging and may lead to inferior performance.

3D Object Detection Knowledge Distillation +1

Vision Transformer with Quadrangle Attention

1 code implementation27 Mar 2023 Qiming Zhang, Jing Zhang, Yufei Xu, DaCheng Tao

Window-based attention has become a popular choice in vision transformers due to its superior performance, lower computational complexity, and less memory footprint.

object-detection Object Detection +2

LPFF: A Portrait Dataset for Face Generators Across Large Poses

no code implementations ICCV 2023 Yiqian Wu, Jing Zhang, Hongbo Fu, Xiaogang Jin

To better validate our pose-conditional 3D-aware generators, we develop a new FID measure to evaluate the 3D-level performance.

3D Reconstruction