Search Results for author: Yang Yang

Found 385 papers, 125 papers with code

A Progressive Framework for Role-Aware Rumor Resolution

1 code implementation COLING 2022 Lei Chen, Guanying Li, Zhongyu Wei, Yang Yang, Baohua Zhou, Qi Zhang, Xuanjing Huang

Existing works on rumor resolution have shown great potential in recognizing word appearance and user participation.

DMM: Disparity-guided Multispectral Mamba for Oriented Object Detection in Remote Sensing

no code implementations11 Jul 2024 Minghang Zhou, Tianyu Li, Chaofan Qiao, Dongyu Xie, Guoqing Wang, Ningjuan Ruan, Lin Mei, Yang Yang

Inspired by the efficiency and lower complexity of Mamba in long sequence tasks, we propose Disparity-guided Multispectral Mamba (DMM), a multispectral oriented object detection framework comprised of a Disparity-guided Cross-modal Fusion Mamba (DCFM) module, a Multi-scale Target-aware Attention (MTA) module, and a Target-Prior Aware (TPA) auxiliary task.

Computational Efficiency object-detection +2

Chromosomal Structural Abnormality Diagnosis by Homologous Similarity

1 code implementation11 Jul 2024 Juren Li, Fanzhe Fu, Ran Wei, Yifei Sun, Zeyu Lai, Ning Song, Xin Chen, Yang Yang

Pathogenic chromosome abnormalities are very common among the general population.

Ternary Spike-based Neuromorphic Signal Processing System

no code implementations7 Jul 2024 Shuai Wang, Dehao Zhang, Ammar Belatreche, Yichen Xiao, Hongyu Qing, Wenjie We, Malu Zhang, Yang Yang

QT-SNN, compatible with ternary spike trains from the TAE method, quantifies both membrane potentials and synaptic weights to reduce memory requirements while maintaining performance.

Quantization

The Solution for the AIGC Inference Performance Optimization Competition

no code implementations6 Jul 2024 Sishun Pan, Haonan Xu, Zhonghua Wan, Yang Yang

In recent years, the rapid advancement of large-scale pre-trained language models based on transformer architectures has revolutionized natural language processing tasks.

Computational Efficiency Marketing

The Solution for Language-Enhanced Image New Category Discovery

no code implementations6 Jul 2024 Haonan Xu, Dian Chao, Xiangyu Wu, Zhonghua Wan, Yang Yang

Treating texts as images, combining prompts with textual labels for prompt tuning, and leveraging the alignment properties of CLIP have been successfully applied in zero-shot multi-label image recognition.

Contrastive Learning Diversity +1

The Solution for the sequential task continual learning track of the 2nd Greater Bay Area International Algorithm Competition

no code implementations6 Jul 2024 Sishun Pan, Xixian Wu, Tingmin Li, Longfei Huang, Mingxu Feng, Zhonghua Wan, Yang Yang

This paper presents a data-free, parameter-isolation-based continual learning algorithm we developed for the sequential task continual learning track of the 2nd Greater Bay Area International Algorithm Competition.

Continual Learning

The Solution for the 5th GCAIAC Zero-shot Referring Expression Comprehension Challenge

no code implementations6 Jul 2024 Longfei Huang, Feng Yu, Zhihao Guan, Zhonghua Wan, Yang Yang

Recent studies have enhanced the zero-shot performance of multimodal base models in referring expression comprehension tasks by introducing visual prompts.

Referring Expression Referring Expression Comprehension

Multimodal Classification via Modal-Aware Interactive Enhancement

no code implementations5 Jul 2024 Qing-Yuan Jiang, Zhouyang Chi, Yang Yang

Due to the notorious modality imbalance problem, multimodal learning (MML) leads to the phenomenon of optimization imbalance, thus struggling to achieve satisfactory performance.

Classification

The Solution for the GAIIC2024 RGB-TIR object detection Challenge

no code implementations4 Jul 2024 Xiangyu Wu, Jinling Xu, Longfei Huang, Yang Yang

This report introduces a solution to The task of RGB-TIR object detection from the perspective of unmanned aerial vehicles.

Object object-detection +1

The Solution for the ICCV 2023 Perception Test Challenge 2023 -- Task 6 -- Grounded videoQA

no code implementations2 Jul 2024 Hailiang Zhang, Dian Chao, Zhihao Guan, Yang Yang

To tackle this issue, we propose an alternative two-stage approach:(1) First, we leverage the VALOR model to answer questions based on video information.

Grounded Video Question Answering Object Tracking +3

The Solution for The PST-KDD-2024 OAG-Challenge

no code implementations2 Jul 2024 Shupeng Zhong, Xinger Li, Shushan Jin, Yang Yang

In this paper, we introduce the second-place solution in the KDD-2024 OAG-Challenge paper source tracing track.

First Place Solution of 2023 Global Artificial Intelligence Technology Innovation Competition Track 1

no code implementations1 Jul 2024 Xiangyu Wu, Hailiang Zhang, Yang Yang, Jianfeng Lu

The retrieval augmentation constructs a mini-knowledge base, enriching the input information of the model, while the similarity bucket further perceives the noise information within the mini-knowledge base, guiding the model to generate higher-quality diagnostic reports based on the similarity prompts.

Denoising Language Modelling +2

Complementary Fusion of Deep Network and Tree Model for ETA Prediction

no code implementations1 Jul 2024 Yurui Huang, Jie Zhang, HengDa Bao, Yang Yang, Jian Yang

Estimated time of arrival (ETA) is a very important factor in the transportation system.

Seeing Is Believing: Black-Box Membership Inference Attacks Against Retrieval Augmented Generation

no code implementations27 Jun 2024 Yuying Li, Gaoyang Liu, Yang Yang, Chen Wang

Retrieval-Augmented Generation (RAG) is a state-of-the-art technique that enhances Large Language Models (LLMs) by retrieving relevant knowledge from an external, non-parametric database.

RAG Retrieval

Poisoned LangChain: Jailbreak LLMs by LangChain

no code implementations26 Jun 2024 Ziqiu Wang, Jun Liu, Shengkai Zhang, Yang Yang

Building on this, we further design a novel method of indirect jailbreak attack, termed Poisoned-LangChain (PLC), which leverages a poisoned external knowledge base to interact with large language models, thereby causing the large models to generate malicious non-compliant dialogues. We tested this method on six different large language models across three major categories of jailbreak issues.

RAG Retrieval

Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models

1 code implementation25 Jun 2024 Wenhao Shi, Zhiqiang Hu, Yi Bin, Junhua Liu, Yang Yang, See-Kiong Ng, Lidong Bing, Roy Ka-Wei Lee

To bridge this gap, we address the lack of high-quality, diverse multimodal mathematical datasets by collecting 40K high-quality images with question-answer pairs from 24 existing datasets and synthesizing 320K new pairs, creating the MathV360K dataset, which enhances both the breadth and depth of multimodal mathematical questions.

Diversity Math +1

The Championship-Winning Solution for the 5th CLVISION Challenge 2024

no code implementations24 Jun 2024 Sishun Pan, Tingmin Li, Yang Yang

Our approach is based on Winning Subnetworks to allocate independent parameter spaces for each task addressing the catastrophic forgetting problem in class incremental learning and employ three training strategies: supervised classification learning, unsupervised contrastive learning, and pseudo-label classification learning to fully utilize the information in both labeled and unlabeled data, enhancing the classification performance of each subnetwork.

Classification Class Incremental Learning +3

Q-SNNs: Quantized Spiking Neural Networks

no code implementations19 Jun 2024 Wenjie Wei, Yu Liang, Ammar Belatreche, Yichen Xiao, Honglin Cao, Zhenbang Ren, Guoqing Wang, Malu Zhang, Yang Yang

Brain-inspired Spiking Neural Networks (SNNs) leverage sparse spikes to represent information and process them in an asynchronous event-driven manner, offering an energy-efficient paradigm for the next generation of machine intelligence.

Quantization

The Solution for CVPR2024 Foundational Few-Shot Object Detection Challenge

no code implementations18 Jun 2024 Hongpeng Pan, Shifeng Yi, Shouwei Yang, Lei Qi, Bing Hu, Yi Xu, Yang Yang

This misalignment hinders the zero-shot performance of VLM and the application of fine-tuning methods based on pseudo-labels.

Few-Shot Object Detection Language Modelling +4

ECGMamba: Towards Efficient ECG Classification with BiSSM

no code implementations14 Jun 2024 Yupeng Qiang, Xunde Dong, Xiuling Liu, Yang Yang, Yihai Fang, Jianhong Dou

ECGMamba is based on the innovative Mamba-based block, which incorporates a range of time series modeling techniques to enhance performance while maintaining the efficiency of inference.

Classification ECG Classification

Neuro-Symbolic Temporal Point Processes

no code implementations6 Jun 2024 Yang Yang, Chao Yang, Boyang Li, Yinghao Fu, Shuang Li

Our goal is to $\textit{efficiently}$ discover a compact set of temporal logic rules to explain irregular events of interest.

Point Processes

W-Net: A Facial Feature-Guided Face Super-Resolution Network

no code implementations2 Jun 2024 Hao liu, Yang Yang, Yunxia Liu

We use this parsing map as an attention prior, effectively integrating information from both the parsing map and LR images.

Super-Resolution

Multi-Condition Latent Diffusion Network for Scene-Aware Neural Human Motion Prediction

no code implementations29 May 2024 Xuehao Gao, Yang Yang, Yang Wu, Shaoyi Du, Guo-Jun Qi

Inferring 3D human motion is fundamental in many applications, including understanding human activity and analyzing one's intention.

Human motion prediction motion prediction

Universal and Extensible Language-Vision Models for Organ Segmentation and Tumor Detection from Abdominal Computed Tomography

1 code implementation28 May 2024 Jie Liu, Yixiao Zhang, Kang Wang, Mehmet Can Yavuz, Xiaoxi Chen, Yixuan Yuan, Haoliang Li, Yang Yang, Alan Yuille, Yucheng Tang, Zongwei Zhou

However, these AI models often struggle with flexibility for partially annotated datasets and extensibility for new classes due to limitations in the one-hot encoding, architectural design, and learning scheme.

Computational Efficiency Computed Tomography (CT) +1

Data Augmentation for Text-based Person Retrieval Using Large Language Models

no code implementations20 May 2024 Zheng Li, Lijia Si, Caili Guo, Yang Yang, Qiushi Cao

LLM-DA uses LLMs to rewrite the text in the current TPR dataset, achieving high-quality expansion of the dataset concisely and efficiently.

Data Augmentation Person Retrieval +4

Generative Enzyme Design Guided by Functionally Important Sites and Small-Molecule Substrates

1 code implementation13 May 2024 Zhenqiao Song, Yunlong Zhao, Wenxian Shi, Wengong Jin, Yang Yang, Lei LI

In this paper, we propose EnzyGen, an approach to learn a unified model to design enzymes across all functional families.

Robust Semi-supervised Learning by Wisely Leveraging Open-set Data

no code implementations11 May 2024 Yang Yang, Nan Jiang, Yi Xu, De-Chuan Zhan

Open-set Semi-supervised Learning (OSSL) holds a realistic setting that unlabeled data may come from classes unseen in the labeled set, i. e., out-of-distribution (OOD) data, which could cause performance degradation in conventional SSL models.

Learning Theory

TAI++: Text as Image for Multi-Label Image Classification by Co-Learning Transferable Prompt

no code implementations11 May 2024 Xiangyu Wu, Qing-Yuan Jiang, Yang Yang, Yi-Feng Wu, Qing-Guo Chen, Jianfeng Lu

Then, a co-learning strategy with a dual-adapter module is designed to transfer visual knowledge from pseudo-visual prompt to text prompt, enhancing their visual representation abilities.

Diversity Multi-Label Image Classification +1

Exploring Correlations of Self-Supervised Tasks for Graphs

1 code implementation7 May 2024 Taoran Fang, Wei Zhou, Yifei Sun, Kaiqiao Han, Lvbin Ma, Yang Yang

Specifically, we evaluate the performance of the representations trained by one specific task on other tasks and define correlation values to quantify task correlations.

Multi-Task Learning Self-Supervised Learning

Modality Prompts for Arbitrary Modality Salient Object Detection

no code implementations6 May 2024 Nianchang Huang, Yang Yang, Qiang Zhang, Jungong Han, Jin Huang

A novel modality-adaptive Transformer (MAT) will be proposed to investigate two fundamental challenges of AM SOD, ie more diverse modality discrepancies caused by varying modality types that need to be processed, and dynamic fusion design caused by an uncertain number of modalities present in the inputs of multimodal fusion strategy.

Object object-detection +2

Salient Object Detection From Arbitrary Modalities

1 code implementation6 May 2024 Nianchang Huang, Yang Yang, Ruida Xi, Qiang Zhang, Jungong Han, Jin Huang

The most prominent characteristics of AM SOD are that the modality types and modality numbers will be arbitrary or dynamically changed.

Object object-detection +3

Low-light Object Detection

no code implementations6 May 2024 Pengpeng Li, Haowei Gu, Yang Yang

In this competition we employed a model fusion approach to achieve object detection results close to those of real images.

Clustering Object +2

Technical report on target classification in SAR track

no code implementations3 May 2024 Haonan Xu, Han Yinan, Haotian Si, Yang Yang

This report proposes a robust method for classifying oceanic and atmospheric phenomena using synthetic aperture radar (SAR) imagery.

Classification Data Augmentation +1

Solution for Authenticity Identification of Typical Target Remote Sensing Images

no code implementations3 May 2024 Yipeng Lin, Xinger Li, Yang Yang

In this paper, we propose a basic RGB single-mode model based on weakly supervised training under pseudo labels, which performs high-precision authenticity identification under multi-scene typical target remote sensing images.

Advancing low-field MRI with a universal denoising imaging transformer: Towards fast and high-quality imaging

1 code implementation30 Apr 2024 Zheren Zhu, Azaan Rehman, Xiaozhi Cao, Congyu Liao, Yoo Jin Lee, Michael Ohliger, Hui Xue, Yang Yang

Recent developments in low-field (LF) magnetic resonance imaging (MRI) systems present remarkable opportunities for affordable and widespread MRI access.

Denoising Diversity

BrainODE: Dynamic Brain Signal Analysis via Graph-Aided Neural Ordinary Differential Equations

no code implementations30 Apr 2024 Kaiqiao Han, Yi Yang, Zijie Huang, Xuan Kan, Yang Yang, Ying Guo, Lifang He, Liang Zhan, Yizhou Sun, Wei Wang, Carl Yang

Brain network analysis is vital for understanding the neural interactions regarding brain structures and functions, and identifying potential biomarkers for clinical phenotypes.

Irregular Time Series Time Series

Policy Gradient-Driven Noise Mask

1 code implementation29 Apr 2024 Mehmet Can Yavuz, Yang Yang

As a reinforcement learning algorithm, our approach employs a dual-component system comprising a very light-weight policy network that learns to sample conditional noise using a differentiable beta distribution and a classifier network.

Image Augmentation

Retrieval-Oriented Knowledge for Click-Through Rate Prediction

no code implementations28 Apr 2024 Huanshuo Liu, Bo Chen, Menghui Zhu, Jianghao Lin, Jiarui Qin, Yang Yang, Hao Zhang, Ruiming Tang

Specifically, a knowledge base, consisting of a retrieval-oriented embedding layer and a knowledge encoder, is designed to preserve and imitate the retrieved & aggregated representations in a decomposition-reconstruction paradigm.

Click-Through Rate Prediction Contrastive Learning +2

Implementation of Big AI Models for Wireless Networks with Collaborative Edge Computing

no code implementations27 Apr 2024 Liekang Zeng, Shengyuan Ye, Xu Chen, Yang Yang

Motivated by this, in this article, we propose collaborative edge training, a novel training mechanism that orchestrates a group of trusted edge devices as a resource pool for expedited, sustainable big AI model training at the edge.

Edge-computing Scheduling

Online Planning of Power Flows for Power Systems Against Bushfires Using Spatial Context

no code implementations20 Apr 2024 Jianyu Xu, Qiuzhuang Sun, Yang Yang, Huadong Mo, Daoyi Dong

Our model assumptions are verified by the real bushfire data from NSW, Australia, and we apply our model to two power systems to illustrate its applicability.

AMMUNet: Multi-Scale Attention Map Merging for Remote Sensing Image Segmentation

1 code implementation20 Apr 2024 Yang Yang, Shunyi Zheng

The advancement of deep learning has driven notable progress in remote sensing semantic segmentation.

Computational Efficiency Image Segmentation +1

AccidentBlip2: Accident Detection With Multi-View MotionBlip2

1 code implementation18 Apr 2024 Yihua Shao, Hongyi Cai, Xinwei Long, Weiyi Lang, Zhe Wang, Haoran Wu, Yan Wang, Jiayi Yin, Yang Yang, Yisheng Lv, Zhen Lei

The inference capabilities of neural networks using cameras limit the accuracy of accident detection in complex transportation systems.

Language Modelling Large Language Model +2

VFLGAN: Vertical Federated Learning-based Generative Adversarial Network for Vertically Partitioned Data Publication

1 code implementation15 Apr 2024 Xun Yuan, Yang Yang, Prosanta Gope, Aryan Pasikhani, Biplab Sikdar

Nevertheless, in some scenarios, it has been found that the attributes needed to train an AI model belong to different parties, and they cannot share the raw data for synthetic data publication due to privacy regulations.

Generative Adversarial Network Vertical Federated Learning

MAP: Model Aggregation and Personalization in Federated Learning with Incomplete Classes

no code implementations14 Apr 2024 Xin-Chun Li, Shaoming Song, Yinchuan Li, Bingshuai Li, Yunfeng Shao, Yang Yang, De-Chuan Zhan

For better model personalization, we point out that the hard-won personalized models are not well exploited and propose "inherited private model" to store the personalization experience.

Federated Learning

Learning to Rebalance Multi-Modal Optimization by Adaptively Masking Subnetworks

no code implementations12 Apr 2024 Yang Yang, Hongpeng Pan, Qing-Yuan Jiang, Yi Xu, Jinghui Tang

According to the findings, we further propose a novel importance sampling-based, element-wise joint optimization method, called Adaptively Mask Subnetworks Considering Modal Significance(AMSS).

JobFormer: Skill-Aware Job Recommendation with Semantic-Enhanced Transformer

no code implementations5 Apr 2024 Zhihao Guan, Jia-Qi Yang, Yang Yang, HengShu Zhu, Wenjie Li, Hui Xiong

Moreover, we adopt a two-stage learning strategy for skill-aware recommendation, in which we utilize the skill distribution to guide JD representation learning in the recall stage, and then combine the user profiles for final prediction in the ranking stage.

Click-Through Rate Prediction Representation Learning

Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach

1 code implementation CVPR 2024 Wei Dong, Xing Zhang, Bihui Chen, Dawei Yan, Zhijun Lin, Qingsen Yan, Peng Wang, Yang Yang

Parameter-efficient fine-tuning for pre-trained Vision Transformers aims to adeptly tailor a model to downstream tasks by learning a minimal set of new adaptation parameters while preserving the frozen majority of pre-trained parameters.

Image Classification

High-Resolution Image Translation Model Based on Grayscale Redefinition

no code implementations26 Mar 2024 Xixian Wu, Dian Chao, Yang Yang

Image-to-image translation is a technique that focuses on transferring images from one domain to another while maintaining the essential content representations.

Image-to-Image Translation Translation

Solution for Point Tracking Task of ICCV 1st Perception Test Challenge 2023

no code implementations26 Mar 2024 Hongpeng Pan, Yang Yang, Zhongtian Fu, Yuxuan Zhang, Shian Du, Yi Xu, Xiangyang Ji

To address this issue, we propose a simple yet effective approach called TAP with confident static points (TAPIR+), which focuses on rectifying the tracking of the static point in the videos shot by a static camera.

Motion Detection Point Tracking +2

Semi-Supervised Image Captioning Considering Wasserstein Graph Matching

no code implementations26 Mar 2024 Yang Yang

Image captioning can automatically generate captions for the given images, and the key challenge is to learn a mapping function from visual features to natural language features.

Data Augmentation Graph Matching +2

Solution for Emotion Prediction Competition of Workshop on Emotionally and Culturally Intelligent AI

no code implementations26 Mar 2024 Shengdong Xu, Zhouyang Chi, Yang Yang

In order to address this issue, we propose a simple yet effective approach called single-multi modal with Emotion-Cultural specific prompt(ECSP), which focuses on using the single modal message to enhance the performance of multimodal models and a well-designed prompt to reduce cultural differences problem.

Diversity XLM-R

NJUST-KMG at TRAC-2024 Tasks 1 and 2: Offline Harm Potential Identification

no code implementations26 Mar 2024 Jingyuan Wang, Shengdong Xu, Yang Yang

This report provide a detailed description of the method that we proposed in the TRAC-2024 Offline Harm Potential dentification which encloses two sub-tasks.

Contrastive Learning

An Expert is Worth One Token: Synergizing Multiple Expert LLMs as Generalist via Expert Token Routing

no code implementations25 Mar 2024 Ziwei Chai, Guoyin Wang, Jing Su, Tianjie Zhang, Xuanwen Huang, Xuwu Wang, Jingjing Xu, Jianbo Yuan, Hongxia Yang, Fei Wu, Yang Yang

We present Expert-Token-Routing, a unified generalist framework that facilitates seamless integration of multiple expert LLMs.

Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments

1 code implementation20 Mar 2024 Yang Yang, Wenhai Wang, Zhe Chen, Jifeng Dai, Liang Zheng

However, in the real-world where test ground truths are not provided, it is non-trivial to find out whether bounding boxes are accurate, thus preventing us from assessing the detector generalization ability.

object-detection Object Detection +1

Positioning Using Wireless Networks: Applications, Recent Progress and Future Challenges

no code implementations18 Mar 2024 Yang Yang, Mingzhe Chen, Yufei Blankenship, Jemin Lee, Zabih Ghassemlooy, Julian Cheng, Shiwen Mao

The purpose of this paper is to provide a comprehensive overview of existing works and new trends in the field of positioning techniques from both the academic and industrial perspectives.

Path-GPTOmic: A Balanced Multi-modal Learning Framework for Survival Outcome Prediction

no code implementations18 Mar 2024 Hongxiao Wang, Yang Yang, Zhuo Zhao, Pengfei Gu, Nishchal Sapkota, Danny Z. Chen

For predicting cancer survival outcomes, standard approaches in clinical research are often based on two main modalities: pathology images for observing cell morphology features, and genomic (e. g., bulk RNA-seq) for quantifying gene expressions.

Survival Prediction

LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models

1 code implementation18 Mar 2024 Yang Yang, Wen Wang, Liang Peng, Chaotian Song, Yao Chen, Hengjia Li, Xiaolong Yang, Qinglin Lu, Deng Cai, Boxi Wu, Wei Liu

Customization generation techniques have significantly advanced the synthesis of specific concepts across varied contexts.

An upper bound of the mutation probability in the genetic algorithm for general 0-1 knapsack problem

no code implementations17 Mar 2024 Yang Yang

As an important part of genetic algorithms (GAs), mutation operators is widely used in evolutionary algorithms to solve $\mathcal{NP}$-hard problems because it can increase the population diversity of individual.

Diversity Evolutionary Algorithms +1

Region-aware Distribution Contrast: A Novel Approach to Multi-Task Partially Supervised Learning

no code implementations15 Mar 2024 Meixuan Li, Tianyu Li, Guoqing Wang, Peng Wang, Yang Yang, Heng Tao Shen

Aligning these distributions between corresponding regions from different tasks imparts higher flexibility and capacity to capture intra-region structures, accommodating a broader range of tasks.

Depth Estimation Semantic Segmentation +1

SEMRes-DDPM: Residual Network Based Diffusion Modelling Applied to Imbalanced Data

no code implementations9 Mar 2024 Ming Zheng, Yang Yang, Zhi-Hang Zhao, Shan-Chao Gan, Yang Chen, Si-Kai Ni, Yang Lu

In the current oversampling methods based on generative networks, the methods based on GANs can capture the true distribution of data, but there is the problem of pattern collapse and training instability in training; in the oversampling methods based on denoising diffusion probability models, the neural network of the inverse diffusion process using the U-Net is not applicable to tabular data, and although the MLP can be used to replace the U-Net, the problem exists due to the simplicity of the structure and the poor effect of removing noise.

Denoising

Towards Efficient and Effective Unlearning of Large Language Models for Recommendation

1 code implementation6 Mar 2024 Hangyu Wang, Jianghao Lin, Bo Chen, Yang Yang, Ruiming Tang, Weinan Zhang, Yong Yu

However, in order to protect user privacy and optimize utility, it is also crucial for LLMRec to intentionally forget specific user data, which is generally referred to as recommendation unlearning.

World Knowledge

Event-Driven Learning for Spiking Neural Networks

no code implementations1 Mar 2024 Wenjie Wei, Malu Zhang, Jilin Zhang, Ammar Belatreche, Jibin Wu, Zijing Xu, Xuerui Qiu, Hong Chen, Yang Yang, Haizhou Li

Specifically, we introduce two novel event-driven learning methods: the spike-timing-dependent event-driven (STD-ED) and membrane-potential-dependent event-driven (MPD-ED) algorithms.

Can GNN be Good Adapter for LLMs?

2 code implementations20 Feb 2024 Xuanwen Huang, Kaiqiao Han, Yang Yang, Dezheng Bao, Quanjin Tao, Ziwei Chai, Qi Zhu

In terms of efficiency, the GNN adapter introduces only a few trainable parameters and can be trained with low computation costs.

Graph Neural Network Node Classification +3

Brant-2: Foundation Model for Brain Signals

no code implementations15 Feb 2024 Zhizhang Yuan, Daoze Zhang, Junru Chen, Gefei Gu, Yang Yang

Foundational models benefit from pre-training on large amounts of unlabeled data and enable strong performance in a wide variety of applications with a small amount of labeled data.

Graph-Skeleton: ~1% Nodes are Sufficient to Represent Billion-Scale Graph

1 code implementation14 Feb 2024 Linfeng Cao, Haoran Deng, Yang Yang, Chunping Wang, Lei Chen

In this paper, we argue that properly fetching and condensing the background nodes from massive web graph data might be a more economical shortcut to tackle the obstacles fundamentally.

Feature Correlation Graph Mining +1

MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

1 code implementation6 Feb 2024 Xiangxiang Chu, Limeng Qiao, Xinyu Zhang, Shuang Xu, Fei Wei, Yang Yang, Xiaofei Sun, Yiming Hu, Xinyang Lin, Bo Zhang, Chunhua Shen

We introduce MobileVLM V2, a family of significantly improved vision language models upon MobileVLM, which proves that a delicate orchestration of novel architectural design, an improved training scheme tailored for mobile VLMs, and rich high-quality dataset curation can substantially benefit VLMs' performance.

AutoML Language Modelling

Unveiling Latent Causal Rules: A Temporal Point Process Approach for Abnormal Event Explanation

no code implementations3 Feb 2024 Yiling Kuang, Chao Yang, Yang Yang, Shuang Li

In the M-step, we update both the rule set and model parameters to enhance the likelihood function's lower bound.

Point Processes

One Graph Model for Cross-domain Dynamic Link Prediction

no code implementations3 Feb 2024 Xuanwen Huang, Wei Chow, Yang Wang, Ziwei Chai, Chunping Wang, Lei Chen, Yang Yang

Extensive experiments on eight untrained graphs demonstrate that DyExpert achieves state-of-the-art performance in cross-domain link prediction.

Dynamic Link Prediction

Are Synthetic Time-series Data Really not as Good as Real Data?

no code implementations1 Feb 2024 Fanzhe Fu, Junru Chen, Jing Zhang, Carl Yang, Lvbin Ma, Yang Yang

Time-series data presents limitations stemming from data quality issues, bias and vulnerabilities, and generalization problem.

Representation Learning Time Series

Binaural Angular Separation Network

no code implementations16 Jan 2024 Yang Yang, George Sung, Shao-Fu Shih, Hakan Erdogan, Chehung Lee, Matthias Grundmann

We propose a neural network model that can separate target speech sources from interfering sources at different angular regions using two microphones.

Robust Semi-Supervised Learning for Self-learning Open-World Classes

1 code implementation15 Jan 2024 Wenjuan Xi, Xin Song, Weili Guo, Yang Yang

Existing semi-supervised learning (SSL) methods assume that labeled and unlabeled data share the same class space.

Open-World Semi-Supervised Learning Self-Learning

InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks

1 code implementation10 Jan 2024 Xueyu Hu, Ziyu Zhao, Shuang Wei, Ziwei Chai, Qianli Ma, Guoyin Wang, Xuwu Wang, Jing Su, Jingjing Xu, Ming Zhu, Yao Cheng, Jianbo Yuan, Jiwei Li, Kun Kuang, Yang Yang, Hongxia Yang, Fei Wu

In this paper, we introduce InfiAgent-DABench, the first benchmark specifically designed to evaluate LLM-based agents on data analysis tasks.

Benchmarking

StreamVC: Real-Time Low-Latency Voice Conversion

no code implementations5 Jan 2024 Yang Yang, Yury Kartynnik, Yunpeng Li, Jiuqiang Tang, Xing Li, George Sung, Matthias Grundmann

We present StreamVC, a streaming voice conversion solution that preserves the content and prosody of any source speech while matching the voice timbre from any target speech.

Speech Synthesis Voice Conversion

GUESS:GradUally Enriching SyntheSis for Text-Driven Human Motion Generation

1 code implementation4 Jan 2024 Xuehao Gao, Yang Yang, Zhenyu Xie, Shaoyi Du, Zhongqian Sun, Yang Wu

The whole text-driven human motion synthesis problem is then divided into multiple abstraction levels and solved with a multi-stage generation framework with a cascaded latent diffusion model: an initial generator first generates the coarsest human motion guess from a given text description; then, a series of successive generators gradually enrich the motion details based on the textual description and the previous synthesized results.

Motion Synthesis

OFDM-Based Digital Semantic Communication with Importance Awareness

no code implementations4 Jan 2024 Chuanhong Liu, Caili Guo, Yang Yang, Wanli Ni, Tony Q. S. Quek

Based on semantic importance, we formulate a sub-carrier and bit allocation problem to maximize communication performance.

Semantic Communication

Ensemble Diversity Facilitates Adversarial Transferability

1 code implementation CVPR 2024 Bowen Tang, Zheng Wang, Yi Bin, Qi Dou, Yang Yang, Heng Tao Shen

With the advent of ensemble-based attacks the transferability of generated adversarial examples is elevated by a noticeable margin despite many methods only employing superficial integration yet ignoring the diversity between ensemble models.

Diversity reinforcement-learning

Fine-tuning Graph Neural Networks by Preserving Graph Generative Patterns

1 code implementation21 Dec 2023 Yifei Sun, Qi Zhu, Yang Yang, Chunping Wang, Tianyu Fan, Jiajun Zhu, Lei Chen

In this paper, we identify the fundamental cause of structural divergence as the discrepancy of generative patterns between the pre-training and downstream graphs.

Graph Mining Transfer Learning

Towards Fair Graph Federated Learning via Incentive Mechanisms

1 code implementation20 Dec 2023 Chenglu Pan, Jiarong Xu, Yue Yu, Ziqi Yang, Qingbiao Wu, Chunping Wang, Lei Chen, Yang Yang

Extensive experiments show that our model achieves the best trade-off between accuracy and the fairness of model gradient, as well as superior payoff fairness.

Fairness Federated Learning +1

Generalized Damping Torque Analysis of Ultra-Low Frequency Oscillation in the Jerk Space

no code implementations7 Dec 2023 Yichen Zhou, Yang Yang, Tao Zhou, Yonggang Li

A multi-information variable is constructed to transform the system into a new state space, where it is found that the jerk dynamics of the turbine-generator cascaded system is a second-order differential equation.

A WINNER+ Based 3-D Non-Stationary Wideband MIMO Channel Model

no code implementations1 Dec 2023 Ji Bian, Jian Sun, Cheng-Xiang Wang, Rui Feng, Jie Huang, Yang Yang, Minggao Zhang

In this paper, a three-dimensional (3-D) non-stationary wideband multiple-input multiple-output (MIMO) channel model based on the WINNER+ channel model is proposed.

KBioXLM: A Knowledge-anchored Biomedical Multilingual Pretrained Language Model

1 code implementation20 Nov 2023 Lei Geng, Xu Yan, Ziqiang Cao, Juntao Li, Wenjie Li, Sujian Li, Xinjie Zhou, Yang Yang, Jun Zhang

We achieve a biomedical multilingual corpus by incorporating three granularity knowledge alignments (entity, fact, and passage levels) into monolingual corpora.

Relation XLM-R

Better with Less: A Data-Active Perspective on Pre-Training Graph Neural Networks

1 code implementation NeurIPS 2023 Jiarong Xu, Renhong Huang, Xin Jiang, Yuxuan Cao, Carl Yang, Chunping Wang, Yang Yang

The proposed pre-training pipeline is called the data-active graph pre-training (APT) framework, and is composed of a graph selector and a pre-training model.

Technical Note: Feasibility of translating 3.0T-trained Deep-Learning Segmentation Models Out-of-the-Box on Low-Field MRI 0.55T Knee-MRI of Healthy Controls

no code implementations26 Oct 2023 Rupsa Bhattacharjee, Zehra Akkaya, Johanna Luitjens, Pan Su, Yang Yang, Valentina Pedoia, Sharmila Majumdar

The current study assesses the performance of standard in-practice bone, and cartilage segmentation algorithms at 0. 55T, both qualitatively and quantitatively, in terms of comparing segmentation performance, areas of improvement, and compartment-wise cartilage thickness values between 0. 55T vs. 3. 0T.

Image Segmentation Segmentation +1

Recent Advances in Multi-modal 3D Scene Understanding: A Comprehensive Survey and Evaluation

no code implementations24 Oct 2023 Yinjie Lei, Zixuan Wang, Feng Chen, Guoqing Wang, Peng Wang, Yang Yang

Multi-modal 3D scene understanding has gained considerable attention due to its wide applications in many areas, such as autonomous driving and human-computer interaction.

Autonomous Driving Scene Understanding

Non-Autoregressive Sentence Ordering

1 code implementation19 Oct 2023 Yi Bin, Wenhao Shi, Bin Ji, Jipeng Zhang, Yujuan Ding, Yang Yang

Existing sentence ordering approaches generally employ encoder-decoder frameworks with the pointer net to recover the coherence by recurrently predicting each sentence step-by-step.

Decoder Sentence +1

Solving Math Word Problems with Reexamination

1 code implementation14 Oct 2023 Yi Bin, Wenhao Shi, Yujuan Ding, Yang Yang, See-Kiong Ng

Math word problem (MWP) solving aims to understand the descriptive math problem and calculate the result, for which previous efforts are mostly devoted to upgrade different technical modules.

Descriptive Math

Solution for SMART-101 Challenge of ICCV Multi-modal Algorithmic Reasoning Task 2023

no code implementations10 Oct 2023 Xiangyu Wu, Yang Yang, Shengdong Xu, Yifeng Wu, QingGuo Chen, Jianfeng Lu

At the data level, inspired by the challenge paper, we categorized the whole questions into eight types and utilized the llama-2-chat model to directly generate the type for each question in a zero-shot manner.

Decoder object-detection +4

GraphLLM: Boosting Graph Reasoning Ability of Large Language Model

1 code implementation9 Oct 2023 Ziwei Chai, Tianjie Zhang, Liang Wu, Kaiqiao Han, Xiaohai Hu, Xuanwen Huang, Yang Yang

This synergy equips LLMs with the ability to proficiently interpret and reason on graph data, harnessing the superior expressive power of graph learning models.

Graph Learning Language Modelling +1

Towards Scalable Wireless Federated Learning: Challenges and Solutions

no code implementations8 Oct 2023 Yong Zhou, Yuanming Shi, Haibo Zhou, Jingjing Wang, Liqun Fu, Yang Yang

The explosive growth of smart devices (e. g., mobile phones, vehicles, drones) with sensing, communication, and computation capabilities gives rise to an unprecedented amount of data.

Federated Learning Privacy Preserving

Twin Graph-based Anomaly Detection via Attentive Multi-Modal Learning for Microservice System

1 code implementation7 Oct 2023 Jun Huang, Yang Yang, Hang Yu, Jianguo Li, Xiao Zheng

The MST graph provides a virtual representation of the status and scheduling relationships among service instances of a real-world microservice system.

Anomaly Detection Scheduling

Functional Geometry Guided Protein Sequence and Backbone Structure Co-Design

1 code implementation6 Oct 2023 Zhenqiao Song, Yunlong Zhao, Wenxian Shi, Yang Yang, Lei LI

In this paper, we propose NAEPro, a model to jointly design Protein sequence and structure based on automatically detected functional sites.

CIFAR-10-Warehouse: Broad and More Realistic Testbeds in Model Generalization Analysis

no code implementations6 Oct 2023 Xiaoxiao Sun, Xingjian Leng, Zijian Wang, Yang Yang, Zi Huang, Liang Zheng

Analyzing model performance in various unseen environments is a critical research problem in the machine learning community.

Benchmarking Domain Generalization +1

Joint Design of Protein Sequence and Structure based on Motifs

no code implementations4 Oct 2023 Zhenqiao Song, Yunlong Zhao, Yufei Song, Wenxian Shi, Yang Yang, Lei LI

Designing novel proteins with desired functions is crucial in biology and chemistry.

Decoder

Learning to Generate Lumped Hydrological Models

1 code implementation18 Sep 2023 Yang Yang, Ting Fong May Chui

Overall, this study demonstrates that the hydrological behavior of a catchment can be effectively described using a small number of latent variables, and that well-fitting hydrologic model functions can be reconstructed from these variables.

How to Generate Popular Post Headlines on Social Media?

no code implementations18 Sep 2023 Zhouxiang Fang, Min Yu, Zhendong Fu, Boning Zhang, Xuanwen Huang, Xiaoqi Tang, Yang Yang

Observation results demonstrate that trends and personal styles are widespread in headlines on social medias and have significant contribution to posts's popularity.

Headline Generation

Cross-Utterance Conditioned VAE for Speech Generation

no code implementations8 Sep 2023 Yang Li, Cheng Yu, Guangzhi Sun, Weiqin Zu, Zheng Tian, Ying Wen, Wei Pan, Chao Zhang, Jun Wang, Yang Yang, Fanglei Sun

Experimental results on the LibriTTS datasets demonstrate that our proposed models significantly enhance speech synthesis and editing, producing more natural and expressive speech.

Speech Synthesis

SPM: Structured Pretraining and Matching Architectures for Relevance Modeling in Meituan Search

no code implementations15 Aug 2023 Wen Zan, Yaopeng Han, Xiaotian Jiang, Yao Xiao, Yang Yang, Dayao Chen, Sheng Chen

At pretraining stage, we propose an effective pretraining method that employs both query and multiple fields of document as inputs, including an effective information compression method for lengthy fields.

Language Modelling

Routing Recovery for UAV Networks with Deliberate Attacks: A Reinforcement Learning based Approach

no code implementations14 Aug 2023 Sijie He, Ziye Jia, Chao Dong, Wei Wang, Yilu Cao, Yang Yang, Qihui Wu

The unmanned aerial vehicle (UAV) network is popular these years due to its various applications.

Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval

1 code implementation8 Aug 2023 Yi Bin, Haoxuan Li, Yahui Xu, Xing Xu, Yang Yang, Heng Tao Shen

Specifically, on two key tasks, \textit{i. e.}, image-to-text and text-to-image retrieval, HAT achieves 7. 6\% and 16. 7\% relative score improvement of Recall@1 on MSCOCO, and 4. 4\% and 11. 6\% on Flickr30k respectively.

Cross-Modal Retrieval Image Retrieval +1

Your Negative May not Be True Negative: Boosting Image-Text Matching with False Negative Elimination

1 code implementation8 Aug 2023 Haoxuan Li, Yi Bin, Junrong Liao, Yang Yang, Heng Tao Shen

Most existing image-text matching methods adopt triplet loss as the optimization objective, and choosing a proper negative sample for the triplet of <anchor, positive, negative> is important for effectively training the model, e. g., hard negatives make the model learn efficiently and effectively.

Image-text matching Representation Learning +1

A Novel DDPM-based Ensemble Approach for Energy Theft Detection in Smart Grids

no code implementations30 Jul 2023 Xun Yuan, Yang Yang, Asif Iqbal, Prosanta Gope, Biplab Sikdar

To address these challenges, several unsupervised ETD methods have been proposed, focusing on learning the normal patterns from honest users, specifically the reconstruction of input.

Denoising

MBrain: A Multi-channel Self-Supervised Learning Framework for Brain Signals

no code implementations15 Jun 2023 Donghong Cai, Junru Chen, Yang Yang, Teng Liu, Yafeng Li

Intuitively, brain signals, generated by the firing of neurons, are transmitted among different connecting structures in human brain.

EEG Seizure Detection +1

Accelerating Dynamic Network Embedding with Billions of Parameter Updates to Milliseconds

1 code implementation15 Jun 2023 Haoran Deng, Yang Yang, Jiahe Li, Haoyang Cai, ShiLiang Pu, Weihao Jiang

Network embedding, a graph representation learning method illustrating network topology by mapping nodes into lower-dimension vectors, is challenging to accommodate the ever-changing dynamic graphs in practice.

Graph Reconstruction Graph Representation Learning +3

GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model

1 code implementation11 Jun 2023 Shicheng Tan, Weng Lam Tam, Yuanchun Wang, Wenwen Gong, Yang Yang, Hongyin Tang, Keqing He, Jiahao Liu, Jingang Wang, Shu Zhao, Peng Zhang, Jie Tang

Currently, the reduction in the parameter scale of large-scale pre-trained language models (PLMs) through knowledge distillation has greatly facilitated their widespread deployment on various devices.

General Knowledge Knowledge Distillation +1

Probabilistic Multi-Dimensional Classification

1 code implementation10 Jun 2023 Vu-Linh Nguyen, Yang Yang, Cassio de Campos

We propose a formal framework for probabilistic MDC in which learning an optimal multi-dimensional classifier can be decomposed, without loss of generality, into learning a set of (smaller) single-variable multi-class probabilistic classifiers and a directed acyclic graph.

Classification

COURIER: Contrastive User Intention Reconstruction for Large-Scale Visual Recommendation

1 code implementation8 Jun 2023 Jia-Qi Yang, Chenglei Dai, Dan Ou, Dongshuai Li, Ju Huang, De-Chuan Zhan, Xiaoyi Zeng, Yang Yang

Even if the performance of cross-modal prediction tasks is excellent, it is challenging to provide significant information gain for the downstream models.

Attribute Click-Through Rate Prediction +1

A Novel Correlation-optimized Deep Learning Method for Wind Speed Forecast

1 code implementation3 Jun 2023 Yang Yang, Jin Lang, Jian Wu, Yanyan Zhang, Xiang Zhao

Finally, the effectiveness of the proposed method is verified by three wind prediction cases from a wind farm in Liaoning, China.

Teacher Agent: A Knowledge Distillation-Free Framework for Rehearsal-based Video Incremental Learning

1 code implementation1 Jun 2023 Shengqin Jiang, Yaoyu Fang, Haokui Zhang, Qingshan Liu, Yuankai Qi, Yang Yang, Peng Wang

Rehearsal-based video incremental learning often employs knowledge distillation to mitigate catastrophic forgetting of previously learned data.

Incremental Learning Knowledge Distillation +1

Cross-Domain Car Detection Model with Integrated Convolutional Block Attention Mechanism

no code implementations31 May 2023 Haoxuan Xu, Songning Lai, Xianyang Li, Yang Yang

To address these issues, we propose cross-domain Car Detection Model with integrated convolutional block Attention mechanism(CDMA) that we apply to car recognition for autonomous driving and other areas.

Autonomous Driving object-detection +1

PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language Models

no code implementations30 May 2023 Zhuocheng Gong, Jiahao Liu, Qifan Wang, Yang Yang, Jingang Wang, Wei Wu, Yunsen Xian, Dongyan Zhao, Rui Yan

While transformer-based pre-trained language models (PLMs) have dominated a number of NLP applications, these models are heavy to deploy and expensive to use.

Quantization

Deep Neural Networks in Video Human Action Recognition: A Review

no code implementations25 May 2023 Zihan Wang, Yang Yang, Zhi Liu, Yifan Zheng

Our current related research addresses multiple novel proposed research works and compares their advantages and disadvantages between the derived deep learning frameworks rather than machine learning frameworks.

Action Recognition Optical Flow Estimation +1

Breaking the Curse of Quality Saturation with User-Centric Ranking

no code implementations24 May 2023 Zhuokai Zhao, Yang Yang, Wenyu Wang, Chihuang Liu, Yu Shi, Wenjie Hu, Haotian Zhang, Shuang Yang

A key puzzle in search, ads, and recommendation is that the ranking model can only utilize a small portion of the vastly available user interaction data.

Faster Video Moment Retrieval with Point-Level Supervision

no code implementations23 May 2023 Xun Jiang, Zailei Zhou, Xing Xu, Yang Yang, Guoqing Wang, Heng Tao Shen

Existing VMR methods suffer from two defects: (1) massive expensive temporal annotations are required to obtain satisfying performance; (2) complicated cross-modal interaction modules are deployed, which lead to high computational cost and low efficiency for the retrieval process.

Moment Retrieval Natural Language Queries +1

Task-agnostic Distillation of Encoder-Decoder Language Models

no code implementations21 May 2023 Chen Zhang, Yang Yang, Jingang Wang, Dawei Song

Finetuning pretrained language models (LMs) have enabled appealing performance on a diverse array of tasks.

Abstractive Text Summarization Decoder

Lifting the Curse of Capacity Gap in Distilling Language Models

1 code implementation20 May 2023 Chen Zhang, Yang Yang, Jiahao Liu, Jingang Wang, Yunsen Xian, Benyou Wang, Dawei Song

However, when the capacity gap between the teacher and the student is large, a curse of capacity gap appears, invoking a deficiency in distilling LMs.

Knowledge Distillation

InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language

2 code implementations9 May 2023 Zhaoyang Liu, Yinan He, Wenhai Wang, Weiyun Wang, Yi Wang, Shoufa Chen, Qinglong Zhang, Zeqiang Lai, Yang Yang, Qingyun Li, Jiashuo Yu, Kunchang Li, Zhe Chen, Xue Yang, Xizhou Zhu, Yali Wang, LiMin Wang, Ping Luo, Jifeng Dai, Yu Qiao

Different from existing interactive systems that rely on pure language, by incorporating pointing instructions, the proposed iGPT significantly improves the efficiency of communication between users and chatbots, as well as the accuracy of chatbots in vision-centric tasks, especially in complicated visual scenarios where the number of objects is greater than 2.

Language Modelling

Non-Autoregressive Math Word Problem Solver with Unified Tree Structure

1 code implementation8 May 2023 Yi Bin, Mengqun Han, Wenhao Shi, Lei Wang, Yang Yang, See-Kiong Ng, Heng Tao Shen

For evaluating the possible expression variants, we design a path-based metric to evaluate the partial accuracy of expressions of a unified tree.

Math valid

MrTF: Model Refinery for Transductive Federated Learning

1 code implementation7 May 2023 Xin-Chun Li, Yang Yang, De-Chuan Zhan

We propose a novel learning paradigm named transductive federated learning (TFL) to simultaneously consider the structural information of the to-be-inferred data.

Federated Learning

A Simulation-Augmented Benchmarking Framework for Automatic RSO Streak Detection in Single-Frame Space Images

no code implementations30 Apr 2023 Zhe Chen, Yang Yang, Anne Bettens, Youngho Eun, Xiaofeng Wu

In our framework, by making the best use of the hardware parameters of the sensor that captures real-world space images, we first develop a high-fidelity RSO simulator that can generate various realistic space images.

Benchmarking object-detection +1

Evaluating ChatGPT's Information Extraction Capabilities: An Assessment of Performance, Explainability, Calibration, and Faithfulness

1 code implementation23 Apr 2023 Bo Li, Gexiang Fang, Yang Yang, Quansen Wang, Wei Ye, Wen Zhao, Shikun Zhang

The capability of Large Language Models (LLMs) like ChatGPT to comprehend user intent and provide reasonable responses has made them extremely popular lately.

CrossFusion: Interleaving Cross-modal Complementation for Noise-resistant 3D Object Detection

no code implementations19 Apr 2023 Yang Yang, Weijie Ma, Hao Chen, Linlin Ou, Xinyi Yu

The combination of LiDAR and camera modalities is proven to be necessary and typical for 3D object detection according to recent studies.

3D Object Detection Depth Estimation +1

CoVLR: Coordinating Cross-Modal Consistency and Intra-Modal Structure for Vision-Language Retrieval

no code implementations15 Apr 2023 Yang Yang, Zhongtian Fu, Xiangyu Wu, Wenjie Li

To address this challenge, in this paper, we experimentally observe that the vision-language divergence may cause the existence of strong and weak modalities, and the hard cross-modal consistency cannot guarantee that strong modal instances' relationships are not affected by weak modality, resulting in the strong modal instances' relationships perturbed despite learned consistent representations. To this end, we propose a novel and directly Coordinated VisionLanguage Retrieval method (dubbed CoVLR), which aims to study and alleviate the desynchrony problem between the cross-modal alignment and single-modal cluster-preserving tasks.

cross-modal alignment Cross-Modal Retrieval +2

Learning Semantic-Aware Knowledge Guidance for Low-Light Image Enhancement

1 code implementation CVPR 2023 Yuhui Wu, Chen Pan, Guoqing Wang, Yang Yang, Jiwei Wei, Chongyi Li, Heng Tao Shen

To address this issue, we propose a novel semantic-aware knowledge-guided framework (SKF) that can assist a low-light enhancement model in learning rich and diverse priors encapsulated in a semantic segmentation model.

Low-Light Image Enhancement Semantic Segmentation

$\text{DC}^2$: Dual-Camera Defocus Control by Learning to Refocus

no code implementations6 Apr 2023 Hadi AlZayer, Abdullah Abuolaim, Leung Chun Chan, Yang Yang, Ying Chen Lou, Jia-Bin Huang, Abhishek Kar

Smartphone cameras today are increasingly approaching the versatility and quality of professional cameras through a combination of hardware and software advancements.

Deblurring

When to Pre-Train Graph Neural Networks? From Data Generation Perspective!

1 code implementation29 Mar 2023 Yuxuan Cao, Jiarong Xu, Carl Yang, Jiaan Wang, Yunchao Zhang, Chunping Wang, Lei Chen, Yang Yang

All convex combinations of graphon bases give rise to a generator space, from which graphs generated form the solution space for those downstream data that can benefit from pre-training.

Learning a Deep Color Difference Metric for Photographic Images

1 code implementation CVPR 2023 Haoyu Chen, Zhihua Wang, Yang Yang, Qilin Sun, Kede Ma

Most well-established and widely used color difference (CD) metrics are handcrafted and subject-calibrated against uniformly colored patches, which do not generalize well to photographic images characterized by natural scene complexities.

ScanERU: Interactive 3D Visual Grounding based on Embodied Reference Understanding

1 code implementation23 Mar 2023 Ziyang Lu, Yunqiang Pei, Guoqing Wang, Yang Yang, Zheng Wang, Heng Tao Shen

Despite their effectiveness, existing methods suffer from the difficulty of low recognition accuracy in cases of multiple adjacent objects with similar appearances. To address this issue, this work intuitively introduces the human-robot interaction as a cue to facilitate the development of 3D visual grounding.

3D visual grounding

Explainable Semantic Communication for Text Tasks

no code implementations22 Mar 2023 Chuanhong Liu, Caili Guo, Yang Yang, Wanli Ni, Yanquan Zhou, Lei LI, Tony Q. S. Quek

In this paper, we propose a triplet-based explainable semantic communication (TESC) scheme for representing text semantics efficiently.