Search Results for author: Yanfeng Wang

Found 172 papers, 98 papers with code

VocalNet: Speech LLM with Multi-Token Prediction for Faster and High-Quality Generation

no code implementations5 Apr 2025 Yuhao Wang, Heyang Liu, Ziyang Cheng, Ronghua Wu, Qunshan Gu, Yanfeng Wang, Yu Wang

Speech large language models (LLMs) have emerged as a prominent research focus in speech processing.

COST: Contrastive One-Stage Transformer for Vision-Language Small Object Tracking

1 code implementation2 Apr 2025 Chunhui Zhang, Li Liu, Jialin Gao, Xin Sun, Hao Wen, Xi Zhou, Shiming Ge, Yanfeng Wang

In this work, we propose COST, a contrastive one-stage transformer fusion framework for VL tracking, aiming to learn semantically consistent and unified VL representations.

cross-modal alignment Object +1

RARE: Retrieval-Augmented Reasoning Modeling

1 code implementation30 Mar 2025 Zhengren Wang, Jiayang Yu, Dongsheng Ma, Zhe Chen, Yu Wang, Zhiyu Li, Feiyu Xiong, Yanfeng Wang, Weinan E, Linpeng Tang, Wentao Zhang

Domain-specific intelligence demands specialized knowledge and sophisticated reasoning for problem-solving, posing significant challenges for large language models (LLMs) that struggle with knowledge hallucination and inadequate reasoning capabilities under constrained parameter budgets.

Hallucination Memorization +1

4DGC: Rate-Aware 4D Gaussian Compression for Efficient Streamable Free-Viewpoint Video

no code implementations24 Mar 2025 Qiang Hu, Zihan Zheng, Houqiang Zhong, Sihua Fu, Li Song, XiaoyunZhang, Guangtao Zhai, Yanfeng Wang

Existing methods typically handle dynamic 3DGS representation and compression separately, neglecting motion information and the rate-distortion (RD) trade-off during training, leading to performance degradation and increased model redundancy.

3DGS Quantization

ChatBEV: A Visual Language Model that Understands BEV Maps

no code implementations18 Mar 2025 Qingyao Xu, Siheng Chen, Guang Chen, Yanfeng Wang, Ya zhang

Traffic scene understanding is essential for intelligent transportation systems and autonomous driving, ensuring safe and efficient vehicle operation.

Autonomous Driving Language Modeling +4

FedMABench: Benchmarking Mobile Agents on Decentralized Heterogeneous User Data

1 code implementation7 Mar 2025 Wenhao Wang, Zijie Yu, Rui Ye, Jianqing Zhang, Siheng Chen, Yanfeng Wang

FedMABench features 6 datasets with 30+ subsets, 8 federated algorithms, 10+ base models, and over 800 apps across 5 categories, providing a comprehensive framework for evaluating mobile agents across diverse environments.

Benchmarking Federated Learning

Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases

no code implementations6 Mar 2025 Pengcheng Qiu, Chaoyi Wu, Shuyu Liu, Weike Zhao, Zhuoxia Chen, Hongfei Gu, Chuanjin Peng, Ya zhang, Yanfeng Wang, Weidi Xie

Notably, open-source models like DeepSeek-R1 are narrowing the gap with proprietary systems, highlighting their potential to drive accessible and equitable advancements in healthcare.

Benchmarking Diagnostic

DSVD: Dynamic Self-Verify Decoding for Faithful Generation in Large Language Models

no code implementations5 Mar 2025 YiQiu Guo, Yuchen Yang, Zhe Chen, Pingjie Wang, Yusheng Liao, Ya zhang, Yanfeng Wang, Yu Wang

The reliability of large language models remains a critical challenge, particularly due to their susceptibility to hallucinations and factual inaccuracies during text generation.

Hallucination Text Generation

M^3Builder: A Multi-Agent System for Automated Machine Learning in Medical Imaging

no code implementations27 Feb 2025 Jinghao Feng, Qiaoyu Zheng, Chaoyi Wu, Ziheng Zhao, Ya zhang, Yanfeng Wang, Weidi Xie

In this paper, we make three contributions: (i) We present M3Builder, a novel multi-agent system designed to automate machine learning (ML) in medical imaging.

Auto Debugging

Contrast-Unity for Partially-Supervised Temporal Sentence Grounding

no code implementations18 Feb 2025 Haicheng Wang, Chen Ju, Weixiong Lin, Chaofan Ma, Shuai Xiao, Ya zhang, Yanfeng Wang

Temporal sentence grounding aims to detect event timestamps described by the natural language query from given untrimmed videos.

Contrastive Learning Denoising +3

FedMobileAgent: Training Mobile Agents Using Decentralized Self-Sourced Data from Diverse Users

no code implementations5 Feb 2025 Wenhao Wang, Zijie Yu, William Liu, Rui Ye, Tian Jin, Siheng Chen, Yanfeng Wang

To tackle these challenges, we propose FedMobileAgent, a collaborative framework that trains mobile agents using self-sourced data from diverse users.

WanJuanSiLu: A High-Quality Open-Source Webtext Dataset for Low-Resource Languages

1 code implementation24 Jan 2025 JIA YU, Fei Yuan, Rui Min, Jing Yu, Pei Chu, Jiayang Li, Wei Li, Ruijie Zhang, Zhenxiang Li, Zhifei Ren, Dong Zheng, Wenjian Zhang, Yan Teng, Lingyu Meng, Zhenjiang Jin, Jiantao Qiu, Shasha Wang, Zhongying Tu, Dahua Lin, Yu Wang, Yu Qiao, Yanfeng Wang, Conghui He

This paper introduces the open-source dataset WanJuanSiLu, designed to provide high-quality training corpora for low-resource languages, thereby advancing the research and development of multilingual models.

Diversity

MedS$^3$: Towards Medical Small Language Models with Self-Evolved Slow Thinking

1 code implementation21 Jan 2025 Shuyang Jiang, Yusheng Liao, Zhe Chen, Ya zhang, Yanfeng Wang, Yu Wang

In this work, we present a deployable, small-scale medical language model, \mone, designed for long-chain reasoning in clinical tasks using a self-evolution paradigm.

Multiple-choice

Active Sampling for Node Attribute Completion on Graphs

no code implementations14 Jan 2025 Benyuan Liu, Xu Chen, Yanfeng Wang, Ya zhang, Zhi Cao, Ivor Tsang

Node attribute, a type of crucial information for graph analysis, may be partially or completely missing for certain nodes in real world applications.

Attribute Graph Learning

Towards Omni-RAG: Comprehensive Retrieval-Augmented Generation for Large Language Models in Medical Applications

no code implementations5 Jan 2025 Zhe Chen, Yusheng Liao, Shuyang Jiang, Pingjie Wang, YiQiu Guo, Yanfeng Wang, Yu Wang

Large language models (LLMs) hold promise for addressing healthcare challenges but often generate hallucinations due to limited integration of medical knowledge.

RAG Retrieval

A Knowledge-enhanced Pathology Vision-language Foundation Model for Cancer Diagnosis

2 code implementations17 Dec 2024 Xiao Zhou, Luoyi Sun, Dexuan He, Wenbin Guan, Ruifen Wang, LiFeng Wang, Xin Sun, Kun Sun, Ya zhang, Yanfeng Wang, Weidi Xie

To derive more nuanced image and text representations, we propose a novel knowledge-enhanced vision-language pre-training approach that integrates disease knowledge into the alignment within hierarchical semantic groups instead of unstructured image-text pairs.

Diagnostic Specificity +1

VRVVC: Variable-Rate NeRF-Based Volumetric Video Compression

no code implementations16 Dec 2024 Qiang Hu, Houqiang Zhong, Zihan Zheng, Xiaoyun Zhang, Zhengxue Cheng, Li Song, Guangtao Zhai, Yanfeng Wang

In this paper, we propose VRVVC, a novel end-to-end joint optimization variable-rate framework for volumetric video compression that achieves variable bitrates using a single model while maintaining superior RD performance.

NeRF Quantization +1

Drawing the Line: Enhancing Trustworthiness of MLLMs Through the Power of Refusal

no code implementations15 Dec 2024 Yuhao Wang, Zhiyuan Zhu, Heyang Liu, Yusheng Liao, Hongcheng Liu, Yanfeng Wang, Yu Wang

Multimodal large language models (MLLMs) excel at multimodal perception and understanding, yet their tendency to generate hallucinated or inaccurate responses undermines their trustworthiness.

Can Modern LLMs Act as Agent Cores in Radiology Environments?

1 code implementation12 Dec 2024 Qiaoyu Zheng, Chaoyi Wu, Pengcheng Qiu, Lisong Dai, Ya zhang, Yanfeng Wang, Weidi Xie

Advancements in large language models (LLMs) have paved the way for LLM-based agent systems that offer enhanced accuracy and interpretability across various domains.

MRGen: Diffusion-based Controllable Data Engine for MRI Segmentation towards Unannotated Modalities

1 code implementation4 Dec 2024 HaoNing Wu, Ziheng Zhao, Ya zhang, Weidi Xie, Yanfeng Wang

Medical image segmentation has recently demonstrated impressive progress with deep neural networks, yet the heterogeneous modalities and scarcity of mask annotations limit the development of segmentation models on unannotated modalities.

Image Generation Image Segmentation +4

Towards Universal Soccer Video Understanding

1 code implementation2 Dec 2024 Jiayuan Rao, HaoNing Wu, Hao Jiang, Ya zhang, Yanfeng Wang, Weidi Xie

As a globally celebrated sport, soccer has attracted widespread interest from fans all over the world.

Action Classification Sports Understanding +1

MambaTrack: Exploiting Dual-Enhancement for Night UAV Tracking

1 code implementation24 Nov 2024 Chunhui Zhang, Li Liu, Hao Wen, Xi Zhou, Yanfeng Wang

Night unmanned aerial vehicle (UAV) tracking is impeded by the challenges of poor illumination, with previous daylight-optimized methods demonstrating suboptimal performance in low-light conditions, limiting the utility of UAV applications.

Image Enhancement Mamba +1

Task-Aware Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning

1 code implementation2 Nov 2024 Ziqing Fan, Shengchao Hu, YuHang Zhou, Li Shen, Ya zhang, Yanfeng Wang, DaCheng Tao

The purpose of offline multi-task reinforcement learning (MTRL) is to develop a unified policy applicable to diverse tasks without the need for online environmental interaction.

Meta-Learning

ReflecTool: Towards Reflection-Aware Tool-Augmented Clinical Agents

3 code implementations23 Oct 2024 Yusheng Liao, Shuyang Jiang, Yanfeng Wang, Yu Wang

Large Language Models (LLMs) have shown promising potential in the medical domain, assisting with tasks like clinical note generation and patient communication.

Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation

1 code implementation18 Oct 2024 Shuo Tang, Xianghe Pang, Zexi Liu, Bohan Tang, Rui Ye, Xiaowen Dong, Yanfeng Wang, Siheng Chen

Post-training is essential for enabling large language models (LLMs) to follow human instructions.

Data Quality Control in Federated Instruction-tuning of Large Language Models

no code implementations15 Oct 2024 Yaxin Du, Rui Ye, Fengting Yuchi, Wanru Zhao, Jingjing Qu, Yanfeng Wang, Siheng Chen

To address this gap, we propose a new framework of federated instruction tuning of LLMs with data quality control (FedDQC), which measures data quality to facilitate the subsequent filtering and hierarchical training processes.

Federated Learning Privacy Preserving

CliMedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models in Clinical Scenarios

1 code implementation4 Oct 2024 Zetian Ouyang, Yishuai Qiu, LinLin Wang, Gerard de Melo, Ya zhang, Yanfeng Wang, Liang He

With the proliferation of Large Language Models (LLMs) in diverse domains, there is a particular need for unified evaluation standards in clinical medical scenarios, where models need to be examined very thoroughly.

Clinical Knowledge Diagnostic

LoRKD: Low-Rank Knowledge Decomposition for Medical Foundation Models

2 code implementations29 Sep 2024 Haolin Li, YuHang Zhou, Ziheng Zhao, Siyuan Du, Jiangchao Yao, Weidi Xie, Ya zhang, Yanfeng Wang

To accomplish the above objective, we propose a novel framework named Low-Rank Knowledge Decomposition (LoRKD), which explicitly separates gradients from different tasks by incorporating low-rank expert modules and efficient knowledge separation convolution.

3D Medical Imaging Segmentation Medical Image Classification

Towards Underwater Camouflaged Object Tracking: Benchmark and Baselines

2 code implementations25 Sep 2024 Chunhui Zhang, Li Liu, Guanjie Huang, Hao Wen, Xi Zhou, Yanfeng Wang

Based on the proposed dataset, this paper first comprehensively evaluates current advanced visual object tracking methods and SAM- and SAM2-based trackers in challenging underwater environments.

Object Video Segmentation +2

Leveraging Unstructured Text Data for Federated Instruction Tuning of Large Language Models

no code implementations11 Sep 2024 Rui Ye, Rui Ge, Yuchi Fengting, Jingyi Chai, Yanfeng Wang, Siheng Chen

Federated instruction tuning enables multiple clients to collaboratively fine-tune a shared large language model (LLM) that can follow humans' instructions without directly sharing raw data.

Language Modelling Large Language Model +1

Self-supervised Anomaly Detection Pretraining Enhances Long-tail ECG Diagnosis

1 code implementation30 Aug 2024 Aofan Jiang, Chaoqin Huang, Qing Cao, Yuchen Xu, Zi Zeng, Kang Chen, Ya zhang, Yanfeng Wang

Current computer-aided ECG diagnostic systems struggle with the underdetection of rare but critical cardiac anomalies due to the imbalanced nature of ECG datasets.

Diagnostic Self-Supervised Anomaly Detection +2

Towards Evaluating and Building Versatile Large Language Models for Medicine

1 code implementation22 Aug 2024 Chaoyi Wu, Pengcheng Qiu, Jinxin Liu, Hongfei Gu, Na Li, Ya zhang, Yanfeng Wang, Weidi Xie

To promote further advancements in the application of LLMs to clinical challenges, we have made the MedS-Ins dataset fully accessible and invite the research community to contribute to its expansion. Additionally, we have launched a dynamic leaderboard for MedS-Bench, which we plan to regularly update the test set to track progress and enhance the adaptation of general LLMs to the medical domain.

Multiple-choice named-entity-recognition +2

MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning

1 code implementation20 Aug 2024 HaoNing Wu, Shaocheng Shen, Qiang Hu, Xiaoyun Zhang, Ya zhang, Yanfeng Wang

Diffusion models have emerged as frontrunners in text-to-image generation, but their fixed image resolution during training often leads to challenges in high-resolution image generation, such as semantic deviations and object replication.

Denoising Scheduling +1

Decoding Linguistic Representations of Human Brain

no code implementations30 Jul 2024 Yu Wang, Heyang Liu, Yuhao Wang, Chuan Xuan, Yixuan Hou, Sheng Feng, Hongcheng Liu, Yusheng Liao, Yanfeng Wang

Language, as an information medium created by advanced organisms, has always been a concern of neuroscience regarding how it is represented in the brain.

Brain Computer Interface

AutoRG-Brain: Grounded Report Generation for Brain MRI

no code implementations23 Jul 2024 Jiayu Lei, Xiaoman Zhang, Chaoyi Wu, Lisong Dai, Ya zhang, Yanyong Zhang, Yanfeng Wang, Weidi Xie, Yuehua Li

To address these challenges, we initiate a series of work on grounded Automatic Report Generation (AutoRG), starting from the brain MRI interpretation system, which supports the delineation of brain structures, the localization of anomalies, and the generation of well-organized findings.

Anomaly Localization Anomaly Segmentation

Reconstruct the Pruned Model without Any Retraining

no code implementations18 Jul 2024 Pingjie Wang, Ziqing Fan, Shengchao Hu, Zhe Chen, Yanfeng Wang, Yu Wang

Structured pruning is a promising hardware-friendly compression technique for large language models (LLMs), which is expected to be retraining-free to avoid the enormous retraining cost.

Common Sense Reasoning model

HPC: Hierarchical Progressive Coding Framework for Volumetric Video

no code implementations12 Jul 2024 Zihan Zheng, Houqiang Zhong, Qiang Hu, Xiaoyun Zhang, Li Song, Ya zhang, Yanfeng Wang

Volumetric video based on Neural Radiance Field (NeRF) holds vast potential for various 3D applications, but its substantial data volume poses significant challenges for compression and transmission.

NeRF

Reprogramming Distillation for Medical Foundation Models

1 code implementation9 Jul 2024 YuHang Zhou, Siyuan Du, Haolin Li, Jiangchao Yao, Ya zhang, Yanfeng Wang

However, due to the gap between pre-training tasks (or modalities) and downstream tasks (or modalities), the real-world computation and speed constraints, it might not be straightforward to apply medical foundation models in the downstream scenarios.

Knowledge Distillation parameter-efficient fine-tuning +1

MatchTime: Towards Automatic Soccer Game Commentary Generation

1 code implementation26 Jun 2024 Jiayuan Rao, HaoNing Wu, Chang Liu, Yanfeng Wang, Weidi Xie

Soccer is a globally popular sport with a vast audience, in this paper, we consider constructing an automatic soccer game commentary model to improve the audiences' viewing experience.

MedCare: Advancing Medical LLMs through Decoupling Clinical Alignment and Knowledge Aggregation

3 code implementations25 Jun 2024 Yusheng Liao, Shuyang Jiang, Zhe Chen, Yanfeng Wang, Yu Wang

Based on this two-stage paradigm, we proposed a Medical LLM through decoupling Clinical Alignment and Knowledge Aggregation (MedCare), which is designed to achieve state-of-the-art (SOTA) performance on over 20 medical tasks, as well as SOTA results on specific medical alignment tasks.

Diversity Natural Language Understanding

RaTEScore: A Metric for Radiology Report Generation

3 code implementations24 Jun 2024 Weike Zhao, Chaoyi Wu, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie

This paper introduces a novel, entity-aware metric, termed as Radiological Report (Text) Evaluation (RaTEScore), to assess the quality of medical reports generated by AI models.

Diagnostic Entity Embeddings +4

Self-Localized Collaborative Perception

no code implementations18 Jun 2024 Zhenyang Ni, Zixing Lei, Yifan Lu, Dingju Wang, Chen Feng, Yanfeng Wang, Siheng Chen

However, existing collaborative perception systems heavily rely on precise localization systems to establish a consistent spatial coordinate system between agents.

Towards an End-to-End Framework for Invasive Brain Signal Decoding with Large Language Models

1 code implementation17 Jun 2024 Sheng Feng, Heyang Liu, Yu Wang, Yanfeng Wang

In this paper, we introduce a groundbreaking end-to-end (E2E) framework for decoding invasive brain signals, marking a significant advancement in the field of speech neuroprosthesis.

Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models

no code implementations15 Jun 2024 Rui Ye, Jingyi Chai, Xiangrui Liu, Yaodong Yang, Yanfeng Wang, Siheng Chen

Federated learning (FL) enables multiple parties to collaboratively fine-tune an large language model (LLM) without the need of direct data sharing.

Federated Learning Language Modelling +2

Exploring Training on Heterogeneous Data with Mixture of Low-rank Adapters

1 code implementation14 Jun 2024 YuHang Zhou, Zihua Zhao, Haolin Li, Siyuan Du, Jiangchao Yao, Ya zhang, Yanfeng Wang

Training a unified model to take multiple targets into account is a trend towards artificial general intelligence.

Few-Shot Anomaly Detection via Category-Agnostic Registration Learning

1 code implementation13 Jun 2024 Chaoqin Huang, Haoyan Guan, Aofan Jiang, Ya zhang, Michael Spratling, Xinchao Wang, Yanfeng Wang

At test time, an image and its corresponding support set, consisting of a few normal images from the same category, are supplied, and anomalies are identified by comparing the registered features of the test image to its corresponding support image features.

Anomaly Detection Representation Learning

Diversified Batch Selection for Training Acceleration

1 code implementation7 Jun 2024 Feng Hong, Yueming Lyu, Jiangchao Yao, Ya zhang, Ivor W. Tsang, Yanfeng Wang

The remarkable success of modern machine learning models on large datasets often demands extensive training time and resource consumption.

Diversity

FedLLM-Bench: Realistic Benchmarks for Federated Learning of Large Language Models

3 code implementations7 Jun 2024 Rui Ye, Rui Ge, Xinyu Zhu, Jingyi Chai, Yaxin Du, Yang Liu, Yanfeng Wang, Siheng Chen

Addressing this, we propose FedLLM-Bench, which involves 8 training methods, 4 training datasets, and 6 evaluation metrics, to offer a comprehensive testbed for the FedLLM community.

Federated Learning

WebUOT-1M: Advancing Deep Underwater Object Tracking with A Million-Scale Benchmark

1 code implementation30 May 2024 Chunhui Zhang, Li Liu, Guanjie Huang, Hao Wen, Xi Zhou, Yanfeng Wang

Most existing trackers are tailored for open-air environments, leading to performance degradation when applied to UOT due to domain gaps.

Knowledge Distillation Object Tracking

TAIA: Large Language Models are Out-of-Distribution Data Learners

1 code implementation30 May 2024 Shuyang Jiang, Yusheng Liao, Ya zhang, Yanfeng Wang, Yu Wang

However, in certain specialized domains, such as healthcare or harmless content generation, it is nearly impossible to obtain a large volume of high-quality data that matches the downstream distribution.

Math

Domain-Inspired Sharpness-Aware Minimization Under Domain Shifts

1 code implementation29 May 2024 Ruipeng Zhang, Ziqing Fan, Jiangchao Yao, Ya zhang, Yanfeng Wang

This paper presents a Domain-Inspired Sharpness-Aware Minimization (DISAM) algorithm for optimization under domain shifts.

Domain Generalization parameter-efficient fine-tuning

Federated Learning with Bilateral Curation for Partially Class-Disjoint Data

1 code implementation NeurIPS 2023 Ziqing Fan, Ruipeng Zhang, Jiangchao Yao, Bo Han, Ya zhang, Yanfeng Wang

Partially class-disjoint data (PCDD), a common yet under-explored data formation where each client contributes a part of classes (instead of all classes) of samples, severely challenges the performance of federated algorithms.

Federated Learning

Federated Learning under Partially Class-Disjoint Data via Manifold Reshaping

1 code implementation29 May 2024 Ziqing Fan, Jiangchao Yao, Ruipeng Zhang, Lingjuan Lyu, Ya zhang, Yanfeng Wang

Statistical heterogeneity severely limits the performance of federated learning (FL), motivating several explorations e. g., FedProx, MOON and FedDyn, to alleviate this problem.

Federated Learning

Locally Estimated Global Perturbations are Better than Local Perturbations for Federated Sharpness-aware Minimization

1 code implementation29 May 2024 Ziqing Fan, Shengchao Hu, Jiangchao Yao, Gang Niu, Ya zhang, Masashi Sugiyama, Yanfeng Wang

However, the local loss landscapes may not accurately reflect the flatness of global loss landscape in heterogeneous environments; as a result, minimizing local sharpness and calculating perturbations on client data might not align the efficacy of SAM in FL with centralized training.

Federated Learning

HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning

1 code implementation28 May 2024 Shengchao Hu, Ziqing Fan, Li Shen, Ya zhang, Yanfeng Wang, DaCheng Tao

However, variations in task content and complexity pose significant challenges in policy formulation, necessitating judicious parameter sharing and management of conflicting gradients for optimal policy performance.

Management Meta-Learning +2

Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning

1 code implementation CVPR 2024 Zihua Zhao, Mengxi Chen, Tianjie Dai, Jiangchao Yao, Bo Han, Ya zhang, Yanfeng Wang

Prior approaches to leverage such data mainly consider the application of uni-modal noisy label learning without amending the impact on both cross-modal and intra-modal geometrical structures in multimodal learning.

Cross-modal retrieval with noisy correspondence

Q-value Regularized Transformer for Offline Reinforcement Learning

2 code implementations27 May 2024 Shengchao Hu, Ziqing Fan, Chaoqin Huang, Li Shen, Ya zhang, Yanfeng Wang, DaCheng Tao

Recent advancements in offline reinforcement learning (RL) have underscored the capabilities of Conditional Sequence Modeling (CSM), a paradigm that learns the action distribution based on history trajectory and target returns for each state.

D4RL Offline RL +4

Language-Driven Interactive Traffic Trajectory Generation

1 code implementation24 May 2024 Junkai Xia, Chenxin Xu, Qingyao Xu, Chen Xie, Yanfeng Wang, Siheng Chen

To produce interactive traffic trajectories, we propose a code-to-trajectory decoder with interaction-aware feature aggregation that synergizes vehicle interactions with the environmental map and the vehicle moves.

Decoder

JointRF: End-to-End Joint Optimization for Dynamic Neural Radiance Field Representation and Compression

no code implementations23 May 2024 Zihan Zheng, Houqiang Zhong, Qiang Hu, Xiaoyun Zhang, Li Song, Ya zhang, Yanfeng Wang

Neural Radiance Field (NeRF) excels in photo-realistically static scenes, inspiring numerous efforts to facilitate volumetric videos.

Feature Compression NeRF

Awesome Multi-modal Object Tracking

5 code implementations23 May 2024 Chunhui Zhang, Li Liu, Hao Wen, Xi Zhou, Yanfeng Wang

To leverage more modalities, some recent efforts have been made to learn a unified visual object tracking model for any modality.

Autonomous Driving Knowledge Distillation +6

Robust Collaborative Perception without External Localization and Clock Devices

1 code implementation5 May 2024 Zixing Lei, Zhenyang Ni, Ruize Han, Shuo Tang, Dingju Wang, Chen Feng, Siheng Chen, Yanfeng Wang

To achieve this spatial-temporal alignment, traditional methods depend on external devices to provide localization and clock signals.

Graph Neural Network

Low-Rank Knowledge Decomposition for Medical Foundation Models

1 code implementation CVPR 2024 YuHang Zhou, Haolin Li, Siyuan Du, Jiangchao Yao, Ya zhang, Yanfeng Wang

The popularity of large-scale pre-training has promoted the development of medical foundation models.

RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis

1 code implementation25 Apr 2024 Xiaoman Zhang, Chaoyi Wu, Ziheng Zhao, Jiayu Lei, Ya zhang, Yanfeng Wang, Weidi Xie

We believe that RadGenome-Chest CT can significantly advance the development of multimodal medical foundation models, by training to generate texts based on given segmentation regions, which is unattainable with previous relevant datasets.

Segmentation Sentence +2

DENOISER: Rethinking the Robustness for Open-Vocabulary Action Recognition

no code implementations23 Apr 2024 Haozhe Cheng, Cheng Ju, Haicheng Wang, Jinxiang Liu, Mengting Chen, Qiang Hu, Xiaoyun Zhang, Yanfeng Wang

The denoised text classes help OVAR models classify visual samples more accurately; in return, classified visual samples help better denoising.

Denoising Open Vocabulary Action Recognition

Ethical-Lens: Curbing Malicious Usages of Open-Source Text-to-Image Models

1 code implementation18 Apr 2024 Yuzhu Cai, Sheng Yin, Yuxi Wei, Chenxin Xu, Weibo Mao, Felix Juefei-Xu, Siheng Chen, Yanfeng Wang

The burgeoning landscape of text-to-image models, exemplified by innovations such as Midjourney and DALLE 3, has revolutionized content creation across diverse sectors.

Knowledge-enhanced Visual-Language Pretraining for Computational Pathology

1 code implementation15 Apr 2024 Xiao Zhou, Xiaoman Zhang, Chaoyi Wu, Ya zhang, Weidi Xie, Yanfeng Wang

In this paper, we consider the problem of visual representation learning for computational pathology, by exploiting large-scale image-text pairs gathered from public resources, along with the domain-specific knowledge in pathology.

Cross-Modal Retrieval Language Modeling +5

MING-MOE: Enhancing Medical Multi-Task Learning in Large Language Models with Sparse Mixture of Low-Rank Adapter Experts

2 code implementations13 Apr 2024 Yusheng Liao, Shuyang Jiang, Yu Wang, Yanfeng Wang

Large language models like ChatGPT have shown substantial progress in natural language understanding and generation, proving valuable across various disciplines, including the medical field.

Diversity Language Modeling +4

Anomaly Detection in Electrocardiograms: Advancing Clinical Diagnosis Through Self-Supervised Learning

no code implementations7 Apr 2024 Aofan Jiang, Chaoqin Huang, Qing Cao, Yuchen Xu, Zi Zeng, Kang Chen, Ya zhang, Yanfeng Wang

We introduce a novel self-supervised learning framework for ECG AD, utilizing a vast dataset of normal ECGs to autonomously detect and localize cardiac anomalies.

Anomaly Localization Diagnostic +4

ReMamber: Referring Image Segmentation with Mamba Twister

1 code implementation26 Mar 2024 Yuhuan Yang, Chaofan Ma, Jiangchao Yao, Zhun Zhong, Ya zhang, Yanfeng Wang

In this paper, we propose ReMamber, a novel RIS architecture that integrates the power of Mamba with a multi-modal Mamba Twister block.

Image Segmentation Mamba +1

M$^3$AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset

no code implementations21 Mar 2024 Zhe Chen, Heyang Liu, Wenyi Yu, Guangzhi Sun, Hongcheng Liu, Ji Wu, Chao Zhang, Yu Wang, Yanfeng Wang

Although multiple academic video datasets have been constructed and released, few of them support both multimodal content recognition and understanding tasks, which is partially due to the lack of high-quality human annotations.

Diversity Script Generation +3

Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images

1 code implementation CVPR 2024 Chaoqin Huang, Aofan Jiang, Jinghao Feng, Ya zhang, Xinchao Wang, Yanfeng Wang

Recent advancements in large-scale visual-language pre-trained models have led to significant progress in zero-/few-shot anomaly detection within natural image domains.

Anomaly Classification Anomaly Detection +1

Audio-Visual Segmentation via Unlabeled Frame Exploitation

no code implementations CVPR 2024 Jinxiang Liu, Yikun Liu, Fei Zhang, Chen Ju, Ya zhang, Yanfeng Wang

NFs, temporally adjacent to the labeled frame, often contain rich motion information that assists in the accurate localization of sounding objects.

Diversity valid

Automatic Interactive Evaluation for Large Language Models with State Aware Patient Simulator

3 code implementations13 Mar 2024 Yusheng Liao, Yutong Meng, Yuhao Wang, Hongcheng Liu, Yanfeng Wang, Yu Wang

Large Language Models (LLMs) have demonstrated remarkable proficiency in human interactions, yet their application within the medical field remains insufficiently explored.

Decentralized and Lifelong-Adaptive Multi-Agent Collaborative Learning

no code implementations11 Mar 2024 Shuo Tang, Rui Ye, Chenxin Xu, Xiaowen Dong, Siheng Chen, Yanfeng Wang

In this paper, we propose DeLAMA, a decentralized multi-agent lifelong collaborative learning algorithm with dynamic collaboration graphs.

Computational Efficiency Graph structure learning

Enhancing Data Quality in Federated Fine-Tuning of Foundation Models

no code implementations7 Mar 2024 Wanru Zhao, Yaxin Du, Nicholas Donald Lane, Siheng Chen, Yanfeng Wang

In the current landscape of foundation model training, there is a significant reliance on public domain data, which is nearing exhaustion according to recent research.

Leveraging Diverse Modeling Contexts with Collaborating Learning for Neural Machine Translation

no code implementations28 Feb 2024 Yusheng Liao, Yanfeng Wang, Yu Wang

Autoregressive (AR) and Non-autoregressive (NAR) models are two types of generative models for Neural Machine Translation (NMT).

Contrastive Learning Machine Translation +2

Towards Building Multilingual Language Model for Medicine

1 code implementation21 Feb 2024 Pengcheng Qiu, Chaoyi Wu, Xiaoman Zhang, Weixiong Lin, Haicheng Wang, Ya zhang, Yanfeng Wang, Weidi Xie

The development of open-source, multilingual medical language models can benefit a wide, linguistically diverse audience from different regions.

Domain Adaptation Language Modeling +3

M2K-VDG: Model-Adaptive Multimodal Knowledge Anchor Enhanced Video-grounded Dialogue Generation

no code implementations19 Feb 2024 Hongcheng Liu, Pingjie Wang, Yu Wang, Yanfeng Wang

Video-grounded dialogue generation (VDG) requires the system to generate a fluent and accurate answer based on multimodal knowledge.

counterfactual Dialogue Generation +1

Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents

1 code implementation CVPR 2024 Yuxi Wei, Zi Wang, Yifan Lu, Chenxin Xu, Changxing Liu, Hao Zhao, Siheng Chen, Yanfeng Wang

Furthermore, to unleash the potential of extensive high-quality digital assets, ChatSim employs a novel multi-camera lighting estimation method to achieve scene-consistent assets' rendering.

Autonomous Driving Language Modeling +3

Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation

no code implementations8 Feb 2024 Xianghe Pang, Shuo Tang, Rui Ye, Yuxin Xiong, Bolun Zhang, Yanfeng Wang, Siheng Chen

Drawing from the sociological insight that acknowledging all parties' concerns is a key factor in shaping human values, this paper proposes a novel direction to align LLMs by themselves: social scene simulation.

An Extensible Framework for Open Heterogeneous Collaborative Perception

1 code implementation25 Jan 2024 Yifan Lu, Yue Hu, Yiqi Zhong, Dequan Wang, Yanfeng Wang, Siheng Chen

In this paper, we introduce a new open heterogeneous problem: how to accommodate continually emerging new heterogeneous agent types into collaborative perception, while ensuring high perception performance and low integration cost?

FedRSU: Federated Learning for Scene Flow Estimation on Roadside Units

1 code implementation23 Jan 2024 Shaoheng Fang, Rui Ye, Wenhao Wang, Zuhong Liu, Yuxiao Wang, Yafei Wang, Siheng Chen, Yanfeng Wang

In this paper, we introduce FedRSU, an innovative federated learning framework for self-supervised scene flow estimation.

Autonomous Vehicles Federated Learning +2

MM-SAP: A Comprehensive Benchmark for Assessing Self-Awareness of Multimodal Large Language Models in Perception

1 code implementation15 Jan 2024 Yuhao Wang, Yusheng Liao, Heyang Liu, Hongcheng Liu, Yu Wang, Yanfeng Wang

We believe that these hallucinations are partially due to the models' struggle with understanding what they can and cannot perceive from images, a capability we refer to as self-awareness in perception.

Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models

1 code implementation CVPR 2024 Chang Liu, HaoNing Wu, Yujie Zhong, Xiaoyun Zhang, Yanfeng Wang, Weidi Xie

Generative models have recently exhibited exceptional capabilities in text-to-image generation but still struggle to generate image sequences coherently.

Text-to-Image Generation Visual Storytelling

One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts

1 code implementation28 Dec 2023 Ziheng Zhao, Yao Zhang, Chaoyi Wu, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie

Our main contributions are three folds: (i) for dataset construction, we construct the first multi-modal knowledge tree on human anatomy, including 6502 anatomical terminologies; Then, we build up the largest and most comprehensive segmentation dataset for training, by collecting over 22K 3D medical image scans from72 segmentation datasets, across 497 classes, with careful standardization on both image scans and label space; (ii) for architecture design, we propose to inject medical knowledge into a text encoder via contrastive learning, and then formulate a universal segmentation model, that can be prompted by feeding in medical terminologies in text form; (iii) As a result, we have trained SAT-Nano (110M parameters) and SAT-Pro (447M parameters), demonstrating superior or comparable performance to 72 specialist models, i. e., nnU-Nets, U-Mamba or SwinUNETR, trained on each dataset/subsets.

All Anatomy +7

Multi-Sentence Grounding for Long-term Instructional Video

no code implementations21 Dec 2023 Zeqian Li, Qirui Chen, Tengda Han, Ya zhang, Yanfeng Wang, Weidi Xie

In this paper, we aim to establish an automatic, scalable pipeline for denoising the large-scale instructional dataset and construct a high-quality video-text dataset with multiple descriptive steps supervision, named HowToStep.

Denoising Descriptive +5

MedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models

no code implementations20 Dec 2023 Yan Cai, LinLin Wang, Ye Wang, Gerard de Melo, Ya zhang, Yanfeng Wang, Liang He

The emergence of various medical large language models (LLMs) in the medical domain has highlighted the need for unified evaluation standards, as manual evaluation of LLMs proves to be time-consuming and labor-intensive.

Clinical Knowledge Diagnostic

Hypergraph Transformer for Semi-Supervised Classification

1 code implementation18 Dec 2023 Zexi Liu, Bohan Tang, Ziyuan Ye, Xiaowen Dong, Siheng Chen, Yanfeng Wang

Hypergraphs play a pivotal role in the modelling of data featuring higher-order relations involving more than two entities.

Classification Node Classification +1

UniChest: Conquer-and-Divide Pre-training for Multi-Source Chest X-Ray Classification

1 code implementation18 Dec 2023 Tianjie Dai, Ruipeng Zhang, Feng Hong, Jiangchao Yao, Ya zhang, Yanfeng Wang

Vision-Language Pre-training (VLP) that utilizes the multi-modal information to promote the training efficiency and effectiveness, has achieved great success in vision recognition of natural domains and shown promise in medical imaging diagnosis for the Chest X-Rays (CXRs).

X-ray Classification

Fake It Till Make It: Federated Learning with Consensus-Oriented Generation

no code implementations10 Dec 2023 Rui Ye, Yaxin Du, Zhenyang Ni, Siheng Chen, Yanfeng Wang

FedCOG consists of two key components at the client side: complementary data generation, which generates data extracted from the shared global model to complement the original dataset, and knowledge-distillation-based model training, which distills knowledge from global model to local model based on the generated data to mitigate over-fitting the original heterogeneous dataset.

Federated Learning Knowledge Distillation

Federated Learning Empowered by Generative Content

no code implementations10 Dec 2023 Rui Ye, Xinyu Zhu, Jingyi Chai, Siheng Chen, Yanfeng Wang

In this paper, we propose a novel FL framework termed FedGC, designed to mitigate data heterogeneity issues by diversifying private data with generative content.

Diversity Federated Learning +1

Combating Representation Learning Disparity with Geometric Harmonization

1 code implementation NeurIPS 2023 Zhihan Zhou, Jiangchao Yao, Feng Hong, Ya zhang, Bo Han, Yanfeng Wang

Self-supervised learning (SSL) as an effective paradigm of representation learning has achieved tremendous success on various curated datasets in diverse scenarios.

Representation Learning Self-Supervised Learning

Can GPT-4V(ision) Serve Medical Applications? Case Studies on GPT-4V for Multimodal Medical Diagnosis

1 code implementation15 Oct 2023 Chaoyi Wu, Jiayu Lei, Qiaoyu Zheng, Weike Zhao, Weixiong Lin, Xiaoman Zhang, Xiao Zhou, Ziheng Zhao, Ya zhang, Yanfeng Wang, Weidi Xie

Driven by the large foundation models, the development of artificial intelligence has witnessed tremendous progress lately, leading to a surge of general interest from the public.

Anatomy Computed Tomography (CT) +2

Improving the Reliability of Large Language Models by Leveraging Uncertainty-Aware In-Context Learning

no code implementations7 Oct 2023 Yuchen Yang, Houqiang Li, Yanfeng Wang, Yu Wang

In this study, we introduce an uncertainty-aware in-context learning framework to empower the model to enhance or reject its output in response to uncertainty.

Hallucination In-Context Learning +1

UniBrain: Universal Brain MRI Diagnosis with Hierarchical Knowledge-enhanced Pre-training

1 code implementation13 Sep 2023 Jiayu Lei, Lisong Dai, Haoyun Jiang, Chaoyi Wu, Xiaoman Zhang, Yao Zhang, Jiangchao Yao, Weidi Xie, Yanyong Zhang, Yuehua Li, Ya zhang, Yanfeng Wang

Magnetic resonance imaging~(MRI) have played a crucial role in brain disease diagnosis, with which a range of computer-aided artificial intelligence methods have been proposed.

Diagnostic

LibriSQA: A Novel Dataset and Framework for Spoken Question Answering with Large Language Models

1 code implementation20 Aug 2023 Zihan Zhao, Yiyang Jiang, Heyang Liu, Yanfeng Wang, Yu Wang

While Large Language Models (LLMs) have demonstrated commendable performance across a myriad of domains and tasks, existing LLMs still exhibit a palpable deficit in handling multimodal functionalities, especially for the Spoken Question Answering (SQA) task which necessitates precise alignment and deep interaction between speech and text features.

Multiple-choice Question Answering

Auxiliary Tasks Benefit 3D Skeleton-based Human Motion Prediction

1 code implementation ICCV 2023 Chenxin Xu, Robby T. Tan, Yuhong Tan, Siheng Chen, Xinchao Wang, Yanfeng Wang

To work with auxiliary tasks, we propose a novel auxiliary-adapted transformer, which can handle incomplete, corrupted motion data and achieve coordinate recovery via capturing spatial-temporal dependencies.

Human motion prediction Human Pose Forecasting +1

Bag of Tricks for Long-Tailed Multi-Label Classification on Chest X-Rays

no code implementations17 Aug 2023 Feng Hong, Tianjie Dai, Jiangchao Yao, Ya zhang, Yanfeng Wang

Clinical classification of chest radiography is particularly challenging for standard machine learning algorithms due to its inherent long-tailed and multi-label nature.

Data Augmentation Multi-Label Classification +1

Multi-Scale Memory Comparison for Zero-/Few-Shot Anomaly Detection

no code implementations9 Aug 2023 Chaoqin Huang, Aofan Jiang, Ya zhang, Yanfeng Wang

Anomaly detection has gained considerable attention due to its broad range of applications, particularly in industrial defect detection.

Anomaly Detection Defect Detection +1

Joint-Relation Transformer for Multi-Person Motion Prediction

1 code implementation ICCV 2023 Qingyao Xu, Weibo Mao, Jingze Gong, Chenxin Xu, Siheng Chen, Weidi Xie, Ya zhang, Yanfeng Wang

Multi-person motion prediction is a challenging problem due to the dependency of motion on both individual past movements and interactions with other people.

motion prediction Prediction +1

Multi-scale Cross-restoration Framework for Electrocardiogram Anomaly Detection

1 code implementation3 Aug 2023 Aofan Jiang, Chaoqin Huang, Qing Cao, Shuang Wu, Zi Zeng, Kang Chen, Ya zhang, Yanfeng Wang

To address this challenge, this paper introduces a novel multi-scale cross-restoration framework for ECG anomaly detection and localization that considers both local and global ECG characteristics.

Anomaly Detection Diagnostic +1

Balanced Destruction-Reconstruction Dynamics for Memory-replay Class Incremental Learning

1 code implementation3 Aug 2023 YuHang Zhou, Jiangchao Yao, Feng Hong, Ya zhang, Yanfeng Wang

By dynamically manipulating the gradient during training based on these factors, BDR can effectively alleviate knowledge destruction and improve knowledge reconstruction.

class-incremental learning Class Incremental Learning +1

Audio-aware Query-enhanced Transformer for Audio-Visual Segmentation

no code implementations25 Jul 2023 Jinxiang Liu, Chen Ju, Chaofan Ma, Yanfeng Wang, Yu Wang, Ya zhang

The goal of the audio-visual segmentation (AVS) task is to segment the sounding objects in the video frames using audio cues.

Decoder Segmentation

All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment

no code implementations7 Jul 2023 Chunhui Zhang, Xin Sun, Li Liu, Yiqian Yang, Qiong Liu, Xi Zhou, Yanfeng Wang

This approach achieves feature integration in a unified backbone, removing the need for carefully-designed fusion modules and resulting in a more effective and efficient VL tracking framework.

All

Multi-Modal Prototypes for Open-World Semantic Segmentation

no code implementations5 Jul 2023 Yuhuan Yang, Chaofan Ma, Chen Ju, Fei Zhang, Jiangchao Yao, Ya zhang, Yanfeng Wang

To be specific, unlike the straightforward combination of bi-modal clues, we decompose the high-level language information as multi-aspect prototypes and aggregate the low-level visual information as more semantic prototypes, on basis of which, a fine-grained complementary fusion makes the multi-modal prototypes more powerful and accurate to promote the prediction.

Segmentation Semantic Segmentation

Boost Video Frame Interpolation via Motion Adaptation

1 code implementation24 Jun 2023 HaoNing Wu, Xiaoyun Zhang, Weidi Xie, Ya zhang, Yanfeng Wang

Video frame interpolation (VFI) is a challenging task that aims to generate intermediate frames between two consecutive frames in a video.

Motion Estimation Video Frame Interpolation

Zero-shot Composed Text-Image Retrieval

1 code implementation12 Jun 2023 Yikun Liu, Jiangchao Yao, Ya zhang, Yanfeng Wang, Weidi Xie

In this paper, we consider the problem of composed image retrieval (CIR), it aims to train a model that can fuse multi-modal information, e. g., text and images, to accurately retrieve images that match the query, extending the user's expression ability.

Image Retrieval Retrieval +1

Exploring Effective Mask Sampling Modeling for Neural Image Compression

no code implementations9 Jun 2023 Lin Liu, Mingming Zhao, Shanxin Yuan, Wenlong Lyu, Wengang Zhou, Houqiang Li, Yanfeng Wang, Qi Tian

Specifically, Cube Mask Sampling Module (CMSM) is proposed to apply both spatial and channel mask sampling modeling to image compression in the pre-training stage.

Image Compression Self-Supervised Learning

Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models

1 code implementation1 Jun 2023 Chang Liu, HaoNing Wu, Yujie Zhong, Xiaoyun Zhang, Yanfeng Wang, Weidi Xie

Generative models have recently exhibited exceptional capabilities in text-to-image generation, but still struggle to generate image sequences coherently.

Story Visualization Style Transfer +2

FedDisco: Federated Learning with Discrepancy-Aware Collaboration

1 code implementation30 May 2023 Rui Ye, Mingkai Xu, Jianyu Wang, Chenxin Xu, Siheng Chen, Yanfeng Wang

However, based on our empirical observations and theoretical analysis, we find that the dataset size is not optimal and the discrepancy between local and global category distributions could be a beneficial and complementary indicator for determining aggregation weights.

Federated Learning

PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering

2 code implementations17 May 2023 Xiaoman Zhang, Chaoyi Wu, Ziheng Zhao, Weixiong Lin, Ya zhang, Yanfeng Wang, Weidi Xie

Medical Visual Question Answering (MedVQA) presents a significant opportunity to enhance diagnostic accuracy and healthcare delivery by leveraging artificial intelligence to interpret and answer questions based on medical images.

Benchmarking Diagnostic +6

PMC-LLaMA: Towards Building Open-source Language Models for Medicine

1 code implementation27 Apr 2023 Chaoyi Wu, Weixiong Lin, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie

Our contributions are threefold: (i) we systematically investigate the process of adapting a general-purpose foundation language model towards medical domain, this involves data-centric knowledge injection through the integration of 4. 8M biomedical academic papers and 30K medical textbooks, as well as comprehensive fine-tuning for alignment with domain-specific instructions; (ii) we contribute a large-scale, comprehensive dataset for instruction tuning.

Language Modeling Language Modelling +2

Collaboration Helps Camera Overtake LiDAR in 3D Detection

1 code implementation CVPR 2023 Yue Hu, Yifan Lu, Runsheng Xu, Weidi Xie, Siheng Chen, Yanfeng Wang

Camera-only 3D detection provides an economical solution with a simple configuration for localizing objects in 3D space compared to LiDAR-based detection systems.

Depth Estimation

Multi-modal Prompting for Low-Shot Temporal Action Localization

no code implementations21 Mar 2023 Chen Ju, Zeqian Li, Peisen Zhao, Ya zhang, Xiaopeng Zhang, Qi Tian, Yanfeng Wang, Weidi Xie

In this paper, we consider the problem of temporal action localization under low-shot (zero-shot & few-shot) scenario, with the goal of detecting and classifying the action instances from arbitrary categories within some untrimmed videos, even not seen at training time.

Action Classification Temporal Action Localization

EqMotion: Equivariant Multi-agent Motion Prediction with Invariant Interaction Reasoning

2 code implementations CVPR 2023 Chenxin Xu, Robby T. Tan, Yuhong Tan, Siheng Chen, Yu Guang Wang, Xinchao Wang, Yanfeng Wang

In motion prediction tasks, maintaining motion equivariance under Euclidean geometric transformations and invariance of agent interaction is a critical and fundamental principle.

Human Pose Forecasting motion prediction +3

Leapfrog Diffusion Model for Stochastic Trajectory Prediction

1 code implementation CVPR 2023 Weibo Mao, Chenxin Xu, Qi Zhu, Siheng Chen, Yanfeng Wang

The core of the proposed LED is to leverage a trainable leapfrog initializer to directly learn an expressive multi-modal distribution of future trajectories, which skips a large number of denoising steps, significantly accelerating inference speed.

Denoising model +2

Boundary-aware Supervoxel-level Iteratively Refined Interactive 3D Image Segmentation with Multi-agent Reinforcement Learning

no code implementations19 Mar 2023 Chaofan Ma, Qisen Xu, Xiangfeng Wang, Bo Jin, Xiaoyun Zhang, Yanfeng Wang, Ya zhang

Interactive segmentation has recently been explored to effectively and efficiently harvest high-quality segmentation masks by iteratively incorporating user hints.

Image Segmentation Interactive Segmentation +6

DiffusionSeg: Adapting Diffusion Towards Unsupervised Object Discovery

no code implementations17 Mar 2023 Chaofan Ma, Yuhuan Yang, Chen Ju, Fei Zhang, Jinxiang Liu, Yu Wang, Ya zhang, Yanfeng Wang

However, the challenges exist as there is one structural difference between generative and discriminative models, which limits the direct use.

Object Object Discovery +1

TBP-Former: Learning Temporal Bird's-Eye-View Pyramid for Joint Perception and Prediction in Vision-Centric Autonomous Driving

no code implementations CVPR 2023 Shaoheng Fang, Zi Wang, Yiqi Zhong, Junhao Ge, Siheng Chen, Yanfeng Wang

Second, a spatial-temporal pyramid transformer is introduced to comprehensively extract multi-scale BEV features and predict future BEV states with the support of spatial-temporal priors.

Ranked #2 on Bird's-Eye View Semantic Segmentation on nuScenes (IoU ped - 224x480 - Vis filter. - 100x100 at 0.5 metric)

Autonomous Driving Bird's-Eye View Semantic Segmentation

Knowledge-enhanced Visual-Language Pre-training on Chest Radiology Images

1 code implementation27 Feb 2023 Xiaoman Zhang, Chaoyi Wu, Ya zhang, Yanfeng Wang, Weidi Xie

While multi-modal foundation models pre-trained on large-scale data have been successful in natural language understanding and vision recognition, their use in medical domains is still limited due to the fine-grained nature of medical tasks and the high demand for domain knowledge.

Natural Language Understanding Representation Learning

Knowledge-aware Bayesian Co-attention for Multimodal Emotion Recognition

no code implementations20 Feb 2023 Zihan Zhao, Yu Wang, Yanfeng Wang

Multimodal emotion recognition is a challenging research area that aims to fuse different modalities to predict human emotion.

Multimodal Emotion Recognition

Long-Tailed Partial Label Learning via Dynamic Rebalancing

1 code implementation10 Feb 2023 Feng Hong, Jiangchao Yao, Zhihan Zhou, Ya zhang, Yanfeng Wang

The straightforward combination of LT and PLL, i. e., LT-PLL, suffers from a fundamental dilemma: LT methods build upon a given class distribution that is unavailable in PLL, and the performance of PLL is severely influenced in long-tailed context.

Partial Label Learning

Open-vocabulary Object Segmentation with Diffusion Models

1 code implementation ICCV 2023 Ziyi Li, Qinye Zhou, Xiaoyun Zhang, Ya zhang, Yanfeng Wang, Weidi Xie

The goal of this paper is to extract the visual-language correspondence from a pre-trained text-to-image diffusion model, in the form of segmentation map, i. e., simultaneously generating images and segmentation masks for the corresponding visual entities described in the text prompt.

Image Segmentation Object +3

Integrating features from lymph node stations for metastatic lymph node detection

no code implementations9 Jan 2023 Chaoyi Wu, Feng Chang, Xiao Su, Zhihan Wu, Yanfeng Wang, Ling Zhu, Ya zhang

The branch targets to solve a closely related task on the LN station level, i. e., classifying whether an LN station contains metastatic LN or not, so as to learn representations for LN stations.

Computed Tomography (CT)

MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training in Radiology

no code implementations5 Jan 2023 Chaoyi Wu, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie

In this paper, we consider enhancing medical visual-language pre-training (VLP) with domain-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice.

Medical Diagnosis Self-Supervised Learning +1

MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training for X-ray Diagnosis

no code implementations ICCV 2023 Chaoyi Wu, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie

In this paper, we consider enhancing medical visual-language pre-training (VLP) with domain-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice.

Medical Diagnosis Triplet

Federated Domain Generalization With Generalization Adjustment

1 code implementation CVPR 2023 Ruipeng Zhang, Qinwei Xu, Jiangchao Yao, Ya zhang, Qi Tian, Yanfeng Wang

Federated Domain Generalization (FedDG) attempts to learn a global model in a privacy-preserving manner that generalizes well to new clients possibly with domain shift.

Domain Generalization Fairness +1

FedSkip: Combatting Statistical Heterogeneity with Federated Skip Aggregation

1 code implementation14 Dec 2022 Ziqing Fan, Yanfeng Wang, Jiangchao Yao, Lingjuan Lyu, Ya zhang, Qi Tian

However, in addition to previous explorations for improvement in federated averaging, our analysis shows that another critical bottleneck is the poorer optima of client models in more heterogeneous conditions.

Federated Learning

Robust Collaborative 3D Object Detection in Presence of Pose Errors

1 code implementation14 Nov 2022 Yifan Lu, Quanhao Li, Baoan Liu, Mehrdad Dianati, Chen Feng, Siheng Chen, Yanfeng Wang

Collaborative 3D object detection exploits information exchange among multiple agents to enhance accuracy of object detection in presence of sensor impairments such as occlusion.

3D Object Detection Object +2

Unrolled Graph Learning for Multi-Agent Collaboration

no code implementations31 Oct 2022 Enpei Zhang, Shuo Tang, Xiaowen Dong, Siheng Chen, Yanfeng Wang

To fill this gap, we propose a distributed multi-agent learning model inspired by human collaboration, in which the agents can autonomously detect suitable collaborators and refer to collaborators' model for better performance.

Graph Learning Rolling Shutter Correction

Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models

1 code implementation27 Oct 2022 Chaofan Ma, Yuhuan Yang, Yanfeng Wang, Ya zhang, Weidi Xie

When trained at a sufficient scale, self-supervised learning has exhibited a notable ability to solve a wide range of visual or language understanding tasks.

Image Segmentation Language Modelling +4

Number-Adaptive Prototype Learning for 3D Point Cloud Semantic Segmentation

no code implementations18 Oct 2022 Yangheng Zhao, Jun Wang, Xiaolong Li, Yue Hu, Ce Zhang, Yanfeng Wang, Siheng Chen

Instead of learning a single prototype for each class, in this paper, we propose to use an adaptive number of prototypes to dynamically describe the different point patterns within a semantic class.

3D Semantic Segmentation Scene Understanding +1

A Simple Plugin for Transforming Images to Arbitrary Scales

no code implementations7 Oct 2022 Qinye Zhou, Ziyi Li, Weidi Xie, Xiaoyun Zhang, Ya zhang, Yanfeng Wang

Existing models on super-resolution often specialized for one scale, fundamentally limiting their use in practical scenarios.

Super-Resolution

Low-Light Video Enhancement with Synthetic Event Guidance

no code implementations23 Aug 2022 Lin Liu, Junfeng An, Jianzhuang Liu, Shanxin Yuan, Xiangyu Chen, Wengang Zhou, Houqiang Li, Yanfeng Wang, Qi Tian

Low-light video enhancement (LLVE) is an important yet challenging task with many applications such as photographing and autonomous driving.

Autonomous Driving Image Enhancement +1

Multi-level Fusion of Wav2vec 2.0 and BERT for Multimodal Emotion Recognition

1 code implementation11 Jul 2022 Zihan Zhao, Yanfeng Wang, Yu Wang

The research and applications of multimodal emotion recognition have become increasingly popular recently.

Multimodal Emotion Recognition Transfer Learning

Collaborative Uncertainty Benefits Multi-Agent Multi-Modal Trajectory Forecasting

1 code implementation11 Jul 2022 Bohan Tang, Yiqi Zhong, Chenxin Xu, Wei-Tao Wu, Ulrich Neumann, Yanfeng Wang, Ya zhang, Siheng Chen

Further, we apply the proposed framework to current SOTA multi-agent multi-modal forecasting systems as a plugin module, which enables the SOTA systems to 1) estimate the uncertainty in the multi-agent multi-modal trajectory forecasting task; 2) rank the multiple predictions and select the optimal one based on the estimated uncertainty.

regression Task 2 +1

Nextformer: A ConvNeXt Augmented Conformer For End-To-End Speech Recognition

1 code implementation29 Jun 2022 Yongjun Jiang, Jian Yu, Wenwen Yang, Bihong Zhang, Yanfeng Wang

To the best of our knowledge, the proposed Nextformer model achieves SOTA results on AISHELL-1(CER 4. 06%) and WenetSpeech(CER 7. 56%/11. 29%).

speech-recognition Speech Recognition

Contrastive Learning with Boosted Memorization

1 code implementation25 May 2022 Zhihan Zhou, Jiangchao Yao, Yanfeng Wang, Bo Han, Ya zhang

Different from previous works, we explore this direction from an alternative perspective, i. e., the data perspective, and propose a novel Boosted Contrastive Learning (BCL) method.

Contrastive Learning Memorization +2

Self-Supervised Masking for Unsupervised Anomaly Detection and Localization

no code implementations13 May 2022 Chaoqin Huang, Qinwei Xu, Yanfeng Wang, Yu Wang, Ya zhang

To extend the reconstruction-based anomaly detection architecture to the localized anomalies, we propose a self-supervised learning approach through random masking and then restoring, named Self-Supervised Masking (SSM) for unsupervised anomaly detection and localization.

Anomaly Localization Defect Detection +3

Multiscale Spatio-Temporal Graph Neural Networks for 3D Skeleton-Based Motion Prediction

no code implementations25 Aug 2021 Maosen Li, Siheng Chen, Yangheng Zhao, Ya zhang, Yanfeng Wang, Qi Tian

The core of MST-GNN is a multiscale spatio-temporal graph that explicitly models the relations in motions at various spatial and temporal scales.

Decoder Graph Neural Network +1

Cooperative Learning for Noisy Supervision

no code implementations11 Aug 2021 Hao Wu, Jiangchao Yao, Ya zhang, Yanfeng Wang

Learning with noisy labels has gained the enormous interest in the robust deep learning area.

Learning with noisy labels

MS-KD: Multi-Organ Segmentation with Multiple Binary-Labeled Datasets

no code implementations5 Aug 2021 Shixiang Feng, YuHang Zhou, Xiaoman Zhang, Ya zhang, Yanfeng Wang

A novel Multi-teacher Single-student Knowledge Distillation (MS-KD) framework is proposed, where the teacher models are pre-trained single-organ segmentation networks, and the student model is a multi-organ segmentation network.

Knowledge Distillation Organ Segmentation +1

A Fourier-based Framework for Domain Generalization

1 code implementation CVPR 2021 Qinwei Xu, Ruipeng Zhang, Ya zhang, Yanfeng Wang, Qi Tian

Modern deep neural networks suffer from performance degradation when evaluated on testing data under different distributions from training data.

Data Augmentation Domain Generalization

H2O: A Benchmark for Visual Human-human Object Handover Analysis

no code implementations ICCV 2021 Ruolin Ye, Wenqiang Xu, Zhendong Xue, Tutian Tang, Yanfeng Wang, Cewu Lu

Besides, we also report the hand and object pose errors with existing baselines and show that the dataset can serve as the video demonstrations for robot imitation learning on the handover task.

Imitation Learning Object

Collaborative Label Correction via Entropy Thresholding

no code implementations31 Mar 2021 Hao Wu, Jiangchao Yao, Jiajie Wang, Yinru Chen, Ya zhang, Yanfeng Wang

Deep neural networks (DNNs) have the capacity to fit extremely noisy labels nonetheless they tend to learn data with clean labels first and then memorize those with noisy labels.

Divide and Conquer for Single-Frame Temporal Action Localization

no code implementations ICCV 2021 Chen Ju, Peisen Zhao, Siheng Chen, Ya zhang, Yanfeng Wang, Qi Tian

Single-frame temporal action localization (STAL) aims to localize actions in untrimmed videos with only one timestamp annotation for each action instance.

Temporal Action Localization

FGraDA: A Dataset and Benchmark for Fine-Grained Domain Adaptation in Machine Translation

1 code implementation LREC 2022 Wenhao Zhu, ShuJian Huang, Tong Pu, Pingxuan Huang, Xu Zhang, Jian Yu, Wei Chen, Yanfeng Wang, Jiajun Chen

Previous research for adapting a general neural machine translation (NMT) model into a specific domain usually neglects the diversity in translation within the same domain, which is a core problem for domain adaptation in real-world scenarios.

Autonomous Vehicles Diversity +4

Point-Level Temporal Action Localization: Bridging Fully-supervised Proposals to Weakly-supervised Losses

no code implementations15 Dec 2020 Chen Ju, Peisen Zhao, Ya zhang, Yanfeng Wang, Qi Tian

Point-Level temporal action localization (PTAL) aims to localize actions in untrimmed videos with only one timestamp annotation for each action instance.

Prediction Weakly Supervised Action Localization

Privileged Knowledge Distillation for Online Action Detection

no code implementations18 Nov 2020 Peisen Zhao, Lingxi Xie, Ya zhang, Yanfeng Wang, Qi Tian

Knowledge distillation is employed to transfer the privileged information from the offline teacher to the online student.

Knowledge Distillation Online Action Detection

SAR: Scale-Aware Restoration Learning for 3D Tumor Segmentation

no code implementations13 Oct 2020 Xiaoman Zhang, Shixiang Feng, YuHang Zhou, Ya zhang, Yanfeng Wang

We demonstrate the effectiveness of our methods on two downstream tasks: i) Brain tumor segmentation, ii) Pancreas tumor segmentation.

Brain Tumor Segmentation Segmentation +3

Defending Adversarial Attacks by Correcting logits

no code implementations26 Jun 2019 Yifeng Li, Lingxi Xie, Ya zhang, Rui Zhang, Yanfeng Wang, Qi Tian

Generating and eliminating adversarial examples has been an intriguing topic in the field of deep learning.

Cannot find the paper you are looking for? You can Submit a new open access paper.