Search Results for author: Yanfeng Wang

Found 105 papers, 45 papers with code

The Sogou-TIIC Speech Translation System for IWSLT 2018

no code implementations • IWSLT (EMNLP) 2018 • Yuguang Wang, Liangliang Shi, Linyu Wei, Weifeng Zhu, Jinkun Chen, Zhichao Wang, Shixue Wen, Wei Chen, Yanfeng Wang, Jia Jia

Our final average result on speech translation is 31. 02 BLEU.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Add Code

Low-Rank Knowledge Decomposition for Medical Foundation Models

no code implementations • 26 Apr 2024 • YuHang Zhou, Haolin Li, Siyuan Du, Jiangchao Yao, Ya zhang, Yanfeng Wang

The popularity of large-scale pre-training has promoted the development of medical foundation models.

Paper
Add Code

RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis

no code implementations • 25 Apr 2024 • Xiaoman Zhang, Chaoyi Wu, Ziheng Zhao, Jiayu Lei, Ya zhang, Yanfeng Wang, Weidi Xie

We believe that RadGenome-Chest CT can significantly advance the development of multimodal medical foundation models, by training to generate texts based on given segmentation regions, which is unattainable with previous relevant datasets.

Segmentation Sentence +2

Paper
Add Code

DENOISER: Rethinking the Robustness for Open-Vocabulary Action Recognition

no code implementations • 23 Apr 2024 • Haozhe Cheng, Cheng Ju, Haicheng Wang, Jinxiang Liu, Mengting Chen, Qiang Hu, Xiaoyun Zhang, Yanfeng Wang

The denoised text classes help OVAR models classify visual samples more accurately; in return, classified visual samples help better denoising.

Denoising Open Vocabulary Action Recognition

Paper
Add Code

Ethical-Lens: Curbing Malicious Usages of Open-Source Text-to-Image Models

no code implementations • 18 Apr 2024 • Yuzhu Cai, Sheng Yin, Yuxi Wei, Chenxin Xu, Weibo Mao, Felix Juefei-Xu, Siheng Chen, Yanfeng Wang

The burgeoning landscape of text-to-image models, exemplified by innovations such as Midjourney and DALLE 3, has revolutionized content creation across diverse sectors.

Paper
Add Code

Knowledge-enhanced Visual-Language Pretraining for Computational Pathology

1 code implementation • 15 Apr 2024 • Xiao Zhou, Xiaoman Zhang, Chaoyi Wu, Ya zhang, Weidi Xie, Yanfeng Wang

In this paper, we consider the problem of visual representation learning for computational pathology, by exploiting large-scale image-text pairs gathered from public resources, along with the domain specific knowledge in pathology.

Cross-Modal Retrieval Language Modelling +4

Paper
Code

MING-MOE: Enhancing Medical Multi-Task Learning in Large Language Models with Sparse Mixture of Low-Rank Adapter Experts

2 code implementations • 13 Apr 2024 • Yusheng Liao, Shuyang Jiang, Yu Wang, Yanfeng Wang

Large language models like ChatGPT have shown substantial progress in natural language understanding and generation, proving valuable across various disciplines, including the medical field.

Language Modelling Large Language Model +2

660

Paper
Code

Anomaly Detection in Electrocardiograms: Advancing Clinical Diagnosis Through Self-Supervised Learning

no code implementations • 7 Apr 2024 • Aofan Jiang, Chaoqin Huang, Qing Cao, Yuchen Xu, Zi Zeng, Kang Chen, Ya zhang, Yanfeng Wang

We introduce a novel self-supervised learning framework for ECG AD, utilizing a vast dataset of normal ECGs to autonomously detect and localize cardiac anomalies.

Self-Supervised Anomaly Detection Self-Supervised Learning +2

Paper
Add Code

ReMamber: Referring Image Segmentation with Mamba Twister

no code implementations • 26 Mar 2024 • Yuhuan Yang, Chaofan Ma, Jiangchao Yao, Zhun Zhong, Ya zhang, Yanfeng Wang

Referring Image Segmentation (RIS) leveraging transformers has achieved great success on the interpretation of complex visual-language tasks.

Image Segmentation Semantic Segmentation

Paper
Add Code

M$^3$AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset

no code implementations • 21 Mar 2024 • Zhe Chen, Heyang Liu, Wenyi Yu, Guangzhi Sun, Hongcheng Liu, Ji Wu, Chao Zhang, Yu Wang, Yanfeng Wang

Although multiple academic video datasets have been constructed and released, few of them support both multimodal content recognition and understanding tasks, which is partially due to the lack of high-quality human annotations.

speech-recognition Speech Recognition +1

Paper
Add Code

Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images

1 code implementation • 19 Mar 2024 • Chaoqin Huang, Aofan Jiang, Jinghao Feng, Ya zhang, Xinchao Wang, Yanfeng Wang

Recent advancements in large-scale visual-language pre-trained models have led to significant progress in zero-/few-shot anomaly detection within natural image domains.

Anomaly Classification Anomaly Detection

Paper
Code

Audio-Visual Segmentation via Unlabeled Frame Exploitation

no code implementations • 17 Mar 2024 • Jinxiang Liu, Yikun Liu, Fei Zhang, Chen Ju, Ya zhang, Yanfeng Wang

NFs, temporally adjacent to the labeled frame, often contain rich motion information that assists in the accurate localization of sounding objects.

valid

Paper
Add Code

Automatic Interactive Evaluation for Large Language Models with State Aware Patient Simulator

2 code implementations • 13 Mar 2024 • Yusheng Liao, Yutong Meng, Yuhao Wang, Hongcheng Liu, Yanfeng Wang, Yu Wang

Large Language Models (LLMs) have demonstrated remarkable proficiency in human interactions, yet their application within the medical field remains insufficiently explored.

660

Paper
Code

Decentralized and Lifelong-Adaptive Multi-Agent Collaborative Learning

no code implementations • 11 Mar 2024 • Shuo Tang, Rui Ye, Chenxin Xu, Xiaowen Dong, Siheng Chen, Yanfeng Wang

In this paper, we propose DeLAMA, a decentralized multi-agent lifelong collaborative learning algorithm with dynamic collaboration graphs.

Computational Efficiency Graph structure learning

Paper
Add Code

Enhancing Data Quality in Federated Fine-Tuning of Foundation Models

no code implementations • 7 Mar 2024 • Wanru Zhao, Yaxin Du, Nicholas Donald Lane, Siheng Chen, Yanfeng Wang

In the current landscape of foundation model training, there is a significant reliance on public domain data, which is nearing exhaustion according to recent research.

Paper
Add Code

Post-decoder Biasing for End-to-End Speech Recognition of Multi-turn Medical Interview

no code implementations • 1 Mar 2024 • Heyang Liu, Yu Wang, Yanfeng Wang

End-to-end (E2E) approach is gradually replacing hybrid models for automatic speech recognition (ASR) tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Leveraging Diverse Modeling Contexts with Collaborating Learning for Neural Machine Translation

no code implementations • 28 Feb 2024 • Yusheng Liao, Yanfeng Wang, Yu Wang

Autoregressive (AR) and Non-autoregressive (NAR) models are two types of generative models for Neural Machine Translation (NMT).

Contrastive Learning Machine Translation +2

Paper
Add Code

Towards Building Multilingual Language Model for Medicine

1 code implementation • 21 Feb 2024 • Pengcheng Qiu, Chaoyi Wu, Xiaoman Zhang, Weixiong Lin, Haicheng Wang, Ya zhang, Yanfeng Wang, Weidi Xie

In this paper, we aim to develop an open-source, multilingual language model for medicine, that the benefits a wider, linguistically diverse audience from different regions.

Language Modelling Question Answering

108

Paper
Code

M2K-VDG: Model-Adaptive Multimodal Knowledge Anchor Enhanced Video-grounded Dialogue Generation

no code implementations • 19 Feb 2024 • Hongcheng Liu, Pingjie Wang, Yu Wang, Yanfeng Wang

Video-grounded dialogue generation (VDG) requires the system to generate a fluent and accurate answer based on multimodal knowledge.

counterfactual Dialogue Generation +1

Paper
Add Code

DictLLM: Harnessing Key-Value Data Structures with Large Language Models for Enhanced Medical Diagnostics

no code implementations • 18 Feb 2024 • YiQiu Guo, Yuchen Yang, Ya zhang, Yu Wang, Yanfeng Wang

Structured data offers a sophisticated mechanism for the organization of information.

Paper
Add Code

OpenFedLLM: Training Large Language Models on Decentralized Private Data via Federated Learning

1 code implementation • 10 Feb 2024 • Rui Ye, Wenhao Wang, Jingyi Chai, Dihan Li, Zexi Li, Yinda Xu, Yaxin Du, Yanfeng Wang, Siheng Chen

Trained on massive publicly available data, large language models (LLMs) have demonstrated tremendous success across various fields.

Federated Learning Instruction Following +1

217

Paper
Code

Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation

no code implementations • 8 Feb 2024 • Xianghe Pang, Shuo Tang, Rui Ye, Yuxin Xiong, Bolun Zhang, Yanfeng Wang, Siheng Chen

Aligning large language models (LLMs) with human values is imperative to mitigate potential adverse effects resulting from their misuse.

Paper
Add Code

Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents

1 code implementation • 8 Feb 2024 • Yuxi Wei, Zi Wang, Yifan Lu, Chenxin Xu, Changxing Liu, Hao Zhao, Siheng Chen, Yanfeng Wang

Furthermore, to unleash the potential of extensive high-quality digital assets, ChatSim employs a novel multi-camera lighting estimation method to achieve scene-consistent assets' rendering.

Autonomous Driving Language Modelling +2

205

Paper
Code

An Extensible Framework for Open Heterogeneous Collaborative Perception

1 code implementation • 25 Jan 2024 • Yifan Lu, Yue Hu, Yiqi Zhong, Dequan Wang, Yanfeng Wang, Siheng Chen

In this paper, we introduce a new open heterogeneous problem: how to accommodate continually emerging new heterogeneous agent types into collaborative perception, while ensuring high perception performance and low integration cost?

106

Paper
Code

FedRSU: Federated Learning for Scene Flow Estimation on Roadside Units

no code implementations • 23 Jan 2024 • Shaoheng Fang, Rui Ye, Wenhao Wang, Zuhong Liu, Yuxiao Wang, Yafei Wang, Siheng Chen, Yanfeng Wang

In this paper, we introduce FedRSU, an innovative federated learning framework for self-supervised scene flow estimation.

Autonomous Vehicles Federated Learning +2

Paper
Add Code

MM-SAP: A Comprehensive Benchmark for Assessing Self-Awareness of Multimodal Large Language Models in Perception

1 code implementation • 15 Jan 2024 • Yuhao Wang, Yusheng Liao, Heyang Liu, Hongcheng Liu, Yu Wang, Yanfeng Wang

We believe that these hallucinations are partially due to the models' struggle with understanding what they can and cannot perceive from images, a capability we refer to as self-awareness in perception.

Paper
Code

One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts

no code implementations • 28 Dec 2023 • Ziheng Zhao, Yao Zhang, Chaoyi Wu, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie

Our main contributions are three folds: (i) on data construction, we combine multiple knowledge sources to construct a multi-modal medical knowledge tree; Then we build up a large-scale segmentation dataset for training, by collecting over 11K 3D medical image scans from 31 segmentation datasets with careful standardization on both visual scans and label space; (ii) on model training, we formulate a universal segmentation model, that can be prompted by inputting medical terminologies in text form.

Representation Learning Segmentation +1

Paper
Add Code

Large-scale Long-tailed Disease Diagnosis on Radiology Images

1 code implementation • 26 Dec 2023 • Qiaoyu Zheng, Weike Zhao, Chaoyi Wu, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie

In this study, we aim to investigate the problem of large-scale, large-vocabulary disease classification for radiologic images, which can be formulated as a multi-modal, multi-anatomy, multi-label, long-tailed classification.

Anatomy

Paper
Code

A Strong Baseline for Temporal Video-Text Alignment

no code implementations • 21 Dec 2023 • Zeqian Li, Qirui Chen, Tengda Han, Ya zhang, Yanfeng Wang, Weidi Xie

In this paper, we consider the problem of temporally aligning the video and texts from instructional videos, specifically, given a long-term video, and associated text sentences, our goal is to determine their corresponding timestamps in the video.

Descriptive Language Modelling +3

Paper
Add Code

MedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models

no code implementations • 20 Dec 2023 • Yan Cai, LinLin Wang, Ye Wang, Gerard de Melo, Ya zhang, Yanfeng Wang, Liang He

The emergence of various medical large language models (LLMs) in the medical domain has highlighted the need for unified evaluation standards, as manual evaluation of LLMs proves to be time-consuming and labor-intensive.

Clinical Knowledge

Paper
Add Code

UniChest: Conquer-and-Divide Pre-training for Multi-Source Chest X-Ray Classification

1 code implementation • 18 Dec 2023 • Tianjie Dai, Ruipeng Zhang, Feng Hong, Jiangchao Yao, Ya zhang, Yanfeng Wang

Vision-Language Pre-training (VLP) that utilizes the multi-modal information to promote the training efficiency and effectiveness, has achieved great success in vision recognition of natural domains and shown promise in medical imaging diagnosis for the Chest X-Rays (CXRs).

Paper
Code

Hypergraph Transformer for Semi-Supervised Classification

1 code implementation • 18 Dec 2023 • Zexi Liu, Bohan Tang, Ziyuan Ye, Xiaowen Dong, Siheng Chen, Yanfeng Wang

Hypergraphs play a pivotal role in the modelling of data featuring higher-order relations involving more than two entities.

Classification Node Classification +1

Paper
Code

Federated Learning Empowered by Generative Content

no code implementations • 10 Dec 2023 • Rui Ye, Xinyu Zhu, Jingyi Chai, Siheng Chen, Yanfeng Wang

In this paper, we propose a novel FL framework termed FedGC, designed to mitigate data heterogeneity issues by diversifying private data with generative content.

Federated Learning Privacy Preserving

Paper
Add Code

Fake It Till Make It: Federated Learning with Consensus-Oriented Generation

no code implementations • 10 Dec 2023 • Rui Ye, Yaxin Du, Zhenyang Ni, Siheng Chen, Yanfeng Wang

FedCOG consists of two key components at the client side: complementary data generation, which generates data extracted from the shared global model to complement the original dataset, and knowledge-distillation-based model training, which distills knowledge from global model to local model based on the generated data to mitigate over-fitting the original heterogeneous dataset.

Federated Learning Knowledge Distillation

Paper
Add Code

Combating Representation Learning Disparity with Geometric Harmonization

1 code implementation • NeurIPS 2023 • Zhihan Zhou, Jiangchao Yao, Feng Hong, Ya zhang, Bo Han, Yanfeng Wang

Self-supervised learning (SSL) as an effective paradigm of representation learning has achieved tremendous success on various curated datasets in diverse scenarios.

Representation Learning Self-Supervised Learning

Paper
Code

Can GPT-4V(ision) Serve Medical Applications? Case Studies on GPT-4V for Multimodal Medical Diagnosis

1 code implementation • 15 Oct 2023 • Chaoyi Wu, Jiayu Lei, Qiaoyu Zheng, Weike Zhao, Weixiong Lin, Xiaoman Zhang, Xiao Zhou, Ziheng Zhao, Ya zhang, Yanfeng Wang, Weidi Xie

Driven by the large foundation models, the development of artificial intelligence has witnessed tremendous progress lately, leading to a surge of general interest from the public.

Anatomy Computed Tomography (CT) +2

Paper
Code

Improving the Reliability of Large Language Models by Leveraging Uncertainty-Aware In-Context Learning

no code implementations • 7 Oct 2023 • Yuchen Yang, Houqiang Li, Yanfeng Wang, Yu Wang

In this study, we introduce an uncertainty-aware in-context learning framework to empower the model to enhance or reject its output in response to uncertainty.

Hallucination In-Context Learning +1

Paper
Add Code

MSG-BART: Multi-granularity Scene Graph-Enhanced Encoder-Decoder Language Model for Video-grounded Dialogue Generation

no code implementations • 26 Sep 2023 • Hongcheng Liu, Zhe Chen, Hui Li, Pingjie Wang, Yanfeng Wang, Yu Wang

Generating dialogue grounded in videos requires a high level of understanding and reasoning about the visual scenes in the videos.

Dialogue Generation Language Modelling

Paper
Add Code

UniBrain: Universal Brain MRI Diagnosis with Hierarchical Knowledge-enhanced Pre-training

1 code implementation • 13 Sep 2023 • Jiayu Lei, Lisong Dai, Haoyun Jiang, Chaoyi Wu, Xiaoman Zhang, Yao Zhang, Jiangchao Yao, Weidi Xie, Yanyong Zhang, Yuehua Li, Ya zhang, Yanfeng Wang

Magnetic resonance imaging~(MRI) have played a crucial role in brain disease diagnosis, with which a range of computer-aided artificial intelligence methods have been proposed.

Paper
Code

An Automatic Evaluation Framework for Multi-turn Medical Consultations Capabilities of Large Language Models

no code implementations • 5 Sep 2023 • Yusheng Liao, Yutong Meng, Hongcheng Liu, Yanfeng Wang, Yu Wang

A medical consultation training set is further constructed to improve the consultation ability of LLMs.

Multiple-choice

Paper
Add Code

AttrSeg: Open-Vocabulary Semantic Segmentation via Attribute Decomposition-Aggregation

no code implementations • NeurIPS 2023 • Chaofan Ma, Yuhuan Yang, Chen Ju, Fei Zhang, Ya zhang, Yanfeng Wang

The results show the superior performance of attribute decomposition-aggregation.

Attribute Open Vocabulary Semantic Segmentation +1

Paper
Add Code

LibriSQA: A Novel Dataset and Framework for Spoken Question Answering with Large Language Models

1 code implementation • 20 Aug 2023 • Zihan Zhao, Yiyang Jiang, Heyang Liu, Yanfeng Wang, Yu Wang

While Large Language Models (LLMs) have demonstrated commendable performance across a myriad of domains and tasks, existing LLMs still exhibit a palpable deficit in handling multimodal functionalities, especially for the Spoken Question Answering (SQA) task which necessitates precise alignment and deep interaction between speech and text features.

Multiple-choice Question Answering

Paper
Code

Bag of Tricks for Long-Tailed Multi-Label Classification on Chest X-Rays

no code implementations • 17 Aug 2023 • Feng Hong, Tianjie Dai, Jiangchao Yao, Ya zhang, Yanfeng Wang

Clinical classification of chest radiography is particularly challenging for standard machine learning algorithms due to its inherent long-tailed and multi-label nature.

Data Augmentation Multi-Label Classification

Paper
Add Code

Auxiliary Tasks Benefit 3D Skeleton-based Human Motion Prediction

1 code implementation • ICCV 2023 • Chenxin Xu, Robby T. Tan, Yuhong Tan, Siheng Chen, Xinchao Wang, Yanfeng Wang

To work with auxiliary tasks, we propose a novel auxiliary-adapted transformer, which can handle incomplete, corrupted motion data and achieve coordinate recovery via capturing spatial-temporal dependencies.

Human motion prediction motion prediction

Paper
Code

Multi-Scale Memory Comparison for Zero-/Few-Shot Anomaly Detection

no code implementations • 9 Aug 2023 • Chaoqin Huang, Aofan Jiang, Ya zhang, Yanfeng Wang

Anomaly detection has gained considerable attention due to its broad range of applications, particularly in industrial defect detection.

Anomaly Detection Defect Detection +1

Paper
Add Code

Joint-Relation Transformer for Multi-Person Motion Prediction

1 code implementation • ICCV 2023 • Qingyao Xu, Weibo Mao, Jingze Gong, Chenxin Xu, Siheng Chen, Weidi Xie, Ya zhang, Yanfeng Wang

Multi-person motion prediction is a challenging problem due to the dependency of motion on both individual past movements and interactions with other people.

motion prediction Relation

Paper
Code

Towards Generalist Foundation Model for Radiology by Leveraging Web-scale 2D&3D Medical Data

1 code implementation • 4 Aug 2023 • Chaoyi Wu, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie

In this study, we aim to initiate the development of Radiology Foundation Model, termed as RadFM.

Question Answering Visual Question Answering

276

Paper
Code

Multi-scale Cross-restoration Framework for Electrocardiogram Anomaly Detection

1 code implementation • 3 Aug 2023 • Aofan Jiang, Chaoqin Huang, Qing Cao, Shuang Wu, Zi Zeng, Kang Chen, Ya zhang, Yanfeng Wang

To address this challenge, this paper introduces a novel multi-scale cross-restoration framework for ECG anomaly detection and localization that considers both local and global ECG characteristics.

Anomaly Detection

Paper
Code

Balanced Destruction-Reconstruction Dynamics for Memory-replay Class Incremental Learning

1 code implementation • 3 Aug 2023 • YuHang Zhou, Jiangchao Yao, Feng Hong, Ya zhang, Yanfeng Wang

By dynamically manipulating the gradient during training based on these factors, BDR can effectively alleviate knowledge destruction and improve knowledge reconstruction.

Class Incremental Learning Incremental Learning

Paper
Code

Audio-aware Query-enhanced Transformer for Audio-Visual Segmentation

no code implementations • 25 Jul 2023 • Jinxiang Liu, Chen Ju, Chaofan Ma, Yanfeng Wang, Yu Wang, Ya zhang

The goal of the audio-visual segmentation (AVS) task is to segment the sounding objects in the video frames using audio cues.

Segmentation

Paper
Add Code

All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment

no code implementations • 7 Jul 2023 • Chunhui Zhang, Xin Sun, Li Liu, Yiqian Yang, Qiong Liu, Xi Zhou, Yanfeng Wang

This approach achieves feature integration in a unified backbone, removing the need for carefully-designed fusion modules and resulting in a more effective and efficient VL tracking framework.

Paper
Add Code

Multi-Modal Prototypes for Open-Set Semantic Segmentation

no code implementations • 5 Jul 2023 • Yuhuan Yang, Chaofan Ma, Chen Ju, Ya zhang, Yanfeng Wang

In this paper, we define a unified setting termed as open-set semantic segmentation (O3S), which aims to learn seen and unseen semantics from both visual examples and textual names.

Segmentation Semantic Segmentation

Paper
Add Code

Boost Video Frame Interpolation via Motion Adaptation

1 code implementation • 24 Jun 2023 • HaoNing Wu, Xiaoyun Zhang, Weidi Xie, Ya zhang, Yanfeng Wang

Video frame interpolation (VFI) is a challenging task that aims to generate intermediate frames between two consecutive frames in a video.

Motion Estimation Video Frame Interpolation

Paper
Code

Zero-shot Composed Text-Image Retrieval

1 code implementation • 12 Jun 2023 • Yikun Liu, Jiangchao Yao, Ya zhang, Yanfeng Wang, Weidi Xie

In this paper, we consider the problem of composed image retrieval (CIR), it aims to train a model that can fuse multi-modal information, e. g., text and images, to accurately retrieve images that match the query, extending the user's expression ability.

Ranked #1 on Zero-Shot Composed Image Retrieval (ZS-CIR) on CIRR

Image Retrieval Retrieval +1

Paper
Code

Exploring Effective Mask Sampling Modeling for Neural Image Compression

no code implementations • 9 Jun 2023 • Lin Liu, Mingming Zhao, Shanxin Yuan, Wenlong Lyu, Wengang Zhou, Houqiang Li, Yanfeng Wang, Qi Tian

Specifically, Cube Mask Sampling Module (CMSM) is proposed to apply both spatial and channel mask sampling modeling to image compression in the pre-training stage.

Image Compression Self-Supervised Learning

Paper
Add Code

Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models

1 code implementation • 1 Jun 2023 • Chang Liu, HaoNing Wu, Yujie Zhong, Xiaoyun Zhang, Yanfeng Wang, Weidi Xie

Generative models have recently exhibited exceptional capabilities in text-to-image generation, but still struggle to generate image sequences coherently.

Story Visualization Style Transfer +2

150

Paper
Code

FedDisco: Federated Learning with Discrepancy-Aware Collaboration

1 code implementation • 30 May 2023 • Rui Ye, Mingkai Xu, Jianyu Wang, Chenxin Xu, Siheng Chen, Yanfeng Wang

However, based on our empirical observations and theoretical analysis, we find that the dataset size is not optimal and the discrepancy between local and global category distributions could be a beneficial and complementary indicator for determining aggregation weights.

Federated Learning

Paper
Code

PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering

2 code implementations • 17 May 2023 • Xiaoman Zhang, Chaoyi Wu, Ziheng Zhao, Weixiong Lin, Ya zhang, Yanfeng Wang, Weidi Xie

In this paper, we focus on the problem of Medical Visual Question Answering (MedVQA), which is crucial in efficiently interpreting medical images with vital clinic-relevant information.

Ranked #1 on Medical Visual Question Answering on PMC-VQA

Generative Visual Question Answering Language Modelling +4

142

Paper
Code

PMC-LLaMA: Towards Building Open-source Language Models for Medicine

1 code implementation • 27 Apr 2023 • Chaoyi Wu, Weixiong Lin, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie

Our contributions are threefold: (i) we systematically investigate the process of adapting a general-purpose foundation language model towards medical domain, this involves data-centric knowledge injection through the integration of 4. 8M biomedical academic papers and 30K medical textbooks, as well as comprehensive fine-tuning for alignment with domain-specific instructions; (ii) we contribute a large-scale, comprehensive dataset for instruction tuning.

Language Modelling Natural Language Understanding +1

534

Paper
Code

Collaboration Helps Camera Overtake LiDAR in 3D Detection

1 code implementation • CVPR 2023 • Yue Hu, Yifan Lu, Runsheng Xu, Weidi Xie, Siheng Chen, Yanfeng Wang

Camera-only 3D detection provides an economical solution with a simple configuration for localizing objects in 3D space compared to LiDAR-based detection systems.

Depth Estimation

Paper
Code

Multi-modal Prompting for Low-Shot Temporal Action Localization

no code implementations • 21 Mar 2023 • Chen Ju, Zeqian Li, Peisen Zhao, Ya zhang, Xiaopeng Zhang, Qi Tian, Yanfeng Wang, Weidi Xie

In this paper, we consider the problem of temporal action localization under low-shot (zero-shot & few-shot) scenario, with the goal of detecting and classifying the action instances from arbitrary categories within some untrimmed videos, even not seen at training time.

Action Classification Temporal Action Localization

Paper
Add Code

EqMotion: Equivariant Multi-agent Motion Prediction with Invariant Interaction Reasoning

1 code implementation • CVPR 2023 • Chenxin Xu, Robby T. Tan, Yuhong Tan, Siheng Chen, Yu Guang Wang, Xinchao Wang, Yanfeng Wang

In motion prediction tasks, maintaining motion equivariance under Euclidean geometric transformations and invariance of agent interaction is a critical and fundamental principle.

Ranked #1 on Human Pose Forecasting on Human3.6M

Human Pose Forecasting motion prediction +2

102

Paper
Code

Leapfrog Diffusion Model for Stochastic Trajectory Prediction

1 code implementation • CVPR 2023 • Weibo Mao, Chenxin Xu, Qi Zhu, Siheng Chen, Yanfeng Wang

The core of the proposed LED is to leverage a trainable leapfrog initializer to directly learn an expressive multi-modal distribution of future trajectories, which skips a large number of denoising steps, significantly accelerating inference speed.

Denoising Trajectory Prediction

113

Paper
Code

Boundary-aware Supervoxel-level Iteratively Refined Interactive 3D Image Segmentation with Multi-agent Reinforcement Learning

no code implementations • 19 Mar 2023 • Chaofan Ma, Qisen Xu, Xiangfeng Wang, Bo Jin, Xiaoyun Zhang, Yanfeng Wang, Ya zhang

Interactive segmentation has recently been explored to effectively and efficiently harvest high-quality segmentation masks by iteratively incorporating user hints.

Image Segmentation Interactive Segmentation +5

Paper
Add Code

DiffusionSeg: Adapting Diffusion Towards Unsupervised Object Discovery

no code implementations • 17 Mar 2023 • Chaofan Ma, Yuhuan Yang, Chen Ju, Fei Zhang, Jinxiang Liu, Yu Wang, Ya zhang, Yanfeng Wang

However, the challenges exist as there is one structural difference between generative and discriminative models, which limits the direct use.

Object Object Discovery +1

Paper
Add Code

TBP-Former: Learning Temporal Bird's-Eye-View Pyramid for Joint Perception and Prediction in Vision-Centric Autonomous Driving

no code implementations • CVPR 2023 • Shaoheng Fang, Zi Wang, Yiqi Zhong, Junhao Ge, Siheng Chen, Yanfeng Wang

Second, a spatial-temporal pyramid transformer is introduced to comprehensively extract multi-scale BEV features and predict future BEV states with the support of spatial-temporal priors.

Ranked #2 on Bird's-Eye View Semantic Segmentation on nuScenes (IoU ped - 224x480 - Vis filter. - 100x100 at 0.5 metric)

Autonomous Driving Bird's-Eye View Semantic Segmentation

Paper
Add Code

DR2: Diffusion-based Robust Degradation Remover for Blind Face Restoration

1 code implementation • CVPR 2023 • Zhixin Wang, Xiaoyun Zhang, Ziying Zhang, Huangjie Zheng, Mingyuan Zhou, Ya zhang, Yanfeng Wang

However, it is expensive and infeasible to include every type of degradation to cover real-world cases in the training data.

Blind Face Restoration Denoising

Paper
Code

PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents

1 code implementation • 13 Mar 2023 • Weixiong Lin, Ziheng Zhao, Xiaoman Zhang, Chaoyi Wu, Ya zhang, Yanfeng Wang, Weidi Xie

Foundation models trained on large-scale dataset gain a recent surge in CV and NLP.

Ranked #3 on Medical Visual Question Answering on PMC-VQA

Image Classification Medical Visual Question Answering +3

106

Paper
Code

Knowledge-enhanced Visual-Language Pre-training on Chest Radiology Images

1 code implementation • 27 Feb 2023 • Xiaoman Zhang, Chaoyi Wu, Ya zhang, Yanfeng Wang, Weidi Xie

While multi-modal foundation models pre-trained on large-scale data have been successful in natural language understanding and vision recognition, their use in medical domains is still limited due to the fine-grained nature of medical tasks and the high demand for domain knowledge.

Natural Language Understanding Representation Learning

102

Paper
Code

K-Diag: Knowledge-enhanced Disease Diagnosis in Radiographic Imaging

no code implementations • 22 Feb 2023 • Chaoyi Wu, Xiaoman Zhang, Yanfeng Wang, Ya zhang, Weidi Xie

In this paper, we consider the problem of disease diagnosis.

Anatomy Contrastive Learning +2

Paper
Add Code

Knowledge-aware Bayesian Co-attention for Multimodal Emotion Recognition

no code implementations • 20 Feb 2023 • Zihan Zhao, Yu Wang, Yanfeng Wang

Multimodal emotion recognition is a challenging research area that aims to fuse different modalities to predict human emotion.

Multimodal Emotion Recognition

Paper
Add Code

Long-Tailed Partial Label Learning via Dynamic Rebalancing

1 code implementation • 10 Feb 2023 • Feng Hong, Jiangchao Yao, Zhihan Zhou, Ya zhang, Yanfeng Wang

The straightforward combination of LT and PLL, i. e., LT-PLL, suffers from a fundamental dilemma: LT methods build upon a given class distribution that is unavailable in PLL, and the performance of PLL is severely influenced in long-tailed context.

Partial Label Learning

Paper
Code

Open-vocabulary Object Segmentation with Diffusion Models

1 code implementation • ICCV 2023 • Ziyi Li, Qinye Zhou, Xiaoyun Zhang, Ya zhang, Yanfeng Wang, Weidi Xie

The goal of this paper is to extract the visual-language correspondence from a pre-trained text-to-image diffusion model, in the form of segmentation map, i. e., simultaneously generating images and segmentation masks for the corresponding visual entities described in the text prompt.

Image Segmentation Object +3

156

Paper
Code

Integrating features from lymph node stations for metastatic lymph node detection

no code implementations • 9 Jan 2023 • Chaoyi Wu, Feng Chang, Xiao Su, Zhihan Wu, Yanfeng Wang, Ling Zhu, Ya zhang

The branch targets to solve a closely related task on the LN station level, i. e., classifying whether an LN station contains metastatic LN or not, so as to learn representations for LN stations.

Computed Tomography (CT)

Paper
Add Code

MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training in Radiology

no code implementations • 5 Jan 2023 • Chaoyi Wu, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie

In this paper, we consider enhancing medical visual-language pre-training (VLP) with domain-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice.

Medical Diagnosis Self-Supervised Learning

Paper
Add Code

Federated Domain Generalization With Generalization Adjustment

1 code implementation • CVPR 2023 • Ruipeng Zhang, Qinwei Xu, Jiangchao Yao, Ya zhang, Qi Tian, Yanfeng Wang

Federated Domain Generalization (FedDG) attempts to learn a global model in a privacy-preserving manner that generalizes well to new clients possibly with domain shift.

Domain Generalization Fairness +1

Paper
Code

MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training for X-ray Diagnosis

no code implementations • ICCV 2023 • Chaoyi Wu, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie

In this paper, we consider enhancing medical visual-language pre-training (VLP) with domain-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice.

Medical Diagnosis

Paper
Add Code

Distilling Vision-Language Pre-training to Collaborate with Weakly-Supervised Temporal Action Localization

no code implementations • CVPR 2023 • Chen Ju, Kunhao Zheng, Jinxiang Liu, Peisen Zhao, Ya zhang, Jianlong Chang, Yanfeng Wang, Qi Tian

And as a result, the dual-branch complementarity is effectively fused to promote a strong alliance.

Weakly-supervised Temporal Action Localization Weakly Supervised Temporal Action Localization

Paper
Add Code

FedSkip: Combatting Statistical Heterogeneity with Federated Skip Aggregation

1 code implementation • 14 Dec 2022 • Ziqing Fan, Yanfeng Wang, Jiangchao Yao, Lingjuan Lyu, Ya zhang, Qi Tian

However, in addition to previous explorations for improvement in federated averaging, our analysis shows that another critical bottleneck is the poorer optima of client models in more heterogeneous conditions.

Federated Learning

Paper
Code

Robust Collaborative 3D Object Detection in Presence of Pose Errors

1 code implementation • 14 Nov 2022 • Yifan Lu, Quanhao Li, Baoan Liu, Mehrdad Dianati, Chen Feng, Siheng Chen, Yanfeng Wang

Collaborative 3D object detection exploits information exchange among multiple agents to enhance accuracy of object detection in presence of sensor impairments such as occlusion.

3D Object Detection Object +2

117

Paper
Code

Unrolled Graph Learning for Multi-Agent Collaboration

no code implementations • 31 Oct 2022 • Enpei Zhang, Shuo Tang, Xiaowen Dong, Siheng Chen, Yanfeng Wang

To fill this gap, we propose a distributed multi-agent learning model inspired by human collaboration, in which the agents can autonomously detect suitable collaborators and refer to collaborators' model for better performance.

Graph Learning Rolling Shutter Correction

Paper
Add Code

Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models

1 code implementation • 27 Oct 2022 • Chaofan Ma, Yuhuan Yang, Yanfeng Wang, Ya zhang, Weidi Xie

When trained at a sufficient scale, self-supervised learning has exhibited a notable ability to solve a wide range of visual or language understanding tasks.

Image Segmentation Language Modelling +3

Paper
Code

Number-Adaptive Prototype Learning for 3D Point Cloud Semantic Segmentation

no code implementations • 18 Oct 2022 • Yangheng Zhao, Jun Wang, Xiaolong Li, Yue Hu, Ce Zhang, Yanfeng Wang, Siheng Chen

Instead of learning a single prototype for each class, in this paper, we propose to use an adaptive number of prototypes to dynamically describe the different point patterns within a semantic class.

Ranked #17 on 3D Semantic Segmentation on SemanticKITTI

3D Semantic Segmentation Scene Understanding +1

Paper
Add Code

A Simple Plugin for Transforming Images to Arbitrary Scales

no code implementations • 7 Oct 2022 • Qinye Zhou, Ziyi Li, Weidi Xie, Xiaoyun Zhang, Ya zhang, Yanfeng Wang

Existing models on super-resolution often specialized for one scale, fundamentally limiting their use in practical scenarios.

Super-Resolution

Paper
Add Code

Low-Light Video Enhancement with Synthetic Event Guidance

no code implementations • 23 Aug 2022 • Lin Liu, Junfeng An, Jianzhuang Liu, Shanxin Yuan, Xiangyu Chen, Wengang Zhou, Houqiang Li, Yanfeng Wang, Qi Tian

Low-light video enhancement (LLVE) is an important yet challenging task with many applications such as photographing and autonomous driving.

Autonomous Driving Image Enhancement +1

Paper
Add Code

Multi-level Fusion of Wav2vec 2.0 and BERT for Multimodal Emotion Recognition

no code implementations • 11 Jul 2022 • Zihan Zhao, Yanfeng Wang, Yu Wang

The research and applications of multimodal emotion recognition have become increasingly popular recently.

Multimodal Emotion Recognition Transfer Learning

Paper
Add Code

Collaborative Uncertainty Benefits Multi-Agent Multi-Modal Trajectory Forecasting

no code implementations • 11 Jul 2022 • Bohan Tang, Yiqi Zhong, Chenxin Xu, Wei-Tao Wu, Ulrich Neumann, Yanfeng Wang, Ya zhang, Siheng Chen

Further, we apply the proposed framework to current SOTA multi-agent multi-modal forecasting systems as a plugin module, which enables the SOTA systems to 1) estimate the uncertainty in the multi-agent multi-modal trajectory forecasting task; 2) rank the multiple predictions and select the optimal one based on the estimated uncertainty.

regression Task 2 +1

Paper
Add Code

Nextformer: A ConvNeXt Augmented Conformer For End-To-End Speech Recognition

1 code implementation • 29 Jun 2022 • Yongjun Jiang, Jian Yu, Wenwen Yang, Bihong Zhang, Yanfeng Wang

To the best of our knowledge, the proposed Nextformer model achieves SOTA results on AISHELL-1(CER 4. 06%) and WenetSpeech(CER 7. 56%/11. 29%).

speech-recognition Speech Recognition

Paper
Code

K-Space Transformer for Undersampled MRI Reconstruction

1 code implementation • 14 Jun 2022 • Ziheng Zhao, Tianjiao Zhang, Weidi Xie, Yanfeng Wang, Ya zhang

This paper considers the problem of undersampled MRI reconstruction.

Inductive Bias MRI Reconstruction

Paper
Code

Contrastive Learning with Boosted Memorization

1 code implementation • 25 May 2022 • Zhihan Zhou, Jiangchao Yao, Yanfeng Wang, Bo Han, Ya zhang

Different from previous works, we explore this direction from an alternative perspective, i. e., the data perspective, and propose a novel Boosted Contrastive Learning (BCL) method.

Contrastive Learning Memorization +2

109

Paper
Code

Self-Supervised Masking for Unsupervised Anomaly Detection and Localization

no code implementations • 13 May 2022 • Chaoqin Huang, Qinwei Xu, Yanfeng Wang, Yu Wang, Ya zhang

To extend the reconstruction-based anomaly detection architecture to the localized anomalies, we propose a self-supervised learning approach through random masking and then restoring, named Self-Supervised Masking (SSM) for unsupervised anomaly detection and localization.

Defect Detection Medical Diagnosis +2

Paper
Add Code

Handwritten Mathematical Expression Recognition via Attention Aggregation based Bi-directional Mutual Learning

1 code implementation • 7 Dec 2021 • Xiaohang Bian, Bo Qin, Xiaozhe Xin, Jianwu Li, Xuefeng Su, Yanfeng Wang

Handwritten mathematical expression recognition aims to automatically generate LaTeX sequences from given images.

Data Augmentation Transfer Learning

Paper
Code

Self-supervised Tumor Segmentation through Layer Decomposition

no code implementations • 7 Sep 2021 • Xiaoman Zhang, Weidi Xie, Chaoqin Huang, Yanfeng Wang, Ya zhang, Xin Chen, Qi Tian

In this paper, we target self-supervised representation learning for zero-shot tumor segmentation.

Brain Tumor Segmentation Data Augmentation +5

Paper
Add Code

Multiscale Spatio-Temporal Graph Neural Networks for 3D Skeleton-Based Motion Prediction

no code implementations • 25 Aug 2021 • Maosen Li, Siheng Chen, Yangheng Zhao, Ya zhang, Yanfeng Wang, Qi Tian

The core of MST-GNN is a multiscale spatio-temporal graph that explicitly models the relations in motions at various spatial and temporal scales.

motion prediction

Paper
Add Code

Cooperative Learning for Noisy Supervision

no code implementations • 11 Aug 2021 • Hao Wu, Jiangchao Yao, Ya zhang, Yanfeng Wang

Learning with noisy labels has gained the enormous interest in the robust deep learning area.

Learning with noisy labels

Paper
Add Code

MS-KD: Multi-Organ Segmentation with Multiple Binary-Labeled Datasets

no code implementations • 5 Aug 2021 • Shixiang Feng, YuHang Zhou, Xiaoman Zhang, Ya zhang, Yanfeng Wang

A novel Multi-teacher Single-student Knowledge Distillation (MS-KD) framework is proposed, where the teacher models are pre-trained single-organ segmentation networks, and the student model is a multi-organ segmentation network.

Knowledge Distillation Organ Segmentation +1

Paper
Add Code

A Fourier-based Framework for Domain Generalization

1 code implementation • CVPR 2021 • Qinwei Xu, Ruipeng Zhang, Ya zhang, Yanfeng Wang, Qi Tian

Modern deep neural networks suffer from performance degradation when evaluated on testing data under different distributions from training data.

Data Augmentation Domain Generalization

146

Paper
Code

H2O: A Benchmark for Visual Human-human Object Handover Analysis

no code implementations • ICCV 2021 • Ruolin Ye, Wenqiang Xu, Zhendong Xue, Tutian Tang, Yanfeng Wang, Cewu Lu

Besides, we also report the hand and object pose errors with existing baselines and show that the dataset can serve as the video demonstrations for robot imitation learning on the handover task.

Imitation Learning Object

Paper
Add Code

Collaborative Label Correction via Entropy Thresholding

no code implementations • 31 Mar 2021 • Hao Wu, Jiangchao Yao, Jiajie Wang, Yinru Chen, Ya zhang, Yanfeng Wang

Deep neural networks (DNNs) have the capacity to fit extremely noisy labels nonetheless they tend to learn data with clean labels first and then memorize those with noisy labels.

Paper
Add Code

Divide and Conquer for Single-Frame Temporal Action Localization

no code implementations • ICCV 2021 • Chen Ju, Peisen Zhao, Siheng Chen, Ya zhang, Yanfeng Wang, Qi Tian

Single-frame temporal action localization (STAL) aims to localize actions in untrimmed videos with only one timestamp annotation for each action instance.

Temporal Action Localization

Paper
Add Code

FGraDA: A Dataset and Benchmark for Fine-Grained Domain Adaptation in Machine Translation

1 code implementation • LREC 2022 • Wenhao Zhu, ShuJian Huang, Tong Pu, Pingxuan Huang, Xu Zhang, Jian Yu, Wei Chen, Yanfeng Wang, Jiajun Chen

Previous research for adapting a general neural machine translation (NMT) model into a specific domain usually neglects the diversity in translation within the same domain, which is a core problem for domain adaptation in real-world scenarios.

Autonomous Vehicles Domain Adaptation +3

Paper
Code

Point-Level Temporal Action Localization: Bridging Fully-supervised Proposals to Weakly-supervised Losses

no code implementations • 15 Dec 2020 • Chen Ju, Peisen Zhao, Ya zhang, Yanfeng Wang, Qi Tian

Point-Level temporal action localization (PTAL) aims to localize actions in untrimmed videos with only one timestamp annotation for each action instance.

Ranked #3 on Weakly Supervised Action Localization on BEOID

Weakly Supervised Action Localization

Paper
Add Code

Privileged Knowledge Distillation for Online Action Detection

no code implementations • 18 Nov 2020 • Peisen Zhao, Lingxi Xie, Ya zhang, Yanfeng Wang, Qi Tian

Knowledge distillation is employed to transfer the privileged information from the offline teacher to the online student.

Ranked #11 on Online Action Detection on TVSeries

Knowledge Distillation Online Action Detection

Paper
Add Code

SAR: Scale-Aware Restoration Learning for 3D Tumor Segmentation

no code implementations • 13 Oct 2020 • Xiaoman Zhang, Shixiang Feng, YuHang Zhou, Ya zhang, Yanfeng Wang

We demonstrate the effectiveness of our methods on two downstream tasks: i) Brain tumor segmentation, ii) Pancreas tumor segmentation.

Brain Tumor Segmentation Segmentation +3

Paper
Add Code

Defending Adversarial Attacks by Correcting logits

no code implementations • 26 Jun 2019 • Yifeng Li, Lingxi Xie, Ya zhang, Rui Zhang, Yanfeng Wang, Qi Tian

Generating and eliminating adversarial examples has been an intriguing topic in the field of deep learning.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.