Awesome Multi-modal Object Tracking

no code implementations23 May 2024 Chunhui Zhang, Li Liu, Hao Wen, Xi Zhou, Yanfeng Wang

To leverage more modalities, some recent efforts have been made to learn a unified visual object tracking model for any modality.

Autonomous Driving Knowledge Distillation +4

JointRF: End-to-End Joint Optimization for Dynamic Neural Radiance Field Representation and Compression

no code implementations23 May 2024 Zihan Zheng, Houqiang Zhong, Qiang Hu, Xiaoyun Zhang, Li Song, Ya zhang, Yanfeng Wang

Neural Radiance Field (NeRF) excels in photo-realistically static scenes, inspiring numerous efforts to facilitate volumetric videos.

Feature Compression

Robust Collaborative Perception without External Localization and Clock Devices

no code implementations5 May 2024 Zixing Lei, Zhenyang Ni, Ruize Han, Shuo Tang, Chen Feng, Siheng Chen, Yanfeng Wang

To achieve this spatial-temporal alignment, traditional methods depend on external devices to provide localization and clock signals.

Low-Rank Knowledge Decomposition for Medical Foundation Models

no code implementations26 Apr 2024 YuHang Zhou, Haolin Li, Siyuan Du, Jiangchao Yao, Ya zhang, Yanfeng Wang

The popularity of large-scale pre-training has promoted the development of medical foundation models.

RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis

no code implementations25 Apr 2024 Xiaoman Zhang, Chaoyi Wu, Ziheng Zhao, Jiayu Lei, Ya zhang, Yanfeng Wang, Weidi Xie

We believe that RadGenome-Chest CT can significantly advance the development of multimodal medical foundation models, by training to generate texts based on given segmentation regions, which is unattainable with previous relevant datasets.

Segmentation Sentence +2

DENOISER: Rethinking the Robustness for Open-Vocabulary Action Recognition

no code implementations23 Apr 2024 Haozhe Cheng, Cheng Ju, Haicheng Wang, Jinxiang Liu, Mengting Chen, Qiang Hu, Xiaoyun Zhang, Yanfeng Wang

The denoised text classes help OVAR models classify visual samples more accurately; in return, classified visual samples help better denoising.

Denoising Open Vocabulary Action Recognition

Ethical-Lens: Curbing Malicious Usages of Open-Source Text-to-Image Models

no code implementations18 Apr 2024 Yuzhu Cai, Sheng Yin, Yuxi Wei, Chenxin Xu, Weibo Mao, Felix Juefei-Xu, Siheng Chen, Yanfeng Wang

The burgeoning landscape of text-to-image models, exemplified by innovations such as Midjourney and DALLE 3, has revolutionized content creation across diverse sectors.

Knowledge-enhanced Visual-Language Pretraining for Computational Pathology

1 code implementation15 Apr 2024 Xiao Zhou, Xiaoman Zhang, Chaoyi Wu, Ya zhang, Weidi Xie, Yanfeng Wang

In this paper, we consider the problem of visual representation learning for computational pathology, by exploiting large-scale image-text pairs gathered from public resources, along with the domain specific knowledge in pathology.

Cross-Modal Retrieval Language Modelling +4

MING-MOE: Enhancing Medical Multi-Task Learning in Large Language Models with Sparse Mixture of Low-Rank Adapter Experts

2 code implementations13 Apr 2024 Yusheng Liao, Shuyang Jiang, Yu Wang, Yanfeng Wang

Large language models like ChatGPT have shown substantial progress in natural language understanding and generation, proving valuable across various disciplines, including the medical field.

Language Modelling Large Language Model +2

Anomaly Detection in Electrocardiograms: Advancing Clinical Diagnosis Through Self-Supervised Learning

no code implementations7 Apr 2024 Aofan Jiang, Chaoqin Huang, Qing Cao, Yuchen Xu, Zi Zeng, Kang Chen, Ya zhang, Yanfeng Wang

We introduce a novel self-supervised learning framework for ECG AD, utilizing a vast dataset of normal ECGs to autonomously detect and localize cardiac anomalies.

Self-Supervised Anomaly Detection Self-Supervised Learning +2

ReMamber: Referring Image Segmentation with Mamba Twister

no code implementations26 Mar 2024 Yuhuan Yang, Chaofan Ma, Jiangchao Yao, Zhun Zhong, Ya zhang, Yanfeng Wang

Referring Image Segmentation (RIS) leveraging transformers has achieved great success on the interpretation of complex visual-language tasks.

Image Segmentation Semantic Segmentation

M$^3$AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset

no code implementations21 Mar 2024 Zhe Chen, Heyang Liu, Wenyi Yu, Guangzhi Sun, Hongcheng Liu, Ji Wu, Chao Zhang, Yu Wang, Yanfeng Wang

Although multiple academic video datasets have been constructed and released, few of them support both multimodal content recognition and understanding tasks, which is partially due to the lack of high-quality human annotations.

speech-recognition Speech Recognition +1

Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images

1 code implementation19 Mar 2024 Chaoqin Huang, Aofan Jiang, Jinghao Feng, Ya zhang, Xinchao Wang, Yanfeng Wang

Recent advancements in large-scale visual-language pre-trained models have led to significant progress in zero-/few-shot anomaly detection within natural image domains.

Anomaly Classification Anomaly Detection

Audio-Visual Segmentation via Unlabeled Frame Exploitation

no code implementations17 Mar 2024 Jinxiang Liu, Yikun Liu, Fei Zhang, Chen Ju, Ya zhang, Yanfeng Wang

NFs, temporally adjacent to the labeled frame, often contain rich motion information that assists in the accurate localization of sounding objects.


Automatic Interactive Evaluation for Large Language Models with State Aware Patient Simulator

2 code implementations13 Mar 2024 Yusheng Liao, Yutong Meng, Yuhao Wang, Hongcheng Liu, Yanfeng Wang, Yu Wang

Large Language Models (LLMs) have demonstrated remarkable proficiency in human interactions, yet their application within the medical field remains insufficiently explored.

Decentralized and Lifelong-Adaptive Multi-Agent Collaborative Learning

no code implementations11 Mar 2024 Shuo Tang, Rui Ye, Chenxin Xu, Xiaowen Dong, Siheng Chen, Yanfeng Wang

In this paper, we propose DeLAMA, a decentralized multi-agent lifelong collaborative learning algorithm with dynamic collaboration graphs.

Computational Efficiency Graph structure learning

Enhancing Data Quality in Federated Fine-Tuning of Foundation Models

no code implementations7 Mar 2024 Wanru Zhao, Yaxin Du, Nicholas Donald Lane, Siheng Chen, Yanfeng Wang

In the current landscape of foundation model training, there is a significant reliance on public domain data, which is nearing exhaustion according to recent research.

Leveraging Diverse Modeling Contexts with Collaborating Learning for Neural Machine Translation

no code implementations28 Feb 2024 Yusheng Liao, Yanfeng Wang, Yu Wang

Autoregressive (AR) and Non-autoregressive (NAR) models are two types of generative models for Neural Machine Translation (NMT).

Contrastive Learning Machine Translation +2

Towards Building Multilingual Language Model for Medicine

1 code implementation21 Feb 2024 Pengcheng Qiu, Chaoyi Wu, Xiaoman Zhang, Weixiong Lin, Haicheng Wang, Ya zhang, Yanfeng Wang, Weidi Xie

In this paper, we aim to develop an open-source, multilingual language model for medicine, that the benefits a wider, linguistically diverse audience from different regions.

Language Modelling Question Answering

M2K-VDG: Model-Adaptive Multimodal Knowledge Anchor Enhanced Video-grounded Dialogue Generation

no code implementations19 Feb 2024 Hongcheng Liu, Pingjie Wang, Yu Wang, Yanfeng Wang

Video-grounded dialogue generation (VDG) requires the system to generate a fluent and accurate answer based on multimodal knowledge.

counterfactual Dialogue Generation +1

Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents

1 code implementation8 Feb 2024 Yuxi Wei, Zi Wang, Yifan Lu, Chenxin Xu, Changxing Liu, Hao Zhao, Siheng Chen, Yanfeng Wang

Furthermore, to unleash the potential of extensive high-quality digital assets, ChatSim employs a novel multi-camera lighting estimation method to achieve scene-consistent assets' rendering.

Autonomous Driving Language Modelling +2

Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation

no code implementations8 Feb 2024 Xianghe Pang, Shuo Tang, Rui Ye, Yuxin Xiong, Bolun Zhang, Yanfeng Wang, Siheng Chen

Aligning large language models (LLMs) with human values is imperative to mitigate potential adverse effects resulting from their misuse.

An Extensible Framework for Open Heterogeneous Collaborative Perception

1 code implementation25 Jan 2024 Yifan Lu, Yue Hu, Yiqi Zhong, Dequan Wang, Yanfeng Wang, Siheng Chen

In this paper, we introduce a new open heterogeneous problem: how to accommodate continually emerging new heterogeneous agent types into collaborative perception, while ensuring high perception performance and low integration cost?

MM-SAP: A Comprehensive Benchmark for Assessing Self-Awareness of Multimodal Large Language Models in Perception

1 code implementation15 Jan 2024 Yuhao Wang, Yusheng Liao, Heyang Liu, Hongcheng Liu, Yu Wang, Yanfeng Wang

We believe that these hallucinations are partially due to the models' struggle with understanding what they can and cannot perceive from images, a capability we refer to as self-awareness in perception.

One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts

no code implementations28 Dec 2023 Ziheng Zhao, Yao Zhang, Chaoyi Wu, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie

In this study, we focus on building up a model that aims to Segment Anything in medical scenarios, driven by Text prompts, termed as SAT.

Anatomy Representation Learning +2

Large-scale Long-tailed Disease Diagnosis on Radiology Images

1 code implementation26 Dec 2023 Qiaoyu Zheng, Weike Zhao, Chaoyi Wu, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie

In this study, we aim to investigate the problem of large-scale, large-vocabulary disease classification for radiologic images, which can be formulated as a multi-modal, multi-anatomy, multi-label, long-tailed classification.


A Strong Baseline for Temporal Video-Text Alignment

no code implementations21 Dec 2023 Zeqian Li, Qirui Chen, Tengda Han, Ya zhang, Yanfeng Wang, Weidi Xie

In this paper, we consider the problem of temporally aligning the video and texts from instructional videos, specifically, given a long-term video, and associated text sentences, our goal is to determine their corresponding timestamps in the video.

Descriptive Language Modelling +3

MedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models

no code implementations20 Dec 2023 Yan Cai, LinLin Wang, Ye Wang, Gerard de Melo, Ya zhang, Yanfeng Wang, Liang He

The emergence of various medical large language models (LLMs) in the medical domain has highlighted the need for unified evaluation standards, as manual evaluation of LLMs proves to be time-consuming and labor-intensive.

Clinical Knowledge

Hypergraph Transformer for Semi-Supervised Classification

1 code implementation18 Dec 2023 Zexi Liu, Bohan Tang, Ziyuan Ye, Xiaowen Dong, Siheng Chen, Yanfeng Wang

Hypergraphs play a pivotal role in the modelling of data featuring higher-order relations involving more than two entities.

Classification Node Classification +1

UniChest: Conquer-and-Divide Pre-training for Multi-Source Chest X-Ray Classification

1 code implementation18 Dec 2023 Tianjie Dai, Ruipeng Zhang, Feng Hong, Jiangchao Yao, Ya zhang, Yanfeng Wang

Vision-Language Pre-training (VLP) that utilizes the multi-modal information to promote the training efficiency and effectiveness, has achieved great success in vision recognition of natural domains and shown promise in medical imaging diagnosis for the Chest X-Rays (CXRs).

Federated Learning Empowered by Generative Content

no code implementations10 Dec 2023 Rui Ye, Xinyu Zhu, Jingyi Chai, Siheng Chen, Yanfeng Wang

In this paper, we propose a novel FL framework termed FedGC, designed to mitigate data heterogeneity issues by diversifying private data with generative content.

Federated Learning Privacy Preserving

Fake It Till Make It: Federated Learning with Consensus-Oriented Generation

no code implementations10 Dec 2023 Rui Ye, Yaxin Du, Zhenyang Ni, Siheng Chen, Yanfeng Wang

FedCOG consists of two key components at the client side: complementary data generation, which generates data extracted from the shared global model to complement the original dataset, and knowledge-distillation-based model training, which distills knowledge from global model to local model based on the generated data to mitigate over-fitting the original heterogeneous dataset.

Federated Learning Knowledge Distillation

Combating Representation Learning Disparity with Geometric Harmonization

1 code implementation NeurIPS 2023 Zhihan Zhou, Jiangchao Yao, Feng Hong, Ya zhang, Bo Han, Yanfeng Wang

Self-supervised learning (SSL) as an effective paradigm of representation learning has achieved tremendous success on various curated datasets in diverse scenarios.

Representation Learning Self-Supervised Learning

Can GPT-4V(ision) Serve Medical Applications? Case Studies on GPT-4V for Multimodal Medical Diagnosis

1 code implementation15 Oct 2023 Chaoyi Wu, Jiayu Lei, Qiaoyu Zheng, Weike Zhao, Weixiong Lin, Xiaoman Zhang, Xiao Zhou, Ziheng Zhao, Ya zhang, Yanfeng Wang, Weidi Xie

Driven by the large foundation models, the development of artificial intelligence has witnessed tremendous progress lately, leading to a surge of general interest from the public.

Anatomy Computed Tomography (CT) +2

Improving the Reliability of Large Language Models by Leveraging Uncertainty-Aware In-Context Learning

no code implementations7 Oct 2023 Yuchen Yang, Houqiang Li, Yanfeng Wang, Yu Wang

In this study, we introduce an uncertainty-aware in-context learning framework to empower the model to enhance or reject its output in response to uncertainty.

Hallucination In-Context Learning +1

UniBrain: Universal Brain MRI Diagnosis with Hierarchical Knowledge-enhanced Pre-training

1 code implementation13 Sep 2023 Jiayu Lei, Lisong Dai, Haoyun Jiang, Chaoyi Wu, Xiaoman Zhang, Yao Zhang, Jiangchao Yao, Weidi Xie, Yanyong Zhang, Yuehua Li, Ya zhang, Yanfeng Wang

Magnetic resonance imaging~(MRI) have played a crucial role in brain disease diagnosis, with which a range of computer-aided artificial intelligence methods have been proposed.

LibriSQA: A Novel Dataset and Framework for Spoken Question Answering with Large Language Models

1 code implementation20 Aug 2023 Zihan Zhao, Yiyang Jiang, Heyang Liu, Yanfeng Wang, Yu Wang

While Large Language Models (LLMs) have demonstrated commendable performance across a myriad of domains and tasks, existing LLMs still exhibit a palpable deficit in handling multimodal functionalities, especially for the Spoken Question Answering (SQA) task which necessitates precise alignment and deep interaction between speech and text features.

Multiple-choice Question Answering

Bag of Tricks for Long-Tailed Multi-Label Classification on Chest X-Rays

no code implementations17 Aug 2023 Feng Hong, Tianjie Dai, Jiangchao Yao, Ya zhang, Yanfeng Wang

Clinical classification of chest radiography is particularly challenging for standard machine learning algorithms due to its inherent long-tailed and multi-label nature.

Data Augmentation Multi-Label Classification

Auxiliary Tasks Benefit 3D Skeleton-based Human Motion Prediction

1 code implementation ICCV 2023 Chenxin Xu, Robby T. Tan, Yuhong Tan, Siheng Chen, Xinchao Wang, Yanfeng Wang

To work with auxiliary tasks, we propose a novel auxiliary-adapted transformer, which can handle incomplete, corrupted motion data and achieve coordinate recovery via capturing spatial-temporal dependencies.

Human motion prediction motion prediction

Joint-Relation Transformer for Multi-Person Motion Prediction

1 code implementation ICCV 2023 Qingyao Xu, Weibo Mao, Jingze Gong, Chenxin Xu, Siheng Chen, Weidi Xie, Ya zhang, Yanfeng Wang

Multi-person motion prediction is a challenging problem due to the dependency of motion on both individual past movements and interactions with other people.

motion prediction Relation

Multi-Scale Memory Comparison for Zero-/Few-Shot Anomaly Detection

no code implementations9 Aug 2023 Chaoqin Huang, Aofan Jiang, Ya zhang, Yanfeng Wang

Anomaly detection has gained considerable attention due to its broad range of applications, particularly in industrial defect detection.

Anomaly Detection Defect Detection +1

Balanced Destruction-Reconstruction Dynamics for Memory-replay Class Incremental Learning

1 code implementation3 Aug 2023 YuHang Zhou, Jiangchao Yao, Feng Hong, Ya zhang, Yanfeng Wang

By dynamically manipulating the gradient during training based on these factors, BDR can effectively alleviate knowledge destruction and improve knowledge reconstruction.

Class Incremental Learning Incremental Learning

Multi-scale Cross-restoration Framework for Electrocardiogram Anomaly Detection

1 code implementation3 Aug 2023 Aofan Jiang, Chaoqin Huang, Qing Cao, Shuang Wu, Zi Zeng, Kang Chen, Ya zhang, Yanfeng Wang

To address this challenge, this paper introduces a novel multi-scale cross-restoration framework for ECG anomaly detection and localization that considers both local and global ECG characteristics.

Anomaly Detection

Audio-aware Query-enhanced Transformer for Audio-Visual Segmentation

no code implementations25 Jul 2023 Jinxiang Liu, Chen Ju, Chaofan Ma, Yanfeng Wang, Yu Wang, Ya zhang

The goal of the audio-visual segmentation (AVS) task is to segment the sounding objects in the video frames using audio cues.

Decoder Segmentation

All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment

no code implementations7 Jul 2023 Chunhui Zhang, Xin Sun, Li Liu, Yiqian Yang, Qiong Liu, Xi Zhou, Yanfeng Wang

This approach achieves feature integration in a unified backbone, removing the need for carefully-designed fusion modules and resulting in a more effective and efficient VL tracking framework.

Multi-Modal Prototypes for Open-Set Semantic Segmentation

no code implementations5 Jul 2023 Yuhuan Yang, Chaofan Ma, Chen Ju, Ya zhang, Yanfeng Wang

In this paper, we define a unified setting termed as open-set semantic segmentation (O3S), which aims to learn seen and unseen semantics from both visual examples and textual names.

Segmentation Semantic Segmentation

Boost Video Frame Interpolation via Motion Adaptation

1 code implementation24 Jun 2023 HaoNing Wu, Xiaoyun Zhang, Weidi Xie, Ya zhang, Yanfeng Wang

Video frame interpolation (VFI) is a challenging task that aims to generate intermediate frames between two consecutive frames in a video.

Motion Estimation Video Frame Interpolation

Zero-shot Composed Text-Image Retrieval

1 code implementation12 Jun 2023 Yikun Liu, Jiangchao Yao, Ya zhang, Yanfeng Wang, Weidi Xie

In this paper, we consider the problem of composed image retrieval (CIR), it aims to train a model that can fuse multi-modal information, e. g., text and images, to accurately retrieve images that match the query, extending the user's expression ability.

Image Retrieval Retrieval +1

Exploring Effective Mask Sampling Modeling for Neural Image Compression

no code implementations9 Jun 2023 Lin Liu, Mingming Zhao, Shanxin Yuan, Wenlong Lyu, Wengang Zhou, Houqiang Li, Yanfeng Wang, Qi Tian

Specifically, Cube Mask Sampling Module (CMSM) is proposed to apply both spatial and channel mask sampling modeling to image compression in the pre-training stage.

Image Compression Self-Supervised Learning

Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models

1 code implementation1 Jun 2023 Chang Liu, HaoNing Wu, Yujie Zhong, Xiaoyun Zhang, Yanfeng Wang, Weidi Xie

Generative models have recently exhibited exceptional capabilities in text-to-image generation, but still struggle to generate image sequences coherently.

Story Visualization Style Transfer +2

FedDisco: Federated Learning with Discrepancy-Aware Collaboration

1 code implementation30 May 2023 Rui Ye, Mingkai Xu, Jianyu Wang, Chenxin Xu, Siheng Chen, Yanfeng Wang

However, based on our empirical observations and theoretical analysis, we find that the dataset size is not optimal and the discrepancy between local and global category distributions could be a beneficial and complementary indicator for determining aggregation weights.

Federated Learning

PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering

2 code implementations17 May 2023 Xiaoman Zhang, Chaoyi Wu, Ziheng Zhao, Weixiong Lin, Ya zhang, Yanfeng Wang, Weidi Xie

In this paper, we focus on the problem of Medical Visual Question Answering (MedVQA), which is crucial in efficiently interpreting medical images with vital clinic-relevant information.

Generative Visual Question Answering Language Modelling +4

PMC-LLaMA: Towards Building Open-source Language Models for Medicine

1 code implementation27 Apr 2023 Chaoyi Wu, Weixiong Lin, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie

Our contributions are threefold: (i) we systematically investigate the process of adapting a general-purpose foundation language model towards medical domain, this involves data-centric knowledge injection through the integration of 4. 8M biomedical academic papers and 30K medical textbooks, as well as comprehensive fine-tuning for alignment with domain-specific instructions; (ii) we contribute a large-scale, comprehensive dataset for instruction tuning.

Language Modelling Natural Language Understanding +1

Collaboration Helps Camera Overtake LiDAR in 3D Detection

1 code implementation CVPR 2023 Yue Hu, Yifan Lu, Runsheng Xu, Weidi Xie, Siheng Chen, Yanfeng Wang

Camera-only 3D detection provides an economical solution with a simple configuration for localizing objects in 3D space compared to LiDAR-based detection systems.

Depth Estimation

Multi-modal Prompting for Low-Shot Temporal Action Localization

no code implementations21 Mar 2023 Chen Ju, Zeqian Li, Peisen Zhao, Ya zhang, Xiaopeng Zhang, Qi Tian, Yanfeng Wang, Weidi Xie

In this paper, we consider the problem of temporal action localization under low-shot (zero-shot & few-shot) scenario, with the goal of detecting and classifying the action instances from arbitrary categories within some untrimmed videos, even not seen at training time.

Action Classification Temporal Action Localization

EqMotion: Equivariant Multi-agent Motion Prediction with Invariant Interaction Reasoning

1 code implementation CVPR 2023 Chenxin Xu, Robby T. Tan, Yuhong Tan, Siheng Chen, Yu Guang Wang, Xinchao Wang, Yanfeng Wang

In motion prediction tasks, maintaining motion equivariance under Euclidean geometric transformations and invariance of agent interaction is a critical and fundamental principle.

Human Pose Forecasting motion prediction +2

Leapfrog Diffusion Model for Stochastic Trajectory Prediction

1 code implementation CVPR 2023 Weibo Mao, Chenxin Xu, Qi Zhu, Siheng Chen, Yanfeng Wang

The core of the proposed LED is to leverage a trainable leapfrog initializer to directly learn an expressive multi-modal distribution of future trajectories, which skips a large number of denoising steps, significantly accelerating inference speed.

Denoising Trajectory Prediction

Boundary-aware Supervoxel-level Iteratively Refined Interactive 3D Image Segmentation with Multi-agent Reinforcement Learning

no code implementations19 Mar 2023 Chaofan Ma, Qisen Xu, Xiangfeng Wang, Bo Jin, Xiaoyun Zhang, Yanfeng Wang, Ya zhang

Interactive segmentation has recently been explored to effectively and efficiently harvest high-quality segmentation masks by iteratively incorporating user hints.

Image Segmentation Interactive Segmentation +5

TBP-Former: Learning Temporal Bird's-Eye-View Pyramid for Joint Perception and Prediction in Vision-Centric Autonomous Driving

no code implementations CVPR 2023 Shaoheng Fang, Zi Wang, Yiqi Zhong, Junhao Ge, Siheng Chen, Yanfeng Wang

Second, a spatial-temporal pyramid transformer is introduced to comprehensively extract multi-scale BEV features and predict future BEV states with the support of spatial-temporal priors.

Ranked #2 on Bird's-Eye View Semantic Segmentation on nuScenes (IoU ped - 224x480 - Vis filter. - 100x100 at 0.5 metric)

Autonomous Driving Bird's-Eye View Semantic Segmentation

DiffusionSeg: Adapting Diffusion Towards Unsupervised Object Discovery

no code implementations17 Mar 2023 Chaofan Ma, Yuhuan Yang, Chen Ju, Fei Zhang, Jinxiang Liu, Yu Wang, Ya zhang, Yanfeng Wang

However, the challenges exist as there is one structural difference between generative and discriminative models, which limits the direct use.

Object Object Discovery +1

Knowledge-enhanced Visual-Language Pre-training on Chest Radiology Images

1 code implementation27 Feb 2023 Xiaoman Zhang, Chaoyi Wu, Ya zhang, Yanfeng Wang, Weidi Xie

While multi-modal foundation models pre-trained on large-scale data have been successful in natural language understanding and vision recognition, their use in medical domains is still limited due to the fine-grained nature of medical tasks and the high demand for domain knowledge.

Natural Language Understanding Representation Learning

Knowledge-aware Bayesian Co-attention for Multimodal Emotion Recognition

no code implementations20 Feb 2023 Zihan Zhao, Yu Wang, Yanfeng Wang

Multimodal emotion recognition is a challenging research area that aims to fuse different modalities to predict human emotion.

Multimodal Emotion Recognition

Long-Tailed Partial Label Learning via Dynamic Rebalancing

1 code implementation10 Feb 2023 Feng Hong, Jiangchao Yao, Zhihan Zhou, Ya zhang, Yanfeng Wang

The straightforward combination of LT and PLL, i. e., LT-PLL, suffers from a fundamental dilemma: LT methods build upon a given class distribution that is unavailable in PLL, and the performance of PLL is severely influenced in long-tailed context.

Partial Label Learning

Open-vocabulary Object Segmentation with Diffusion Models

1 code implementation ICCV 2023 Ziyi Li, Qinye Zhou, Xiaoyun Zhang, Ya zhang, Yanfeng Wang, Weidi Xie

The goal of this paper is to extract the visual-language correspondence from a pre-trained text-to-image diffusion model, in the form of segmentation map, i. e., simultaneously generating images and segmentation masks for the corresponding visual entities described in the text prompt.

Image Segmentation Object +3

Integrating features from lymph node stations for metastatic lymph node detection

no code implementations9 Jan 2023 Chaoyi Wu, Feng Chang, Xiao Su, Zhihan Wu, Yanfeng Wang, Ling Zhu, Ya zhang

The branch targets to solve a closely related task on the LN station level, i. e., classifying whether an LN station contains metastatic LN or not, so as to learn representations for LN stations.

Computed Tomography (CT)

MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training in Radiology

no code implementations5 Jan 2023 Chaoyi Wu, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie

In this paper, we consider enhancing medical visual-language pre-training (VLP) with domain-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice.

Medical Diagnosis Self-Supervised Learning

Federated Domain Generalization With Generalization Adjustment

1 code implementation CVPR 2023 Ruipeng Zhang, Qinwei Xu, Jiangchao Yao, Ya zhang, Qi Tian, Yanfeng Wang

Federated Domain Generalization (FedDG) attempts to learn a global model in a privacy-preserving manner that generalizes well to new clients possibly with domain shift.

Domain Generalization Fairness +1

MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training for X-ray Diagnosis

no code implementations ICCV 2023 Chaoyi Wu, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie

In this paper, we consider enhancing medical visual-language pre-training (VLP) with domain-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice.

Medical Diagnosis

FedSkip: Combatting Statistical Heterogeneity with Federated Skip Aggregation

1 code implementation14 Dec 2022 Ziqing Fan, Yanfeng Wang, Jiangchao Yao, Lingjuan Lyu, Ya zhang, Qi Tian

However, in addition to previous explorations for improvement in federated averaging, our analysis shows that another critical bottleneck is the poorer optima of client models in more heterogeneous conditions.

Federated Learning

Robust Collaborative 3D Object Detection in Presence of Pose Errors

1 code implementation14 Nov 2022 Yifan Lu, Quanhao Li, Baoan Liu, Mehrdad Dianati, Chen Feng, Siheng Chen, Yanfeng Wang

Collaborative 3D object detection exploits information exchange among multiple agents to enhance accuracy of object detection in presence of sensor impairments such as occlusion.

3D Object Detection Object +2

Unrolled Graph Learning for Multi-Agent Collaboration

no code implementations31 Oct 2022 Enpei Zhang, Shuo Tang, Xiaowen Dong, Siheng Chen, Yanfeng Wang

To fill this gap, we propose a distributed multi-agent learning model inspired by human collaboration, in which the agents can autonomously detect suitable collaborators and refer to collaborators' model for better performance.

Graph Learning Rolling Shutter Correction

Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models

1 code implementation27 Oct 2022 Chaofan Ma, Yuhuan Yang, Yanfeng Wang, Ya zhang, Weidi Xie

When trained at a sufficient scale, self-supervised learning has exhibited a notable ability to solve a wide range of visual or language understanding tasks.

Image Segmentation Language Modelling +3

Number-Adaptive Prototype Learning for 3D Point Cloud Semantic Segmentation

no code implementations18 Oct 2022 Yangheng Zhao, Jun Wang, Xiaolong Li, Yue Hu, Ce Zhang, Yanfeng Wang, Siheng Chen

Instead of learning a single prototype for each class, in this paper, we propose to use an adaptive number of prototypes to dynamically describe the different point patterns within a semantic class.

3D Semantic Segmentation Scene Understanding +1

A Simple Plugin for Transforming Images to Arbitrary Scales

no code implementations7 Oct 2022 Qinye Zhou, Ziyi Li, Weidi Xie, Xiaoyun Zhang, Ya zhang, Yanfeng Wang

Existing models on super-resolution often specialized for one scale, fundamentally limiting their use in practical scenarios.


Low-Light Video Enhancement with Synthetic Event Guidance

no code implementations23 Aug 2022 Lin Liu, Junfeng An, Jianzhuang Liu, Shanxin Yuan, Xiangyu Chen, Wengang Zhou, Houqiang Li, Yanfeng Wang, Qi Tian

Low-light video enhancement (LLVE) is an important yet challenging task with many applications such as photographing and autonomous driving.

Autonomous Driving Image Enhancement +1

Multi-level Fusion of Wav2vec 2.0 and BERT for Multimodal Emotion Recognition

no code implementations11 Jul 2022 Zihan Zhao, Yanfeng Wang, Yu Wang

The research and applications of multimodal emotion recognition have become increasingly popular recently.

Multimodal Emotion Recognition Transfer Learning

Collaborative Uncertainty Benefits Multi-Agent Multi-Modal Trajectory Forecasting

no code implementations11 Jul 2022 Bohan Tang, Yiqi Zhong, Chenxin Xu, Wei-Tao Wu, Ulrich Neumann, Yanfeng Wang, Ya zhang, Siheng Chen

Further, we apply the proposed framework to current SOTA multi-agent multi-modal forecasting systems as a plugin module, which enables the SOTA systems to 1) estimate the uncertainty in the multi-agent multi-modal trajectory forecasting task; 2) rank the multiple predictions and select the optimal one based on the estimated uncertainty.

regression Task 2 +1

Nextformer: A ConvNeXt Augmented Conformer For End-To-End Speech Recognition

1 code implementation29 Jun 2022 Yongjun Jiang, Jian Yu, Wenwen Yang, Bihong Zhang, Yanfeng Wang

To the best of our knowledge, the proposed Nextformer model achieves SOTA results on AISHELL-1(CER 4. 06%) and WenetSpeech(CER 7. 56%/11. 29%).

speech-recognition Speech Recognition

Contrastive Learning with Boosted Memorization

1 code implementation25 May 2022 Zhihan Zhou, Jiangchao Yao, Yanfeng Wang, Bo Han, Ya zhang

Different from previous works, we explore this direction from an alternative perspective, i. e., the data perspective, and propose a novel Boosted Contrastive Learning (BCL) method.

Contrastive Learning Memorization +2

Self-Supervised Masking for Unsupervised Anomaly Detection and Localization

no code implementations13 May 2022 Chaoqin Huang, Qinwei Xu, Yanfeng Wang, Yu Wang, Ya zhang

To extend the reconstruction-based anomaly detection architecture to the localized anomalies, we propose a self-supervised learning approach through random masking and then restoring, named Self-Supervised Masking (SSM) for unsupervised anomaly detection and localization.

Defect Detection Medical Diagnosis +2

Multiscale Spatio-Temporal Graph Neural Networks for 3D Skeleton-Based Motion Prediction

no code implementations25 Aug 2021 Maosen Li, Siheng Chen, Yangheng Zhao, Ya zhang, Yanfeng Wang, Qi Tian

The core of MST-GNN is a multiscale spatio-temporal graph that explicitly models the relations in motions at various spatial and temporal scales.

Decoder motion prediction

Cooperative Learning for Noisy Supervision

no code implementations11 Aug 2021 Hao Wu, Jiangchao Yao, Ya zhang, Yanfeng Wang

Learning with noisy labels has gained the enormous interest in the robust deep learning area.

Learning with noisy labels

MS-KD: Multi-Organ Segmentation with Multiple Binary-Labeled Datasets

no code implementations5 Aug 2021 Shixiang Feng, YuHang Zhou, Xiaoman Zhang, Ya zhang, Yanfeng Wang

A novel Multi-teacher Single-student Knowledge Distillation (MS-KD) framework is proposed, where the teacher models are pre-trained single-organ segmentation networks, and the student model is a multi-organ segmentation network.

Knowledge Distillation Organ Segmentation +1

A Fourier-based Framework for Domain Generalization

1 code implementation CVPR 2021 Qinwei Xu, Ruipeng Zhang, Ya zhang, Yanfeng Wang, Qi Tian

Modern deep neural networks suffer from performance degradation when evaluated on testing data under different distributions from training data.

Data Augmentation Domain Generalization

H2O: A Benchmark for Visual Human-human Object Handover Analysis

no code implementations ICCV 2021 Ruolin Ye, Wenqiang Xu, Zhendong Xue, Tutian Tang, Yanfeng Wang, Cewu Lu

Besides, we also report the hand and object pose errors with existing baselines and show that the dataset can serve as the video demonstrations for robot imitation learning on the handover task.

Imitation Learning Object

Collaborative Label Correction via Entropy Thresholding

no code implementations31 Mar 2021 Hao Wu, Jiangchao Yao, Jiajie Wang, Yinru Chen, Ya zhang, Yanfeng Wang

Deep neural networks (DNNs) have the capacity to fit extremely noisy labels nonetheless they tend to learn data with clean labels first and then memorize those with noisy labels.

Divide and Conquer for Single-Frame Temporal Action Localization

no code implementations ICCV 2021 Chen Ju, Peisen Zhao, Siheng Chen, Ya zhang, Yanfeng Wang, Qi Tian

Single-frame temporal action localization (STAL) aims to localize actions in untrimmed videos with only one timestamp annotation for each action instance.

Temporal Action Localization

FGraDA: A Dataset and Benchmark for Fine-Grained Domain Adaptation in Machine Translation

1 code implementation LREC 2022 Wenhao Zhu, ShuJian Huang, Tong Pu, Pingxuan Huang, Xu Zhang, Jian Yu, Wei Chen, Yanfeng Wang, Jiajun Chen

Previous research for adapting a general neural machine translation (NMT) model into a specific domain usually neglects the diversity in translation within the same domain, which is a core problem for domain adaptation in real-world scenarios.

Autonomous Vehicles Domain Adaptation +3

Point-Level Temporal Action Localization: Bridging Fully-supervised Proposals to Weakly-supervised Losses

no code implementations15 Dec 2020 Chen Ju, Peisen Zhao, Ya zhang, Yanfeng Wang, Qi Tian

Point-Level temporal action localization (PTAL) aims to localize actions in untrimmed videos with only one timestamp annotation for each action instance.

Weakly Supervised Action Localization

Privileged Knowledge Distillation for Online Action Detection

no code implementations18 Nov 2020 Peisen Zhao, Lingxi Xie, Ya zhang, Yanfeng Wang, Qi Tian

Knowledge distillation is employed to transfer the privileged information from the offline teacher to the online student.

Knowledge Distillation Online Action Detection

SAR: Scale-Aware Restoration Learning for 3D Tumor Segmentation

no code implementations13 Oct 2020 Xiaoman Zhang, Shixiang Feng, YuHang Zhou, Ya zhang, Yanfeng Wang

We demonstrate the effectiveness of our methods on two downstream tasks: i) Brain tumor segmentation, ii) Pancreas tumor segmentation.

Brain Tumor Segmentation Segmentation +3

Defending Adversarial Attacks by Correcting logits

no code implementations26 Jun 2019 Yifeng Li, Lingxi Xie, Ya zhang, Rui Zhang, Yanfeng Wang, Qi Tian

Generating and eliminating adversarial examples has been an intriguing topic in the field of deep learning.

