Guiding Text-to-Image Diffusion Model Towards Grounded Generation

no code implementations12 Jan 2023 Ziyi Li, Qinye Zhou, Xiaoyun Zhang, Ya zhang, Yanfeng Wang, Weidi Xie

The goal of this paper is to augment a pre-trained text-to-image diffusion model with the ability of open-vocabulary objects grounding, i. e., simultaneously generating images and segmentation masks for the corresponding visual entities described in the text prompt.

Integrating features from lymph node stations for metastatic lymph node detection

no code implementations9 Jan 2023 Chaoyi Wu, Feng Chang, Xiao Su, Zhihan Wu, Yanfeng Wang, Ling Zhu, Ya zhang

The branch targets to solve a closely related task on the LN station level, i. e., classifying whether an LN station contains metastatic LN or not, so as to learn representations for LN stations.

MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training

no code implementations5 Jan 2023 Chaoyi Wu, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie

In this paper, we consider the problem of enhancing self-supervised visual-language pre-training (VLP) with medical-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice.

FedSkip: Combatting Statistical Heterogeneity with Federated Skip Aggregation

1 code implementation14 Dec 2022 Ziqing Fan, Yanfeng Wang, Jiangchao Yao, Lingjuan Lyu, Ya zhang, Qi Tian

However, in addition to previous explorations for improvement in federated averaging, our analysis shows that another critical bottleneck is the poorer optima of client models in more heterogeneous conditions.

Robust Collaborative 3D Object Detection in Presence of Pose Errors

1 code implementation14 Nov 2022 Yifan Lu, Quanhao Li, Baoan Liu, Mehrdad Dianati, Chen Feng, Siheng Chen, Yanfeng Wang

Collaborative 3D object detection exploits information exchange among multiple agents to enhance accuracy of object detection in presence of sensor impairments such as occlusion.

Unrolled Graph Learning for Multi-Agent Collaboration

no code implementations31 Oct 2022 Enpei Zhang, Shuo Tang, Xiaowen Dong, Siheng Chen, Yanfeng Wang

To fill this gap, we propose a distributed multi-agent learning model inspired by human collaboration, in which the agents can autonomously detect suitable collaborators and refer to collaborators' model for better performance.

Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models

no code implementations27 Oct 2022 Chaofan Ma, Yuhuan Yang, Yanfeng Wang, Ya zhang, Weidi Xie

When trained at a sufficient scale, self-supervised learning has exhibited a notable ability to solve a wide range of visual or language understanding tasks.

Number-Adaptive Prototype Learning for 3D Point Cloud Semantic Segmentation

no code implementations18 Oct 2022 Yangheng Zhao, Jun Wang, Xiaolong Li, Yue Hu, Ce Zhang, Yanfeng Wang, Siheng Chen

Instead of learning a single prototype for each class, in this paper, we propose to use an adaptive number of prototypes to dynamically describe the different point patterns within a semantic class.

A Simple Plugin for Transforming Images to Arbitrary Scales

no code implementations7 Oct 2022 Qinye Zhou, Ziyi Li, Weidi Xie, Xiaoyun Zhang, Ya zhang, Yanfeng Wang

Existing models on super-resolution often specialized for one scale, fundamentally limiting their use in practical scenarios.


Low-Light Video Enhancement with Synthetic Event Guidance

no code implementations23 Aug 2022 Lin Liu, Junfeng An, Jianzhuang Liu, Shanxin Yuan, Xiangyu Chen, Wengang Zhou, Houqiang Li, Yanfeng Wang, Qi Tian

Low-light video enhancement (LLVE) is an important yet challenging task with many applications such as photographing and autonomous driving.

Collaborative Uncertainty Benefits Multi-Agent Multi-Modal Trajectory Forecasting

no code implementations11 Jul 2022 Bohan Tang, Yiqi Zhong, Chenxin Xu, Wei-Tao Wu, Ulrich Neumann, Yanfeng Wang, Ya zhang, Siheng Chen

Further, we apply the proposed framework to current SOTA multi-agent multi-modal forecasting systems as a plugin module, which enables the SOTA systems to 1) estimate the uncertainty in the multi-agent multi-modal trajectory forecasting task; 2) rank the multiple predictions and select the optimal one based on the estimated uncertainty.

Multi-level Fusion of Wav2vec 2.0 and BERT for Multimodal Emotion Recognition

no code implementations11 Jul 2022 Zihan Zhao, Yanfeng Wang, Yu Wang

The research and applications of multimodal emotion recognition have become increasingly popular recently.

Nextformer: A ConvNeXt Augmented Conformer For End-To-End Speech Recognition

1 code implementation29 Jun 2022 Yongjun Jiang, Jian Yu, Wenwen Yang, Bihong Zhang, Yanfeng Wang

To the best of our knowledge, the proposed Nextformer model achieves SOTA results on AISHELL-1(CER 4. 06%) and WenetSpeech(CER 7. 56%/11. 29%).

 Ranked #1 on Speech Recognition on AISHELL-1 (CER metric)

Contrastive Learning with Boosted Memorization

1 code implementation25 May 2022 Zhihan Zhou, Jiangchao Yao, Yanfeng Wang, Bo Han, Ya zhang

Different from previous works, we explore this direction from an alternative perspective, i. e., the data perspective, and propose a novel Boosted Contrastive Learning (BCL) method.

Self-Supervised Masking for Unsupervised Anomaly Detection and Localization

no code implementations13 May 2022 Chaoqin Huang, Qinwei Xu, Yanfeng Wang, Yu Wang, Ya zhang

To extend the reconstruction-based anomaly detection architecture to the localized anomalies, we propose a self-supervised learning approach through random masking and then restoring, named Self-Supervised Masking (SSM) for unsupervised anomaly detection and localization.

Multiscale Spatio-Temporal Graph Neural Networks for 3D Skeleton-Based Motion Prediction

no code implementations25 Aug 2021 Maosen Li, Siheng Chen, Yangheng Zhao, Ya zhang, Yanfeng Wang, Qi Tian

The core of MST-GNN is a multiscale spatio-temporal graph that explicitly models the relations in motions at various spatial and temporal scales.

Cooperative Learning for Noisy Supervision

no code implementations11 Aug 2021 Hao Wu, Jiangchao Yao, Ya zhang, Yanfeng Wang

Learning with noisy labels has gained the enormous interest in the robust deep learning area.

MS-KD: Multi-Organ Segmentation with Multiple Binary-Labeled Datasets

no code implementations5 Aug 2021 Shixiang Feng, YuHang Zhou, Xiaoman Zhang, Ya zhang, Yanfeng Wang

A novel Multi-teacher Single-student Knowledge Distillation (MS-KD) framework is proposed, where the teacher models are pre-trained single-organ segmentation networks, and the student model is a multi-organ segmentation network.

A Fourier-based Framework for Domain Generalization

1 code implementation CVPR 2021 Qinwei Xu, Ruipeng Zhang, Ya zhang, Yanfeng Wang, Qi Tian

Modern deep neural networks suffer from performance degradation when evaluated on testing data under different distributions from training data.

H2O: A Benchmark for Visual Human-human Object Handover Analysis

no code implementations ICCV 2021 Ruolin Ye, Wenqiang Xu, Zhendong Xue, Tutian Tang, Yanfeng Wang, Cewu Lu

Besides, we also report the hand and object pose errors with existing baselines and show that the dataset can serve as the video demonstrations for robot imitation learning on the handover task.

Collaborative Label Correction via Entropy Thresholding

no code implementations31 Mar 2021 Hao Wu, Jiangchao Yao, Jiajie Wang, Yinru Chen, Ya zhang, Yanfeng Wang

Deep neural networks (DNNs) have the capacity to fit extremely noisy labels nonetheless they tend to learn data with clean labels first and then memorize those with noisy labels.

Divide and Conquer for Single-Frame Temporal Action Localization

no code implementations ICCV 2021 Chen Ju, Peisen Zhao, Siheng Chen, Ya zhang, Yanfeng Wang, Qi Tian

Single-frame temporal action localization (STAL) aims to localize actions in untrimmed videos with only one timestamp annotation for each action instance.

FGraDA: A Dataset and Benchmark for Fine-Grained Domain Adaptation in Machine Translation

1 code implementation LREC 2022 Wenhao Zhu, ShuJian Huang, Tong Pu, Pingxuan Huang, Xu Zhang, Jian Yu, Wei Chen, Yanfeng Wang, Jiajun Chen

Previous research for adapting a general neural machine translation (NMT) model into a specific domain usually neglects the diversity in translation within the same domain, which is a core problem for domain adaptation in real-world scenarios.

Point-Level Temporal Action Localization: Bridging Fully-supervised Proposals to Weakly-supervised Losses

no code implementations15 Dec 2020 Chen Ju, Peisen Zhao, Ya zhang, Yanfeng Wang, Qi Tian

Point-Level temporal action localization (PTAL) aims to localize actions in untrimmed videos with only one timestamp annotation for each action instance.

Privileged Knowledge Distillation for Online Action Detection

no code implementations18 Nov 2020 Peisen Zhao, Lingxi Xie, Ya zhang, Yanfeng Wang, Qi Tian

Knowledge distillation is employed to transfer the privileged information from the offline teacher to the online student.

SAR: Scale-Aware Restoration Learning for 3D Tumor Segmentation

no code implementations13 Oct 2020 Xiaoman Zhang, Shixiang Feng, YuHang Zhou, Ya zhang, Yanfeng Wang

We demonstrate the effectiveness of our methods on two downstream tasks: i) Brain tumor segmentation, ii) Pancreas tumor segmentation.

Defending Adversarial Attacks by Correcting logits

no code implementations26 Jun 2019 Yifeng Li, Lingxi Xie, Ya zhang, Rui Zhang, Yanfeng Wang, Qi Tian

Generating and eliminating adversarial examples has been an intriguing topic in the field of deep learning.

