no code implementations • IWSLT (EMNLP) 2018 • Yuguang Wang, Liangliang Shi, Linyu Wei, Weifeng Zhu, Jinkun Chen, Zhichao Wang, Shixue Wen, Wei Chen, Yanfeng Wang, Jia Jia
Our final average result on speech translation is 31. 02 BLEU.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+6
no code implementations • 5 Apr 2025 • Yuhao Wang, Heyang Liu, Ziyang Cheng, Ronghua Wu, Qunshan Gu, Yanfeng Wang, Yu Wang
Speech large language models (LLMs) have emerged as a prominent research focus in speech processing.
1 code implementation • 2 Apr 2025 • Chunhui Zhang, Li Liu, Jialin Gao, Xin Sun, Hao Wen, Xi Zhou, Shiming Ge, Yanfeng Wang
In this work, we propose COST, a contrastive one-stage transformer fusion framework for VL tracking, aiming to learn semantically consistent and unified VL representations.
1 code implementation • 30 Mar 2025 • Zhengren Wang, Jiayang Yu, Dongsheng Ma, Zhe Chen, Yu Wang, Zhiyu Li, Feiyu Xiong, Yanfeng Wang, Weinan E, Linpeng Tang, Wentao Zhang
Domain-specific intelligence demands specialized knowledge and sophisticated reasoning for problem-solving, posing significant challenges for large language models (LLMs) that struggle with knowledge hallucination and inadequate reasoning capabilities under constrained parameter budgets.
no code implementations • 28 Mar 2025 • Zhihan Zhou, Feng Hong, Jiaan Luo, Jiangchao Yao, Dongsheng Li, Bo Han, Ya zhang, Yanfeng Wang
We propose LIT, an advancement of visual instruction tuning (VIT).
no code implementations • 24 Mar 2025 • Qiang Hu, Zihan Zheng, Houqiang Zhong, Sihua Fu, Li Song, XiaoyunZhang, Guangtao Zhai, Yanfeng Wang
Existing methods typically handle dynamic 3DGS representation and compression separately, neglecting motion information and the rate-distortion (RD) trade-off during training, leading to performance degradation and increased model redundancy.
no code implementations • 18 Mar 2025 • Qingyao Xu, Siheng Chen, Guang Chen, Yanfeng Wang, Ya zhang
Traffic scene understanding is essential for intelligent transportation systems and autonomous driving, ensuring safe and efficient vehicle operation.
1 code implementation • 7 Mar 2025 • Wenhao Wang, Zijie Yu, Rui Ye, Jianqing Zhang, Siheng Chen, Yanfeng Wang
FedMABench features 6 datasets with 30+ subsets, 8 federated algorithms, 10+ base models, and over 800 apps across 5 categories, providing a comprehensive framework for evaluating mobile agents across diverse environments.
no code implementations • 6 Mar 2025 • Pengcheng Qiu, Chaoyi Wu, Shuyu Liu, Weike Zhao, Zhuoxia Chen, Hongfei Gu, Chuanjin Peng, Ya zhang, Yanfeng Wang, Weidi Xie
Notably, open-source models like DeepSeek-R1 are narrowing the gap with proprietary systems, highlighting their potential to drive accessible and equitable advancements in healthcare.
no code implementations • 5 Mar 2025 • YiQiu Guo, Yuchen Yang, Zhe Chen, Pingjie Wang, Yusheng Liao, Ya zhang, Yanfeng Wang, Yu Wang
The reliability of large language models remains a critical challenge, particularly due to their susceptibility to hallucinations and factual inaccuracies during text generation.
no code implementations • 27 Feb 2025 • Jinghao Feng, Qiaoyu Zheng, Chaoyi Wu, Ziheng Zhao, Ya zhang, Yanfeng Wang, Weidi Xie
In this paper, we make three contributions: (i) We present M3Builder, a novel multi-agent system designed to automate machine learning (ML) in medical imaging.
no code implementations • 18 Feb 2025 • Haicheng Wang, Chen Ju, Weixiong Lin, Chaofan Ma, Shuai Xiao, Ya zhang, Yanfeng Wang
Temporal sentence grounding aims to detect event timestamps described by the natural language query from given untrimmed videos.
no code implementations • 5 Feb 2025 • Wenhao Wang, Zijie Yu, William Liu, Rui Ye, Tian Jin, Siheng Chen, Yanfeng Wang
To tackle these challenges, we propose FedMobileAgent, a collaborative framework that trains mobile agents using self-sourced data from diverse users.
1 code implementation • 24 Jan 2025 • JIA YU, Fei Yuan, Rui Min, Jing Yu, Pei Chu, Jiayang Li, Wei Li, Ruijie Zhang, Zhenxiang Li, Zhifei Ren, Dong Zheng, Wenjian Zhang, Yan Teng, Lingyu Meng, Zhenjiang Jin, Jiantao Qiu, Shasha Wang, Zhongying Tu, Dahua Lin, Yu Wang, Yu Qiao, Yanfeng Wang, Conghui He
This paper introduces the open-source dataset WanJuanSiLu, designed to provide high-quality training corpora for low-resource languages, thereby advancing the research and development of multilingual models.
1 code implementation • 21 Jan 2025 • Shuyang Jiang, Yusheng Liao, Zhe Chen, Ya zhang, Yanfeng Wang, Yu Wang
In this work, we present a deployable, small-scale medical language model, \mone, designed for long-chain reasoning in clinical tasks using a self-evolution paradigm.
no code implementations • 14 Jan 2025 • Benyuan Liu, Xu Chen, Yanfeng Wang, Ya zhang, Zhi Cao, Ivor Tsang
Node attribute, a type of crucial information for graph analysis, may be partially or completely missing for certain nodes in real world applications.
no code implementations • 5 Jan 2025 • Zhe Chen, Yusheng Liao, Shuyang Jiang, Pingjie Wang, YiQiu Guo, Yanfeng Wang, Yu Wang
Large language models (LLMs) hold promise for addressing healthcare challenges but often generate hallucinations due to limited integration of medical knowledge.
2 code implementations • 17 Dec 2024 • Xiao Zhou, Luoyi Sun, Dexuan He, Wenbin Guan, Ruifen Wang, LiFeng Wang, Xin Sun, Kun Sun, Ya zhang, Yanfeng Wang, Weidi Xie
To derive more nuanced image and text representations, we propose a novel knowledge-enhanced vision-language pre-training approach that integrates disease knowledge into the alignment within hierarchical semantic groups instead of unstructured image-text pairs.
no code implementations • 16 Dec 2024 • Qiang Hu, Houqiang Zhong, Zihan Zheng, Xiaoyun Zhang, Zhengxue Cheng, Li Song, Guangtao Zhai, Yanfeng Wang
In this paper, we propose VRVVC, a novel end-to-end joint optimization variable-rate framework for volumetric video compression that achieves variable bitrates using a single model while maintaining superior RD performance.
no code implementations • 15 Dec 2024 • Yuhao Wang, Zhiyuan Zhu, Heyang Liu, Yusheng Liao, Hongcheng Liu, Yanfeng Wang, Yu Wang
Multimodal large language models (MLLMs) excel at multimodal perception and understanding, yet their tendency to generate hallucinated or inaccurate responses undermines their trustworthiness.
1 code implementation • 12 Dec 2024 • Qiaoyu Zheng, Chaoyi Wu, Pengcheng Qiu, Lisong Dai, Ya zhang, Yanfeng Wang, Weidi Xie
Advancements in large language models (LLMs) have paved the way for LLM-based agent systems that offer enhanced accuracy and interpretability across various domains.
1 code implementation • 4 Dec 2024 • HaoNing Wu, Ziheng Zhao, Ya zhang, Weidi Xie, Yanfeng Wang
Medical image segmentation has recently demonstrated impressive progress with deep neural networks, yet the heterogeneous modalities and scarcity of mask annotations limit the development of segmentation models on unannotated modalities.
1 code implementation • 2 Dec 2024 • Jiayuan Rao, HaoNing Wu, Hao Jiang, Ya zhang, Yanfeng Wang, Weidi Xie
As a globally celebrated sport, soccer has attracted widespread interest from fans all over the world.
1 code implementation • 2 Dec 2024 • Yikun Liu, Pingan Chen, Jiayin Cai, XiaoLong Jiang, Yao Hu, Jiangchao Yao, Yanfeng Wang, Weidi Xie
With the rapid advancement of multimodal information retrieval, increasingly complex retrieval tasks have emerged.
1 code implementation • 24 Nov 2024 • Chunhui Zhang, Li Liu, Hao Wen, Xi Zhou, Yanfeng Wang
Night unmanned aerial vehicle (UAV) tracking is impeded by the challenges of poor illumination, with previous daylight-optimized methods demonstrating suboptimal performance in low-light conditions, limiting the utility of UAV applications.
1 code implementation • 2 Nov 2024 • Ziqing Fan, Shengchao Hu, YuHang Zhou, Li Shen, Ya zhang, Yanfeng Wang, DaCheng Tao
The purpose of offline multi-task reinforcement learning (MTRL) is to develop a unified policy applicable to diverse tasks without the need for online environmental interaction.
3 code implementations • 23 Oct 2024 • Yusheng Liao, Shuyang Jiang, Yanfeng Wang, Yu Wang
Large Language Models (LLMs) have shown promising potential in the medical domain, assisting with tasks like clinical note generation and patient communication.
1 code implementation • 18 Oct 2024 • Shuo Tang, Xianghe Pang, Zexi Liu, Bohan Tang, Rui Ye, Xiaowen Dong, Yanfeng Wang, Siheng Chen
Post-training is essential for enabling large language models (LLMs) to follow human instructions.
no code implementations • 15 Oct 2024 • Yaxin Du, Rui Ye, Fengting Yuchi, Wanru Zhao, Jingjing Qu, Yanfeng Wang, Siheng Chen
To address this gap, we propose a new framework of federated instruction tuning of LLMs with data quality control (FedDQC), which measures data quality to facilitate the subsequent filtering and hierarchical training processes.
1 code implementation • 8 Oct 2024 • Wenhao Wang, Xiaoyu Liang, Rui Ye, Jingyi Chai, Siheng Chen, Yanfeng Wang
The success of large language models (LLMs) facilitate many parties to fine-tune LLMs on their own private data.
1 code implementation • 4 Oct 2024 • Zetian Ouyang, Yishuai Qiu, LinLin Wang, Gerard de Melo, Ya zhang, Yanfeng Wang, Liang He
With the proliferation of Large Language Models (LLMs) in diverse domains, there is a particular need for unified evaluation standards in clinical medical scenarios, where models need to be examined very thoroughly.
2 code implementations • 29 Sep 2024 • Haolin Li, YuHang Zhou, Ziheng Zhao, Siyuan Du, Jiangchao Yao, Weidi Xie, Ya zhang, Yanfeng Wang
To accomplish the above objective, we propose a novel framework named Low-Rank Knowledge Decomposition (LoRKD), which explicitly separates gradients from different tasks by incorporating low-rank expert modules and efficient knowledge separation convolution.
3D Medical Imaging Segmentation
Medical Image Classification
2 code implementations • 25 Sep 2024 • Chunhui Zhang, Li Liu, Guanjie Huang, Hao Wen, Xi Zhou, Yanfeng Wang
Based on the proposed dataset, this paper first comprehensively evaluates current advanced visual object tracking methods and SAM- and SAM2-based trackers in challenging underwater environments.
no code implementations • 11 Sep 2024 • Rui Ye, Rui Ge, Yuchi Fengting, Jingyi Chai, Yanfeng Wang, Siheng Chen
Federated instruction tuning enables multiple clients to collaboratively fine-tune a shared large language model (LLM) that can follow humans' instructions without directly sharing raw data.
1 code implementation • 30 Aug 2024 • Aofan Jiang, Chaoqin Huang, Qing Cao, Yuchen Xu, Zi Zeng, Kang Chen, Ya zhang, Yanfeng Wang
Current computer-aided ECG diagnostic systems struggle with the underdetection of rare but critical cardiac anomalies due to the imbalanced nature of ECG datasets.
1 code implementation • 22 Aug 2024 • Chaoyi Wu, Pengcheng Qiu, Jinxin Liu, Hongfei Gu, Na Li, Ya zhang, Yanfeng Wang, Weidi Xie
To promote further advancements in the application of LLMs to clinical challenges, we have made the MedS-Ins dataset fully accessible and invite the research community to contribute to its expansion. Additionally, we have launched a dynamic leaderboard for MedS-Bench, which we plan to regularly update the test set to track progress and enhance the adaptation of general LLMs to the medical domain.
1 code implementation • 20 Aug 2024 • HaoNing Wu, Shaocheng Shen, Qiang Hu, Xiaoyun Zhang, Ya zhang, Yanfeng Wang
Diffusion models have emerged as frontrunners in text-to-image generation, but their fixed image resolution during training often leads to challenges in high-resolution image generation, such as semantic deviations and object replication.
1 code implementation • 16 Aug 2024 • Hongcheng Liu, Yusheng Liao, Siqv Ou, Yuhao Wang, Heyang Liu, Yanfeng Wang, Yu Wang
The application of the Multi-modal Large Language Models (MLLMs) in medical clinical scenarios remains underexplored.
no code implementations • 30 Jul 2024 • Yu Wang, Heyang Liu, Yuhao Wang, Chuan Xuan, Yixuan Hou, Sheng Feng, Hongcheng Liu, Yusheng Liao, Yanfeng Wang
Language, as an information medium created by advanced organisms, has always been a concern of neuroscience regarding how it is represented in the brain.
no code implementations • 23 Jul 2024 • Jiayu Lei, Xiaoman Zhang, Chaoyi Wu, Lisong Dai, Ya zhang, Yanyong Zhang, Yanfeng Wang, Weidi Xie, Yuehua Li
To address these challenges, we initiate a series of work on grounded Automatic Report Generation (AutoRG), starting from the brain MRI interpretation system, which supports the delineation of brain structures, the localization of anomalies, and the generation of well-organized findings.
no code implementations • 18 Jul 2024 • Pingjie Wang, Ziqing Fan, Shengchao Hu, Zhe Chen, Yanfeng Wang, Yu Wang
Structured pruning is a promising hardware-friendly compression technique for large language models (LLMs), which is expected to be retraining-free to avoid the enormous retraining cost.
no code implementations • 12 Jul 2024 • Zihan Zheng, Houqiang Zhong, Qiang Hu, Xiaoyun Zhang, Li Song, Ya zhang, Yanfeng Wang
Volumetric video based on Neural Radiance Field (NeRF) holds vast potential for various 3D applications, but its substantial data volume poses significant challenges for compression and transmission.
1 code implementation • 9 Jul 2024 • YuHang Zhou, Siyuan Du, Haolin Li, Jiangchao Yao, Ya zhang, Yanfeng Wang
However, due to the gap between pre-training tasks (or modalities) and downstream tasks (or modalities), the real-world computation and speed constraints, it might not be straightforward to apply medical foundation models in the downstream scenarios.
1 code implementation • 26 Jun 2024 • Jiayuan Rao, HaoNing Wu, Chang Liu, Yanfeng Wang, Weidi Xie
Soccer is a globally popular sport with a vast audience, in this paper, we consider constructing an automatic soccer game commentary model to improve the audiences' viewing experience.
3 code implementations • 25 Jun 2024 • Yusheng Liao, Shuyang Jiang, Zhe Chen, Yanfeng Wang, Yu Wang
Based on this two-stage paradigm, we proposed a Medical LLM through decoupling Clinical Alignment and Knowledge Aggregation (MedCare), which is designed to achieve state-of-the-art (SOTA) performance on over 20 medical tasks, as well as SOTA results on specific medical alignment tasks.
3 code implementations • 24 Jun 2024 • Weike Zhao, Chaoyi Wu, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie
This paper introduces a novel, entity-aware metric, termed as Radiological Report (Text) Evaluation (RaTEScore), to assess the quality of medical reports generated by AI models.
no code implementations • 18 Jun 2024 • Zhenyang Ni, Zixing Lei, Yifan Lu, Dingju Wang, Chen Feng, Yanfeng Wang, Siheng Chen
However, existing collaborative perception systems heavily rely on precise localization systems to establish a consistent spatial coordinate system between agents.
1 code implementation • 17 Jun 2024 • Sheng Feng, Heyang Liu, Yu Wang, Yanfeng Wang
In this paper, we introduce a groundbreaking end-to-end (E2E) framework for decoding invasive brain signals, marking a significant advancement in the field of speech neuroprosthesis.
no code implementations • 15 Jun 2024 • Rui Ye, Jingyi Chai, Xiangrui Liu, Yaodong Yang, Yanfeng Wang, Siheng Chen
Federated learning (FL) enables multiple parties to collaboratively fine-tune an large language model (LLM) without the need of direct data sharing.
1 code implementation • 14 Jun 2024 • YuHang Zhou, Zihua Zhao, Haolin Li, Siyuan Du, Jiangchao Yao, Ya zhang, Yanfeng Wang
Training a unified model to take multiple targets into account is a trend towards artificial general intelligence.
1 code implementation • 13 Jun 2024 • Chaoqin Huang, Haoyan Guan, Aofan Jiang, Ya zhang, Michael Spratling, Xinchao Wang, Yanfeng Wang
At test time, an image and its corresponding support set, consisting of a few normal images from the same category, are supplied, and anomalies are identified by comparing the registered features of the test image to its corresponding support image features.
1 code implementation • 7 Jun 2024 • Feng Hong, Yueming Lyu, Jiangchao Yao, Ya zhang, Ivor W. Tsang, Yanfeng Wang
The remarkable success of modern machine learning models on large datasets often demands extensive training time and resource consumption.
3 code implementations • 7 Jun 2024 • Rui Ye, Rui Ge, Xinyu Zhu, Jingyi Chai, Yaxin Du, Yang Liu, Yanfeng Wang, Siheng Chen
Addressing this, we propose FedLLM-Bench, which involves 8 training methods, 4 training datasets, and 6 evaluation metrics, to offer a comprehensive testbed for the FedLLM community.
1 code implementation • 30 May 2024 • Chunhui Zhang, Li Liu, Guanjie Huang, Hao Wen, Xi Zhou, Yanfeng Wang
Most existing trackers are tailored for open-air environments, leading to performance degradation when applied to UOT due to domain gaps.
1 code implementation • 30 May 2024 • Shuyang Jiang, Yusheng Liao, Ya zhang, Yanfeng Wang, Yu Wang
However, in certain specialized domains, such as healthcare or harmless content generation, it is nearly impossible to obtain a large volume of high-quality data that matches the downstream distribution.
1 code implementation • 29 May 2024 • Ruipeng Zhang, Ziqing Fan, Jiangchao Yao, Ya zhang, Yanfeng Wang
This paper presents a Domain-Inspired Sharpness-Aware Minimization (DISAM) algorithm for optimization under domain shifts.
1 code implementation • NeurIPS 2023 • Ziqing Fan, Ruipeng Zhang, Jiangchao Yao, Bo Han, Ya zhang, Yanfeng Wang
Partially class-disjoint data (PCDD), a common yet under-explored data formation where each client contributes a part of classes (instead of all classes) of samples, severely challenges the performance of federated algorithms.
1 code implementation • 29 May 2024 • Ziqing Fan, Jiangchao Yao, Ruipeng Zhang, Lingjuan Lyu, Ya zhang, Yanfeng Wang
Statistical heterogeneity severely limits the performance of federated learning (FL), motivating several explorations e. g., FedProx, MOON and FedDyn, to alleviate this problem.
1 code implementation • 29 May 2024 • Ziqing Fan, Shengchao Hu, Jiangchao Yao, Gang Niu, Ya zhang, Masashi Sugiyama, Yanfeng Wang
However, the local loss landscapes may not accurately reflect the flatness of global loss landscape in heterogeneous environments; as a result, minimizing local sharpness and calculating perturbations on client data might not align the efficacy of SAM in FL with centralized training.
1 code implementation • 28 May 2024 • Shengchao Hu, Ziqing Fan, Li Shen, Ya zhang, Yanfeng Wang, DaCheng Tao
However, variations in task content and complexity pose significant challenges in policy formulation, necessitating judicious parameter sharing and management of conflicting gradients for optimal policy performance.
1 code implementation • CVPR 2024 • Zihua Zhao, Mengxi Chen, Tianjie Dai, Jiangchao Yao, Bo Han, Ya zhang, Yanfeng Wang
Prior approaches to leverage such data mainly consider the application of uni-modal noisy label learning without amending the impact on both cross-modal and intra-modal geometrical structures in multimodal learning.
2 code implementations • 27 May 2024 • Shengchao Hu, Ziqing Fan, Chaoqin Huang, Li Shen, Ya zhang, Yanfeng Wang, DaCheng Tao
Recent advancements in offline reinforcement learning (RL) have underscored the capabilities of Conditional Sequence Modeling (CSM), a paradigm that learns the action distribution based on history trajectory and target returns for each state.
1 code implementation • 24 May 2024 • Junkai Xia, Chenxin Xu, Qingyao Xu, Chen Xie, Yanfeng Wang, Siheng Chen
To produce interactive traffic trajectories, we propose a code-to-trajectory decoder with interaction-aware feature aggregation that synergizes vehicle interactions with the environmental map and the vehicle moves.
no code implementations • 23 May 2024 • Zihan Zheng, Houqiang Zhong, Qiang Hu, Xiaoyun Zhang, Li Song, Ya zhang, Yanfeng Wang
Neural Radiance Field (NeRF) excels in photo-realistically static scenes, inspiring numerous efforts to facilitate volumetric videos.
5 code implementations • 23 May 2024 • Chunhui Zhang, Li Liu, Hao Wen, Xi Zhou, Yanfeng Wang
To leverage more modalities, some recent efforts have been made to learn a unified visual object tracking model for any modality.
no code implementations • 22 May 2024 • Kendall Schmidt, Benjamin Bearce, Ken Chang, Laura Coombs, Keyvan Farahani, Marawan Elbatele, Kaouther Mouhebe, Robert Marti, Ruipeng Zhang, Yao Zhang, Yanfeng Wang, Yaojun Hu, Haochao Ying, Yuyang Xu, Conrad Testagrose, Mutlu Demirer, Vikash Gupta, Ünal Akünal, Markus Bujotzek, Klaus H. Maier-Hein, Yi Qin, Xiaomeng Li, Jayashree Kalpathy-Cramer, Holger R. Roth
The correct interpretation of breast density is important in the assessment of breast cancer risk.
1 code implementation • 5 May 2024 • Zixing Lei, Zhenyang Ni, Ruize Han, Shuo Tang, Dingju Wang, Chen Feng, Siheng Chen, Yanfeng Wang
To achieve this spatial-temporal alignment, traditional methods depend on external devices to provide localization and clock signals.
1 code implementation • CVPR 2024 • YuHang Zhou, Haolin Li, Siyuan Du, Jiangchao Yao, Ya zhang, Yanfeng Wang
The popularity of large-scale pre-training has promoted the development of medical foundation models.
1 code implementation • 25 Apr 2024 • Xiaoman Zhang, Chaoyi Wu, Ziheng Zhao, Jiayu Lei, Ya zhang, Yanfeng Wang, Weidi Xie
We believe that RadGenome-Chest CT can significantly advance the development of multimodal medical foundation models, by training to generate texts based on given segmentation regions, which is unattainable with previous relevant datasets.
no code implementations • 23 Apr 2024 • Haozhe Cheng, Cheng Ju, Haicheng Wang, Jinxiang Liu, Mengting Chen, Qiang Hu, Xiaoyun Zhang, Yanfeng Wang
The denoised text classes help OVAR models classify visual samples more accurately; in return, classified visual samples help better denoising.
1 code implementation • 18 Apr 2024 • Yuzhu Cai, Sheng Yin, Yuxi Wei, Chenxin Xu, Weibo Mao, Felix Juefei-Xu, Siheng Chen, Yanfeng Wang
The burgeoning landscape of text-to-image models, exemplified by innovations such as Midjourney and DALLE 3, has revolutionized content creation across diverse sectors.
1 code implementation • 15 Apr 2024 • Xiao Zhou, Xiaoman Zhang, Chaoyi Wu, Ya zhang, Weidi Xie, Yanfeng Wang
In this paper, we consider the problem of visual representation learning for computational pathology, by exploiting large-scale image-text pairs gathered from public resources, along with the domain-specific knowledge in pathology.
2 code implementations • 13 Apr 2024 • Yusheng Liao, Shuyang Jiang, Yu Wang, Yanfeng Wang
Large language models like ChatGPT have shown substantial progress in natural language understanding and generation, proving valuable across various disciplines, including the medical field.
no code implementations • 7 Apr 2024 • Aofan Jiang, Chaoqin Huang, Qing Cao, Yuchen Xu, Zi Zeng, Kang Chen, Ya zhang, Yanfeng Wang
We introduce a novel self-supervised learning framework for ECG AD, utilizing a vast dataset of normal ECGs to autonomously detect and localize cardiac anomalies.
1 code implementation • 26 Mar 2024 • Yuhuan Yang, Chaofan Ma, Jiangchao Yao, Zhun Zhong, Ya zhang, Yanfeng Wang
In this paper, we propose ReMamber, a novel RIS architecture that integrates the power of Mamba with a multi-modal Mamba Twister block.
no code implementations • 21 Mar 2024 • Zhe Chen, Heyang Liu, Wenyi Yu, Guangzhi Sun, Hongcheng Liu, Ji Wu, Chao Zhang, Yu Wang, Yanfeng Wang
Although multiple academic video datasets have been constructed and released, few of them support both multimodal content recognition and understanding tasks, which is partially due to the lack of high-quality human annotations.
1 code implementation • CVPR 2024 • Chaoqin Huang, Aofan Jiang, Jinghao Feng, Ya zhang, Xinchao Wang, Yanfeng Wang
Recent advancements in large-scale visual-language pre-trained models have led to significant progress in zero-/few-shot anomaly detection within natural image domains.
no code implementations • CVPR 2024 • Jinxiang Liu, Yikun Liu, Fei Zhang, Chen Ju, Ya zhang, Yanfeng Wang
NFs, temporally adjacent to the labeled frame, often contain rich motion information that assists in the accurate localization of sounding objects.
3 code implementations • 13 Mar 2024 • Yusheng Liao, Yutong Meng, Yuhao Wang, Hongcheng Liu, Yanfeng Wang, Yu Wang
Large Language Models (LLMs) have demonstrated remarkable proficiency in human interactions, yet their application within the medical field remains insufficiently explored.
no code implementations • 11 Mar 2024 • Shuo Tang, Rui Ye, Chenxin Xu, Xiaowen Dong, Siheng Chen, Yanfeng Wang
In this paper, we propose DeLAMA, a decentralized multi-agent lifelong collaborative learning algorithm with dynamic collaboration graphs.
no code implementations • 7 Mar 2024 • Wanru Zhao, Yaxin Du, Nicholas Donald Lane, Siheng Chen, Yanfeng Wang
In the current landscape of foundation model training, there is a significant reliance on public domain data, which is nearing exhaustion according to recent research.
no code implementations • 1 Mar 2024 • Heyang Liu, Yu Wang, Yanfeng Wang
End-to-end (E2E) approach is gradually replacing hybrid models for automatic speech recognition (ASR) tasks.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 28 Feb 2024 • Yusheng Liao, Yanfeng Wang, Yu Wang
Autoregressive (AR) and Non-autoregressive (NAR) models are two types of generative models for Neural Machine Translation (NMT).
1 code implementation • 21 Feb 2024 • Pengcheng Qiu, Chaoyi Wu, Xiaoman Zhang, Weixiong Lin, Haicheng Wang, Ya zhang, Yanfeng Wang, Weidi Xie
The development of open-source, multilingual medical language models can benefit a wide, linguistically diverse audience from different regions.
no code implementations • 19 Feb 2024 • Hongcheng Liu, Pingjie Wang, Yu Wang, Yanfeng Wang
Video-grounded dialogue generation (VDG) requires the system to generate a fluent and accurate answer based on multimodal knowledge.
no code implementations • 18 Feb 2024 • YiQiu Guo, Yuchen Yang, Ya zhang, Yu Wang, Yanfeng Wang
Structured data offers a sophisticated mechanism for the organization of information.
3 code implementations • 10 Feb 2024 • Rui Ye, Wenhao Wang, Jingyi Chai, Dihan Li, Zexi Li, Yinda Xu, Yaxin Du, Yanfeng Wang, Siheng Chen
Trained on massive publicly available data, large language models (LLMs) have demonstrated tremendous success across various fields.
1 code implementation • CVPR 2024 • Yuxi Wei, Zi Wang, Yifan Lu, Chenxin Xu, Changxing Liu, Hao Zhao, Siheng Chen, Yanfeng Wang
Furthermore, to unleash the potential of extensive high-quality digital assets, ChatSim employs a novel multi-camera lighting estimation method to achieve scene-consistent assets' rendering.
no code implementations • 8 Feb 2024 • Xianghe Pang, Shuo Tang, Rui Ye, Yuxin Xiong, Bolun Zhang, Yanfeng Wang, Siheng Chen
Drawing from the sociological insight that acknowledging all parties' concerns is a key factor in shaping human values, this paper proposes a novel direction to align LLMs by themselves: social scene simulation.
1 code implementation • 25 Jan 2024 • Yifan Lu, Yue Hu, Yiqi Zhong, Dequan Wang, Yanfeng Wang, Siheng Chen
In this paper, we introduce a new open heterogeneous problem: how to accommodate continually emerging new heterogeneous agent types into collaborative perception, while ensuring high perception performance and low integration cost?
1 code implementation • 23 Jan 2024 • Shaoheng Fang, Rui Ye, Wenhao Wang, Zuhong Liu, Yuxiao Wang, Yafei Wang, Siheng Chen, Yanfeng Wang
In this paper, we introduce FedRSU, an innovative federated learning framework for self-supervised scene flow estimation.
1 code implementation • 15 Jan 2024 • Yuhao Wang, Yusheng Liao, Heyang Liu, Hongcheng Liu, Yu Wang, Yanfeng Wang
We believe that these hallucinations are partially due to the models' struggle with understanding what they can and cannot perceive from images, a capability we refer to as self-awareness in perception.
1 code implementation • CVPR 2024 • Chang Liu, HaoNing Wu, Yujie Zhong, Xiaoyun Zhang, Yanfeng Wang, Weidi Xie
Generative models have recently exhibited exceptional capabilities in text-to-image generation but still struggle to generate image sequences coherently.
1 code implementation • 28 Dec 2023 • Ziheng Zhao, Yao Zhang, Chaoyi Wu, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie
Our main contributions are three folds: (i) for dataset construction, we construct the first multi-modal knowledge tree on human anatomy, including 6502 anatomical terminologies; Then, we build up the largest and most comprehensive segmentation dataset for training, by collecting over 22K 3D medical image scans from72 segmentation datasets, across 497 classes, with careful standardization on both image scans and label space; (ii) for architecture design, we propose to inject medical knowledge into a text encoder via contrastive learning, and then formulate a universal segmentation model, that can be prompted by feeding in medical terminologies in text form; (iii) As a result, we have trained SAT-Nano (110M parameters) and SAT-Pro (447M parameters), demonstrating superior or comparable performance to 72 specialist models, i. e., nnU-Nets, U-Mamba or SwinUNETR, trained on each dataset/subsets.
1 code implementation • 26 Dec 2023 • Qiaoyu Zheng, Weike Zhao, Chaoyi Wu, Xiaoman Zhang, Lisong Dai, Hengyu Guan, Yuehua Li, Ya zhang, Yanfeng Wang, Weidi Xie
Developing a generalist radiology diagnosis system can greatly enhance clinical diagnostics.
no code implementations • 21 Dec 2023 • Zeqian Li, Qirui Chen, Tengda Han, Ya zhang, Yanfeng Wang, Weidi Xie
In this paper, we aim to establish an automatic, scalable pipeline for denoising the large-scale instructional dataset and construct a high-quality video-text dataset with multiple descriptive steps supervision, named HowToStep.
no code implementations • 20 Dec 2023 • Yan Cai, LinLin Wang, Ye Wang, Gerard de Melo, Ya zhang, Yanfeng Wang, Liang He
The emergence of various medical large language models (LLMs) in the medical domain has highlighted the need for unified evaluation standards, as manual evaluation of LLMs proves to be time-consuming and labor-intensive.
1 code implementation • 18 Dec 2023 • Zexi Liu, Bohan Tang, Ziyuan Ye, Xiaowen Dong, Siheng Chen, Yanfeng Wang
Hypergraphs play a pivotal role in the modelling of data featuring higher-order relations involving more than two entities.
1 code implementation • 18 Dec 2023 • Tianjie Dai, Ruipeng Zhang, Feng Hong, Jiangchao Yao, Ya zhang, Yanfeng Wang
Vision-Language Pre-training (VLP) that utilizes the multi-modal information to promote the training efficiency and effectiveness, has achieved great success in vision recognition of natural domains and shown promise in medical imaging diagnosis for the Chest X-Rays (CXRs).
no code implementations • 10 Dec 2023 • Rui Ye, Yaxin Du, Zhenyang Ni, Siheng Chen, Yanfeng Wang
FedCOG consists of two key components at the client side: complementary data generation, which generates data extracted from the shared global model to complement the original dataset, and knowledge-distillation-based model training, which distills knowledge from global model to local model based on the generated data to mitigate over-fitting the original heterogeneous dataset.
no code implementations • 10 Dec 2023 • Rui Ye, Xinyu Zhu, Jingyi Chai, Siheng Chen, Yanfeng Wang
In this paper, we propose a novel FL framework termed FedGC, designed to mitigate data heterogeneity issues by diversifying private data with generative content.
1 code implementation • NeurIPS 2023 • Zhihan Zhou, Jiangchao Yao, Feng Hong, Ya zhang, Bo Han, Yanfeng Wang
Self-supervised learning (SSL) as an effective paradigm of representation learning has achieved tremendous success on various curated datasets in diverse scenarios.
1 code implementation • 15 Oct 2023 • Chaoyi Wu, Jiayu Lei, Qiaoyu Zheng, Weike Zhao, Weixiong Lin, Xiaoman Zhang, Xiao Zhou, Ziheng Zhao, Ya zhang, Yanfeng Wang, Weidi Xie
Driven by the large foundation models, the development of artificial intelligence has witnessed tremendous progress lately, leading to a surge of general interest from the public.
no code implementations • 7 Oct 2023 • Yuchen Yang, Houqiang Li, Yanfeng Wang, Yu Wang
In this study, we introduce an uncertainty-aware in-context learning framework to empower the model to enhance or reject its output in response to uncertainty.
no code implementations • 26 Sep 2023 • Hongcheng Liu, Zhe Chen, Hui Li, Pingjie Wang, Yanfeng Wang, Yu Wang
Generating dialogue grounded in videos requires a high level of understanding and reasoning about the visual scenes in the videos.
1 code implementation • 13 Sep 2023 • Jiayu Lei, Lisong Dai, Haoyun Jiang, Chaoyi Wu, Xiaoman Zhang, Yao Zhang, Jiangchao Yao, Weidi Xie, Yanyong Zhang, Yuehua Li, Ya zhang, Yanfeng Wang
Magnetic resonance imaging~(MRI) have played a crucial role in brain disease diagnosis, with which a range of computer-aided artificial intelligence methods have been proposed.
no code implementations • 5 Sep 2023 • Yusheng Liao, Yutong Meng, Hongcheng Liu, Yanfeng Wang, Yu Wang
A medical consultation training set is further constructed to improve the consultation ability of LLMs.
1 code implementation • NeurIPS 2023 • Chaofan Ma, Yuhuan Yang, Chen Ju, Fei Zhang, Ya zhang, Yanfeng Wang
The results show the superior performance of attribute decomposition-aggregation.
1 code implementation • 20 Aug 2023 • Zihan Zhao, Yiyang Jiang, Heyang Liu, Yanfeng Wang, Yu Wang
While Large Language Models (LLMs) have demonstrated commendable performance across a myriad of domains and tasks, existing LLMs still exhibit a palpable deficit in handling multimodal functionalities, especially for the Spoken Question Answering (SQA) task which necessitates precise alignment and deep interaction between speech and text features.
1 code implementation • ICCV 2023 • Chenxin Xu, Robby T. Tan, Yuhong Tan, Siheng Chen, Xinchao Wang, Yanfeng Wang
To work with auxiliary tasks, we propose a novel auxiliary-adapted transformer, which can handle incomplete, corrupted motion data and achieve coordinate recovery via capturing spatial-temporal dependencies.
Ranked #4 on
Human Pose Forecasting
on Human3.6M
no code implementations • 17 Aug 2023 • Feng Hong, Tianjie Dai, Jiangchao Yao, Ya zhang, Yanfeng Wang
Clinical classification of chest radiography is particularly challenging for standard machine learning algorithms due to its inherent long-tailed and multi-label nature.
no code implementations • 9 Aug 2023 • Chaoqin Huang, Aofan Jiang, Ya zhang, Yanfeng Wang
Anomaly detection has gained considerable attention due to its broad range of applications, particularly in industrial defect detection.
1 code implementation • ICCV 2023 • Qingyao Xu, Weibo Mao, Jingze Gong, Chenxin Xu, Siheng Chen, Weidi Xie, Ya zhang, Yanfeng Wang
Multi-person motion prediction is a challenging problem due to the dependency of motion on both individual past movements and interactions with other people.
1 code implementation • 4 Aug 2023 • Chaoyi Wu, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie
In this study, we aim to initiate the development of Radiology Foundation Model, termed as RadFM.
1 code implementation • 3 Aug 2023 • Aofan Jiang, Chaoqin Huang, Qing Cao, Shuang Wu, Zi Zeng, Kang Chen, Ya zhang, Yanfeng Wang
To address this challenge, this paper introduces a novel multi-scale cross-restoration framework for ECG anomaly detection and localization that considers both local and global ECG characteristics.
1 code implementation • 3 Aug 2023 • YuHang Zhou, Jiangchao Yao, Feng Hong, Ya zhang, Yanfeng Wang
By dynamically manipulating the gradient during training based on these factors, BDR can effectively alleviate knowledge destruction and improve knowledge reconstruction.
no code implementations • 25 Jul 2023 • Jinxiang Liu, Chen Ju, Chaofan Ma, Yanfeng Wang, Yu Wang, Ya zhang
The goal of the audio-visual segmentation (AVS) task is to segment the sounding objects in the video frames using audio cues.
no code implementations • 7 Jul 2023 • Chunhui Zhang, Xin Sun, Li Liu, Yiqian Yang, Qiong Liu, Xi Zhou, Yanfeng Wang
This approach achieves feature integration in a unified backbone, removing the need for carefully-designed fusion modules and resulting in a more effective and efficient VL tracking framework.
no code implementations • 5 Jul 2023 • Yuhuan Yang, Chaofan Ma, Chen Ju, Fei Zhang, Jiangchao Yao, Ya zhang, Yanfeng Wang
To be specific, unlike the straightforward combination of bi-modal clues, we decompose the high-level language information as multi-aspect prototypes and aggregate the low-level visual information as more semantic prototypes, on basis of which, a fine-grained complementary fusion makes the multi-modal prototypes more powerful and accurate to promote the prediction.
1 code implementation • 24 Jun 2023 • HaoNing Wu, Xiaoyun Zhang, Weidi Xie, Ya zhang, Yanfeng Wang
Video frame interpolation (VFI) is a challenging task that aims to generate intermediate frames between two consecutive frames in a video.
1 code implementation • 12 Jun 2023 • Yikun Liu, Jiangchao Yao, Ya zhang, Yanfeng Wang, Weidi Xie
In this paper, we consider the problem of composed image retrieval (CIR), it aims to train a model that can fuse multi-modal information, e. g., text and images, to accurately retrieve images that match the query, extending the user's expression ability.
Ranked #6 on
Zero-Shot Composed Image Retrieval (ZS-CIR)
on CIRR
no code implementations • 9 Jun 2023 • Lin Liu, Mingming Zhao, Shanxin Yuan, Wenlong Lyu, Wengang Zhou, Houqiang Li, Yanfeng Wang, Qi Tian
Specifically, Cube Mask Sampling Module (CMSM) is proposed to apply both spatial and channel mask sampling modeling to image compression in the pre-training stage.
1 code implementation • 1 Jun 2023 • Chang Liu, HaoNing Wu, Yujie Zhong, Xiaoyun Zhang, Yanfeng Wang, Weidi Xie
Generative models have recently exhibited exceptional capabilities in text-to-image generation, but still struggle to generate image sequences coherently.
1 code implementation • 30 May 2023 • Rui Ye, Mingkai Xu, Jianyu Wang, Chenxin Xu, Siheng Chen, Yanfeng Wang
However, based on our empirical observations and theoretical analysis, we find that the dataset size is not optimal and the discrepancy between local and global category distributions could be a beneficial and complementary indicator for determining aggregation weights.
2 code implementations • 17 May 2023 • Xiaoman Zhang, Chaoyi Wu, Ziheng Zhao, Weixiong Lin, Ya zhang, Yanfeng Wang, Weidi Xie
Medical Visual Question Answering (MedVQA) presents a significant opportunity to enhance diagnostic accuracy and healthcare delivery by leveraging artificial intelligence to interpret and answer questions based on medical images.
Ranked #1 on
Medical Visual Question Answering
on PMC-VQA
1 code implementation • 27 Apr 2023 • Chaoyi Wu, Weixiong Lin, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie
Our contributions are threefold: (i) we systematically investigate the process of adapting a general-purpose foundation language model towards medical domain, this involves data-centric knowledge injection through the integration of 4. 8M biomedical academic papers and 30K medical textbooks, as well as comprehensive fine-tuning for alignment with domain-specific instructions; (ii) we contribute a large-scale, comprehensive dataset for instruction tuning.
1 code implementation • CVPR 2023 • Yue Hu, Yifan Lu, Runsheng Xu, Weidi Xie, Siheng Chen, Yanfeng Wang
Camera-only 3D detection provides an economical solution with a simple configuration for localizing objects in 3D space compared to LiDAR-based detection systems.
no code implementations • 21 Mar 2023 • Chen Ju, Zeqian Li, Peisen Zhao, Ya zhang, Xiaopeng Zhang, Qi Tian, Yanfeng Wang, Weidi Xie
In this paper, we consider the problem of temporal action localization under low-shot (zero-shot & few-shot) scenario, with the goal of detecting and classifying the action instances from arbitrary categories within some untrimmed videos, even not seen at training time.
2 code implementations • CVPR 2023 • Chenxin Xu, Robby T. Tan, Yuhong Tan, Siheng Chen, Yu Guang Wang, Xinchao Wang, Yanfeng Wang
In motion prediction tasks, maintaining motion equivariance under Euclidean geometric transformations and invariance of agent interaction is a critical and fundamental principle.
Ranked #1 on
Human Pose Forecasting
on HARPER
1 code implementation • CVPR 2023 • Weibo Mao, Chenxin Xu, Qi Zhu, Siheng Chen, Yanfeng Wang
The core of the proposed LED is to leverage a trainable leapfrog initializer to directly learn an expressive multi-modal distribution of future trajectories, which skips a large number of denoising steps, significantly accelerating inference speed.
no code implementations • 19 Mar 2023 • Chaofan Ma, Qisen Xu, Xiangfeng Wang, Bo Jin, Xiaoyun Zhang, Yanfeng Wang, Ya zhang
Interactive segmentation has recently been explored to effectively and efficiently harvest high-quality segmentation masks by iteratively incorporating user hints.
no code implementations • 17 Mar 2023 • Chaofan Ma, Yuhuan Yang, Chen Ju, Fei Zhang, Jinxiang Liu, Yu Wang, Ya zhang, Yanfeng Wang
However, the challenges exist as there is one structural difference between generative and discriminative models, which limits the direct use.
no code implementations • CVPR 2023 • Shaoheng Fang, Zi Wang, Yiqi Zhong, Junhao Ge, Siheng Chen, Yanfeng Wang
Second, a spatial-temporal pyramid transformer is introduced to comprehensively extract multi-scale BEV features and predict future BEV states with the support of spatial-temporal priors.
Ranked #2 on
Bird's-Eye View Semantic Segmentation
on nuScenes
(IoU ped - 224x480 - Vis filter. - 100x100 at 0.5 metric)
1 code implementation • CVPR 2023 • Zhixin Wang, Xiaoyun Zhang, Ziying Zhang, Huangjie Zheng, Mingyuan Zhou, Ya zhang, Yanfeng Wang
However, it is expensive and infeasible to include every type of degradation to cover real-world cases in the training data.
2 code implementations • 13 Mar 2023 • Weixiong Lin, Ziheng Zhao, Xiaoman Zhang, Chaoyi Wu, Ya zhang, Yanfeng Wang, Weidi Xie
Foundation models trained on large-scale dataset gain a recent surge in CV and NLP.
Ranked #3 on
Medical Visual Question Answering
on PMC-VQA
1 code implementation • 27 Feb 2023 • Xiaoman Zhang, Chaoyi Wu, Ya zhang, Yanfeng Wang, Weidi Xie
While multi-modal foundation models pre-trained on large-scale data have been successful in natural language understanding and vision recognition, their use in medical domains is still limited due to the fine-grained nature of medical tasks and the high demand for domain knowledge.
no code implementations • 22 Feb 2023 • Chaoyi Wu, Xiaoman Zhang, Yanfeng Wang, Ya zhang, Weidi Xie
In this paper, we consider the problem of disease diagnosis.
no code implementations • 20 Feb 2023 • Zihan Zhao, Yu Wang, Yanfeng Wang
Multimodal emotion recognition is a challenging research area that aims to fuse different modalities to predict human emotion.
1 code implementation • 10 Feb 2023 • Feng Hong, Jiangchao Yao, Zhihan Zhou, Ya zhang, Yanfeng Wang
The straightforward combination of LT and PLL, i. e., LT-PLL, suffers from a fundamental dilemma: LT methods build upon a given class distribution that is unavailable in PLL, and the performance of PLL is severely influenced in long-tailed context.
1 code implementation • ICCV 2023 • Ziyi Li, Qinye Zhou, Xiaoyun Zhang, Ya zhang, Yanfeng Wang, Weidi Xie
The goal of this paper is to extract the visual-language correspondence from a pre-trained text-to-image diffusion model, in the form of segmentation map, i. e., simultaneously generating images and segmentation masks for the corresponding visual entities described in the text prompt.
no code implementations • 9 Jan 2023 • Chaoyi Wu, Feng Chang, Xiao Su, Zhihan Wu, Yanfeng Wang, Ling Zhu, Ya zhang
The branch targets to solve a closely related task on the LN station level, i. e., classifying whether an LN station contains metastatic LN or not, so as to learn representations for LN stations.
no code implementations • 5 Jan 2023 • Chaoyi Wu, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie
In this paper, we consider enhancing medical visual-language pre-training (VLP) with domain-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice.
no code implementations • ICCV 2023 • Chaoyi Wu, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie
In this paper, we consider enhancing medical visual-language pre-training (VLP) with domain-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice.
1 code implementation • CVPR 2023 • Ruipeng Zhang, Qinwei Xu, Jiangchao Yao, Ya zhang, Qi Tian, Yanfeng Wang
Federated Domain Generalization (FedDG) attempts to learn a global model in a privacy-preserving manner that generalizes well to new clients possibly with domain shift.
no code implementations • CVPR 2023 • Chen Ju, Kunhao Zheng, Jinxiang Liu, Peisen Zhao, Ya zhang, Jianlong Chang, Yanfeng Wang, Qi Tian
And as a result, the dual-branch complementarity is effectively fused to promote a strong alliance.
Weakly-supervised Temporal Action Localization
Weakly Supervised Temporal Action Localization
1 code implementation • 14 Dec 2022 • Ziqing Fan, Yanfeng Wang, Jiangchao Yao, Lingjuan Lyu, Ya zhang, Qi Tian
However, in addition to previous explorations for improvement in federated averaging, our analysis shows that another critical bottleneck is the poorer optima of client models in more heterogeneous conditions.
1 code implementation • 14 Nov 2022 • Yifan Lu, Quanhao Li, Baoan Liu, Mehrdad Dianati, Chen Feng, Siheng Chen, Yanfeng Wang
Collaborative 3D object detection exploits information exchange among multiple agents to enhance accuracy of object detection in presence of sensor impairments such as occlusion.
no code implementations • 31 Oct 2022 • Enpei Zhang, Shuo Tang, Xiaowen Dong, Siheng Chen, Yanfeng Wang
To fill this gap, we propose a distributed multi-agent learning model inspired by human collaboration, in which the agents can autonomously detect suitable collaborators and refer to collaborators' model for better performance.
1 code implementation • 27 Oct 2022 • Chaofan Ma, Yuhuan Yang, Yanfeng Wang, Ya zhang, Weidi Xie
When trained at a sufficient scale, self-supervised learning has exhibited a notable ability to solve a wide range of visual or language understanding tasks.
no code implementations • 18 Oct 2022 • Yangheng Zhao, Jun Wang, Xiaolong Li, Yue Hu, Ce Zhang, Yanfeng Wang, Siheng Chen
Instead of learning a single prototype for each class, in this paper, we propose to use an adaptive number of prototypes to dynamically describe the different point patterns within a semantic class.
Ranked #19 on
3D Semantic Segmentation
on SemanticKITTI
no code implementations • 7 Oct 2022 • Qinye Zhou, Ziyi Li, Weidi Xie, Xiaoyun Zhang, Ya zhang, Yanfeng Wang
Existing models on super-resolution often specialized for one scale, fundamentally limiting their use in practical scenarios.
no code implementations • 23 Aug 2022 • Lin Liu, Junfeng An, Jianzhuang Liu, Shanxin Yuan, Xiangyu Chen, Wengang Zhou, Houqiang Li, Yanfeng Wang, Qi Tian
Low-light video enhancement (LLVE) is an important yet challenging task with many applications such as photographing and autonomous driving.
1 code implementation • 11 Jul 2022 • Zihan Zhao, Yanfeng Wang, Yu Wang
The research and applications of multimodal emotion recognition have become increasingly popular recently.
1 code implementation • 11 Jul 2022 • Bohan Tang, Yiqi Zhong, Chenxin Xu, Wei-Tao Wu, Ulrich Neumann, Yanfeng Wang, Ya zhang, Siheng Chen
Further, we apply the proposed framework to current SOTA multi-agent multi-modal forecasting systems as a plugin module, which enables the SOTA systems to 1) estimate the uncertainty in the multi-agent multi-modal trajectory forecasting task; 2) rank the multiple predictions and select the optimal one based on the estimated uncertainty.
1 code implementation • 29 Jun 2022 • Yongjun Jiang, Jian Yu, Wenwen Yang, Bihong Zhang, Yanfeng Wang
To the best of our knowledge, the proposed Nextformer model achieves SOTA results on AISHELL-1(CER 4. 06%) and WenetSpeech(CER 7. 56%/11. 29%).
1 code implementation • 14 Jun 2022 • Ziheng Zhao, Tianjiao Zhang, Weidi Xie, Yanfeng Wang, Ya zhang
This paper considers the problem of undersampled MRI reconstruction.
1 code implementation • 25 May 2022 • Zhihan Zhou, Jiangchao Yao, Yanfeng Wang, Bo Han, Ya zhang
Different from previous works, we explore this direction from an alternative perspective, i. e., the data perspective, and propose a novel Boosted Contrastive Learning (BCL) method.
no code implementations • 13 May 2022 • Chaoqin Huang, Qinwei Xu, Yanfeng Wang, Yu Wang, Ya zhang
To extend the reconstruction-based anomaly detection architecture to the localized anomalies, we propose a self-supervised learning approach through random masking and then restoring, named Self-Supervised Masking (SSM) for unsupervised anomaly detection and localization.
1 code implementation • 7 Dec 2021 • Xiaohang Bian, Bo Qin, Xiaozhe Xin, Jianwu Li, Xuefeng Su, Yanfeng Wang
Handwritten mathematical expression recognition aims to automatically generate LaTeX sequences from given images.
no code implementations • 7 Sep 2021 • Xiaoman Zhang, Weidi Xie, Chaoqin Huang, Yanfeng Wang, Ya zhang, Xin Chen, Qi Tian
In this paper, we target self-supervised representation learning for zero-shot tumor segmentation.
no code implementations • 25 Aug 2021 • Maosen Li, Siheng Chen, Yangheng Zhao, Ya zhang, Yanfeng Wang, Qi Tian
The core of MST-GNN is a multiscale spatio-temporal graph that explicitly models the relations in motions at various spatial and temporal scales.
no code implementations • 11 Aug 2021 • Hao Wu, Jiangchao Yao, Ya zhang, Yanfeng Wang
Learning with noisy labels has gained the enormous interest in the robust deep learning area.
no code implementations • 5 Aug 2021 • Shixiang Feng, YuHang Zhou, Xiaoman Zhang, Ya zhang, Yanfeng Wang
A novel Multi-teacher Single-student Knowledge Distillation (MS-KD) framework is proposed, where the teacher models are pre-trained single-organ segmentation networks, and the student model is a multi-organ segmentation network.
1 code implementation • CVPR 2021 • Qinwei Xu, Ruipeng Zhang, Ya zhang, Yanfeng Wang, Qi Tian
Modern deep neural networks suffer from performance degradation when evaluated on testing data under different distributions from training data.
no code implementations • ICCV 2021 • Ruolin Ye, Wenqiang Xu, Zhendong Xue, Tutian Tang, Yanfeng Wang, Cewu Lu
Besides, we also report the hand and object pose errors with existing baselines and show that the dataset can serve as the video demonstrations for robot imitation learning on the handover task.
no code implementations • 31 Mar 2021 • Hao Wu, Jiangchao Yao, Jiajie Wang, Yinru Chen, Ya zhang, Yanfeng Wang
Deep neural networks (DNNs) have the capacity to fit extremely noisy labels nonetheless they tend to learn data with clean labels first and then memorize those with noisy labels.
no code implementations • ICCV 2021 • Chen Ju, Peisen Zhao, Siheng Chen, Ya zhang, Yanfeng Wang, Qi Tian
Single-frame temporal action localization (STAL) aims to localize actions in untrimmed videos with only one timestamp annotation for each action instance.
1 code implementation • LREC 2022 • Wenhao Zhu, ShuJian Huang, Tong Pu, Pingxuan Huang, Xu Zhang, Jian Yu, Wei Chen, Yanfeng Wang, Jiajun Chen
Previous research for adapting a general neural machine translation (NMT) model into a specific domain usually neglects the diversity in translation within the same domain, which is a core problem for domain adaptation in real-world scenarios.
no code implementations • 15 Dec 2020 • Chen Ju, Peisen Zhao, Ya zhang, Yanfeng Wang, Qi Tian
Point-Level temporal action localization (PTAL) aims to localize actions in untrimmed videos with only one timestamp annotation for each action instance.
Ranked #3 on
Weakly Supervised Action Localization
on BEOID
no code implementations • 18 Nov 2020 • Peisen Zhao, Lingxi Xie, Ya zhang, Yanfeng Wang, Qi Tian
Knowledge distillation is employed to transfer the privileged information from the offline teacher to the online student.
Ranked #11 on
Online Action Detection
on TVSeries
no code implementations • 13 Oct 2020 • Xiaoman Zhang, Shixiang Feng, YuHang Zhou, Ya zhang, Yanfeng Wang
We demonstrate the effectiveness of our methods on two downstream tasks: i) Brain tumor segmentation, ii) Pancreas tumor segmentation.
no code implementations • 26 Jun 2019 • Yifeng Li, Lingxi Xie, Ya zhang, Rui Zhang, Yanfeng Wang, Qi Tian
Generating and eliminating adversarial examples has been an intriguing topic in the field of deep learning.