no code implementations • 14 Oct 2024 • Zhengwei Yang, Yuke Li, Qiang Sun, Basura Fernando, Heng Huang, Zheng Wang
Most existing studies on few-shot learning focus on unimodal settings, where models are trained to generalize on unseen data using only a small number of labeled examples from the same modality.
no code implementations • 15 Aug 2024 • Lifeng Zhou, Yuke Li, Rui Deng, Yuting Yang, Haoqi Zhu
To address this issue, we introduce an effective framework and a novel learning task named cross-modal denoising (CMD) to enhance cross-modal interaction to achieve finer-level cross-modal alignment.
no code implementations • 15 Aug 2024 • Lifeng Zhou, Yuke Li
We utilize speech-image contrastive (SIC) learning tasks to align speech and image representations at a coarse level and speech-image matching (SIM) learning tasks to further refine the fine-grained cross-modal alignment.
1 code implementation • CVPR 2024 • Xinyao Li, Yuke Li, Zhekai Du, Fengling Li, Ke Lu, Jingjing Li
In this work, we introduce a Unified Modality Separation (UniMoS) framework for unsupervised domain adaptation.
no code implementations • 20 Feb 2024 • Yuke Li, Guangyi Chen, Ben Abramowitz, Stefano Anzellott, Donglai Wei
Few-shot action recognition aims at quickly adapting a pre-trained model to the novel data with a distribution shift using only a limited number of samples.
1 code implementation • 10 Jan 2024 • Qian Wu, Ruoxuan Cui, Yuke Li, Haoqi Zhu
Action recognition in videos poses a challenge due to its high computational cost, especially for Joint Space-Time video transformers (Joint VT).
no code implementations • 22 Dec 2023 • Yuke Li, Lixiong Chen, Guangyi Chen, Ching-Yao Chan, Kun Zhang, Stefano Anzellotti, Donglai Wei
In order to predict a pedestrian's trajectory in a crowd accurately, one has to take into account her/his underlying socio-temporal interactions with other pedestrians consistently.
no code implementations • 1 Nov 2023 • Zhanwen Liu, Nan Yang, Yang Wang, Yuke Li, Xiangmo Zhao, Fei-Yue Wang
To address this issue, we introduce bio-inspired event cameras and propose a novel Structure-aware Fusion Network (SFNet) that extracts sharp and complete object structures from the event stream to compensate for the lost information in images through cross-modality fusion, enabling the network to obtain illumination-robust representations for traffic object detection.
no code implementations • 26 Oct 2023 • Xinfa Zhu, Yuke Li, Yi Lei, Ning Jiang, Guoqing Zhao, Lei Xie
This paper aims to build a multi-speaker expressive TTS system, synthesizing a target speaker's speech with multiple styles and emotions.
no code implementations • 27 Sep 2023 • Yuhang Liu, Boyi Sun, Yuke Li, Yuzheng Hu, Fei-Yue Wang
It uses a graph-attention Transformer to extract domain-specific features for each agent, coupled with a cross-attention mechanism for the final fusion.
1 code implementation • 15 Sep 2023 • Rui Deng, Qian Wu, Yuke Li, Haoran Fu
To address these issues, we propose an efficient video representation network with Differentiable Resolution Compression and Alignment mechanism, which compresses non-essential information in the early stage of the network to reduce computational costs while maintaining consistent temporal correlations.
1 code implementation • ICCV 2023 • Hao Ni, Yuke Li, Lianli Gao, Heng Tao Shen, Jingkuan Song
Based on the local similarity obtained in CSL, a Part-guided Self-Distillation (PSD) is proposed to further improve the generalization of global features.
Domain Generalization Generalizable Person Re-identification
no code implementations • 1 Jun 2023 • Yuting Yang, Yuke Li, Binbin Du
Specifically, the top-layer hidden representation at the same frame of the streaming and non-streaming modes are regarded as a positive pair, encouraging the representation of the streaming mode close to its non-streaming counterpart.
1 code implementation • 10 Nov 2022 • Rui Deng, Qian Wu, Yuke Li
In this paper, we introduce 3D-CSL, a compact pipeline for Near-Duplicate Video Retrieval (NDVR), and explore a novel self-supervised learning strategy for video similarity learning.
no code implementations • 25 May 2022 • Yuting Yang, Yuke Li, Binbin Du
The CTC-based automatic speech recognition (ASR) models without the external language model usually lack the capacity to model conditional dependencies and textual interactions.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 24 May 2022 • Yuting Yang, Binbin Du, Yuke Li
Thus only considering the writing of Chinese characters as modeling units is insufficient to capture speech features.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 3 Dec 2021 • Yuting Yang, Binbin Du, Yingxin Zhang, Wenxuan Wang, Yuke Li
We propose a mandarin keyword spotting system (KWS) with several novel and effective improvements, including a big backbone (B) model, a keyword biasing (B) mechanism and the introduction of syllable modeling units (S).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 1 Dec 2021 • Yue Wang, Xu Jia, Lu Zhang, Yuke Li, James Elder, Huchuan Lu
TFFM conducts a sufficient feature fusion by integrating features from multiple scales and two modalities over all positions simultaneously.
no code implementations • 29 Sep 2021 • Yuke Li, Kenneth Li, Pin Wang, Donglai Wei, Hanspeter Pfister, Ching-Yao Chan
Non-stationary casual structures are prevalent in real-world physical systems.
no code implementations • 3 Jul 2020 • Yue Wang, Yuke Li, James H. Elder, Huchuan Lu, Runmin Wu, Lu Zhang
Evaluation on seven RGB-D datasets demonstrates that even without saliency ground truth for RGB-D datasets and using only the RGB data of RGB-D datasets at inference, our semi-supervised system performs favorable against state-of-the-art fully-supervised RGB-D saliency detection methods that use saliency ground truth for RGB-D datasets at training and depth data at inference on two largest testing datasets.
no code implementations • 27 Nov 2019 • Yue Wang, Yuke Li, James H. Elder, Runmin Wu, Huchuan Lu
We address this problem by introducing a Class-Conditional Domain Adaptation method (CCDA).
no code implementations • CVPR 2019 • Yuke Li
A policy is then generated by taking the sampled latent decision into account to predict the future.