Search Results for author: Yuke Li

Found 22 papers, 5 papers with code

Cross-Modal Few-Shot Learning: a Generative Transfer Learning Framework

no code implementations14 Oct 2024 Zhengwei Yang, Yuke Li, Qiang Sun, Basura Fernando, Heng Huang, Zheng Wang

Most existing studies on few-shot learning focus on unimodal settings, where models are trained to generalize on unseen data using only a small number of labeled examples from the same modality.

Few-Shot Learning Transfer Learning

Cross-Modal Denoising: A Novel Training Paradigm for Enhancing Speech-Image Retrieval

no code implementations15 Aug 2024 Lifeng Zhou, Yuke Li, Rui Deng, Yuting Yang, Haoqi Zhu

To address this issue, we introduce an effective framework and a novel learning task named cross-modal denoising (CMD) to enhance cross-modal interaction to achieve finer-level cross-modal alignment.

cross-modal alignment Denoising +2

Coarse-to-fine Alignment Makes Better Speech-image Retrieval

no code implementations15 Aug 2024 Lifeng Zhou, Yuke Li

We utilize speech-image contrastive (SIC) learning tasks to align speech and image representations at a coarse level and speech-image matching (SIM) learning tasks to further refine the fine-grained cross-modal alignment.

cross-modal alignment Image Retrieval +1

Learning Causal Domain-Invariant Temporal Dynamics for Few-Shot Action Recognition

no code implementations20 Feb 2024 Yuke Li, Guangyi Chen, Ben Abramowitz, Stefano Anzellott, Donglai Wei

Few-shot action recognition aims at quickly adapting a pre-trained model to the novel data with a distribution shift using only a limited number of samples.

Decoder Few-Shot action recognition +3

HaltingVT: Adaptive Token Halting Transformer for Efficient Video Recognition

1 code implementation10 Jan 2024 Qian Wu, Ruoxuan Cui, Yuke Li, Haoqi Zhu

Action recognition in videos poses a challenge due to its high computational cost, especially for Joint Space-Time video transformers (Joint VT).

Action Recognition In Videos Token Reduction +1

Learning Socio-Temporal Graphs for Multi-Agent Trajectory Prediction

no code implementations22 Dec 2023 Yuke Li, Lixiong Chen, Guangyi Chen, Ching-Yao Chan, Kun Zhang, Stefano Anzellotti, Donglai Wei

In order to predict a pedestrian's trajectory in a crowd accurately, one has to take into account her/his underlying socio-temporal interactions with other pedestrians consistently.

Trajectory Prediction

Enhancing Traffic Object Detection in Variable Illumination with RGB-Event Fusion

no code implementations1 Nov 2023 Zhanwen Liu, Nan Yang, Yang Wang, Yuke Li, Xiangmo Zhao, Fei-Yue Wang

To address this issue, we introduce bio-inspired event cameras and propose a novel Structure-aware Fusion Network (SFNet) that extracts sharp and complete object structures from the event stream to compensate for the lost information in images through cross-modality fusion, enabling the network to obtain illumination-robust representations for traffic object detection.

Object object-detection +2

Boosting Multi-Speaker Expressive Speech Synthesis with Semi-supervised Contrastive Learning

no code implementations26 Oct 2023 Xinfa Zhu, Yuke Li, Yi Lei, Ning Jiang, Guoqing Zhao, Lei Xie

This paper aims to build a multi-speaker expressive TTS system, synthesizing a target speaker's speech with multiple styles and emotions.

Contrastive Learning Expressive Speech Synthesis

HPL-ViT: A Unified Perception Framework for Heterogeneous Parallel LiDARs in V2V

no code implementations27 Sep 2023 Yuhang Liu, Boyi Sun, Yuke Li, Yuzheng Hu, Fei-Yue Wang

It uses a graph-attention Transformer to extract domain-specific features for each agent, coupled with a cross-attention mechanism for the final fusion.

Autonomous Driving Diversity +1

Differentiable Resolution Compression and Alignment for Efficient Video Classification and Retrieval

1 code implementation15 Sep 2023 Rui Deng, Qian Wu, Yuke Li, Haoran Fu

To address these issues, we propose an efficient video representation network with Differentiable Resolution Compression and Alignment mechanism, which compresses non-essential information in the early stage of the network to reduce computational costs while maintaining consistent temporal correlations.

Retrieval Video Classification +1

Part-Aware Transformer for Generalizable Person Re-identification

1 code implementation ICCV 2023 Hao Ni, Yuke Li, Lianli Gao, Heng Tao Shen, Jingkuan Song

Based on the local similarity obtained in CSL, a Part-guided Self-Distillation (PSD) is proposed to further improve the generalization of global features.

Domain Generalization Generalizable Person Re-identification

Enhancing the Unified Streaming and Non-streaming Model with Contrastive Learning

no code implementations1 Jun 2023 Yuting Yang, Yuke Li, Binbin Du

Specifically, the top-layer hidden representation at the same frame of the streaming and non-streaming modes are regarded as a positive pair, encouraging the representation of the streaming mode close to its non-streaming counterpart.

Contrastive Learning speech-recognition +1

3D-CSL: self-supervised 3D context similarity learning for Near-Duplicate Video Retrieval

1 code implementation10 Nov 2022 Rui Deng, Qian Wu, Yuke Li

In this paper, we introduce 3D-CSL, a compact pipeline for Near-Duplicate Video Retrieval (NDVR), and explore a novel self-supervised learning strategy for video similarity learning.

Retrieval Self-Supervised Learning +4

Improving CTC-based ASR Models with Gated Interlayer Collaboration

no code implementations25 May 2022 Yuting Yang, Yuke Li, Binbin Du

The CTC-based automatic speech recognition (ASR) models without the external language model usually lack the capacity to model conditional dependencies and textual interactions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Multi-Level Modeling Units for End-to-End Mandarin Speech Recognition

no code implementations24 May 2022 Yuting Yang, Binbin Du, Yuke Li

Thus only considering the writing of Chinese characters as modeling units is insufficient to capture speech features.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

BBS-KWS:The Mandarin Keyword Spotting System Won the Video Keyword Wakeup Challenge

no code implementations3 Dec 2021 Yuting Yang, Binbin Du, Yingxin Zhang, Wenxuan Wang, Yuke Li

We propose a mandarin keyword spotting system (KWS) with several novel and effective improvements, including a big backbone (B) model, a keyword biasing (B) mechanism and the introduction of syllable modeling units (S).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Transformer-based Network for RGB-D Saliency Detection

no code implementations1 Dec 2021 Yue Wang, Xu Jia, Lu Zhang, Yuke Li, James Elder, Huchuan Lu

TFFM conducts a sufficient feature fusion by integrating features from multiple scales and two modalities over all positions simultaneously.

Saliency Detection

Synergistic saliency and depth prediction for RGB-D saliency detection

no code implementations3 Jul 2020 Yue Wang, Yuke Li, James H. Elder, Huchuan Lu, Runmin Wu, Lu Zhang

Evaluation on seven RGB-D datasets demonstrates that even without saliency ground truth for RGB-D datasets and using only the RGB data of RGB-D datasets at inference, our semi-supervised system performs favorable against state-of-the-art fully-supervised RGB-D saliency detection methods that use saliency ground truth for RGB-D datasets at training and depth data at inference on two largest testing datasets.

Depth Estimation Depth Prediction +1

Which Way Are You Going? Imitative Decision Learning for Path Forecasting in Dynamic Scenes

no code implementations CVPR 2019 Yuke Li

A policy is then generated by taking the sampled latent decision into account to predict the future.

Cannot find the paper you are looking for? You can Submit a new open access paper.