no code implementations • 21 Dec 2023 • Zeqian Li, Qirui Chen, Tengda Han, Ya zhang, Yanfeng Wang, Weidi Xie
In this paper, we consider the problem of temporally aligning the video and texts from instructional videos, specifically, given a long-term video, and associated text sentences, our goal is to determine their corresponding timestamps in the video.
no code implementations • 12 Dec 2023 • Chen Ju, Haicheng Wang, Zeqian Li, Xu Chen, Zhonghua Zhai, Weilin Huang, Shuai Xiao
Vision-Language Large Models (VLMs) have become primary backbone of AI, due to the impressive performance.
no code implementations • 21 Mar 2023 • Chen Ju, Zeqian Li, Peisen Zhao, Ya zhang, Xiaopeng Zhang, Qi Tian, Yanfeng Wang, Weidi Xie
In this paper, we consider the problem of temporal action localization under low-shot (zero-shot & few-shot) scenario, with the goal of detecting and classifying the action instances from arbitrary categories within some untrimmed videos, even not seen at training time.
no code implementations • 10 Nov 2022 • Zeqian Li, Keyu Qiu, Chenxu Jiao, Wen Zhu, Haoran Tang
This paper describes a French dialect recognition system that will appropriately distinguish between different regional French dialects.
no code implementations • 27 Apr 2022 • Zeqian Li, Yuwei Wang, Kexun Chen, Zhibin Yu
To demonstrate the practicality of the pruning method, we select the YOLOv5 model for experiments and provide a data set of outdoor obstacles to show the effect of model.
1 code implementation • 9 Sep 2021 • Zeqian Li, Xinlu He, Jacob Whitehill
We consider a novel clustering task in which clusters can have compositional relationships, e. g., one cluster contains images of rectangles, one contains images of circles, and a third (compositional) cluster contains images with both objects.
1 code implementation • 22 Oct 2020 • Zeqian Li, Jacob Whitehill
We propose a new method for speaker diarization that can handle overlapping speech with 2+ people.
no code implementations • 11 Feb 2020 • Zeqian Li, Michael C. Mozer, Jacob Whitehill
We present a compositional embedding framework that infers not just a single class per input image, but a set of classes, in the setting of one-shot learning.
no code implementations • 25 Sep 2019 • Zeqian Li, Jacob Whitehill
We explore the idea of compositional set embeddings that can be used to infer not just a single class, but the set of classes associated with the input data (e. g., image, video, audio signal).