1 code implementation • 9 Feb 2025 • Xudong Yang, Yizhang Zhu, Nan Tang, Yuyu Luo
Conventional multi-modal multi-label emotion recognition (MMER) from videos typically assumes full availability of visual, textual, and acoustic modalities.
no code implementations • 30 Jan 2025 • Yejing Wang, Chi Zhang, Xiangyu Zhao, Qidong Liu, Maolin Wang, Xuewei Tao, Zitao Liu, Xing Shi, Xudong Yang, Ling Zhong, Wei Lin
Delivering superior search services is crucial for enhancing customer experience and driving revenue growth.
1 code implementation • 26 Dec 2024 • Xudong Yang, Yifan Wu, Yizhang Zhu, Nan Tang, Yuyu Luo
To effectively train AskChart, we design a three-stage training strategy to align visual and textual modalities for learning robust visual-textual representations and optimizing the learning of the MoE layer.
no code implementations • 20 Sep 2023 • Chen Jiang, Kaiming Huang, Sifeng He, Xudong Yang, Wei zhang, Xiaobo Zhang, Yuan Cheng, Lei Yang, Qing Wang, Furong Xu, Tan Pan, Wei Chu
SSAN is based on two newly proposed modules in video retrieval: (1) An efficient Self-supervised Keyframe Extraction (SKE) module to reduce redundant frame features, (2) A robust Similarity Pattern Detection (SPD) module for temporal alignment.
1 code implementation • CVPR 2023 • Tan Pan, Furong Xu, Xudong Yang, Sifeng He, Chen Jiang, Qingpei Guo, Feng Qian Xiaobo Zhang, Yuan Cheng, Lei Yang, Wei Chu
For traditional model upgrades, the old model will not be replaced by the new one until the embeddings of all the images in the database are re-computed by the new model, which takes days or weeks for a large amount of data.
2 code implementations • 23 Nov 2022 • Sifeng He, Yue He, Minlong Lu, Chen Jiang, Xudong Yang, Feng Qian, Xiaobo Zhang, Lei Yang, Jiandong Zhang
Previous methods typically start from frame-to-frame similarity matrix generated by cosine similarity between frame-level features of the input video pair, and then detect and refine the boundaries of copied segments on similarity matrix under temporal constraints.
1 code implementation • CVPR 2022 • Sifeng He, Xudong Yang, Chen Jiang, Gang Liang, Wei zhang, Tan Pan, Qing Wang, Furong Xu, Chunguang Li, Jingxiong Liu, Hui Xu, Kaiming Huang, Yuan Cheng, Feng Qian, Xiaobo Zhang, Lei Yang
In this paper, we introduce VCSL (Video Copy Segment Localization), a new comprehensive segment-level annotated video copy dataset.