Search Results for author: Xudong Yang

Found 8 papers, 5 papers with code

RAMer: Reconstruction-based Adversarial Model for Multi-party Multi-modal Multi-label Emotion Recognition

1 code implementation9 Feb 2025 Xudong Yang, Yizhang Zhu, Nan Tang, Yuyu Luo

Conventional multi-modal multi-label emotion recognition (MMER) from videos typically assumes full availability of visual, textual, and acoustic modalities.

Contrastive Learning Emotion Recognition +1

Behavior Modeling Space Reconstruction for E-Commerce Search

no code implementations30 Jan 2025 Yejing Wang, Chi Zhang, Xiangyu Zhao, Qidong Liu, Maolin Wang, Xuewei Tao, Zitao Liu, Xing Shi, Xudong Yang, Ling Zhong, Wei Lin

Delivering superior search services is crucial for enhancing customer experience and driving revenue growth.

AskChart: Universal Chart Understanding through Textual Enhancement

1 code implementation26 Dec 2024 Xudong Yang, Yifan Wu, Yizhang Zhu, Nan Tang, Yuyu Luo

To effectively train AskChart, we design a three-stage training strategy to align visual and textual modalities for learning robust visual-textual representations and optimizing the learning of the MoE layer.

Chart Understanding

Learning Segment Similarity and Alignment in Large-Scale Content Based Video Retrieval

no code implementations20 Sep 2023 Chen Jiang, Kaiming Huang, Sifeng He, Xudong Yang, Wei zhang, Xiaobo Zhang, Yuan Cheng, Lei Yang, Qing Wang, Furong Xu, Tan Pan, Wei Chu

SSAN is based on two newly proposed modules in video retrieval: (1) An efficient Self-supervised Keyframe Extraction (SKE) module to reduce redundant frame features, (2) A robust Similarity Pattern Detection (SPD) module for temporal alignment.

Retrieval Video Retrieval

Boundary-aware Backward-Compatible Representation via Adversarial Learning in Image Retrieval

1 code implementation CVPR 2023 Tan Pan, Furong Xu, Xudong Yang, Sifeng He, Chen Jiang, Qingpei Guo, Feng Qian Xiaobo Zhang, Yuan Cheng, Lei Yang, Wei Chu

For traditional model upgrades, the old model will not be replaced by the new one until the embeddings of all the images in the database are re-computed by the new model, which takes days or weeks for a large amount of data.

Image Retrieval Retrieval

TransVCL: Attention-enhanced Video Copy Localization Network with Flexible Supervision

2 code implementations23 Nov 2022 Sifeng He, Yue He, Minlong Lu, Chen Jiang, Xudong Yang, Feng Qian, Xiaobo Zhang, Lei Yang, Jiandong Zhang

Previous methods typically start from frame-to-frame similarity matrix generated by cosine similarity between frame-level features of the input video pair, and then detect and refine the boundaries of copied segments on similarity matrix under temporal constraints.

Retrieval Video Retrieval

Cannot find the paper you are looking for? You can Submit a new open access paper.