no code implementations • 6 Aug 2024 • Ruixiang Zhao, Jian Jia, Yan Li, Xuehan Bai, Quan Chen, Han Li, Peng Jiang, Xirong Li
While Automatic Speech Recognition (ASR) text derived from the short or live-stream videos is readily accessible, how to de-noise the excessively noisy text for multimodal representation learning is mostly untouched.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 7 May 2024 • Jian Jia, Yipei Wang, Yan Li, Honggang Chen, Xuehan Bai, Zhaocheng Liu, Jian Liang, Quan Chen, Han Li, Peng Jiang, Kun Gai
Contemporary recommender systems predominantly rely on collaborative filtering techniques, employing ID-embedding to capture latent associations among users and items.
no code implementations • 15 Mar 2024 • Dongze Hao, Jian Jia, Longteng Guo, Qunbo Wang, Te Yang, Yan Li, Yanhua Cheng, Bo wang, Quan Chen, Han Li, Jing Liu
We condense the retrieved knowledge passages from two perspectives.
4 code implementations • CVPR 2023 • Weihua Chen, Xianzhe Xu, Jian Jia, Hao Luo, Yaohua Wang, Fan Wang, Rong Jin, Xiuyu Sun
Unlike the existing self-supervised learning methods, prior knowledge from human images is utilized in SOLIDER to build pseudo semantic labels and import more semantic information into the learned representation.
Ranked #1 on Person Search on PRW
no code implementations • 5 Jan 2023 • Fei He, Haoyang Zhang, Naiyu Gao, Jian Jia, Yanhu Shan, Xin Zhao, Kaiqi Huang
When using such a pair to predict an object instance on the current frame, not only the generated instance is automatically associated with its precursors on previous frames, but the model gets a good prior for predicting the same object.
no code implementations • 2 Dec 2022 • Jian Jia, Fei He, Naiyu Gao, Xiaotang Chen, Kaiqi Huang
The specificity of the framework lies in a feature disentangle module, which contains learnable semantic queries and a Semantic Spatial Cross-Attention (SSCA) module.
no code implementations • 22 Jul 2022 • Fei He, Naiyu Gao, Jian Jia, Xin Zhao, Kaiqi Huang
The proposed QueryProp contains two propagation strategies: 1) query propagation is performed from sparse key frames to dense non-key frames to reduce the redundant computation on non-key frames; 2) query propagation is performed from previous key frames to the current key frame to improve feature representation by temporal context modeling.
1 code implementation • CVPR 2022 • Naiyu Gao, Fei He, Jian Jia, Yanhu Shan, Haoyang Zhang, Xin Zhao, Kaiqi Huang
To overcome these limitations, we propose a unified framework for the DPS task by applying a dynamic convolution technique to both the PS and depth prediction tasks.
no code implementations • 2 Mar 2022 • Xiaokun Feng, Xiaotang Chen, Jian Jia, Kaiqi Huang
Sandplay image, as an important psychoanalysis carrier, is a visual scene constructed by the client selecting and placing sand objects (e. g., sand, river, human figures, animals, vegetation, buildings, etc.).
no code implementations • ICCV 2021 • Jian Jia, Xiaotang Chen, Kaiqi Huang
To fully exploit inter-image relations and aggregate human prior in the model learning process, we construct a Spatial and Semantic Consistency (SSC) framework that consists of two complementary regularizations to achieve spatial and semantic consistency for each attribute.
1 code implementation • 8 Jul 2021 • Jian Jia, Houjing Huang, Xiaotang Chen, Kaiqi Huang
Second, based on the proposed definition, we expose the limitations of the existing datasets, which violate the academic norm and are inconsistent with the essential requirement of practical industry application.
2 code implementations • 25 May 2020 • Jian Jia, Houjing Huang, Wenjie Yang, Xiaotang Chen, Kaiqi Huang
Despite various methods are proposed to make progress in pedestrian attribute recognition, a crucial problem on existing datasets is often neglected, namely, a large number of identical pedestrian identities in train and test set, which is not consistent with practical application.
Ranked #3 on Pedestrian Attribute Recognition on PETA