1 code implementation • 10 Sep 2024 • Zehao Wang, Haobo Yue, Zhicheng Zhang, Da Mu, Jin Tang, Jianqin Yin
Sound Event Detection (SED) plays a vital role in comprehending and perceiving acoustic scenes.
no code implementations • 9 Aug 2024 • Da Mu, Zhicheng Zhang, Haobo Yue, Zehao Wang, Jin Tang, Jianqin Yin
In the Sound Event Localization and Detection (SELD) task, Transformer-based models have demonstrated impressive capabilities.
no code implementations • 29 Jul 2024 • Guoliang Xu, Jianqin Yin, Feng Zhou, Yonghao Dang
Thus, we propose ActivityCLIP, a plug-and-play method for mining the text information contained in the action labels to supplement the image information for enhancing group activity recognition.
no code implementations • 12 Jun 2024 • Ren Zhang, Jianqin Yin, Chao Qi, Zehao Wang, Zhicheng Zhang, Yonghao Dang
Conversely, depth information can effectively represent motion information related to facial structure changes and is not affected by lighting.
no code implementations • 13 May 2024 • Yuanyuan Jiang, Jianqin Yin
Specifically, we propose a TSG+ module to transfer the image-text matching knowledge from CLIP models to our region-text matching process without corresponding ground-truth labels.
Audio-visual Question Answering Audio-Visual Question Answering (AVQA) +5
1 code implementation • 24 Apr 2024 • Lizhi Wang, Feng Zhou, Bo Yu, Pu Cao, Jianqin Yin
Moreover, to reconstruct the unseen portions of the target, we propose a novel target replenishment technique driven by large-scale generative diffusion priors.
1 code implementation • 22 Apr 2024 • Yonghao Dang, Jianqin Yin, Liyuan Liu, Pengxiang Ding, Yuan Sun, Yanzhu Hu
Multi-person pose estimation (MPPE) presents a formidable yet crucial challenge in computer vision.
no code implementations • 4 Apr 2024 • Pengxiang Ding, Jianqin Yin
However, the motion coordination, a global joint relation reflecting the simultaneous cooperation of all joints, is usually weakened because it is learned from part to whole progressively and asynchronously.
1 code implementation • 10 Jan 2024 • Haobo Yue, Zhicheng Zhang, Da Mu, Yonghao Dang, Jianqin Yin, Jin Tang
Recently, 2D convolution has been found unqualified in sound event detection (SED).
no code implementations • 31 Dec 2023 • Ruoqi Yin, Jianqin Yin
Specifically, Transformer-based stream integrates 3D convolutions with multi-head self-attention to learn inter-token correlations; We propose a new multi-branch CNN framework for CNN-based streams that automatically learns joint spatio-temporal features from skeleton sequences.
no code implementations • 25 Dec 2023 • Feng Zhou, Jianqin Yin, Peiyang Li
In the second stage, we allow the keypoints to further emphasize the retained critical image features.
2 code implementations • 23 Dec 2023 • Shaojie Zhang, Jianqin Yin, Yonghao Dang
Furthermore, to explicitly exploit the latent data distributions, we employ the attentive features to contrastive learning, which models the cross-sequence semantic relations by pulling together the features from the positive pairs and pushing away the negative pairs.
no code implementations • 17 Nov 2023 • Zhicheng Zhang, Xueyao Sun, Yonghao Dang, Jianqin Yin
On the challenging of COCO dataset, the proposed method enables the binary neural network to achieve 70. 8 mAP, which is better than most tested lightweight full-precision networks.
2 code implementations • 12 Sep 2023 • Anthony Cioppa, Silvio Giancola, Vladimir Somers, Floriane Magera, Xin Zhou, Hassan Mkhallati, Adrien Deliège, Jan Held, Carlos Hinojosa, Amir M. Mansourian, Pierre Miralles, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdullah Kamal, Adrien Maglo, Albert Clapés, Amr Abdelaziz, Artur Xarles, Astrid Orcesi, Atom Scott, Bin Liu, Byoungkwon Lim, Chen Chen, Fabian Deuser, Feng Yan, Fufu Yu, Gal Shitrit, Guanshuo Wang, Gyusik Choi, Hankyul Kim, Hao Guo, Hasby Fahrudin, Hidenari Koguchi, Håkan Ardö, Ibrahim Salah, Ido Yerushalmy, Iftikar Muhammad, Ikuma Uchida, Ishay Be'ery, Jaonary Rabarisoa, Jeongae Lee, Jiajun Fu, Jianqin Yin, Jinghang Xu, Jongho Nang, Julien Denize, Junjie Li, Junpei Zhang, Juntae Kim, Kamil Synowiec, Kenji Kobayashi, Kexin Zhang, Konrad Habel, Kota Nakajima, Licheng Jiao, Lin Ma, Lizhi Wang, Luping Wang, Menglong Li, Mengying Zhou, Mohamed Nasr, Mohamed Abdelwahed, Mykola Liashuha, Nikolay Falaleev, Norbert Oswald, Qiong Jia, Quoc-Cuong Pham, Ran Song, Romain Hérault, Rui Peng, Ruilong Chen, Ruixuan Liu, Ruslan Baikulov, Ryuto Fukushima, Sergio Escalera, Seungcheon Lee, Shimin Chen, Shouhong Ding, Taiga Someya, Thomas B. Moeslund, Tianjiao Li, Wei Shen, Wei zhang, Wei Li, Wei Dai, Weixin Luo, Wending Zhao, Wenjie Zhang, Xinquan Yang, Yanbiao Ma, Yeeun Joo, Yingsen Zeng, Yiyang Gan, Yongqiang Zhu, Yujie Zhong, Zheng Ruan, Zhiheng Li, Zhijian Huang, Ziyu Meng
More information on the tasks, challenges, and leaderboards are available on https://www. soccer-net. org.
2 code implementations • 30 Aug 2023 • Shaojie Zhang, Jianqin Yin, Yonghao Dang, Jiajun Fu
Graph convolution networks (GCNs) have achieved remarkable performance in skeleton-based action recognition.
1 code implementation • 21 Jun 2023 • Chengxu Duan, Zhicheng Zhang, Xiaoli Liu, Yonghao Dang, Jianqin Yin
Specifically, we introduce a novel adaptable scheme that facilitates the attack to suit the scale of the target pose and two physical constraints to enhance the naturalness of the adversarial example.
1 code implementation • 21 May 2023 • Yuanyuan Jiang, Jianqin Yin
Recent works rely on elaborate target-agnostic parsing of audio-visual scenes for spatial grounding while mistreating audio and video as separate entities for temporal grounding.
Audio-visual Question Answering Audio-Visual Question Answering (AVQA) +3
no code implementations • 18 Apr 2023 • Guoliang Xu, Jianqin Yin
MLP-T is used to model the temporal relation between different frames for each actor.
no code implementations • 17 Apr 2023 • Binglu Ren, Jianqin Yin
To solve these two problems, we present a new concept, Voxel Region (VR), which is obtained by projecting the sparse local point clouds in each voxel dynamically.
no code implementations • 27 Mar 2023 • Ruoqi Yin, Jianqin Yin
In this paper, we concern on the bottom-up paradigm in multi-person pose estimation (MPPE).
no code implementations • 13 Mar 2023 • Jiajun Fu, Yonghao Dang, Ruoqi Yin, Shaojie Zhang, Feng Zhou, Wending Zhao, Jianqin Yin
This technical report describes our first-place solution to the pose estimation challenge at ECCV 2022 Visual Perception for Navigation in Human Environments Workshop.
no code implementations • 21 Feb 2023 • Chao Qi, Jianqin Yin, Jinghang Xu, Pengxiang Ding
This work introduces a new task of instance-incremental scene graph generation: Given a scene of the point cloud, representing it as a graph and automatically increasing novel instances.
no code implementations • 31 Dec 2022 • Xiaofa Liu, Jianqin Yin, Yuan Sun, Zhicheng Zhang, Jin Tang
Unlike most existing methods with offline feature generation, our method directly takes frames as input and further models motion evolution on two different temporal scales. Therefore, we solve the complexity problems of the two stages of modeling and the problem of insufficient temporal and spatial information of a single scale.
1 code implementation • 11 Oct 2022 • Yuanyuan Jiang, Jianqin Yin, Yonghao Dang
In contrast to existing methods, we propose a novel video-level semantic consistency guidance network for the AVE localization task.
no code implementations • 22 Jul 2022 • Yonghao Dang, Jianqin Yin, Shaojie Zhang, Jiping Liu, Yanzhu Hu
In this work, we propose a plug-and-play kinematics modeling module (KMM) to explicitly model temporal correlations between joints across different frames by calculating their temporal similarity.
2 code implementations • 9 May 2022 • Wei Dai, Rui Liu, Tianyi Wu, Min Wang, Jianqin Yin, Jun Liu
Visual features of skin lesions vary significantly because the images are collected from patients with different lesion colours and morphologies by using dissimilar imaging equipment.
no code implementations • 8 May 2022 • Tingxiu Chen, Jianqin Yin, Jin Tang
In recent years, audio-visual event localization has attracted much attention.
1 code implementation • 4 Apr 2022 • Jiajun Fu, Fuxing Yang, Yonghao Dang, Xiaoli Liu, Jianqin Yin
The key of DSTD-GC is constrained dynamic correlation modeling, which explicitly parameterizes the common static constraints as a spatial/temporal vanilla adjacency matrix shared by all frames/joints and dynamically extracts correspondence variances for each frame/joint with an adjustment modeling function.
1 code implementation • 23 Jan 2022 • Xiaoli Liu, Jianqin Yin, Di Guo, Huaping Liu
Next, we build a bi-directional semantic graph for the teacher network and a single-directional semantic graph for the student network to model rich ASCK among partial videos.
no code implementations • 5 Dec 2021 • Chao Qi, Jianqin Yin
Specifically, the NSA-MC dropout samples the model many times through a space-dependent way, outputting point-wise distribution by aggregating stochastic inference results of neighbors.
no code implementations • 20 Nov 2021 • Mingshuai Dong, Shimin Wei, Jianqin Yin, Xiuli Yu
And we also design a target feature attention mechanism to guide the model focus on the features of target object ontology for grasp prediction according to the semantic information.
no code implementations • 15 Jul 2021 • Xunli Zeng, Jianqin Yin
This jigsaw method can better model the occlusion relationship and use the occlusion context information, which is important for amodal segmentation.
no code implementations • 8 Jul 2021 • Pengxiang Ding, Jianqin Yin
It is far more enough for current approaches in actual scenarios because people can't know how to interact with the machine without the evaluation of prediction, and unreliable predictions may mislead the machine to harm the human.
1 code implementation • 8 Jul 2021 • Yonghao Dang, Jianqin Yin, Shaojie Zhang
Moreover, the JRE can infer invisible joints according to the relationship between joints, which is beneficial for the model to locate occluded joints.
no code implementations • 20 May 2021 • Pengxiang Ding, Junying Wang, Jianqin Yin
However, the global coordination of all joints, which reflects human motion's balance property, is usually weakened because it is learned from part to whole progressively and asynchronously.
no code implementations • 11 Apr 2021 • Jin Tang, Jin Zhang, Jianqin Yin
In this paper, we propose a novel temporal fusion (TF) module to fuse the two-stream joints' information to predict human motion, including a temporal concatenation and a reinforcement trajectory spatial-temporal (TST) block, specifically designed to keep prediction temporal consistency.
no code implementations • 20 Jan 2021 • Mingshuai Dong, Shimin Wei, Xiuli Yu, Jianqin Yin
MASK is a segmented image that only contains the pixels of the target object.
Robotics
no code implementations • 23 Dec 2020 • Jin Liu, Jianqin Yin
A multi-grained trajectory graph convolutional networks based and lightweight framework is proposed for habit-unrelated human motion prediction.
no code implementations • 11 Oct 2020 • Xiaoli Liu, Jianqin Yin
Predicting future human motion is critical for intelligent robots to interact with humans in the real world, and human motion has the nature of multi-granularity.
1 code implementation • 25 May 2020 • Xiaoli Liu, Jianqin Yin, Huaping Liu, Jun Liu
In contrast to prior works, we improve the multi-order modeling ability of human motion systems for more accurate predictions by building a deep state-space model (DeepSSM).
no code implementations • 15 Mar 2020 • Jianqin Yin, Yanchun Wu, Huaping Liu, Yonghao Dang, Zhiyi Liu, Jun Liu
Our work features two-fold: 1) An important insight that deep features extracted for action recognition can well model the self-similarity periodicity of the repetitive action is presented.
no code implementations • 15 Oct 2019 • Xiaoli Liu, Jianqin Yin, Jin Liu, Pengxiang Ding, Jun Liu, Huaping Liu
And the global temporal co-occurrence features represent the co-occurrence relationship that different subsequences in a complex motion sequence are appeared simultaneously, which can be obtained automatically with our proposed TrajectoryNet by reorganizing the temporal information as the depth dimension of the input tensor.
no code implementations • arXiv:1909.01818 2019 • Xiaoli Liu, Jianqin Yin, Huaping Liu, Yilong Yin
Specifically, a skeletal representation is proposed by transforming the joint coordinate sequence into an image sequence, which can model the different correlations of different joints.
Ranked #1 on Pose Prediction on Filtered NTU RGB+D
no code implementations • 29 Aug 2019 • Yonghao Dang, Fuxing Yang, Jianqin Yin
We propose in this paper a deep-wide network (DWnet) which combines the deep structure with the broad learning system (BLS) to recognize actions.