1 code implementation • 12 Oct 2023 • Yixuan Zhou, Xuanhan Wang, Xing Xu, Lei Zhao, Jingkuan Song
Inspired by this observation, we introduce a lightweight and powerful alternative, Spatially Unidimensional Self-Attention (SUSA), to the pointwise (1x1) convolution that is the main computational bottleneck in the depthwise separable 3c3 convolution.
no code implementations • 23 Aug 2023 • Xiaojia Chen, Xuanhan Wang, Lianli Gao, Beitao Chen, Jingkuan Song, HenTao Shen
Existing methods of multiple human parsing (MHP) apply statistical models to acquire underlying associations between images and labeled body parts.
no code implementations • 27 Aug 2022 • Xiaojia Chen, Xuanhan Wang, Lianli Gao, Jingkuan Song
Different from mainstream methods, RepParser solves the multiple human parsing in a new single-stage manner without resorting to person detection or post-grouping. To this end, RepParser decouples the parsing pipeline into instance-aware kernel generation and part-aware human parsing, which are responsible for instance separation and instance-specific part segmentation, respectively.
1 code implementation • 30 Jun 2022 • Xuanhan Wang, Yan Dai, Lianli Gao, Jingkuan Song
Specifically, each GCN model in ACFL not only learns action representation from the single-form skeletons, but also adaptively mimics useful representations derived from other forms of skeletons.
1 code implementation • 21 Jun 2022 • Xuanhan Wang, Jingkuan Song, Xiaojia Chen, Lechao Cheng, Lianli Gao, Heng Tao Shen
In this article, we propose a Knowledge Embedded RCNN (KE-RCNN) to identify attributes by leveraging rich knowledges, including implicit knowledge (e. g., the attribute ``above-the-hip'' for a shirt requires visual/geometry relations of shirt-hip) and explicit knowledge (e. g., the part of ``shorts'' cannot have the attribute of ``hoodie'' or ``lining'').
1 code implementation • 21 Jun 2022 • Xuanhan Wang, Lianli Gao, Yixuan Zhou, Jingkuan Song, Meng Wang
Human densepose estimation, aiming at establishing dense correspondences between 2D pixels of human body and 3D human body template, is a key technique in enabling machines to have an understanding of people in images.
no code implementations • 5 Nov 2021 • Xuanhan Wang, Xiaojia Chen, Lianli Gao, Lechao Chen, Jingkuan Song
Despite of dramatic progresses in the area of video classification research, a severe problem faced by the community is that the detailed understanding of human actions is ignored.
1 code implementation • ICCV 2021 • Yuyu Guo, Lianli Gao, Xuanhan Wang, Yuxuan Hu, Xing Xu, Xu Lu, Heng Tao Shen, Jingkuan Song
The scene graph generation (SGG) task aims to detect visual relationship triplets, i. e., subject, predicate, object, in an image, providing a structural vision layout for scene understanding.