no code implementations • 8 Mar 2024 • Xiang Huang, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Wangmeng Xiang, Baigui Sun, Xiao Wu
The advancement of autonomous driving systems hinges on the ability to achieve low-latency and high-accuracy perception.
no code implementations • 3 Jan 2024 • Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Jingdong Sun, Wangmeng Xiang, Yusen Hu, Xianhui Lin, Xiaoyang Kang, Zengke Jin, Bin Luo, Yifeng Geng, Xuansong Xie, Jingren Zhou
This paper introduces the WordArt Designer API, a novel framework for user-driven artistic typography synthesis utilizing Large Language Models (LLMs) on ModelScope.
1 code implementation • 6 Nov 2023 • Yuxiang Tuo, Wangmeng Xiang, Jun-Yan He, Yifeng Geng, Xuansong Xie
Based on AnyWord-3M dataset, we propose AnyText-benchmark for the evaluation of visual text generation accuracy and quality.
no code implementations • 20 Oct 2023 • Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Jingdong Sun, Wangmeng Xiang, Xianhui Lin, Xiaoyang Kang, Zengke Jin, Yusen Hu, Bin Luo, Yifeng Geng, Xuansong Xie, Jingren Zhou
This paper introduces WordArt Designer, a user-driven framework for artistic typography synthesis, relying on the Large Language Model (LLM).
1 code implementation • 4 Sep 2023 • Hanbing Liu, Wangmeng Xiang, Jun-Yan He, Zhi-Qi Cheng, Bin Luo, Yifeng Geng, Xuansong Xie
Accurately estimating the 3D pose of humans in video sequences requires both accuracy and a well-structured architecture.
1 code implementation • 18 Aug 2023 • Hanbing Liu, Jun-Yan He, Zhi-Qi Cheng, Wangmeng Xiang, Qize Yang, Wenhao Chai, Gaoang Wang, Xu Bao, Bin Luo, Yifeng Geng, Xuansong Xie
Typically, PoSynDA uses a diffusion-inspired structure to simulate 3D pose distribution in the target domain.
1 code implementation • ICCV 2023 • jianqi ma, Zhetong Liang, Wangmeng Xiang, Xi Yang, Lei Zhang
Scene Text Image Super-resolution (STISR) aims to recover high-resolution (HR) scene text images with visually pleasant and readable text content from the given low-resolution (LR) input.
1 code implementation • 25 May 2023 • Xu Bao, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Wangmeng Xiang, Jingdong Sun, Hanbing Liu, Wei Liu, Bin Luo, Yifeng Geng, Xuansong Xie
By spearheading the integration of Multilateration with facial analysis, KeyPosS marks a paradigm shift in facial landmark detection.
1 code implementation • 30 Mar 2023 • Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Wangmeng Xiang, Binghui Chen, Bin Luo, Yifeng Geng, Xuansong Xie
Real-time perception, or streaming perception, is a crucial aspect of autonomous driving that has yet to be thoroughly explored in existing research.
1 code implementation • CVPR 2023 • Minghan Li, Shuai Li, Wangmeng Xiang, Lei Zhang
The proposed MDQE is the first VIS method with per-clip input that achieves state-of-the-art results on challenging videos and competitive performance on simple videos.
Ranked #13 on Video Instance Segmentation on YouTube-VIS 2021
1 code implementation • 3 Feb 2023 • Hanyuan Chen, Jun-Yan He, Wangmeng Xiang, Zhi-Qi Cheng, Wei Liu, Hanbing Liu, Bin Luo, Yifeng Geng, Xuansong Xie
Human pose estimation is a challenging task due to its structured data sequence nature.
Ranked #74 on 3D Human Pose Estimation on Human3.6M
2 code implementations • 27 Oct 2022 • Jin-Peng Lan, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Bin Luo, Xu Bao, Wangmeng Xiang, Yifeng Geng, Xuansong Xie
Existing Visual Object Tracking (VOT) only takes the target area in the first frame as a template.
3 code implementations • ICCV 2023 • Wangmeng Xiang, Chao Li, Yuxuan Zhou, Biao Wang, Lei Zhang
More specifically, we employ a pre-trained large-scale language model as the knowledge engine to automatically generate text descriptions for body parts movements of actions, and propose a multi-modal training scheme by utilizing the text encoder to generate feature vectors for different body parts and supervise the skeleton encoder for action representation learning.
Ranked #5 on Skeleton Based Action Recognition on N-UCLA
1 code implementation • 27 Jul 2022 • Wangmeng Xiang, Chao Li, Biao Wang, Xihan Wei, Xian-Sheng Hua, Lei Zhang
For 3D video-based tasks such as action recognition, however, directly applying spatiotemporal transformers on video data will bring heavy computation and memory burdens due to the largely increased number of patches and the quadratic complexity of self-attention computation.
Ranked #9 on Action Recognition on Diving-48
1 code implementation • 15 Jun 2022 • Yuxuan Zhou, Wangmeng Xiang, Chao Li, Biao Wang, Xihan Wei, Lei Zhang, Margret Keuper, Xiansheng Hua
Unlike convolutional inductive biases, which are forced to focus exclusively on hard-coded local regions, our proposed SPs are learned by the model itself and take a variety of spatial relations into account.
Ranked #153 on Image Classification on ImageNet
1 code implementation • ICCV 2021 • Xi Yang, Wangmeng Xiang, Hui Zeng, Lei Zhang
Existing VSR methods are mostly trained and evaluated on synthetic datasets, where the LR videos are uniformly downsampled from their high-resolution (HR) counterparts by some simple operators (e. g., bicubic downsampling).
no code implementations • 8 Jul 2018 • Yong Wang, Xiao-Ming Wu, Qimai Li, Jiatao Gu, Wangmeng Xiang, Lei Zhang, Victor O. K. Li
The key issue of few-shot learning is learning to generalize.
no code implementations • 24 Apr 2018 • Wangmeng Xiang, Jianqiang Huang, Xianbiao Qi, Xian-Sheng Hua, Lei Zhang
Person re-identification (Person ReID) is a challenging task due to the large variations in camera viewpoint, lighting, resolution, and human pose.