Search Results for author: Wangmeng Xiang

Found 18 papers, 13 papers with code

DyRoNet: Dynamic Routing and Low-Rank Adapters for Autonomous Driving Streaming Perception

no code implementations • 8 Mar 2024 • Xiang Huang, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Wangmeng Xiang, Baigui Sun, Xiao Wu

The advancement of autonomous driving systems hinges on the ability to achieve low-latency and high-accuracy perception.

Autonomous Driving Navigate

Paper
Add Code

WordArt Designer API: User-Driven Artistic Typography Synthesis with Large Language Models on ModelScope

no code implementations • 3 Jan 2024 • Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Jingdong Sun, Wangmeng Xiang, Yusen Hu, Xianhui Lin, Xiaoyang Kang, Zengke Jin, Bin Luo, Yifeng Geng, Xuansong Xie, Jingren Zhou

This paper introduces the WordArt Designer API, a novel framework for user-driven artistic typography synthesis utilizing Large Language Models (LLMs) on ModelScope.

Paper
Add Code

AnyText: Multilingual Visual Text Generation And Editing

1 code implementation • 6 Nov 2023 • Yuxiang Tuo, Wangmeng Xiang, Jun-Yan He, Yifeng Geng, Xuansong Xie

Based on AnyWord-3M dataset, we propose AnyText-benchmark for the evaluation of visual text generation accuracy and quality.

Optical Character Recognition (OCR) Text Generation

3,767

Paper
Code

WordArt Designer: User-Driven Artistic Typography Synthesis using Large Language Models

no code implementations • 20 Oct 2023 • Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Jingdong Sun, Wangmeng Xiang, Xianhui Lin, Xiaoyang Kang, Zengke Jin, Yusen Hu, Bin Luo, Yifeng Geng, Xuansong Xie, Jingren Zhou

This paper introduces WordArt Designer, a user-driven framework for artistic typography synthesis, relying on the Large Language Model (LLM).

Language Modelling Large Language Model

Paper
Add Code

Refined Temporal Pyramidal Compression-and-Amplification Transformer for 3D Human Pose Estimation

1 code implementation • 4 Sep 2023 • Hanbing Liu, Wangmeng Xiang, Jun-Yan He, Zhi-Qi Cheng, Bin Luo, Yifeng Geng, Xuansong Xie

Accurately estimating the 3D pose of humans in video sequences requires both accuracy and a well-structured architecture.

3D Human Pose Estimation

Paper
Code

PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation

1 code implementation • 18 Aug 2023 • Hanbing Liu, Jun-Yan He, Zhi-Qi Cheng, Wangmeng Xiang, Qize Yang, Wenhao Chai, Gaoang Wang, Xu Bao, Bin Luo, Yifeng Geng, Xuansong Xie

Typically, PoSynDA uses a diffusion-inspired structure to simulate 3D pose distribution in the target domain.

3D Human Pose Estimation Domain Adaptation

Paper
Code

A Benchmark for Chinese-English Scene Text Image Super-resolution

1 code implementation • ICCV 2023 • jianqi ma, Zhetong Liang, Wangmeng Xiang, Xi Yang, Lei Zhang

Scene Text Image Super-resolution (STISR) aims to recover high-resolution (HR) scene text images with visually pleasant and readable text content from the given low-resolution (LR) input.

Image Super-Resolution

Paper
Code

KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range Multilateration

1 code implementation • 25 May 2023 • Xu Bao, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Wangmeng Xiang, Jingdong Sun, Hanbing Liu, Wei Liu, Bin Luo, Yifeng Geng, Xuansong Xie

By spearheading the integration of Multilateration with facial analysis, KeyPosS marks a paradigm shift in facial landmark detection.

Benchmarking Face Recognition +3

Paper
Code

DAMO-StreamNet: Optimizing Streaming Perception in Autonomous Driving

1 code implementation • 30 Mar 2023 • Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Wangmeng Xiang, Binghui Chen, Bin Luo, Yifeng Geng, Xuansong Xie

Real-time perception, or streaming perception, is a crucial aspect of autonomous driving that has yet to be thoroughly explored in existing research.

Autonomous Driving

Paper
Code

MDQE: Mining Discriminative Query Embeddings to Segment Occluded Instances on Challenging Videos

1 code implementation • CVPR 2023 • Minghan Li, Shuai Li, Wangmeng Xiang, Lei Zhang

The proposed MDQE is the first VIS method with per-clip input that achieves state-of-the-art results on challenging videos and competitive performance on simple videos.

Ranked #13 on Video Instance Segmentation on YouTube-VIS 2021

Instance Segmentation Semantic Segmentation +1

Paper
Code

HDFormer: High-order Directed Transformer for 3D Human Pose Estimation

1 code implementation • 3 Feb 2023 • Hanyuan Chen, Jun-Yan He, Wangmeng Xiang, Zhi-Qi Cheng, Wei Liu, Hanbing Liu, Bin Luo, Yifeng Geng, Xuansong Xie

Human pose estimation is a challenging task due to its structured data sequence nature.

Ranked #74 on 3D Human Pose Estimation on Human3.6M

3D Human Pose Estimation 3D Pose Estimation +1

Paper
Code

ProContEXT: Exploring Progressive Context Transformer for Tracking

2 code implementations • 27 Oct 2022 • Jin-Peng Lan, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Bin Luo, Xu Bao, Wangmeng Xiang, Yifeng Geng, Xuansong Xie

Existing Visual Object Tracking (VOT) only takes the target area in the first frame as a template.

Object Visual Object Tracking

Paper
Code

Generative Action Description Prompts for Skeleton-based Action Recognition

3 code implementations • ICCV 2023 • Wangmeng Xiang, Chao Li, Yuxuan Zhou, Biao Wang, Lei Zhang

More specifically, we employ a pre-trained large-scale language model as the knowledge engine to automatically generate text descriptions for body parts movements of actions, and propose a multi-modal training scheme by utilizing the text encoder to generate feature vectors for different body parts and supervise the skeleton encoder for action representation learning.

Ranked #5 on Skeleton Based Action Recognition on N-UCLA

Action Recognition Language Modelling +2

Paper
Code

Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition

1 code implementation • 27 Jul 2022 • Wangmeng Xiang, Chao Li, Biao Wang, Xihan Wei, Xian-Sheng Hua, Lei Zhang

For 3D video-based tasks such as action recognition, however, directly applying spatiotemporal transformers on video data will bring heavy computation and memory burdens due to the largely increased number of patches and the quadratic complexity of self-attention computation.

Ranked #9 on Action Recognition on Diving-48

Action Classification Action Recognition

Paper
Code

SP-ViT: Learning 2D Spatial Priors for Vision Transformers

1 code implementation • 15 Jun 2022 • Yuxuan Zhou, Wangmeng Xiang, Chao Li, Biao Wang, Xihan Wei, Lei Zhang, Margret Keuper, Xiansheng Hua

Unlike convolutional inductive biases, which are forced to focus exclusively on hard-coded local regions, our proposed SPs are learned by the model itself and take a variety of spatial relations into account.

Ranked #153 on Image Classification on ImageNet

Image Classification

Paper
Code

Real-World Video Super-Resolution: A Benchmark Dataset and a Decomposition Based Learning Scheme

1 code implementation • ICCV 2021 • Xi Yang, Wangmeng Xiang, Hui Zeng, Lei Zhang

Existing VSR methods are mostly trained and evaluated on synthetic datasets, where the LR videos are uniformly downsampled from their high-resolution (HR) counterparts by some simple operators (e. g., bicubic downsampling).

Video Super-Resolution