Search Results for author: Jun-Yan He

Found 20 papers, 13 papers with code

Exploring Dynamic Transformer for Efficient Object Tracking

no code implementations • 26 Mar 2024 • Jiawen Zhu, Xin Chen, Haiwen Diao, Shuai Li, Jun-Yan He, Chenyang Li, Bin Luo, Dong Wang, Huchuan Lu

For instance, DyTrack obtains 64. 9% AUC on LaSOT with a speed of 256 fps.

Object Visual Object Tracking

Paper
Add Code

DyRoNet: Dynamic Routing and Low-Rank Adapters for Autonomous Driving Streaming Perception

no code implementations • 8 Mar 2024 • Xiang Huang, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Wangmeng Xiang, Baigui Sun, Xiao Wu

The advancement of autonomous driving systems hinges on the ability to achieve low-latency and high-accuracy perception.

Autonomous Driving Navigate

Paper
Add Code

Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception

no code implementations • 5 Mar 2024 • Junwen He, Yifan Wang, Lijun Wang, Huchuan Lu, Jun-Yan He, Jin-Peng Lan, Bin Luo, Xuansong Xie

Multimodal Large Language Model (MLLMs) leverages Large Language Models as a cognitive framework for diverse visual-language tasks.

Language Modelling Large Language Model +2

Paper
Add Code

WordArt Designer API: User-Driven Artistic Typography Synthesis with Large Language Models on ModelScope

no code implementations • 3 Jan 2024 • Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Jingdong Sun, Wangmeng Xiang, Yusen Hu, Xianhui Lin, Xiaoyang Kang, Zengke Jin, Bin Luo, Yifeng Geng, Xuansong Xie, Jingren Zhou

This paper introduces the WordArt Designer API, a novel framework for user-driven artistic typography synthesis utilizing Large Language Models (LLMs) on ModelScope.

Paper
Add Code

Tracking with Human-Intent Reasoning

1 code implementation • 29 Dec 2023 • Jiawen Zhu, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Bin Luo, Huchuan Lu, Yifeng Geng, Xuansong Xie

The perception component then generates the tracking results based on the embeddings.

Language Modelling Object +4

Paper
Code

AnyText: Multilingual Visual Text Generation And Editing

1 code implementation • 6 Nov 2023 • Yuxiang Tuo, Wangmeng Xiang, Jun-Yan He, Yifeng Geng, Xuansong Xie

Based on AnyWord-3M dataset, we propose AnyText-benchmark for the evaluation of visual text generation accuracy and quality.

Optical Character Recognition (OCR) Text Generation

3,716

Paper
Code

WordArt Designer: User-Driven Artistic Typography Synthesis using Large Language Models

no code implementations • 20 Oct 2023 • Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Jingdong Sun, Wangmeng Xiang, Xianhui Lin, Xiaoyang Kang, Zengke Jin, Yusen Hu, Bin Luo, Yifeng Geng, Xuansong Xie, Jingren Zhou

This paper introduces WordArt Designer, a user-driven framework for artistic typography synthesis, relying on the Large Language Model (LLM).

Language Modelling Large Language Model

Paper
Add Code

DCPT: Darkness Clue-Prompted Tracking in Nighttime UAVs

1 code implementation • 19 Sep 2023 • Jiawen Zhu, Huayi Tang, Zhi-Qi Cheng, Jun-Yan He, Bin Luo, Shihao Qiu, Shengming Li, Huchuan Lu

To address this, we propose a novel architecture called Darkness Clue-Prompted Tracking (DCPT) that achieves robust UAV tracking at night by efficiently learning to generate darkness clue prompts.

Paper
Code

Refined Temporal Pyramidal Compression-and-Amplification Transformer for 3D Human Pose Estimation

1 code implementation • 4 Sep 2023 • Hanbing Liu, Wangmeng Xiang, Jun-Yan He, Zhi-Qi Cheng, Bin Luo, Yifeng Geng, Xuansong Xie

Accurately estimating the 3D pose of humans in video sequences requires both accuracy and a well-structured architecture.

3D Human Pose Estimation

Paper
Code

PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation

1 code implementation • 18 Aug 2023 • Hanbing Liu, Jun-Yan He, Zhi-Qi Cheng, Wangmeng Xiang, Qize Yang, Wenhao Chai, Gaoang Wang, Xu Bao, Bin Luo, Yifeng Geng, Xuansong Xie

Typically, PoSynDA uses a diffusion-inspired structure to simulate 3D pose distribution in the target domain.

3D Human Pose Estimation Domain Adaptation

Paper
Code

Towards Deeply Unified Depth-aware Panoptic Segmentation with Bi-directional Guidance Learning

1 code implementation • ICCV 2023 • Junwen He, Yifan Wang, Lijun Wang, Huchuan Lu, Jun-Yan He, Jin-Peng Lan, Bin Luo, Yifeng Geng, Xuansong Xie

Our method sets the new state of the art for depth-aware panoptic segmentation on both Cityscapes-DVPS and SemKITTI-DVPS datasets.

Depth Estimation Panoptic Segmentation +1

Paper
Code

Tracking Anything in High Quality

1 code implementation • 26 Jul 2023 • Jiawen Zhu, Zhenyu Chen, Zeqi Hao, Shijie Chang, Lu Zhang, Dong Wang, Huchuan Lu, Bin Luo, Jun-Yan He, Jin-Peng Lan, Hanyuan Chen, Chenyang Li

To further improve the quality of tracking masks, a pretrained MR model is employed to refine the tracking results.

Ranked #5 on Semi-Supervised Video Object Segmentation on YouTube-VOS 2019 (using extra training data)

Object Semantic Segmentation +3

732

Paper
Code

KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range Multilateration

1 code implementation • 25 May 2023 • Xu Bao, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Wangmeng Xiang, Jingdong Sun, Hanbing Liu, Wei Liu, Bin Luo, Yifeng Geng, Xuansong Xie

By spearheading the integration of Multilateration with facial analysis, KeyPosS marks a paradigm shift in facial landmark detection.

Benchmarking Face Recognition +3

Paper
Code

Overcoming Topology Agnosticism: Enhancing Skeleton-Based Action Recognition through Redefined Skeletal Topology Awareness

1 code implementation • 19 May 2023 • Yuxuan Zhou, Zhi-Qi Cheng, Jun-Yan He, Bin Luo, Yifeng Geng, Xuansong Xie

As a remedy, we propose a threefold strategy: (1) We forge an innovative pathway that encodes bone connectivity by harnessing the power of graph distances.

Action Recognition Skeleton Based Action Recognition

Paper
Code

DAMO-StreamNet: Optimizing Streaming Perception in Autonomous Driving

1 code implementation • 30 Mar 2023 • Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Wangmeng Xiang, Binghui Chen, Bin Luo, Yifeng Geng, Xuansong Xie

Real-time perception, or streaming perception, is a crucial aspect of autonomous driving that has yet to be thoroughly explored in existing research.

Autonomous Driving

Paper
Code

HDFormer: High-order Directed Transformer for 3D Human Pose Estimation

1 code implementation • 3 Feb 2023 • Hanyuan Chen, Jun-Yan He, Wangmeng Xiang, Zhi-Qi Cheng, Wei Liu, Hanbing Liu, Bin Luo, Yifeng Geng, Xuansong Xie

Human pose estimation is a challenging task due to its structured data sequence nature.

Ranked #74 on 3D Human Pose Estimation on Human3.6M

3D Human Pose Estimation 3D Pose Estimation +1

Paper
Code

Optimal Proposal Learning for Deployable End-to-End Pedestrian Detection

no code implementations • CVPR 2023 • Xiaolin Song, Binghui Chen, Pengyu Li, Jun-Yan He, Biao Wang, Yifeng Geng, Xuansong Xie, Honggang Zhang

End-to-end pedestrian detection focuses on training a pedestrian detection model via discarding the Non-Maximum Suppression (NMS) post-processing.

Pedestrian Detection

Paper
Add Code

ProContEXT: Exploring Progressive Context Transformer for Tracking

2 code implementations • 27 Oct 2022 • Jin-Peng Lan, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Bin Luo, Xu Bao, Wangmeng Xiang, Yifeng Geng, Xuansong Xie

Existing Visual Object Tracking (VOT) only takes the target area in the first frame as a template.

Object Visual Object Tracking

Paper
Code

LongShortNet: Exploring Temporal and Semantic Features Fusion in Streaming Perception

2 code implementations • 27 Oct 2022 • Chenyang Li, Zhi-Qi Cheng, Jun-Yan He, Pengyu Li, Bin Luo, Hanyuan Chen, Yifeng Geng, Jin-Peng Lan, Xuansong Xie

Streaming perception is a critical task in autonomous driving that requires balancing the latency and accuracy of the autopilot system.

Autonomous Driving

Paper
Code

Improving the Learning of Multi-column Convolutional Neural Network for Crowd Counting

no code implementations • 17 Sep 2019 • Zhi-Qi Cheng, Jun-Xiu Li, Qi Dai, Xiao Wu, Jun-Yan He, Alexander Hauptmann

By minimizing the mutual information, each column is guided to learn features with different image scales.

Crowd Counting

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.