Search Results for author: Jun-Yan He

Found 27 papers, 19 papers with code

UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision-Language Models for Universal Cross-Domain Retrieval

1 code implementation14 Dec 2024 Haoyu Jiang, Zhi-Qi Cheng, Gabriel Moreira, Jiawen Zhu, Jingdong Sun, Bukun Ren, Jun-Yan He, Qi Dai, Xian-Sheng Hua

Second, Target Prompt Generation creates dynamic prompts by attending to masked source prompts, enabling seamless adaptation to unseen domains and classes.

Retrieval

GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts

no code implementations18 Nov 2024 Junwen He, Yifan Wang, Lijun Wang, Huchuan Lu, Jun-Yan He, Chenyang Li, Hanyuan Chen, Jin-Peng Lan, Bin Luo, Yifeng Geng

Text logo design heavily relies on the creativity and expertise of professional designers, in which arranging element layouts is one of the most important procedures.

Layout Design Layout Generation

MetaDesigner: Advancing Artistic Typography Through AI-Driven, User-Centric, and Multilingual WordArt Synthesis

no code implementations28 Jun 2024 Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Jingdong Sun, Qi He, Wangmeng Xiang, Hanyuan Chen, Jin-Peng Lan, Xianhui Lin, Kang Zhu, Bin Luo, Yifeng Geng, Xuansong Xie, Alexander G. Hauptmann

MetaDesigner introduces a transformative framework for artistic typography synthesis, powered by Large Language Models (LLMs) and grounded in a user-centric design paradigm.

UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts

1 code implementation29 Apr 2024 Zhi-Qi Cheng, Xiang Li, Jun-Yan He, Junyao Chen, Xiaomao Fan, Xiaojiang Peng, Alexander G. Hauptmann

Emotional Text-to-Speech (E-TTS) synthesis has garnered significant attention in recent years due to its potential to revolutionize human-computer interaction.

Contrastive Learning Speech Synthesis +2

WordArt Designer API: User-Driven Artistic Typography Synthesis with Large Language Models on ModelScope

no code implementations3 Jan 2024 Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Jingdong Sun, Wangmeng Xiang, Yusen Hu, Xianhui Lin, Xiaoyang Kang, Zengke Jin, Bin Luo, Yifeng Geng, Xuansong Xie, Jingren Zhou

This paper introduces the WordArt Designer API, a novel framework for user-driven artistic typography synthesis utilizing Large Language Models (LLMs) on ModelScope.

AnyText: Multilingual Visual Text Generation And Editing

1 code implementation6 Nov 2023 Yuxiang Tuo, Wangmeng Xiang, Jun-Yan He, Yifeng Geng, Xuansong Xie

Based on AnyWord-3M dataset, we propose AnyText-benchmark for the evaluation of visual text generation accuracy and quality.

Image Generation Optical Character Recognition (OCR) +1

DCPT: Darkness Clue-Prompted Tracking in Nighttime UAVs

1 code implementation19 Sep 2023 Jiawen Zhu, Huayi Tang, Zhi-Qi Cheng, Jun-Yan He, Bin Luo, Shihao Qiu, Shengming Li, Huchuan Lu

To address this, we propose a novel architecture called Darkness Clue-Prompted Tracking (DCPT) that achieves robust UAV tracking at night by efficiently learning to generate darkness clue prompts.

Tracking Anything in High Quality

1 code implementation26 Jul 2023 Jiawen Zhu, Zhenyu Chen, Zeqi Hao, Shijie Chang, Lu Zhang, Dong Wang, Huchuan Lu, Bin Luo, Jun-Yan He, Jin-Peng Lan, Hanyuan Chen, Chenyang Li

To further improve the quality of tracking masks, a pretrained MR model is employed to refine the tracking results.

Object Semantic Segmentation +3

Overcoming Topology Agnosticism: Enhancing Skeleton-Based Action Recognition through Redefined Skeletal Topology Awareness

1 code implementation19 May 2023 Yuxuan Zhou, Zhi-Qi Cheng, Jun-Yan He, Bin Luo, Yifeng Geng, Xuansong Xie

As a remedy, we propose a threefold strategy: (1) We forge an innovative pathway that encodes bone connectivity by harnessing the power of graph distances.

Action Recognition Skeleton Based Action Recognition

DAMO-StreamNet: Optimizing Streaming Perception in Autonomous Driving

1 code implementation30 Mar 2023 Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Wangmeng Xiang, Binghui Chen, Bin Luo, Yifeng Geng, Xuansong Xie

Real-time perception, or streaming perception, is a crucial aspect of autonomous driving that has yet to be thoroughly explored in existing research.

Autonomous Driving

Optimal Proposal Learning for Deployable End-to-End Pedestrian Detection

no code implementations CVPR 2023 Xiaolin Song, Binghui Chen, Pengyu Li, Jun-Yan He, Biao Wang, Yifeng Geng, Xuansong Xie, Honggang Zhang

End-to-end pedestrian detection focuses on training a pedestrian detection model via discarding the Non-Maximum Suppression (NMS) post-processing.

Pedestrian Detection

LongShortNet: Exploring Temporal and Semantic Features Fusion in Streaming Perception

2 code implementations27 Oct 2022 Chenyang Li, Zhi-Qi Cheng, Jun-Yan He, Pengyu Li, Bin Luo, Hanyuan Chen, Yifeng Geng, Jin-Peng Lan, Xuansong Xie

Streaming perception is a critical task in autonomous driving that requires balancing the latency and accuracy of the autopilot system.

Autonomous Driving

Improving the Learning of Multi-column Convolutional Neural Network for Crowd Counting

no code implementations17 Sep 2019 Zhi-Qi Cheng, Jun-Xiu Li, Qi Dai, Xiao Wu, Jun-Yan He, Alexander Hauptmann

By minimizing the mutual information, each column is guided to learn features with different image scales.

Crowd Counting

Cannot find the paper you are looking for? You can Submit a new open access paper.