1 code implementation • 14 Dec 2024 • Haoyu Jiang, Zhi-Qi Cheng, Gabriel Moreira, Jiawen Zhu, Jingdong Sun, Bukun Ren, Jun-Yan He, Qi Dai, Xian-Sheng Hua
Second, Target Prompt Generation creates dynamic prompts by attending to masked source prompts, enabling seamless adaptation to unseen domains and classes.
no code implementations • 18 Nov 2024 • Junwen He, Yifan Wang, Lijun Wang, Huchuan Lu, Jun-Yan He, Chenyang Li, Hanyuan Chen, Jin-Peng Lan, Bin Luo, Yifeng Geng
Text logo design heavily relies on the creativity and expertise of professional designers, in which arranging element layouts is one of the most important procedures.
1 code implementation • 12 Oct 2024 • Chong-Yang Xiang, Jun-Yan He, Zhi-Qi Cheng, Xiao Wu, Xian-Sheng Hua
(2) To enhance the pseudo-range accuracy of selected anchor points, a new loss function, named multilateration anchor loss, is proposed.
no code implementations • 28 Jun 2024 • Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Jingdong Sun, Qi He, Wangmeng Xiang, Hanyuan Chen, Jin-Peng Lan, Xianhui Lin, Kang Zhu, Bin Luo, Yifeng Geng, Xuansong Xie, Alexander G. Hauptmann
MetaDesigner introduces a transformative framework for artistic typography synthesis, powered by Large Language Models (LLMs) and grounded in a user-centric design paradigm.
1 code implementation • 27 Jun 2024 • Heng Li, Minghan Li, Zhi-Qi Cheng, Yifei Dong, Yuxuan Zhou, Jun-Yan He, Qi Dai, Teruko Mitamura, Alexander G. Hauptmann
Vision-and-Language Navigation (VLN) aims to develop embodied agents that navigate based on human instructions.
1 code implementation • 17 Jun 2024 • Zebang Cheng, Zhi-Qi Cheng, Jun-Yan He, Jingdong Sun, Kai Wang, Yuxiang Lin, Zheng Lian, Xiaojiang Peng, Alexander Hauptmann
Accurate emotion perception is crucial for various applications, including human-computer interaction, education, and counseling.
1 code implementation • 29 Apr 2024 • Zhi-Qi Cheng, Xiang Li, Jun-Yan He, Junyao Chen, Xiaomao Fan, Xiaojiang Peng, Alexander G. Hauptmann
Emotional Text-to-Speech (E-TTS) synthesis has garnered significant attention in recent years due to its potential to revolutionize human-computer interaction.
no code implementations • 26 Mar 2024 • Jiawen Zhu, Xin Chen, Haiwen Diao, Shuai Li, Jun-Yan He, Chenyang Li, Bin Luo, Dong Wang, Huchuan Lu
For instance, DyTrack obtains 64. 9% AUC on LaSOT with a speed of 256 fps.
no code implementations • 8 Mar 2024 • Xiang Huang, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Wangmeng Xiang, Baigui Sun
The advancement of autonomous driving systems hinges on the ability to achieve low-latency and high-accuracy perception.
1 code implementation • CVPR 2024 • Junwen He, Yifan Wang, Lijun Wang, Huchuan Lu, Jun-Yan He, Jin-Peng Lan, Bin Luo, Xuansong Xie
Multimodal Large Language Model (MLLMs) leverages Large Language Models as a cognitive framework for diverse visual-language tasks.
no code implementations • 3 Jan 2024 • Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Jingdong Sun, Wangmeng Xiang, Yusen Hu, Xianhui Lin, Xiaoyang Kang, Zengke Jin, Bin Luo, Yifeng Geng, Xuansong Xie, Jingren Zhou
This paper introduces the WordArt Designer API, a novel framework for user-driven artistic typography synthesis utilizing Large Language Models (LLMs) on ModelScope.
1 code implementation • 29 Dec 2023 • Jiawen Zhu, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Bin Luo, Huchuan Lu, Yifeng Geng, Xuansong Xie
The perception component then generates the tracking results based on the embeddings.
Ranked #5 on
Referring Video Object Segmentation
on ReVOS
1 code implementation • 6 Nov 2023 • Yuxiang Tuo, Wangmeng Xiang, Jun-Yan He, Yifeng Geng, Xuansong Xie
Based on AnyWord-3M dataset, we propose AnyText-benchmark for the evaluation of visual text generation accuracy and quality.
Ranked #6 on
Image Generation
on TextAtlasEval
no code implementations • 20 Oct 2023 • Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Jingdong Sun, Wangmeng Xiang, Xianhui Lin, Xiaoyang Kang, Zengke Jin, Yusen Hu, Bin Luo, Yifeng Geng, Xuansong Xie, Jingren Zhou
This paper introduces WordArt Designer, a user-driven framework for artistic typography synthesis, relying on the Large Language Model (LLM).
1 code implementation • 19 Sep 2023 • Jiawen Zhu, Huayi Tang, Zhi-Qi Cheng, Jun-Yan He, Bin Luo, Shihao Qiu, Shengming Li, Huchuan Lu
To address this, we propose a novel architecture called Darkness Clue-Prompted Tracking (DCPT) that achieves robust UAV tracking at night by efficiently learning to generate darkness clue prompts.
1 code implementation • 4 Sep 2023 • Hanbing Liu, Wangmeng Xiang, Jun-Yan He, Zhi-Qi Cheng, Bin Luo, Yifeng Geng, Xuansong Xie
Accurately estimating the 3D pose of humans in video sequences requires both accuracy and a well-structured architecture.
Ranked #10 on
3D Human Pose Estimation
on HumanEva-I
1 code implementation • 18 Aug 2023 • Hanbing Liu, Jun-Yan He, Zhi-Qi Cheng, Wangmeng Xiang, Qize Yang, Wenhao Chai, Gaoang Wang, Xu Bao, Bin Luo, Yifeng Geng, Xuansong Xie
Typically, PoSynDA uses a diffusion-inspired structure to simulate 3D pose distribution in the target domain.
1 code implementation • ICCV 2023 • Junwen He, Yifan Wang, Lijun Wang, Huchuan Lu, Jun-Yan He, Jin-Peng Lan, Bin Luo, Yifeng Geng, Xuansong Xie
Our method sets the new state of the art for depth-aware panoptic segmentation on both Cityscapes-DVPS and SemKITTI-DVPS datasets.
1 code implementation • 26 Jul 2023 • Jiawen Zhu, Zhenyu Chen, Zeqi Hao, Shijie Chang, Lu Zhang, Dong Wang, Huchuan Lu, Bin Luo, Jun-Yan He, Jin-Peng Lan, Hanyuan Chen, Chenyang Li
To further improve the quality of tracking masks, a pretrained MR model is employed to refine the tracking results.
Ranked #5 on
Semi-Supervised Video Object Segmentation
on YouTube-VOS 2019
(using extra training data)
1 code implementation • 25 May 2023 • Xu Bao, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Wangmeng Xiang, Jingdong Sun, Hanbing Liu, Wei Liu, Bin Luo, Yifeng Geng, Xuansong Xie
By spearheading the integration of Multilateration with facial analysis, KeyPosS marks a paradigm shift in facial landmark detection.
1 code implementation • 19 May 2023 • Yuxuan Zhou, Zhi-Qi Cheng, Jun-Yan He, Bin Luo, Yifeng Geng, Xuansong Xie
As a remedy, we propose a threefold strategy: (1) We forge an innovative pathway that encodes bone connectivity by harnessing the power of graph distances.
1 code implementation • 30 Mar 2023 • Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Wangmeng Xiang, Binghui Chen, Bin Luo, Yifeng Geng, Xuansong Xie
Real-time perception, or streaming perception, is a crucial aspect of autonomous driving that has yet to be thoroughly explored in existing research.
1 code implementation • 3 Feb 2023 • Hanyuan Chen, Jun-Yan He, Wangmeng Xiang, Zhi-Qi Cheng, Wei Liu, Hanbing Liu, Bin Luo, Yifeng Geng, Xuansong Xie
Human pose estimation is a challenging task due to its structured data sequence nature.
Ranked #89 on
3D Human Pose Estimation
on Human3.6M
no code implementations • CVPR 2023 • Xiaolin Song, Binghui Chen, Pengyu Li, Jun-Yan He, Biao Wang, Yifeng Geng, Xuansong Xie, Honggang Zhang
End-to-end pedestrian detection focuses on training a pedestrian detection model via discarding the Non-Maximum Suppression (NMS) post-processing.
4 code implementations • 27 Oct 2022 • Jin-Peng Lan, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Bin Luo, Xu Bao, Wangmeng Xiang, Yifeng Geng, Xuansong Xie
Existing Visual Object Tracking (VOT) only takes the target area in the first frame as a template.
Ranked #1 on
Video Object Tracking
on NT-VOT211
2 code implementations • 27 Oct 2022 • Chenyang Li, Zhi-Qi Cheng, Jun-Yan He, Pengyu Li, Bin Luo, Hanyuan Chen, Yifeng Geng, Jin-Peng Lan, Xuansong Xie
Streaming perception is a critical task in autonomous driving that requires balancing the latency and accuracy of the autopilot system.
no code implementations • 17 Sep 2019 • Zhi-Qi Cheng, Jun-Xiu Li, Qi Dai, Xiao Wu, Jun-Yan He, Alexander Hauptmann
By minimizing the mutual information, each column is guided to learn features with different image scales.