no code implementations • 23 Apr 2024 • Fan Zhang, Zhi-Qi Cheng, Jian Zhao, Xiaojiang Peng, Xuelong Li
LEAF introduces a hierarchical expression-aware aggregation strategy that operates at three levels: semantic, instance, and category.
Facial Expression Recognition Facial Expression Recognition (FER)
1 code implementation • 31 Mar 2024 • Zebang Cheng, Fuqiang Niu, Yuxiang Lin, Zhi-Qi Cheng, BoWen Zhang, Xiaojiang Peng
This paper presents our winning submission to Subtask 2 of SemEval 2024 Task 3 on multimodal emotion cause analysis in conversations.
1 code implementation • 18 Mar 2024 • Hang Wang, Zhi-Qi Cheng, Youtian Du, Lei Zhang
Our research addresses the shortfall by introducing a novel approach to VAC, called Irregular Video Action Counting (IVAC).
no code implementations • 8 Mar 2024 • Xiang Huang, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Wangmeng Xiang, Baigui Sun, Xiao Wu
The advancement of autonomous driving systems hinges on the ability to achieve low-latency and high-accuracy perception.
1 code implementation • 4 Mar 2024 • Chao Xu, Yang Liu, Jiazheng Xing, Weida Wang, Mingze Sun, Jun Dan, Tianxin Huang, Siyuan Li, Zhi-Qi Cheng, Ying Tai, Baigui Sun
In this paper, we abstract the process of people hearing speech, extracting meaningful cues, and creating various dynamically audio-consistent talking faces, termed Listening and Imagining, into the task of high-fidelity diverse talking faces generation from a single audio.
no code implementations • 3 Jan 2024 • Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Jingdong Sun, Wangmeng Xiang, Yusen Hu, Xianhui Lin, Xiaoyang Kang, Zengke Jin, Bin Luo, Yifeng Geng, Xuansong Xie, Jingren Zhou
This paper introduces the WordArt Designer API, a novel framework for user-driven artistic typography synthesis utilizing Large Language Models (LLMs) on ModelScope.
1 code implementation • 29 Dec 2023 • Jiawen Zhu, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Bin Luo, Huchuan Lu, Yifeng Geng, Xuansong Xie
The perception component then generates the tracking results based on the embeddings.
1 code implementation • 19 Dec 2023 • Kaipeng Fang, Jingkuan Song, Lianli Gao, Pengpeng Zeng, Zhi-Qi Cheng, Xiyao Li, Heng Tao Shen
Then, in Context-aware Simulator Learning stage, we train a Content-aware Prompt Simulator under a simulated test scenarios to produce the corresponding CaDP.
1 code implementation • 30 Nov 2023 • Shuyuan Tu, Qi Dai, Zhi-Qi Cheng, Han Hu, Xintong Han, Zuxuan Wu, Yu-Gang Jiang
This mechanism enables the editing branch to query the key and value from the reconstruction branch in a decoupled manner, making the editing branch retain the original background and protagonist appearance.
no code implementations • 3 Nov 2023 • Changdae Oh, Hyesu Lim, Mijoo Kim, Jaegul Choo, Alexander Hauptmann, Zhi-Qi Cheng, Kyungwoo Song
Robust fine-tuning aims to ensure performance on out-of-distribution (OOD) samples, which is sometimes compromised by pursuing adaptation on in-distribution (ID) samples.
no code implementations • 28 Oct 2023 • Hao Wang, Zhi-Qi Cheng, Jingdong Sun, Xin Yang, Xiao Wu, Hongyang Chen, Yan Yang
Multi-view or even multi-modal data is appealing yet challenging for real-world applications.
no code implementations • 20 Oct 2023 • Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Jingdong Sun, Wangmeng Xiang, Xianhui Lin, Xiaoyang Kang, Zengke Jin, Yusen Hu, Bin Luo, Yifeng Geng, Xuansong Xie, Jingren Zhou
This paper introduces WordArt Designer, a user-driven framework for artistic typography synthesis, relying on the Large Language Model (LLM).
1 code implementation • 19 Sep 2023 • Jiawen Zhu, Huayi Tang, Zhi-Qi Cheng, Jun-Yan He, Bin Luo, Shihao Qiu, Shengming Li, Huchuan Lu
To address this, we propose a novel architecture called Darkness Clue-Prompted Tracking (DCPT) that achieves robust UAV tracking at night by efficiently learning to generate darkness clue prompts.
1 code implementation • 4 Sep 2023 • Hanbing Liu, Wangmeng Xiang, Jun-Yan He, Zhi-Qi Cheng, Bin Luo, Yifeng Geng, Xuansong Xie
Accurately estimating the 3D pose of humans in video sequences requires both accuracy and a well-structured architecture.
1 code implementation • 18 Aug 2023 • Hanbing Liu, Jun-Yan He, Zhi-Qi Cheng, Wangmeng Xiang, Qize Yang, Wenhao Chai, Gaoang Wang, Xu Bao, Bin Luo, Yifeng Geng, Xuansong Xie
Typically, PoSynDA uses a diffusion-inspired structure to simulate 3D pose distribution in the target domain.
no code implementations • 16 Aug 2023 • Ji Zhang, Xiao Wu, Zhi-Qi Cheng, Qi He, Wei Li
Anomaly segmentation plays a pivotal role in identifying atypical objects in images, crucial for hazard detection in autonomous driving systems.
1 code implementation • 25 May 2023 • Xu Bao, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Wangmeng Xiang, Jingdong Sun, Hanbing Liu, Wei Liu, Bin Luo, Yifeng Geng, Xuansong Xie
By spearheading the integration of Multilateration with facial analysis, KeyPosS marks a paradigm shift in facial landmark detection.
1 code implementation • 19 May 2023 • Yuxuan Zhou, Zhi-Qi Cheng, Jun-Yan He, Bin Luo, Yifeng Geng, Xuansong Xie
As a remedy, we propose a threefold strategy: (1) We forge an innovative pathway that encodes bone connectivity by harnessing the power of graph distances.
1 code implementation • ICCV 2023 • Shuyuan Tu, Qi Dai, Zuxuan Wu, Zhi-Qi Cheng, Han Hu, Yu-Gang Jiang
While modeling temporal information within straight through tube is widely adopted in literature, we find that simple frame alignment already provides enough essence without temporal attention.
Ranked #16 on Action Classification on Kinetics-400
1 code implementation • ICCV 2023 • Zhi-Qi Cheng, Qi Dai, SiYao Li, Jingdong Sun, Teruko Mitamura, Alexander G. Hauptmann
We evaluate ChartReader on Chart-to-Table, ChartQA, and Chart-to-Text tasks, demonstrating its superiority over existing methods.
1 code implementation • 30 Mar 2023 • Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Wangmeng Xiang, Binghui Chen, Bin Luo, Yifeng Geng, Xuansong Xie
Real-time perception, or streaming perception, is a crucial aspect of autonomous driving that has yet to be thoroughly explored in existing research.
1 code implementation • 3 Feb 2023 • Hanyuan Chen, Jun-Yan He, Wangmeng Xiang, Zhi-Qi Cheng, Wei Liu, Hanbing Liu, Bin Luo, Yifeng Geng, Xuansong Xie
Human pose estimation is a challenging task due to its structured data sequence nature.
Ranked #74 on 3D Human Pose Estimation on Human3.6M
1 code implementation • 17 Nov 2022 • Yuxuan Zhou, Zhi-Qi Cheng, Chao Li, Yanwen Fang, Yifeng Geng, Xuansong Xie, Margret Keuper
Skeleton-based action recognition aims to recognize human actions given human joint coordinates with skeletal interconnections.
Ranked #7 on Skeleton Based Action Recognition on NTU RGB+D 120
2 code implementations • 27 Oct 2022 • Jin-Peng Lan, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Bin Luo, Xu Bao, Wangmeng Xiang, Yifeng Geng, Xuansong Xie
Existing Visual Object Tracking (VOT) only takes the target area in the first frame as a template.
2 code implementations • 27 Oct 2022 • Chenyang Li, Zhi-Qi Cheng, Jun-Yan He, Pengyu Li, Bin Luo, Hanyuan Chen, Yifeng Geng, Jin-Peng Lan, Xuansong Xie
Streaming perception is a critical task in autonomous driving that requires balancing the latency and accuracy of the autopilot system.
1 code implementation • 18 Aug 2022 • Zhi-Qi Cheng, Qi Dai, SiYao Li, Teruko Mitamura, Alexander G. Hauptmann
In the second stage, we exploit transformer layers to unearth the potential semantic relations within both verbs and semantic roles.
1 code implementation • CVPR 2022 • Zhi-Qi Cheng, Qi Dai, Hong Li, Jingkuan Song, Xiao Wu, Alexander G. Hauptmann
We evaluate our methods on 4 mainstream object counting networks (i. e., MCNN, CSRNet, SANet, and ResNet-50).
Ranked #1 on Object Counting on TRANCOS
no code implementations • 2 May 2021 • Ting-yao Hu, Zhi-Qi Cheng, Alexander G. Hauptmann
In this paper, we propose a subspace representation learning (SRL) framework to tackle few-shot image classification tasks.
1 code implementation • 17 Jul 2020 • Siyu Huang, Haoyi Xiong, Zhi-Qi Cheng, Qingzhong Wang, Xingran Zhou, Bihan Wen, Jun Huan, Dejing Dou
Generation of high-quality person images is challenging, due to the sophisticated entanglements among image factors, e. g., appearance, pose, foreground, background, local details, global structures, etc.
no code implementations • 17 Sep 2019 • Zhi-Qi Cheng, Jun-Xiu Li, Qi Dai, Xiao Wu, Jun-Yan He, Alexander Hauptmann
By minimizing the mutual information, each column is guided to learn features with different image scales.
no code implementations • ICCV 2019 • Zhi-Qi Cheng, Jun-Xiu Li, Qi Dai, Xiao Wu, Alexander Hauptmann
Although the Maximum Excess over SubArrays (MESA) loss has been previously proposed to address the above issues by finding the rectangular subregion whose predicted density map has the maximum difference from the ground truth, it cannot be solved by gradient descent, thus can hardly be integrated into the deep learning framework.
Ranked #5 on Crowd Counting on WorldExpo’10
no code implementations • 29 Nov 2018 • Siyu Huang, Zhi-Qi Cheng, Xi Li, Xiao Wu, Zhongfei Zhang, Alexander Hauptmann
To tackle this challenge, we present a novel pipeline comprised of an Observer Engine and a Physicist Engine by respectively imitating the actions of an observer and a physicist in the real world.
1 code implementation • 22 Aug 2018 • Siyu Huang, Xi Li, Zhi-Qi Cheng, Zhongfei Zhang, Alexander Hauptmann
In this work, we explore the cross-scale similarity in crowd counting scenario, in which the regions of different scales often exhibit high visual similarity.
no code implementations • 19 Apr 2018 • Siyu Huang, Xi Li, Zhi-Qi Cheng, Zhongfei Zhang, Alexander Hauptmann
A key problem in deep multi-attribute learning is to effectively discover the inter-attribute correlation structures.
no code implementations • 14 Apr 2018 • Zhi-Qi Cheng, Hao Zhang, Xiao Wu, Chong-Wah Ngo
A principle way of hyperlinking can be carried out by picking centers of clusters as anchors and from there reach out to targets within or outside of clusters with consideration of neighborhood complexity.
2 code implementations • CVPR 2017 • Zhi-Qi Cheng, Xiao Wu, Yang Liu, Xian-Sheng Hua
For the video side, deep visual features are extracted from detected object regions in each frame, and further fed into a Long Short-Term Memory (LSTM) framework for sequence modeling, which captures the temporal dynamics in videos.
no code implementations • 17 Apr 2017 • Bo Zhao, Xiao Wu, Zhi-Qi Cheng, Hao liu, Zequn Jie, Jiashi Feng
This paper addresses a challenging problem -- how to generate multi-view cloth images from only a single view input.