Search Results for author: Zhi-Qi Cheng

Found 37 papers, 23 papers with code

LEAF: Unveiling Two Sides of the Same Coin in Semi-supervised Facial Expression Recognition

no code implementations • 23 Apr 2024 • Fan Zhang, Zhi-Qi Cheng, Jian Zhao, Xiaojiang Peng, Xuelong Li

LEAF introduces a hierarchical expression-aware aggregation strategy that operates at three levels: semantic, instance, and category.

Facial Expression Recognition Facial Expression Recognition (FER)

Paper
Add Code

MIPS at SemEval-2024 Task 3: Multimodal Emotion-Cause Pair Extraction in Conversations with Multimodal Language Models

1 code implementation • 31 Mar 2024 • Zebang Cheng, Fuqiang Niu, Yuxiang Lin, Zhi-Qi Cheng, BoWen Zhang, Xiaojiang Peng

This paper presents our winning submission to Subtask 2 of SemEval 2024 Task 3 on multimodal emotion cause analysis in conversations.

Emotion Cause Extraction Emotion-Cause Pair Extraction +1

Paper
Code

IVAC-P2L: Leveraging Irregular Repetition Priors for Improving Video Action Counting

1 code implementation • 18 Mar 2024 • Hang Wang, Zhi-Qi Cheng, Youtian Du, Lei Zhang

Our research addresses the shortfall by introducing a novel approach to VAC, called Irregular Video Action Counting (IVAC).

Paper
Code

DyRoNet: Dynamic Routing and Low-Rank Adapters for Autonomous Driving Streaming Perception

no code implementations • 8 Mar 2024 • Xiang Huang, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Wangmeng Xiang, Baigui Sun, Xiao Wu

The advancement of autonomous driving systems hinges on the ability to achieve low-latency and high-accuracy perception.

Autonomous Driving Navigate

Paper
Add Code

FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio

1 code implementation • 4 Mar 2024 • Chao Xu, Yang Liu, Jiazheng Xing, Weida Wang, Mingze Sun, Jun Dan, Tianxin Huang, Siyuan Li, Zhi-Qi Cheng, Ying Tai, Baigui Sun

In this paper, we abstract the process of people hearing speech, extracting meaningful cues, and creating various dynamically audio-consistent talking faces, termed Listening and Imagining, into the task of high-fidelity diverse talking faces generation from a single audio.

Disentanglement

8,307

Paper
Code

WordArt Designer API: User-Driven Artistic Typography Synthesis with Large Language Models on ModelScope

no code implementations • 3 Jan 2024 • Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Jingdong Sun, Wangmeng Xiang, Yusen Hu, Xianhui Lin, Xiaoyang Kang, Zengke Jin, Bin Luo, Yifeng Geng, Xuansong Xie, Jingren Zhou

This paper introduces the WordArt Designer API, a novel framework for user-driven artistic typography synthesis utilizing Large Language Models (LLMs) on ModelScope.

Paper
Add Code

Tracking with Human-Intent Reasoning

1 code implementation • 29 Dec 2023 • Jiawen Zhu, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Bin Luo, Huchuan Lu, Yifeng Geng, Xuansong Xie

The perception component then generates the tracking results based on the embeddings.

Language Modelling Object +4

Paper
Code

ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain Retrieval

1 code implementation • 19 Dec 2023 • Kaipeng Fang, Jingkuan Song, Lianli Gao, Pengpeng Zeng, Zhi-Qi Cheng, Xiyao Li, Heng Tao Shen

Then, in Context-aware Simulator Learning stage, we train a Content-aware Prompt Simulator under a simulated test scenarios to produce the corresponding CaDP.

Few-Shot Learning Retrieval +2

Paper
Code

MotionEditor: Editing Video Motion via Content-Aware Diffusion

1 code implementation • 30 Nov 2023 • Shuyuan Tu, Qi Dai, Zhi-Qi Cheng, Han Hu, Xintong Han, Zuxuan Wu, Yu-Gang Jiang

This mechanism enables the editing branch to query the key and value from the reconstruction branch in a decoupled manner, making the editing branch retain the original background and protagonist appearance.

Video Editing

Paper
Code

Towards Calibrated Robust Fine-Tuning of Vision-Language Models

no code implementations • 3 Nov 2023 • Changdae Oh, Hyesu Lim, Mijoo Kim, Jaegul Choo, Alexander Hauptmann, Zhi-Qi Cheng, Kyungwoo Song

Robust fine-tuning aims to ensure performance on out-of-distribution (OOD) samples, which is sometimes compromised by pursuing adaptation on in-distribution (ID) samples.

Autonomous Driving Medical Diagnosis

Paper
Add Code

Debunking Free Fusion Myth: Online Multi-view Anomaly Detection with Disentangled Product-of-Experts Modeling

no code implementations • 28 Oct 2023 • Hao Wang, Zhi-Qi Cheng, Jingdong Sun, Xin Yang, Xiao Wu, Hongyang Chen, Yan Yang

Multi-view or even multi-modal data is appealing yet challenging for real-world applications.

Anomaly Detection Disentanglement +1

Paper
Add Code

WordArt Designer: User-Driven Artistic Typography Synthesis using Large Language Models

no code implementations • 20 Oct 2023 • Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Jingdong Sun, Wangmeng Xiang, Xianhui Lin, Xiaoyang Kang, Zengke Jin, Yusen Hu, Bin Luo, Yifeng Geng, Xuansong Xie, Jingren Zhou

This paper introduces WordArt Designer, a user-driven framework for artistic typography synthesis, relying on the Large Language Model (LLM).

Language Modelling Large Language Model

Paper
Add Code

DCPT: Darkness Clue-Prompted Tracking in Nighttime UAVs

1 code implementation • 19 Sep 2023 • Jiawen Zhu, Huayi Tang, Zhi-Qi Cheng, Jun-Yan He, Bin Luo, Shihao Qiu, Shengming Li, Huchuan Lu

To address this, we propose a novel architecture called Darkness Clue-Prompted Tracking (DCPT) that achieves robust UAV tracking at night by efficiently learning to generate darkness clue prompts.

Paper
Code

Refined Temporal Pyramidal Compression-and-Amplification Transformer for 3D Human Pose Estimation

1 code implementation • 4 Sep 2023 • Hanbing Liu, Wangmeng Xiang, Jun-Yan He, Zhi-Qi Cheng, Bin Luo, Yifeng Geng, Xuansong Xie

Accurately estimating the 3D pose of humans in video sequences requires both accuracy and a well-structured architecture.

3D Human Pose Estimation

Paper
Code

PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation

1 code implementation • 18 Aug 2023 • Hanbing Liu, Jun-Yan He, Zhi-Qi Cheng, Wangmeng Xiang, Qize Yang, Wenhao Chai, Gaoang Wang, Xu Bao, Bin Luo, Yifeng Geng, Xuansong Xie

Typically, PoSynDA uses a diffusion-inspired structure to simulate 3D pose distribution in the target domain.

3D Human Pose Estimation Domain Adaptation

Paper
Code

Improving Anomaly Segmentation with Multi-Granularity Cross-Domain Alignment

no code implementations • 16 Aug 2023 • Ji Zhang, Xiao Wu, Zhi-Qi Cheng, Qi He, Wei Li

Anomaly segmentation plays a pivotal role in identifying atypical objects in images, crucial for hazard detection in autonomous driving systems.

Autonomous Driving Contrastive Learning

Paper
Add Code

KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range Multilateration

1 code implementation • 25 May 2023 • Xu Bao, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Wangmeng Xiang, Jingdong Sun, Hanbing Liu, Wei Liu, Bin Luo, Yifeng Geng, Xuansong Xie

By spearheading the integration of Multilateration with facial analysis, KeyPosS marks a paradigm shift in facial landmark detection.

Benchmarking Face Recognition +3

Paper
Code

Overcoming Topology Agnosticism: Enhancing Skeleton-Based Action Recognition through Redefined Skeletal Topology Awareness

1 code implementation • 19 May 2023 • Yuxuan Zhou, Zhi-Qi Cheng, Jun-Yan He, Bin Luo, Yifeng Geng, Xuansong Xie

As a remedy, we propose a threefold strategy: (1) We forge an innovative pathway that encodes bone connectivity by harnessing the power of graph distances.

Action Recognition Skeleton Based Action Recognition

Paper
Code

Implicit Temporal Modeling with Learnable Alignment for Video Recognition

1 code implementation • ICCV 2023 • Shuyuan Tu, Qi Dai, Zuxuan Wu, Zhi-Qi Cheng, Han Hu, Yu-Gang Jiang

While modeling temporal information within straight through tube is widely adopted in literature, we find that simple frame alignment already provides enough essence without temporal attention.

Ranked #16 on Action Classification on Kinetics-400

Action Classification Action Recognition +1

Paper
Code

ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules

1 code implementation • ICCV 2023 • Zhi-Qi Cheng, Qi Dai, SiYao Li, Jingdong Sun, Teruko Mitamura, Alexander G. Hauptmann

We evaluate ChartReader on Chart-to-Table, ChartQA, and Chart-to-Text tasks, demonstrating its superiority over existing methods.

Derendering Language Modelling +1

Paper
Code

DAMO-StreamNet: Optimizing Streaming Perception in Autonomous Driving

1 code implementation • 30 Mar 2023 • Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Wangmeng Xiang, Binghui Chen, Bin Luo, Yifeng Geng, Xuansong Xie

Real-time perception, or streaming perception, is a crucial aspect of autonomous driving that has yet to be thoroughly explored in existing research.

Autonomous Driving

Paper
Code

HDFormer: High-order Directed Transformer for 3D Human Pose Estimation

1 code implementation • 3 Feb 2023 • Hanyuan Chen, Jun-Yan He, Wangmeng Xiang, Zhi-Qi Cheng, Wei Liu, Hanbing Liu, Bin Luo, Yifeng Geng, Xuansong Xie

Human pose estimation is a challenging task due to its structured data sequence nature.

Ranked #74 on 3D Human Pose Estimation on Human3.6M

3D Human Pose Estimation 3D Pose Estimation +1

Paper
Code

Hypergraph Transformer for Skeleton-based Action Recognition

1 code implementation • 17 Nov 2022 • Yuxuan Zhou, Zhi-Qi Cheng, Chao Li, Yanwen Fang, Yifeng Geng, Xuansong Xie, Margret Keuper

Skeleton-based action recognition aims to recognize human actions given human joint coordinates with skeletal interconnections.

Ranked #7 on Skeleton Based Action Recognition on NTU RGB+D 120

Action Recognition Skeleton Based Action Recognition

Paper
Code

ProContEXT: Exploring Progressive Context Transformer for Tracking

2 code implementations • 27 Oct 2022 • Jin-Peng Lan, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Bin Luo, Xu Bao, Wangmeng Xiang, Yifeng Geng, Xuansong Xie

Existing Visual Object Tracking (VOT) only takes the target area in the first frame as a template.

Object Visual Object Tracking

Paper
Code

LongShortNet: Exploring Temporal and Semantic Features Fusion in Streaming Perception

2 code implementations • 27 Oct 2022 • Chenyang Li, Zhi-Qi Cheng, Jun-Yan He, Pengyu Li, Bin Luo, Hanyuan Chen, Yifeng Geng, Jin-Peng Lan, Xuansong Xie

Streaming perception is a critical task in autonomous driving that requires balancing the latency and accuracy of the autopilot system.

Autonomous Driving

Paper
Code

GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement

1 code implementation • 18 Aug 2022 • Zhi-Qi Cheng, Qi Dai, SiYao Li, Teruko Mitamura, Alexander G. Hauptmann

In the second stage, we exploit transformer layers to unearth the potential semantic relations within both verbs and semantic roles.

Grounded Situation Recognition Image Captioning +3

Paper
Code

Rethinking Spatial Invariance of Convolutional Networks for Object Counting

1 code implementation • CVPR 2022 • Zhi-Qi Cheng, Qi Dai, Hong Li, Jingkuan Song, Xiao Wu, Alexander G. Hauptmann

We evaluate our methods on 4 mainstream object counting networks (i. e., MCNN, CSRNet, SANet, and ResNet-50).

Ranked #1 on Object Counting on TRANCOS

Crowd Counting Object +2

Paper
Code

Subspace Representation Learning for Few-shot Image Classification

no code implementations • 2 May 2021 • Ting-yao Hu, Zhi-Qi Cheng, Alexander G. Hauptmann

In this paper, we propose a subspace representation learning (SRL) framework to tackle few-shot image classification tasks.

Classification Few-Shot Image Classification +3

Paper
Add Code

Generating Person Images with Appearance-aware Pose Stylizer

1 code implementation • 17 Jul 2020 • Siyu Huang, Haoyi Xiong, Zhi-Qi Cheng, Qingzhong Wang, Xingran Zhou, Bihan Wen, Jun Huan, Dejing Dou

Generation of high-quality person images is challenging, due to the sophisticated entanglements among image factors, e. g., appearance, pose, foreground, background, local details, global structures, etc.

Image Generation

Paper
Code

Improving the Learning of Multi-column Convolutional Neural Network for Crowd Counting

no code implementations • 17 Sep 2019 • Zhi-Qi Cheng, Jun-Xiu Li, Qi Dai, Xiao Wu, Jun-Yan He, Alexander Hauptmann

By minimizing the mutual information, each column is guided to learn features with different image scales.

Crowd Counting

Paper
Add Code

Learning Spatial Awareness to Improve Crowd Counting

no code implementations • ICCV 2019 • Zhi-Qi Cheng, Jun-Xiu Li, Qi Dai, Xiao Wu, Alexander Hauptmann

Although the Maximum Excess over SubArrays (MESA) loss has been previously proposed to address the above issues by finding the rectangular subregion whose predicted density map has the maximum difference from the ground truth, it cannot be solved by gradient descent, thus can hardly be integrated into the deep learning framework.

Ranked #5 on Crowd Counting on WorldExpo’10

Crowd Counting Weakly-supervised Learning

Paper
Add Code

Perceiving Physical Equation by Observing Visual Scenarios

no code implementations • 29 Nov 2018 • Siyu Huang, Zhi-Qi Cheng, Xi Li, Xiao Wu, Zhongfei Zhang, Alexander Hauptmann

To tackle this challenge, we present a novel pipeline comprised of an Observer Engine and a Physicist Engine by respectively imitating the actions of an observer and a physicist in the real world.

Paper
Add Code

Stacked Pooling: Improving Crowd Counting by Boosting Scale Invariance

1 code implementation • 22 Aug 2018 • Siyu Huang, Xi Li, Zhi-Qi Cheng, Zhongfei Zhang, Alexander Hauptmann

In this work, we explore the cross-scale similarity in crowd counting scenario, in which the regions of different scales often exhibit high visual similarity.

Crowd Counting Density Estimation

Paper
Code

GNAS: A Greedy Neural Architecture Search Method for Multi-Attribute Learning

no code implementations • 19 Apr 2018 • Siyu Huang, Xi Li, Zhi-Qi Cheng, Zhongfei Zhang, Alexander Hauptmann

A key problem in deep multi-attribute learning is to effectively discover the inter-attribute correlation structures.

Attribute Neural Architecture Search

Paper
Add Code

On the Selection of Anchors and Targets for Video Hyperlinking

no code implementations • 14 Apr 2018 • Zhi-Qi Cheng, Hao Zhang, Xiao Wu, Chong-Wah Ngo

A principle way of hyperlinking can be carried out by picking centers of clusters as anchors and from there reach out to targets within or outside of clusters with consideration of neighborhood complexity.

Paper
Add Code

Video2Shop: Exact Matching Clothes in Videos to Online Shopping Images

2 code implementations • CVPR 2017 • Zhi-Qi Cheng, Xiao Wu, Yang Liu, Xian-Sheng Hua

For the video side, deep visual features are extracted from detected object regions in each frame, and further fed into a Long Short-Term Memory (LSTM) framework for sequence modeling, which captures the temporal dynamics in videos.

Paper
Code

Multi-View Image Generation from a Single-View

no code implementations • 17 Apr 2017 • Bo Zhao, Xiao Wu, Zhi-Qi Cheng, Hao liu, Zequn Jie, Jiashi Feng

This paper addresses a challenging problem -- how to generate multi-view cloth images from only a single view input.

Image Generation Variational Inference

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.