Search Results for author: Zhi-Qi Cheng

Found 37 papers, 23 papers with code

IVAC-P2L: Leveraging Irregular Repetition Priors for Improving Video Action Counting

1 code implementation18 Mar 2024 Hang Wang, Zhi-Qi Cheng, Youtian Du, Lei Zhang

Our research addresses the shortfall by introducing a novel approach to VAC, called Irregular Video Action Counting (IVAC).

FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio

1 code implementation4 Mar 2024 Chao Xu, Yang Liu, Jiazheng Xing, Weida Wang, Mingze Sun, Jun Dan, Tianxin Huang, Siyuan Li, Zhi-Qi Cheng, Ying Tai, Baigui Sun

In this paper, we abstract the process of people hearing speech, extracting meaningful cues, and creating various dynamically audio-consistent talking faces, termed Listening and Imagining, into the task of high-fidelity diverse talking faces generation from a single audio.

Disentanglement

WordArt Designer API: User-Driven Artistic Typography Synthesis with Large Language Models on ModelScope

no code implementations3 Jan 2024 Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Jingdong Sun, Wangmeng Xiang, Yusen Hu, Xianhui Lin, Xiaoyang Kang, Zengke Jin, Bin Luo, Yifeng Geng, Xuansong Xie, Jingren Zhou

This paper introduces the WordArt Designer API, a novel framework for user-driven artistic typography synthesis utilizing Large Language Models (LLMs) on ModelScope.

ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain Retrieval

1 code implementation19 Dec 2023 Kaipeng Fang, Jingkuan Song, Lianli Gao, Pengpeng Zeng, Zhi-Qi Cheng, Xiyao Li, Heng Tao Shen

Then, in Context-aware Simulator Learning stage, we train a Content-aware Prompt Simulator under a simulated test scenarios to produce the corresponding CaDP.

Few-Shot Learning Retrieval +2

MotionEditor: Editing Video Motion via Content-Aware Diffusion

1 code implementation30 Nov 2023 Shuyuan Tu, Qi Dai, Zhi-Qi Cheng, Han Hu, Xintong Han, Zuxuan Wu, Yu-Gang Jiang

This mechanism enables the editing branch to query the key and value from the reconstruction branch in a decoupled manner, making the editing branch retain the original background and protagonist appearance.

Video Editing

Towards Calibrated Robust Fine-Tuning of Vision-Language Models

no code implementations3 Nov 2023 Changdae Oh, Hyesu Lim, Mijoo Kim, Jaegul Choo, Alexander Hauptmann, Zhi-Qi Cheng, Kyungwoo Song

Robust fine-tuning aims to ensure performance on out-of-distribution (OOD) samples, which is sometimes compromised by pursuing adaptation on in-distribution (ID) samples.

Autonomous Driving Medical Diagnosis

DCPT: Darkness Clue-Prompted Tracking in Nighttime UAVs

1 code implementation19 Sep 2023 Jiawen Zhu, Huayi Tang, Zhi-Qi Cheng, Jun-Yan He, Bin Luo, Shihao Qiu, Shengming Li, Huchuan Lu

To address this, we propose a novel architecture called Darkness Clue-Prompted Tracking (DCPT) that achieves robust UAV tracking at night by efficiently learning to generate darkness clue prompts.

Improving Anomaly Segmentation with Multi-Granularity Cross-Domain Alignment

no code implementations16 Aug 2023 Ji Zhang, Xiao Wu, Zhi-Qi Cheng, Qi He, Wei Li

Anomaly segmentation plays a pivotal role in identifying atypical objects in images, crucial for hazard detection in autonomous driving systems.

Autonomous Driving Contrastive Learning

Overcoming Topology Agnosticism: Enhancing Skeleton-Based Action Recognition through Redefined Skeletal Topology Awareness

1 code implementation19 May 2023 Yuxuan Zhou, Zhi-Qi Cheng, Jun-Yan He, Bin Luo, Yifeng Geng, Xuansong Xie

As a remedy, we propose a threefold strategy: (1) We forge an innovative pathway that encodes bone connectivity by harnessing the power of graph distances.

Action Recognition Skeleton Based Action Recognition

Implicit Temporal Modeling with Learnable Alignment for Video Recognition

1 code implementation ICCV 2023 Shuyuan Tu, Qi Dai, Zuxuan Wu, Zhi-Qi Cheng, Han Hu, Yu-Gang Jiang

While modeling temporal information within straight through tube is widely adopted in literature, we find that simple frame alignment already provides enough essence without temporal attention.

Action Classification Action Recognition +1

DAMO-StreamNet: Optimizing Streaming Perception in Autonomous Driving

1 code implementation30 Mar 2023 Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Wangmeng Xiang, Binghui Chen, Bin Luo, Yifeng Geng, Xuansong Xie

Real-time perception, or streaming perception, is a crucial aspect of autonomous driving that has yet to be thoroughly explored in existing research.

Autonomous Driving

LongShortNet: Exploring Temporal and Semantic Features Fusion in Streaming Perception

2 code implementations27 Oct 2022 Chenyang Li, Zhi-Qi Cheng, Jun-Yan He, Pengyu Li, Bin Luo, Hanyuan Chen, Yifeng Geng, Jin-Peng Lan, Xuansong Xie

Streaming perception is a critical task in autonomous driving that requires balancing the latency and accuracy of the autopilot system.

Autonomous Driving

Subspace Representation Learning for Few-shot Image Classification

no code implementations2 May 2021 Ting-yao Hu, Zhi-Qi Cheng, Alexander G. Hauptmann

In this paper, we propose a subspace representation learning (SRL) framework to tackle few-shot image classification tasks.

Classification Few-Shot Image Classification +3

Generating Person Images with Appearance-aware Pose Stylizer

1 code implementation17 Jul 2020 Siyu Huang, Haoyi Xiong, Zhi-Qi Cheng, Qingzhong Wang, Xingran Zhou, Bihan Wen, Jun Huan, Dejing Dou

Generation of high-quality person images is challenging, due to the sophisticated entanglements among image factors, e. g., appearance, pose, foreground, background, local details, global structures, etc.

Image Generation

Improving the Learning of Multi-column Convolutional Neural Network for Crowd Counting

no code implementations17 Sep 2019 Zhi-Qi Cheng, Jun-Xiu Li, Qi Dai, Xiao Wu, Jun-Yan He, Alexander Hauptmann

By minimizing the mutual information, each column is guided to learn features with different image scales.

Crowd Counting

Learning Spatial Awareness to Improve Crowd Counting

no code implementations ICCV 2019 Zhi-Qi Cheng, Jun-Xiu Li, Qi Dai, Xiao Wu, Alexander Hauptmann

Although the Maximum Excess over SubArrays (MESA) loss has been previously proposed to address the above issues by finding the rectangular subregion whose predicted density map has the maximum difference from the ground truth, it cannot be solved by gradient descent, thus can hardly be integrated into the deep learning framework.

Crowd Counting Weakly-supervised Learning

Perceiving Physical Equation by Observing Visual Scenarios

no code implementations29 Nov 2018 Siyu Huang, Zhi-Qi Cheng, Xi Li, Xiao Wu, Zhongfei Zhang, Alexander Hauptmann

To tackle this challenge, we present a novel pipeline comprised of an Observer Engine and a Physicist Engine by respectively imitating the actions of an observer and a physicist in the real world.

Stacked Pooling: Improving Crowd Counting by Boosting Scale Invariance

1 code implementation22 Aug 2018 Siyu Huang, Xi Li, Zhi-Qi Cheng, Zhongfei Zhang, Alexander Hauptmann

In this work, we explore the cross-scale similarity in crowd counting scenario, in which the regions of different scales often exhibit high visual similarity.

Crowd Counting Density Estimation

GNAS: A Greedy Neural Architecture Search Method for Multi-Attribute Learning

no code implementations19 Apr 2018 Siyu Huang, Xi Li, Zhi-Qi Cheng, Zhongfei Zhang, Alexander Hauptmann

A key problem in deep multi-attribute learning is to effectively discover the inter-attribute correlation structures.

Attribute Neural Architecture Search

On the Selection of Anchors and Targets for Video Hyperlinking

no code implementations14 Apr 2018 Zhi-Qi Cheng, Hao Zhang, Xiao Wu, Chong-Wah Ngo

A principle way of hyperlinking can be carried out by picking centers of clusters as anchors and from there reach out to targets within or outside of clusters with consideration of neighborhood complexity.

Video2Shop: Exact Matching Clothes in Videos to Online Shopping Images

2 code implementations CVPR 2017 Zhi-Qi Cheng, Xiao Wu, Yang Liu, Xian-Sheng Hua

For the video side, deep visual features are extracted from detected object regions in each frame, and further fed into a Long Short-Term Memory (LSTM) framework for sequence modeling, which captures the temporal dynamics in videos.

Multi-View Image Generation from a Single-View

no code implementations17 Apr 2017 Bo Zhao, Xiao Wu, Zhi-Qi Cheng, Hao liu, Zequn Jie, Jiashi Feng

This paper addresses a challenging problem -- how to generate multi-view cloth images from only a single view input.

Image Generation Variational Inference

Cannot find the paper you are looking for? You can Submit a new open access paper.