Search Results for author: Shiwei Zhang

Found 59 papers, 34 papers with code

I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models

3 code implementations • 7 Nov 2023 • Shiwei Zhang, Jiayu Wang, Yingya Zhang, Kang Zhao, Hangjie Yuan, Zhiwu Qin, Xiang Wang, Deli Zhao, Jingren Zhou

By this means, I2VGen-XL can simultaneously enhance the semantic accuracy, continuity of details and clarity of generated videos.

6,049

Paper
Code

ModelScope Text-to-Video Technical Report

3 code implementations • 12 Aug 2023 • Jiuniu Wang, Hangjie Yuan, Dayou Chen, Yingya Zhang, Xiang Wang, Shiwei Zhang

This paper introduces ModelScopeT2V, a text-to-video synthesis model that evolves from a text-to-image synthesis model (i. e., Stable Diffusion).

Ranked #8 on Text-to-Video Generation on MSR-VTT

Denoising Image Generation +1

2,568

Paper
Code

Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation

1 code implementation • 7 Dec 2023 • Zhiwu Qing, Shiwei Zhang, Jiayu Wang, Xiang Wang, Yujie Wei, Yingya Zhang, Changxin Gao, Nong Sang

At the structure level, we decompose the T2V task into two steps, including spatial reasoning and temporal reasoning, using a unified denoiser.

Ranked #5 on Text-to-Video Generation on MSR-VTT

Text-to-Video Generation Video Generation

2,568

Paper
Code

DreamVideo: Composing Your Dream Videos with Customized Subject and Motion

1 code implementation • 7 Dec 2023 • Yujie Wei, Shiwei Zhang, Zhiwu Qing, Hangjie Yuan, Zhiheng Liu, Yu Liu, Yingya Zhang, Jingren Zhou, Hongming Shan

In motion learning, we architect a motion adapter and fine-tune it on the given videos to effectively model the target motion pattern.

Image Generation Video Generation

2,568

Paper
Code

VideoLCM: Video Latent Consistency Model

2 code implementations • 14 Dec 2023 • Xiang Wang, Shiwei Zhang, Han Zhang, Yu Liu, Yingya Zhang, Changxin Gao, Nong Sang

Consistency models have demonstrated powerful capability in efficient image generation and allowed synthesis within a few sampling steps, alleviating the high computational cost in diffusion models.

Computational Efficiency Image Generation +1

2,568

Paper
Code

VideoComposer: Compositional Video Synthesis with Motion Controllability

4 code implementations • NeurIPS 2023 • Xiang Wang, Hangjie Yuan, Shiwei Zhang, Dayou Chen, Jiuniu Wang, Yingya Zhang, Yujun Shen, Deli Zhao, Jingren Zhou

The pursuit of controllability as a higher standard of visual content creation has yielded remarkable progress in customizable image synthesis.

Ranked #5 on Text-to-Video Generation on EvalCrafter Text-to-Video (ECTV) Dataset (using extra training data)

Image Generation Text-to-Video Generation

841

Paper
Code

Self-supervised Motion Learning from Static Images

1 code implementation • CVPR 2021 • Ziyuan Huang, Shiwei Zhang, Jianwen Jiang, Mingqian Tang, Rong Jin, Marcelo Ang

We furthermore introduce a static mask in pseudo motions to create local motion patterns, which forces the model to additionally locate notable motion areas for the correct classification. We demonstrate that MoSI can discover regions with large motion even without fine-tuning on the downstream datasets.

Action Recognition Self-Supervised Learning

215

Paper
Code

Towards Training Stronger Video Vision Transformers for EPIC-KITCHENS-100 Action Recognition

1 code implementation • 9 Jun 2021 • Ziyuan Huang, Zhiwu Qing, Xiang Wang, Yutong Feng, Shiwei Zhang, Jianwen Jiang, Zhurong Xia, Mingqian Tang, Nong Sang, Marcelo H. Ang Jr

In this paper, we present empirical results for training a stronger video vision transformer on the EPIC-KITCHENS-100 Action Recognition dataset.

Action Recognition Point Cloud Classification +1

215

Paper
Code

A Stronger Baseline for Ego-Centric Action Detection

1 code implementation • 13 Jun 2021 • Zhiwu Qing, Ziyuan Huang, Xiang Wang, Yutong Feng, Shiwei Zhang, Jianwen Jiang, Mingqian Tang, Changxin Gao, Marcelo H. Ang Jr, Nong Sang

This technical report analyzes an egocentric video action detection method we used in the 2021 EPIC-KITCHENS-100 competition hosted in CVPR2021 Workshop.

Action Detection

215

Paper
Code

ParamCrop: Parametric Cubic Cropping for Video Contrastive Learning

1 code implementation • 24 Aug 2021 • Zhiwu Qing, Ziyuan Huang, Shiwei Zhang, Mingqian Tang, Changxin Gao, Marcelo H. Ang Jr, Rong Jin, Nong Sang

The visualizations show that ParamCrop adaptively controls the center distance and the IoU between two augmented views, and the learned change in the disparity along the training process is beneficial to learning a strong representation.

Contrastive Learning

215

Paper
Code

TAda! Temporally-Adaptive Convolutions for Video Understanding

2 code implementations • ICLR 2022 • Ziyuan Huang, Shiwei Zhang, Liang Pan, Zhiwu Qing, Mingqian Tang, Ziwei Liu, Marcelo H. Ang Jr

This work presents Temporally-Adaptive Convolutions (TAdaConv) for video understanding, which shows that adaptive weight calibration along the temporal dimension is an efficient way to facilitate modelling complex temporal dynamics in videos.

Ranked #67 on Action Recognition on Something-Something V2 (using extra training data)

Action Classification Action Recognition +2

215

Paper
Code

Temporally-Adaptive Models for Efficient Video Understanding

1 code implementation • 10 Aug 2023 • Ziyuan Huang, Shiwei Zhang, Liang Pan, Zhiwu Qing, Yingya Zhang, Ziwei Liu, Marcelo H. Ang Jr

Spatial convolutions are extensively used in numerous deep video models.

Ranked #3 on Action Recognition on EPIC-KITCHENS-100 (using extra training data)

Action Classification Action Recognition +1

215

Paper
Code

TCTrack: Temporal Contexts for Aerial Tracking

1 code implementation • CVPR 2022 • Ziang Cao, Ziyuan Huang, Liang Pan, Shiwei Zhang, Ziwei Liu, Changhong Fu

Temporal contexts among consecutive frames are far from being fully utilized in existing visual trackers.

154

Paper
Code

Towards Real-World Visual Tracking with Temporal Contexts

1 code implementation • 20 Aug 2023 • Ziang Cao, Ziyuan Huang, Liang Pan, Shiwei Zhang, Ziwei Liu, Changhong Fu

To handle those problems, we propose a two-level framework (TCTrack) that can exploit temporal contexts efficiently.

Visual Tracking

154

Paper
Code

End-to-end Temporal Action Detection with Transformer

1 code implementation • 18 Jun 2021 • Xiaolong Liu, Qimeng Wang, Yao Hu, Xu Tang, Shiwei Zhang, Song Bai, Xiang Bai

Temporal action detection (TAD) aims to determine the semantic label and the temporal interval of every action instance in an untrimmed video.

Ranked #8 on Temporal Action Localization on HACS

Action Detection Temporal Action Localization +1

141

Paper
Code

RLIPv2: Fast Scaling of Relational Language-Image Pre-training

3 code implementations • ICCV 2023 • Hangjie Yuan, Shiwei Zhang, Xiang Wang, Samuel Albanie, Yining Pan, Tao Feng, Jianwen Jiang, Dong Ni, Yingya Zhang, Deli Zhao

In this paper, we propose RLIPv2, a fast converging model that enables the scaling of relational pre-training to large-scale pseudo-labelled scene graph data.

Ranked #1 on Zero-Shot Human-Object Interaction Detection on HICO-DET (using extra training data)

Graph Generation Human-Object Interaction Detection +6

Paper
Code

OadTR: Online Action Detection with Transformers

1 code implementation • ICCV 2021 • Xiang Wang, Shiwei Zhang, Zhiwu Qing, Yuanjie Shao, Zhengrong Zuo, Changxin Gao, Nong Sang

Most recent approaches for online action detection tend to apply Recurrent Neural Network (RNN) to capture long-range temporal structure.

Ranked #8 on Online Action Detection on THUMOS'14

Online Action Detection

Paper
Code

CBR-Net: Cascade Boundary Refinement Network for Action Detection: Submission to ActivityNet Challenge 2020 (Task 1)

1 code implementation • 13 Jun 2020 • Xiang Wang, Baiteng Ma, Zhiwu Qing, Yongpeng Sang, Changxin Gao, Shiwei Zhang, Nong Sang

In this report, we present our solution for the task of temporal action localization (detection) (task 1) in ActivityNet Challenge 2020.

Action Detection Temporal Action Localization

Paper
Code

Self-Supervised Learning for Semi-Supervised Temporal Action Proposal

1 code implementation • CVPR 2021 • Xiang Wang, Shiwei Zhang, Zhiwu Qing, Yuanjie Shao, Changxin Gao, Nong Sang

In this paper, we focus on applying the power of self-supervised methods to improve semi-supervised action proposal generation.

Ranked #2 on Semi-Supervised Action Detection on ActivityNet-1.3

Relation Self-Supervised Learning +2

Paper
Code

Proposal Relation Network for Temporal Action Detection

1 code implementation • 20 Jun 2021 • Xiang Wang, Zhiwu Qing, Ziyuan Huang, Yutong Feng, Shiwei Zhang, Jianwen Jiang, Mingqian Tang, Changxin Gao, Nong Sang

We calculate the detection results by assigning the proposals with corresponding classification results.

Ranked #2 on Temporal Action Localization on ActivityNet-1.3 (using extra training data)

Action Classification Action Detection +3

Paper
Code

Open-world Semantic Segmentation for LIDAR Point Clouds

1 code implementation • 4 Jul 2022 • Jun Cen, Peng Yun, Shiwei Zhang, Junhao Cai, Di Luan, Michael Yu Wang, Ming Liu, Mingqian Tang

Current methods for LIDAR semantic segmentation are not robust enough for real-world applications, e. g., autonomous driving, since it is closed-set and static.

Autonomous Driving Incremental Learning +3

Paper
Code

CLIP-guided Prototype Modulating for Few-shot Action Recognition

1 code implementation • 6 Mar 2023 • Xiang Wang, Shiwei Zhang, Jun Cen, Changxin Gao, Yingya Zhang, Deli Zhao, Nong Sang

Learning from large-scale contrastive language-image pre-training like CLIP has shown remarkable success in a wide range of downstream tasks recently, but it is still under-explored on the challenging few-shot action recognition (FSAR) task.

Few-Shot action recognition Few Shot Action Recognition

Paper
Code

MoLo: Motion-augmented Long-short Contrastive Learning for Few-shot Action Recognition

1 code implementation • CVPR 2023 • Xiang Wang, Shiwei Zhang, Zhiwu Qing, Changxin Gao, Yingya Zhang, Deli Zhao, Nong Sang

To address these issues, we develop a Motion-augmented Long-short Contrastive Learning (MoLo) method that contains two crucial components, including a long-short contrastive objective and a motion autodecoder.

Contrastive Learning Few-Shot action recognition +1

Paper
Code

MAR: Masked Autoencoders for Efficient Action Recognition

1 code implementation • 24 Jul 2022 • Zhiwu Qing, Shiwei Zhang, Ziyuan Huang, Xiang Wang, Yuehuan Wang, Yiliang Lv, Changxin Gao, Nong Sang

Inspired by this, we propose propose Masked Action Recognition (MAR), which reduces the redundant computation by discarding a proportion of patches and operating only on a part of the videos.

Ranked #12 on Action Recognition on Something-Something V2

Action Classification Action Recognition +1

Paper
Code

Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning

1 code implementation • ICCV 2023 • Zhiwu Qing, Shiwei Zhang, Ziyuan Huang, Yingya Zhang, Changxin Gao, Deli Zhao, Nong Sang

When pre-training on the large-scale Kinetics-710, we achieve 89. 7% on Kinetics-400 with a frozen ViT-L model, which verifies the scalability of DiST.

Transfer Learning Video Recognition

Paper
Code

Hybrid Relation Guided Set Matching for Few-shot Action Recognition

1 code implementation • CVPR 2022 • Xiang Wang, Shiwei Zhang, Zhiwu Qing, Mingqian Tang, Zhengrong Zuo, Changxin Gao, Rong Jin, Nong Sang

To overcome the two limitations, we propose a novel Hybrid Relation guided Set Matching (HyRSM) approach that incorporates two key components: hybrid relation module and set matching metric.

Ranked #1 on Few Shot Action Recognition on Something-Something-100

Few Shot Action Recognition Relation +1

Paper
Code

Enlarging Instance-specific and Class-specific Information for Open-set Action Recognition

1 code implementation • CVPR 2023 • Jun Cen, Shiwei Zhang, Xiang Wang, Yixuan Pei, Zhiwu Qing, Yingya Zhang, Qifeng Chen

In this paper, we begin with analyzing the feature representation behavior in the open-set action recognition (OSAR) problem based on the information bottleneck (IB) theory, and propose to enlarge the instance-specific (IS) and class-specific (CS) information contained in the feature for better performance.

Open Set Action Recognition

Paper
Code

The Devil is in the Wrongly-classified Samples: Towards Unified Open-set Recognition

1 code implementation • 8 Feb 2023 • Jun Cen, Di Luan, Shiwei Zhang, Yixuan Pei, Yingya Zhang, Deli Zhao, Shaojie Shen, Qifeng Chen

Recently, Unified Open-set Recognition (UOSR) has been proposed to reject not only unknown samples but also known but wrongly classified samples, which tends to be more practical in real-world applications.

Open Set Learning

Paper
Code

CMDFusion: Bidirectional Fusion Network with Cross-modality Knowledge Distillation for LIDAR Semantic Segmentation

1 code implementation • 9 Jul 2023 • Jun Cen, Shiwei Zhang, Yixuan Pei, Kun Li, Hang Zheng, Maochun Luo, Yingya Zhang, Qifeng Chen

In this way, RGB images are not required during inference anymore since the 2D knowledge branch provides 2D information according to the 3D LIDAR input.

Autonomous Vehicles Knowledge Distillation +2

Paper
Code

HyRSM++: Hybrid Relation Guided Temporal Set Matching for Few-shot Action Recognition

1 code implementation • 9 Jan 2023 • Xiang Wang, Shiwei Zhang, Zhiwu Qing, Zhengrong Zuo, Changxin Gao, Rong Jin, Nong Sang

To be specific, HyRSM++ consists of two key components, a hybrid relation module and a temporal set matching metric.

Few-Shot action recognition Few Shot Action Recognition +3

Paper
Code

Prompt Combines Paraphrase: Teaching Pre-trained Models to Understand Rare Biomedical Words

1 code implementation • COLING 2022 • Haochun Wang, Chi Liu, Nuwa Xi, Sendong Zhao, Meizhi Ju, Shiwei Zhang, Ziheng Zhang, Yefeng Zheng, Bing Qin, Ting Liu

Prompt-based fine-tuning for pre-trained models has proven effective for many natural language processing tasks under few-shot settings in general domain.

Natural Language Inference

Paper
Code

InstructVideo: Instructing Video Diffusion Models with Human Feedback

1 code implementation • 19 Dec 2023 • Hangjie Yuan, Shiwei Zhang, Xiang Wang, Yujie Wei, Tao Feng, Yining Pan, Yingya Zhang, Ziwei Liu, Samuel Albanie, Dong Ni

To tackle this problem, we propose InstructVideo to instruct text-to-video diffusion models with human feedback by reward fine-tuning.

Video Generation

Paper
Code

A Recipe for Scaling up Text-to-Video Generation with Text-free Videos

1 code implementation • 25 Dec 2023 • Xiang Wang, Shiwei Zhang, Hangjie Yuan, Zhiwu Qing, Biao Gong, Yingya Zhang, Yujun Shen, Changxin Gao, Nong Sang

Following such a pipeline, we study the effect of doubling the scale of training set (i. e., video-only WebVid10M) with some randomly collected text-free videos and are encouraged to observe the performance improvement (FID from 9. 67 to 8. 19 and FVD from 484 to 441), demonstrating the scalability of our approach.

Ranked #6 on Text-to-Video Generation on MSR-VTT

Text-to-Image Generation Text-to-Video Generation +2

Paper
Code

Less is More: Rejecting Unreliable Reviews for Product Question Answering

1 code implementation • 9 Jul 2020 • Shiwei Zhang, Xiuzhen Zhang, Jey Han Lau, Jeffrey Chan, Cecile Paris

In the literature, PQA is formulated as a retrieval problem with the goal to search for the most relevant reviews to answer a given product question.

Community Question Answering Conformal Prediction +1

Paper
Code

TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection

no code implementations • CVPR 2019 • Lin Song, Shiwei Zhang, Gang Yu, Hongbin Sun

In this paper, we define these ambiguous samples as "transitional states", and propose a Transition-Aware Context Network (TACNet) to distinguish transitional states.

Ranked #7 on Action Detection on J-HMDB

Action Detection

Paper
Add Code

Temporal Fusion Network for Temporal Action Localization:Submission to ActivityNet Challenge 2020 (Task E)

no code implementations • 13 Jun 2020 • Zhiwu Qing, Xiang Wang, Yongpeng Sang, Changxin Gao, Shiwei Zhang, Nong Sang

This technical report analyzes a temporal action localization method we used in the HACS competition which is hosted in Activitynet Challenge 2020. The goal of our task is to locate the start time and end time of the action in the untrimmed video, and predict action category. Firstly, we utilize the video-level feature information to train multiple video-level action classification models.

Action Classification Temporal Action Localization

Paper
Add Code

Multi-Level Temporal Pyramid Network for Action Detection

no code implementations • 7 Aug 2020 • Xiang Wang, Changxin Gao, Shiwei Zhang, Nong Sang

By this means, the proposed MLTPN can learn rich and discriminative features for different action instances with different durations.

Action Detection

Paper
Add Code

Self-Supervised Video Representation Learning with Constrained Spatiotemporal Jigsaw

no code implementations • 1 Jan 2021 • Yuqi Huo, Mingyu Ding, Haoyu Lu, Zhiwu Lu, Tao Xiang, Ji-Rong Wen, Ziyuan Huang, Jianwen Jiang, Shiwei Zhang, Mingqian Tang, Songfang Huang, Ping Luo

With the constrained jigsaw puzzles, instead of solving them directly, which could still be extremely hard, we carefully design four surrogate tasks that are more solvable but meanwhile still ensure that the learned representation is sensitive to spatiotemporal continuity at both the local and global levels.

Representation Learning

Paper
Add Code

Tuning the quantumness of simple Bose systems: A universal phase diagram

no code implementations • 9 Aug 2020 • Youssef Kora, Massimo Boninsegni, Dam Thanh Son, Shiwei Zhang

We present a comprehensive theoretical study of the phase diagram of a system of many Bose particles interacting with a two-body central potential of the so-called Lennard-Jones form.

Statistical Mechanics

Paper
Add Code

Relation Modeling in Spatio-Temporal Action Localization

no code implementations • 15 Jun 2021 • Yutong Feng, Jianwen Jiang, Ziyuan Huang, Zhiwu Qing, Xiang Wang, Shiwei Zhang, Mingqian Tang, Yue Gao

This paper presents our solution to the AVA-Kinetics Crossover Challenge of ActivityNet workshop at CVPR 2021.

Ranked #4 on Spatio-Temporal Action Localization on AVA-Kinetics (using extra training data)

Action Detection Relation +2

Paper
Add Code

Weakly-Supervised Temporal Action Localization Through Local-Global Background Modeling

no code implementations • 20 Jun 2021 • Xiang Wang, Zhiwu Qing, Ziyuan Huang, Yutong Feng, Shiwei Zhang, Jianwen Jiang, Mingqian Tang, Yuanjie Shao, Nong Sang

Then our proposed Local-Global Background Modeling Network (LGBM-Net) is trained to localize instances by using only video-level labels based on Multi-Instance Learning (MIL).

Weakly-supervised Learning Weakly-supervised Temporal Action Localization +1

Paper
Add Code

Exploring Stronger Feature for Temporal Action Localization

no code implementations • 24 Jun 2021 • Zhiwu Qing, Xiang Wang, Ziyuan Huang, Yutong Feng, Shiwei Zhang, Jianwen Jiang, Mingqian Tang, Changxin Gao, Nong Sang

Temporal action localization aims to localize starting and ending time with action category.

Temporal Action Localization

Paper
Add Code

Support-Set Based Cross-Supervision for Video Grounding

no code implementations • ICCV 2021 • Xinpeng Ding, Nannan Wang, Shiwei Zhang, De Cheng, Xiaomeng Li, Ziyuan Huang, Mingqian Tang, Xinbo Gao

The contrastive objective aims to learn effective representations by contrastive learning, while the caption objective can train a powerful video encoder supervised by texts.

Contrastive Learning Video Grounding

Paper
Add Code

Discovery-and-Selection: Towards Optimal Multiple Instance Learning for Weakly Supervised Object Detection

no code implementations • 18 Oct 2021 • Shiwei Zhang, Wei Ke, Lin Yang

Weakly supervised object detection (WSOD) is a challenging task that requires simultaneously learn object classifiers and estimate object locations under the supervision of image category labels.

Multiple Instance Learning Object +2

Paper
Add Code

Does QA-based intermediate training help fine-tuning language models for text classification?

no code implementations • ALTA 2021 • Shiwei Zhang, Xiuzhen Zhang

Fine-tuning pre-trained language models for downstream tasks has become a norm for NLP.

Question Answering text-classification +1

Paper
Add Code

Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency

no code implementations • CVPR 2022 • Zhiwu Qing, Shiwei Zhang, Ziyuan Huang, Yi Xu, Xiang Wang, Mingqian Tang, Changxin Gao, Rong Jin, Nong Sang

In this work, we aim to learn representations by leveraging more abundant information in untrimmed videos.

Contrastive Learning Representation Learning +1

Paper
Add Code

G4: Grounding-guided Goal-oriented Dialogues Generation with Multiple Documents

no code implementations • dialdoc (ACL) 2022 • Shiwei Zhang, Yiyang Du, Guanzhong Liu, Zhao Yan, Yunbo Cao

Goal-oriented dialogues generation grounded in multiple documents(MultiDoc2Dial) is a challenging and realistic task.

Machine Reading Comprehension Response Generation

Paper
Add Code

Context-aware Proposal Network for Temporal Action Detection

no code implementations • 18 Jun 2022 • Xiang Wang, Huaxin Zhang, Shiwei Zhang, Changxin Gao, Yuanjie Shao, Nong Sang

This technical report presents our first place winning solution for temporal action detection task in CVPR-2022 AcitivityNet Challenge.

Action Classification Action Detection

Paper
Add Code

Learning a Condensed Frame for Memory-Efficient Video Class-Incremental Learning

no code implementations • 2 Nov 2022 • Yixuan Pei, Zhiwu Qing, Jun Cen, Xiang Wang, Shiwei Zhang, Yaxiong Wang, Mingqian Tang, Nong Sang, Xueming Qian

The former is to reduce the memory cost by preserving only one condensed frame instead of the whole video, while the latter aims to compensate the lost spatio-temporal details in the Frame Condensing stage.

Action Recognition Class Incremental Learning +1

Paper
Add Code

Expediting Distributed DNN Training with Device Topology-Aware Graph Deployment

no code implementations • 13 Feb 2023 • Shiwei Zhang, Xiaodong Yi, Lansong Diao, Chuan Wu, Siyu Wang, Wei Lin

This paper presents TAG, an automatic system to derive optimized DNN training graph and its deployment onto any device topology, for expedited training in device- and topology- heterogeneous ML clusters.

Combinatorial Optimization TAG

Paper
Add Code

Auto-Parallelizing Large Models with Rhino: A Systematic Approach on Production AI Platform

no code implementations • 16 Feb 2023 • Shiwei Zhang, Lansong Diao, Siyu Wang, Zongyan Cao, Yiliang Gu, Chang Si, Ziji Shi, Zhen Zheng, Chuan Wu, Wei Lin

We present Rhino, a system for accelerating tensor programs with automatic parallelization on AI platform for real production environment.

Paper
Add Code

Crowd Counting with Sparse Annotation

no code implementations • 12 Apr 2023 • Shiwei Zhang, Zhengzheng Wang, Qing Liu, Fei Wang, Wei Ke, Tong Zhang

This paper presents a new annotation method called Sparse Annotation (SA) for crowd counting, which reduces human labeling efforts by sparsely labeling individuals in an image.

Crowd Counting

Paper
Add Code

LipFormer: High-Fidelity and Generalizable Talking Face Generation With a Pre-Learned Facial Codebook

no code implementations • CVPR 2023 • Jiayu Wang, Kang Zhao, Shiwei Zhang, Yingya Zhang, Yujun Shen, Deli Zhao, Jingren Zhou

Generating a talking face video from the input audio sequence is a practical yet challenging task.

Talking Face Generation

Paper
Add Code

Unleashing Potential of Evidence in Knowledge-Intensive Dialogue Generation

no code implementations • 15 Sep 2023 • Xianjie Wu, Jian Yang, Tongliang Li, Di Liang, Shiwei Zhang, Yiyang Du, Zhoujun Li

To fully Unleash the potential of evidence, we propose a framework to effectively incorporate Evidence in knowledge-Intensive Dialogue Generation (u-EIDG).

Dialogue Generation

Paper
Add Code

Space-time Prompting for Video Class-incremental Learning

no code implementations • ICCV 2023 • Yixuan Pei, Zhiwu Qing, Shiwei Zhang, Xiang Wang, Yingya Zhang, Deli Zhao, Xueming Qian

In this paper, we will fill this gap by learning multiple prompts based on a powerful image-language pre-trained model, i. e., CLIP, making it fit for video class-incremental learning (VCIL).

Class Incremental Learning Incremental Learning

Paper
Add Code

Few-shot Action Recognition with Captioning Foundation Models

no code implementations • 16 Oct 2023 • Xiang Wang, Shiwei Zhang, Hangjie Yuan, Yingya Zhang, Changxin Gao, Deli Zhao, Nong Sang

In this paper, we develop an effective plug-and-play framework called CapFSAR to exploit the knowledge of multimodal models without manually annotating text.

Few-Shot action recognition Few Shot Action Recognition

Paper
Add Code

Utilising a Large Language Model to Annotate Subject Metadata: A Case Study in an Australian National Research Data Catalogue

no code implementations • 17 Oct 2023 • Shiwei Zhang, Mingfang Wu, Xiuzhen Zhang

To the best of our knowledge, we are introducing, for the first time, an in-context learning method that harnesses large language models for automated subject metadata annotation.

In-Context Learning Language Modelling +1

Paper
Add Code

Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation

no code implementations • 27 Nov 2023 • Biao Gong, Siteng Huang, Yutong Feng, Shiwei Zhang, Yuyuan Li, Yu Liu

To align the generated image with layout instructions, we present a training-free layout calibration system SimM that intervenes in the generative process on the fly during inference time.

Text-to-Image Generation

Paper
Add Code

DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

no code implementations • 15 Dec 2023 • Yifeng Ma, Shiwei Zhang, Jiayu Wang, Xiang Wang, Yingya Zhang, Zhidong Deng

In this work, we propose a DreamTalk framework to fulfill this gap, which employs meticulous design to unlock the potential of diffusion models in generating expressive talking heads.

Denoising Talking Head Generation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.