Search Results for author: Shiwei Zhang

Found 49 papers, 27 papers with code

Unleashing Potential of Evidence in Knowledge-Intensive Dialogue Generation

no code implementations15 Sep 2023 Xianjie Wu, Jian Yang, Tongliang Li, Di Liang, Shiwei Zhang, Yiyang Du, Zhoujun Li

To fully Unleash the potential of evidence, we propose a framework to effectively incorporate Evidence in knowledge-Intensive Dialogue Generation (u-EIDG).

Dialogue Generation

Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning

1 code implementation ICCV 2023 Zhiwu Qing, Shiwei Zhang, Ziyuan Huang, Yingya Zhang, Changxin Gao, Deli Zhao, Nong Sang

When pre-training on the large-scale Kinetics-710, we achieve 89. 7% on Kinetics-400 with a frozen ViT-L model, which verifies the scalability of DiST.

Transfer Learning Video Recognition

Towards Real-World Visual Tracking with Temporal Contexts

1 code implementation20 Aug 2023 Ziang Cao, Ziyuan Huang, Liang Pan, Shiwei Zhang, Ziwei Liu, Changhong Fu

To handle those problems, we propose a two-level framework (TCTrack) that can exploit temporal contexts efficiently.

Visual Tracking

RLIPv2: Fast Scaling of Relational Language-Image Pre-training

2 code implementations ICCV 2023 Hangjie Yuan, Shiwei Zhang, Xiang Wang, Samuel Albanie, Yining Pan, Tao Feng, Jianwen Jiang, Dong Ni, Yingya Zhang, Deli Zhao

In this paper, we propose RLIPv2, a fast converging model that enables the scaling of relational pre-training to large-scale pseudo-labelled scene graph data.

 Ranked #1 on Zero-Shot Human-Object Interaction Detection on HICO-DET (using extra training data)

Graph Generation Human-Object Interaction Detection +5

ModelScope Text-to-Video Technical Report

1 code implementation12 Aug 2023 Jiuniu Wang, Hangjie Yuan, Dayou Chen, Yingya Zhang, Xiang Wang, Shiwei Zhang

This paper introduces ModelScopeT2V, a text-to-video synthesis model that evolves from a text-to-image synthesis model (i. e., Stable Diffusion).

Denoising Image Generation +1

VideoComposer: Compositional Video Synthesis with Motion Controllability

1 code implementation3 Jun 2023 Xiang Wang, Hangjie Yuan, Shiwei Zhang, Dayou Chen, Jiuniu Wang, Yingya Zhang, Yujun Shen, Deli Zhao, Jingren Zhou

The pursuit of controllability as a higher standard of visual content creation has yielded remarkable progress in customizable image synthesis.

Image Generation

Crowd Counting with Sparse Annotation

no code implementations12 Apr 2023 Shiwei Zhang, Zhengzheng Wang, Qing Liu, Fei Wang, Wei Ke, Tong Zhang

This paper presents a new annotation method called Sparse Annotation (SA) for crowd counting, which reduces human labeling efforts by sparsely labeling individuals in an image.

Crowd Counting

MoLo: Motion-augmented Long-short Contrastive Learning for Few-shot Action Recognition

1 code implementation CVPR 2023 Xiang Wang, Shiwei Zhang, Zhiwu Qing, Changxin Gao, Yingya Zhang, Deli Zhao, Nong Sang

To address these issues, we develop a Motion-augmented Long-short Contrastive Learning (MoLo) method that contains two crucial components, including a long-short contrastive objective and a motion autodecoder.

Contrastive Learning Few-Shot action recognition +1

Enlarging Instance-specific and Class-specific Information for Open-set Action Recognition

1 code implementation CVPR 2023 Jun Cen, Shiwei Zhang, Xiang Wang, Yixuan Pei, Zhiwu Qing, Yingya Zhang, Qifeng Chen

In this paper, we begin with analyzing the feature representation behavior in the open-set action recognition (OSAR) problem based on the information bottleneck (IB) theory, and propose to enlarge the instance-specific (IS) and class-specific (CS) information contained in the feature for better performance.

Open Set Action Recognition

CLIP-guided Prototype Modulating for Few-shot Action Recognition

1 code implementation6 Mar 2023 Xiang Wang, Shiwei Zhang, Jun Cen, Changxin Gao, Yingya Zhang, Deli Zhao, Nong Sang

Learning from large-scale contrastive language-image pre-training like CLIP has shown remarkable success in a wide range of downstream tasks recently, but it is still under-explored on the challenging few-shot action recognition (FSAR) task.

Few-Shot action recognition Few Shot Action Recognition

Auto-Parallelizing Large Models with Rhino: A Systematic Approach on Production AI Platform

no code implementations16 Feb 2023 Shiwei Zhang, Lansong Diao, Siyu Wang, Zongyan Cao, Yiliang Gu, Chang Si, Ziji Shi, Zhen Zheng, Chuan Wu, Wei Lin

We present Rhino, a system for accelerating tensor programs with automatic parallelization on AI platform for real production environment.

Expediting Distributed DNN Training with Device Topology-Aware Graph Deployment

no code implementations13 Feb 2023 Shiwei Zhang, Xiaodong Yi, Lansong Diao, Chuan Wu, Siyu Wang, Wei Lin

This paper presents TAG, an automatic system to derive optimized DNN training graph and its deployment onto any device topology, for expedited training in device- and topology- heterogeneous ML clusters.

Combinatorial Optimization TAG

The Devil is in the Wrongly-classified Samples: Towards Unified Open-set Recognition

1 code implementation8 Feb 2023 Jun Cen, Di Luan, Shiwei Zhang, Yixuan Pei, Yingya Zhang, Deli Zhao, Shaojie Shen, Qifeng Chen

Recently, Unified Open-set Recognition (UOSR) has been proposed to reject not only unknown samples but also known but wrongly classified samples, which tends to be more practical in real-world applications.

Open Set Learning

Space-time Prompting for Video Class-incremental Learning

no code implementations ICCV 2023 Yixuan Pei, Zhiwu Qing, Shiwei Zhang, Xiang Wang, Yingya Zhang, Deli Zhao, Xueming Qian

In this paper, we will fill this gap by learning multiple prompts based on a powerful image-language pre-trained model, i. e., CLIP, making it fit for video class-incremental learning (VCIL).

class-incremental learning Class Incremental Learning +1

Learning a Condensed Frame for Memory-Efficient Video Class-Incremental Learning

no code implementations2 Nov 2022 Yixuan Pei, Zhiwu Qing, Jun Cen, Xiang Wang, Shiwei Zhang, Yaxiong Wang, Mingqian Tang, Nong Sang, Xueming Qian

The former is to reduce the memory cost by preserving only one condensed frame instead of the whole video, while the latter aims to compensate the lost spatio-temporal details in the Frame Condensing stage.

Action Recognition class-incremental learning +2

Prompt Combines Paraphrase: Teaching Pre-trained Models to Understand Rare Biomedical Words

1 code implementation COLING 2022 Haochun Wang, Chi Liu, Nuwa Xi, Sendong Zhao, Meizhi Ju, Shiwei Zhang, Ziheng Zhang, Yefeng Zheng, Bing Qin, Ting Liu

Prompt-based fine-tuning for pre-trained models has proven effective for many natural language processing tasks under few-shot settings in general domain.

Natural Language Inference

MAR: Masked Autoencoders for Efficient Action Recognition

1 code implementation24 Jul 2022 Zhiwu Qing, Shiwei Zhang, Ziyuan Huang, Xiang Wang, Yuehuan Wang, Yiliang Lv, Changxin Gao, Nong Sang

Inspired by this, we propose propose Masked Action Recognition (MAR), which reduces the redundant computation by discarding a proportion of patches and operating only on a part of the videos.

Action Classification Action Recognition +1

Open-world Semantic Segmentation for LIDAR Point Clouds

1 code implementation4 Jul 2022 Jun Cen, Peng Yun, Shiwei Zhang, Junhao Cai, Di Luan, Michael Yu Wang, Ming Liu, Mingqian Tang

Current methods for LIDAR semantic segmentation are not robust enough for real-world applications, e. g., autonomous driving, since it is closed-set and static.

Autonomous Driving Incremental Learning +2

Context-aware Proposal Network for Temporal Action Detection

no code implementations18 Jun 2022 Xiang Wang, Huaxin Zhang, Shiwei Zhang, Changxin Gao, Yuanjie Shao, Nong Sang

This technical report presents our first place winning solution for temporal action detection task in CVPR-2022 AcitivityNet Challenge.

Action Classification Action Detection

Hybrid Relation Guided Set Matching for Few-shot Action Recognition

1 code implementation CVPR 2022 Xiang Wang, Shiwei Zhang, Zhiwu Qing, Mingqian Tang, Zhengrong Zuo, Changxin Gao, Rong Jin, Nong Sang

To overcome the two limitations, we propose a novel Hybrid Relation guided Set Matching (HyRSM) approach that incorporates two key components: hybrid relation module and set matching metric.

Few Shot Action Recognition set matching

TCTrack: Temporal Contexts for Aerial Tracking

1 code implementation CVPR 2022 Ziang Cao, Ziyuan Huang, Liang Pan, Shiwei Zhang, Ziwei Liu, Changhong Fu

Temporal contexts among consecutive frames are far from being fully utilized in existing visual trackers.

Discovery-and-Selection: Towards Optimal Multiple Instance Learning for Weakly Supervised Object Detection

no code implementations18 Oct 2021 Shiwei Zhang, Wei Ke, Lin Yang

Weakly supervised object detection (WSOD) is a challenging task that requires simultaneously learn object classifiers and estimate object locations under the supervision of image category labels.

Multiple Instance Learning object-detection +1

TAda! Temporally-Adaptive Convolutions for Video Understanding

2 code implementations ICLR 2022 Ziyuan Huang, Shiwei Zhang, Liang Pan, Zhiwu Qing, Mingqian Tang, Ziwei Liu, Marcelo H. Ang Jr

This work presents Temporally-Adaptive Convolutions (TAdaConv) for video understanding, which shows that adaptive weight calibration along the temporal dimension is an efficient way to facilitate modelling complex temporal dynamics in videos.

Ranked #57 on Action Recognition on Something-Something V2 (using extra training data)

Action Classification Action Recognition +2

Support-Set Based Cross-Supervision for Video Grounding

no code implementations ICCV 2021 Xinpeng Ding, Nannan Wang, Shiwei Zhang, De Cheng, Xiaomeng Li, Ziyuan Huang, Mingqian Tang, Xinbo Gao

The contrastive objective aims to learn effective representations by contrastive learning, while the caption objective can train a powerful video encoder supervised by texts.

Contrastive Learning Video Grounding

ParamCrop: Parametric Cubic Cropping for Video Contrastive Learning

1 code implementation24 Aug 2021 Zhiwu Qing, Ziyuan Huang, Shiwei Zhang, Mingqian Tang, Changxin Gao, Marcelo H. Ang Jr, Rong Jin, Nong Sang

The visualizations show that ParamCrop adaptively controls the center distance and the IoU between two augmented views, and the learned change in the disparity along the training process is beneficial to learning a strong representation.

Contrastive Learning

OadTR: Online Action Detection with Transformers

1 code implementation ICCV 2021 Xiang Wang, Shiwei Zhang, Zhiwu Qing, Yuanjie Shao, Zhengrong Zuo, Changxin Gao, Nong Sang

Most recent approaches for online action detection tend to apply Recurrent Neural Network (RNN) to capture long-range temporal structure.

Online Action Detection

Weakly-Supervised Temporal Action Localization Through Local-Global Background Modeling

no code implementations20 Jun 2021 Xiang Wang, Zhiwu Qing, Ziyuan Huang, Yutong Feng, Shiwei Zhang, Jianwen Jiang, Mingqian Tang, Yuanjie Shao, Nong Sang

Then our proposed Local-Global Background Modeling Network (LGBM-Net) is trained to localize instances by using only video-level labels based on Multi-Instance Learning (MIL).

Weakly-supervised Learning Weakly-supervised Temporal Action Localization +1

End-to-end Temporal Action Detection with Transformer

1 code implementation18 Jun 2021 Xiaolong Liu, Qimeng Wang, Yao Hu, Xu Tang, Shiwei Zhang, Song Bai, Xiang Bai

Temporal action detection (TAD) aims to determine the semantic label and the temporal interval of every action instance in an untrimmed video.

Action Detection Temporal Action Localization +1

A Stronger Baseline for Ego-Centric Action Detection

1 code implementation13 Jun 2021 Zhiwu Qing, Ziyuan Huang, Xiang Wang, Yutong Feng, Shiwei Zhang, Jianwen Jiang, Mingqian Tang, Changxin Gao, Marcelo H. Ang Jr, Nong Sang

This technical report analyzes an egocentric video action detection method we used in the 2021 EPIC-KITCHENS-100 competition hosted in CVPR2021 Workshop.

Action Detection

Self-supervised Motion Learning from Static Images

1 code implementation CVPR 2021 Ziyuan Huang, Shiwei Zhang, Jianwen Jiang, Mingqian Tang, Rong Jin, Marcelo Ang

We furthermore introduce a static mask in pseudo motions to create local motion patterns, which forces the model to additionally locate notable motion areas for the correct classification. We demonstrate that MoSI can discover regions with large motion even without fine-tuning on the downstream datasets.

Action Recognition Self-Supervised Learning

Self-Supervised Video Representation Learning with Constrained Spatiotemporal Jigsaw

no code implementations1 Jan 2021 Yuqi Huo, Mingyu Ding, Haoyu Lu, Zhiwu Lu, Tao Xiang, Ji-Rong Wen, Ziyuan Huang, Jianwen Jiang, Shiwei Zhang, Mingqian Tang, Songfang Huang, Ping Luo

With the constrained jigsaw puzzles, instead of solving them directly, which could still be extremely hard, we carefully design four surrogate tasks that are more solvable but meanwhile still ensure that the learned representation is sensitive to spatiotemporal continuity at both the local and global levels.

Representation Learning

Tuning the quantumness of simple Bose systems: A universal phase diagram

no code implementations9 Aug 2020 Youssef Kora, Massimo Boninsegni, Dam Thanh Son, Shiwei Zhang

We present a comprehensive theoretical study of the phase diagram of a system of many Bose particles interacting with a two-body central potential of the so-called Lennard-Jones form.

Statistical Mechanics

Multi-Level Temporal Pyramid Network for Action Detection

no code implementations7 Aug 2020 Xiang Wang, Changxin Gao, Shiwei Zhang, Nong Sang

By this means, the proposed MLTPN can learn rich and discriminative features for different action instances with different durations.

Action Detection

Less is More: Rejecting Unreliable Reviews for Product Question Answering

1 code implementation9 Jul 2020 Shiwei Zhang, Xiuzhen Zhang, Jey Han Lau, Jeffrey Chan, Cecile Paris

In the literature, PQA is formulated as a retrieval problem with the goal to search for the most relevant reviews to answer a given product question.

Community Question Answering Conformal Prediction +1

Temporal Fusion Network for Temporal Action Localization:Submission to ActivityNet Challenge 2020 (Task E)

no code implementations13 Jun 2020 Zhiwu Qing, Xiang Wang, Yongpeng Sang, Changxin Gao, Shiwei Zhang, Nong Sang

This technical report analyzes a temporal action localization method we used in the HACS competition which is hosted in Activitynet Challenge 2020. The goal of our task is to locate the start time and end time of the action in the untrimmed video, and predict action category. Firstly, we utilize the video-level feature information to train multiple video-level action classification models.

Action Classification Temporal Action Localization

TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection

no code implementations CVPR 2019 Lin Song, Shiwei Zhang, Gang Yu, Hongbin Sun

In this paper, we define these ambiguous samples as "transitional states", and propose a Transition-Aware Context Network (TACNet) to distinguish transitional states.

Action Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.