Search Results for author: Shiwen Zhang

Found 5 papers, 2 papers with code

Forgedit: Text Guided Image Editing via Learning and Forgetting

1 code implementation • 19 Sep 2023 • Shiwen Zhang, Shuai Xiao, Weilin Huang

Text-guided image editing on real or synthetic images, given only the original image itself and the target text prompt as inputs, is a very general and challenging task.

text-guided-image-editing

344

Paper
Code

TFCNet: Temporal Fully Connected Networks for Static Unbiased Temporal Reasoning

no code implementations • 11 Mar 2022 • Shiwen Zhang

New video classification benchmarks aiming to eliminate static biases are proposed, with experiments on these new benchmarks showing that the current clip-based 3D CNNs are outperformed by RNN structures and recent video transformers.

Ranked #2 on Video Object Tracking on CATER

Action Recognition Classification +2

Paper
Add Code

V4D: 4D Convolutional Neural Networks for Video-level Representation Learning

no code implementations • ICLR 2020 • Shiwen Zhang, Sheng Guo, Weilin Huang, Matthew R. Scott, Li-Min Wang

Most existing 3D CNN structures for video representation learning are clip-based methods, and do not consider video-level temporal evolution of spatio-temporal features.

Representation Learning Video Recognition

Paper
Add Code

Knowledge Integration Networks for Action Recognition

no code implementations • 18 Feb 2020 • Shiwen Zhang, Sheng Guo, Li-Min Wang, Weilin Huang, Matthew R. Scott

We design a three-branch architecture consisting of a main branch for action recognition, and two auxiliary branches for human parsing and scene recognition which allow the model to encode the knowledge of human and scene for action recognition.

Action Recognition Human Parsing +2

Paper
Add Code

V4D:4D Convolutional Neural Networks for Video-level Representation Learning

1 code implementation • 18 Feb 2020 • Shiwen Zhang, Sheng Guo, Weilin Huang, Matthew R. Scott, Li-Min Wang

Most existing 3D CNNs for video representation learning are clip-based methods, and thus do not consider video-level temporal evolution of spatio-temporal features.

Long-range modeling Representation Learning +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.