1 code implementation • 30 Nov 2023 • Zipeng Qi, Guoxi Huang, Zebin Huang, Qin Guo, Jinwen Chen, Junyu Han, Jian Wang, Gang Zhang, Lufei Liu, Errui Ding, Jingdong Wang
The LRDiff framework constructs an image-rendering process with multiple layers, each of which applies the vision guidance to instructively estimate the denoising direction for a single object.
1 code implementation • NeurIPS 2023 • Guoxi Huang, Hongtao Fu, Adrian G. Bors
With the same level of computational complexity as ViT-Base and ViT-Large, we instantiate 4. 5$\times$ and 2$\times$ deeper ViTs, dubbed ViT-S-54 and ViT-B-48.
no code implementations • 23 Nov 2022 • Guoxi Huang, Adrian G. Bors
Static appearance of video may impede the ability of a deep neural network to learn motion-relevant features in video action recognition.
no code implementations • TIP 2022 • Guoxi Huang, Adrian G. Bors
Through experiments we show that the proposed MBPM can be used as a plug-in module in various CNN backbone architectures, significantly boosting their performance.
2 code implementations • 29 Mar 2021 • Guoxi Huang, Adrian G. Bors
We design a trainable Motion Band-Pass Module (MBPM) for separating busy information from quiet information in raw video data.
Ranked #15 on Action Recognition on UCF101
1 code implementation • 17 Jul 2020 • Guoxi Huang, Adrian G. Bors
Convolutional Neural Networks (CNNs) model long-range dependencies by deeply stacking convolution operations with small window sizes, which makes the optimizations difficult.
Ranked #32 on Action Recognition on Something-Something V1
no code implementations • 11 Feb 2020 • Guoxi Huang, Adrian G. Bors
In this paper, we propose a new video representation learning method, named Temporal Squeeze (TS) pooling, which can extract the essential movement information from a long sequence of video frames and map it into a set of few images, named Squeezed Images.
Ranked #43 on Action Recognition on UCF101 (using extra training data)