2 code implementations • CVPR 2017 • Ali Diba, Vivek Sharma, Luc van Gool
Advantages of TLEs are: (a) they encode the entire video into a compact feature representation, learning the semantics and a discriminative feature space; (b) they are applicable to all kinds of networks like 2D and 3D CNNs for video classification; and (c) they model feature interactions in a more expressive way and without loss of information.
1 code implementation • CVPR 2021 • M. Saquib Sarfraz, Naila Murray, Vivek Sharma, Ali Diba, Luc van Gool, Rainer Stiefelhagen
Action segmentation refers to inferring boundaries of semantically consistent visual concepts in videos and is an important requirement for many video understanding tasks.
Ranked #1 on Action Segmentation on MPII Cooking 2 Dataset
3 code implementations • 22 Nov 2017 • Ali Diba, Mohsen Fayyaz, Vivek Sharma, Amir Hossein Karami, Mohammad Mahdi Arzani, Rahman Yousefzadeh, Luc van Gool
Thus, by finetuning this network, we beat the performance of generic and recent methods in 3D CNNs, which were trained on large video datasets, e. g. Sports-1M, and finetuned on the target datasets, e. g. HMDB51/UCF101.
1 code implementation • ECCV 2020 • Ali Diba, Mohsen Fayyaz, Vivek Sharma, Manohar Paluri, Jurgen Gall, Rainer Stiefelhagen, Luc van Gool
HVU is organized hierarchically in a semantic taxonomy that focuses on multi-label and multi-task video understanding as a comprehensive problem that encompasses the recognition of multiple semantic aspects in the dynamic scene.
Ranked #11 on Action Recognition on UCF101
1 code implementation • 15 Jun 2016 • Amir Ghodrati, Ali Diba, Marco Pedersoli, Tinne Tuytelaars, Luc van Gool
In this paper, a new method for generating object and action proposals in images and videos is proposed.
1 code implementation • ICCV 2015 • Amir Ghodrati, Ali Diba, Marco Pedersoli, Tinne Tuytelaars, Luc van Gool
We generate hypotheses in a sliding-window fashion over different activation layers and show that the final convolutional layers can find the object of interest with high recall but poor localization due to the coarseness of the feature maps.
1 code implementation • CVPR 2021 • Mohsen Fayyaz, Emad Bahrami, Ali Diba, Mehdi Noroozi, Ehsan Adeli, Luc van Gool, Juergen Gall
While the GFLOPs of a 3D CNN can be decreased by reducing the temporal feature resolution within the network, there is no setting that is optimal for all input clips.
no code implementations • 22 Nov 2017 • Ali Diba, Vivek Sharma, Rainer Stiefelhagen, Luc van Gool
We approach GANs with a novel training method and learning objective, to discover multiple object instances for three cases: 1) synthesizing a picture of a specific object within a cluttered scene; 2) localizing different categories in images for weakly supervised object detection; and 3) improving object discov- ery in object detection pipelines.
Ranked #2 on Weakly Supervised Object Detection on COCO test-dev
no code implementations • 20 Oct 2017 • Vivek Sharma, Ali Diba, Davy Neven, Michael S. Brown, Luc van Gool, Rainer Stiefelhagen
In this paper, we are interested in learning CNNs that can emulate image enhancement and restoration, but with the overall goal to improve image classification and not necessarily human perception.
no code implementations • CVPR 2017 • Ali Diba, Vivek Sharma, Ali Pazandeh, Hamed Pirsiavash, Luc van Gool
The final stage of both architectures is a part of a convolutional neural network that performs multiple instance learning on proposals extracted in the previous stage(s).
Ranked #2 on Weakly Supervised Object Detection on ImageNet
no code implementations • 31 Aug 2016 • Ali Diba, Ali Mohammad Pazandeh, Luc van Gool
The video and action classification have extremely evolved by deep neural networks specially with two stream CNN using RGB and optical flow as inputs and they present outstanding performance in terms of video analysis.
no code implementations • CVPR 2016 • Ali Diba, Ali Mohammad Pazandeh, Hamed Pirsiavash, Luc van Gool
On the other hand, we let an iteration of feature learning and patch clustering purify the set of dedicated patches that we use.
no code implementations • ECCV 2018 • Ali Diba, Mohsen Fayyaz, Vivek Sharma, M. Mahdi Arzani, Rahman Yousefzadeh, Juergen Gall, Luc van Gool
Our experiments show that adding STC blocks to current state-of-the-art architectures outperforms the state-of-the-art methods on the HMDB51, UCF101 and Kinetics datasets.
no code implementations • CVPR 2018 • Vivek Sharma, Ali Diba, Davy Neven, Michael S. Brown, Luc van Gool, Rainer Stiefelhagen
In this paper, we are interested in learning CNNs that can emulate image enhancement and restoration, but with the overall goal to improve image classification and not necessarily human perception.
no code implementations • CVPR 2013 • Mohammad Rastegari, Ali Diba, Devi Parikh, Ali Farhadi
We exploit a discriminative binary space to compute these geometric quantities efficiently.
no code implementations • ICCV 2019 • Ali Diba, Vivek Sharma, Luc van Gool, Rainer Stiefelhagen
With these overall objectives, to this end, we introduce a novel unified spatio-temporal 3D-CNN architecture (DynamoNet) that jointly optimizes the video classification and learning motion representation by predicting future frames as a multi-task learning problem.
no code implementations • 14 Oct 2020 • Ali Varamesh, Ali Diba, Tinne Tuytelaars, Luc van Gool
We present a new framework for self-supervised representation learning by formulating it as a ranking problem in an image retrieval context on a large number of random views (augmentations) obtained from images.
no code implementations • ICCV 2021 • Ali Diba, Vivek Sharma, Reza Safdari, Dariush Lotfi, Saquib Sarfraz, Rainer Stiefelhagen, Luc van Gool
In this paper, we introduce a novel self-supervised visual representation learning method which understands both images and videos in a joint learning fashion.