no code implementations • 29 Apr 2022 • Mahdi M. Kalayeh, Shervin Ardeshir, Lingyi Liu, Nagendra Kamath, Ashok Chandrashekar
The abundance and ease of utilizing sound, along with the fact that auditory clues reveal a plethora of information about what happens in a scene, make the audio-visual space an intuitive choice for representation learning.
no code implementations • NeurIPS 2021 • Mahdi M. Kalayeh, Nagendra Kamath, Lingyi Liu, Ashok Chandrashekar
The abundance and ease of utilizing sound, along with the fact that auditory clues reveal so much about what happens in the scene, make the audio-visual space a perfectly intuitive choice for self-supervised representation learning.
no code implementations • 23 Nov 2019 • Mahdi M. Kalayeh, Mubarak Shah
In SSG, the same idea is applied to the intermediate layers of the network.
no code implementations • 7 Jun 2018 • Mahdi M. Kalayeh, Mubarak Shah
We show that assuming samples within a mini-batch are from the same probability density function, then BN is identical to the Fisher vector of a Gaussian distribution.
no code implementations • CVPR 2018 • Mahdi M. Kalayeh, Emrah Basaran, Muhittin Gokmen, Mustafa E. Kamasak, Mubarak Shah
In this paper, we propose to adopt human semantic parsing which, due to its pixel-level accuracy and capability of modeling arbitrary contours, is naturally a better alternative.
Ranked #80 on Person Re-Identification on Market-1501
no code implementations • CVPR 2017 • Mahdi M. Kalayeh, Boqing Gong, Mubarak Shah
We build our facial attribute prediction model jointly with a deep semantic segmentation network.
Ranked #2 on Facial Attribute Classification on LFWA
no code implementations • 4 Jan 2015 • Mahdi M. Kalayeh, Stephen Mussmann, Alla Petrakova, Niels da Vitoria Lobo, Mubarak Shah
In the second phase, via a Kmeans clustering approach, we create motion components by clustering the flow vectors with respect to their location and velocity.
no code implementations • CVPR 2014 • Mahdi M. Kalayeh, Haroon Idrees, Mubarak Shah
Such models become obsolete and require relearning when new images and tags are added to database.
no code implementations • CVPR 2014 • Subhabrata Bhattacharya, Mahdi M. Kalayeh, Rahul Sukthankar, Mubarak Shah
While approaches based on bags of features excel at low-level action classification, they are ill-suited for recognizing complex events in video, where concept-based temporal representations currently dominate.