no code implementations • 28 Feb 2025 • Sanchayan Santra, Vishal Chudasama, Pankaj Wasnik, Vineeth N. Balasubramanian
Particularly, we propose a network with a convolutional spatial-temporal feature extractor enhanced with our proposed Adaptive Spatio-Temporal Refinement Module (ASTRM) and a long-range temporal module.
no code implementations • 30 Dec 2024 • Hiran Sarkar, Vishal Chudasama, Naoyuki Onoe, Pankaj Wasnik, Vineeth N Balasubramanian
Open-Set Object Detection (OSOD) has emerged as a contemporary research direction to address the detection of unknown objects.
no code implementations • CVPR 2024 • Ayush Ghadiya, Purbayan Kar, Vishal Chudasama, Pankaj Wasnik
Recently, weakly supervised video anomaly detection (WS-VAD) has emerged as a contemporary research direction to identify anomaly events like violence and nudity in videos using only video-level labels.
no code implementations • 26 Aug 2024 • Vishal Chudasama, Hiran Sarkar, Pankaj Wasnik, Vineeth N Balasubramanian, Jayateja Kalla
While traditional FSOD methods have been studied before, this survey paper comprehensively reviews FSOD research with a specific focus on covering different FSOD settings such as standard FSOD, generalized FSOD, incremental FSOD, open-set FSOD, and domain adaptive FSOD.
no code implementations • 23 Feb 2024 • Purbayan Kar, Vishal Chudasama, Naoyuki Onoe, Pankaj Wasnik, Vineeth Balasubramanian
To effectively utilize the newly proposed augmentation technique, we employ a Siamese architecture-based training mechanism with a Deep Canonical Correlation Analysis (DCCA)-based loss to achieve collective learning of high-level feature representations from two different views of the input images.
Ranked #1 on
Facial Landmark Detection
on WFLW
(FR@10 (inter-ocular) metric)
no code implementations • 4 Jun 2023 • Purbayan Kar, Vishal Chudasama, Naoyuki Onoe, Pankaj Wasnik
Semi-supervised object detection (SSOD) has made significant progress with the development of pseudo-label-based end-to-end methods.
no code implementations • 5 Jun 2022 • Vishal Chudasama, Purbayan Kar, Ashish Gudmalwar, Nirmesh Shah, Pankaj Wasnik, Naoyuki Onoe
We introduce a new feature extractor to extract latent features from the audio and visual modality.
Ranked #15 on
Emotion Recognition in Conversation
on MELD
2 code implementations • 17 May 2021 • Andrey Ignatov, Cheng-Ming Chiang, Hsien-Kai Kuo, Anastasia Sycheva, Radu Timofte, Min-Hung Chen, Man-Yu Lee, Yu-Syuan Xu, Yu Tseng, Shusong Xu, Jin Guo, Chao-Hung Chen, Ming-Chun Hsyu, Wen-Chia Tsai, Chao-Wei Chen, Grigory Malivenko, Minsu Kwon, Myungje Lee, Jaeyoon Yoo, Changbeom Kang, Shinjo Wang, Zheng Shaolong, Hao Dejun, Xie Fen, Feng Zhuang, Yipeng Ma, Jingyang Peng, Tao Wang, Fenglong Song, Chih-Chung Hsu, Kwan-Lin Chen, Mei-Hsuang Wu, Vishal Chudasama, Kalpesh Prajapati, Heena Patel, Anjali Sarvaiya, Kishor Upla, Kiran Raja, Raghavendra Ramachandra, Christoph Busch, Etienne de Stoutz
As the quality of mobile cameras starts to play a crucial role in modern smartphones, more and more attention is now being paid to ISP algorithms used to improve various perceptual aspects of mobile photos.