no code implementations • 23 Aug 2024 • Martin Strauss, Wolfgang Mack, María Luis Valero, Okan Köpüklü
We propose a novel Neural Steering technique that adapts the target area of a spatial-aware multi-microphone sound source separation algorithm during inference without the necessity of retraining the deep neural network (DNN).
no code implementations • 19 Aug 2024 • Martin Strauss, Okan Köpüklü
The aim is to preserve speech signals from an unspecified number of sources within a defined spatial area in front of a linear microphone array, while suppressing all other sounds.
1 code implementation • ICCV 2021 • Okan Köpüklü, Maja Taseska, Gerhard Rigoll
Successful active speaker detection requires a three-stage pipeline: (i) audio-visual encoding for all speakers in the clip, (ii) inter-speaker relation modeling between a reference speaker and the background speakers within each frame, and (iii) temporal modeling for the reference speaker.
Active Speaker Detection
Audio-Visual Active Speaker Detection
no code implementations • 24 Feb 2021 • Hakan Cevikalp, Bedirhan Uzun, Okan Köpüklü, Gurkan Ozturk
In this paper, we propose a new deep neural network classifier that simultaneously maximizes the inter-class separation and minimizes the intra-class variation by using the polyhedral conic classification function.
no code implementations • 18 Nov 2020 • Hasan Saribas, Hakan Cevikalp, Okan Köpüklü, Bedirhan Uzun
Although motion provides distinctive and complementary information especially for fast moving objects, most of the recent tracking architectures primarily focus on the objects' appearance information.
1 code implementation • 30 Sep 2020 • Okan Köpüklü, Jiapeng Zheng, Hang Xu, Gerhard Rigoll
For this task, we introduce a new video-based benchmark, the Driver Anomaly Detection (DAD) dataset, which contains normal driving videos together with a set of anomalous actions in its training set.
1 code implementation • 30 Sep 2020 • Okan Köpüklü, Stefan Hörmann, Fabian Herzog, Hakan Cevikalp, Gerhard Rigoll
Convolutional Neural Networks with 3D kernels (3D-CNNs) currently achieve state-of-the-art results in video recognition tasks due to their supremacy in extracting spatiotemporal features within video frames.
no code implementations • 2 Mar 2020 • Okan Köpüklü, Thomas Ledwon, Yao Rong, Neslihan Kose, Gerhard Rigoll
In this work, we propose an HCI system for dynamic recognition of driver micro hand gestures, which can have a crucial impact in automotive sector especially for safety related issues.
1 code implementation • 10 Dec 2019 • Mert Kayhan, Okan Köpüklü, Mhd Hasan Sarhan, Mehmet Yigitsoy, Abouzar Eslami, Gerhard Rigoll
To this end, a lightweight network architecture is introduced and mean teacher, virtual adversarial training and pseudo-labeling algorithms are evaluated on 2D-pose estimation for surgical instruments.
no code implementations • 20 Nov 2019 • Yinglong Feng, Shuncheng Wu, Okan Köpüklü, Xueyang Kang, Federico Tombari
This paper studies unsupervised monocular depth prediction problem.
5 code implementations • 15 Nov 2019 • Okan Köpüklü, Xiangyu Wei, Gerhard Rigoll
YOWO is a single-stage architecture with two branches to extract temporal and spatial information concurrently and predict bounding boxes and action probabilities directly from video clips in one evaluation.
Ranked #1 on
Action Recognition In Videos
on AVA v2.2
1 code implementation • arXiv preprint 2019 • Okan Köpüklü, Fabian Herzog, Gerhard Rigoll
Understanding actions and gestures in video streams requires temporal reasoning of the spatial content from different time instants, i. e., spatiotemporal (ST) modeling.
Ranked #117 on
Action Recognition
on Something-Something V2
no code implementations • 10 May 2019 • Okan Köpüklü, Yao Rong, Gerhard Rigoll
The use of hand gestures provides a natural alternative to cumbersome interface devices for Human-Computer Interaction (HCI) systems.
2 code implementations • 4 Apr 2019 • Okan Köpüklü, Neslihan Kose, Ahmet Gunduz, Gerhard Rigoll
Recently, convolutional neural networks with 3D kernels (3D CNNs) have been very popular in computer vision community as a result of their superior ability of extracting spatio-temporal features within video frames compared to 2D CNNs.
Ranked #2 on
Action Recognition In Videos
on UCF101
5 code implementations • 29 Jan 2019 • Okan Köpüklü, Ahmet Gunduz, Neslihan Kose, Gerhard Rigoll
We evaluate our architecture on two publicly available datasets - EgoGesture and NVIDIA Dynamic Hand Gesture Datasets - which require temporal detection and classification of the performed hand gestures.
Ranked #1 on
Hand Gesture Recognition
on EgoGesture
1 code implementation • 28 Jan 2019 • Okan Köpüklü, Maryam Babaee, Stefan Hörmann, Gerhard Rigoll
In this paper, we propose a CNN architecture, Layer Reuse Network (LruNet), where the convolutional layers are used repeatedly without the need of introducing new layers to get a better performance.
1 code implementation • 19 Apr 2018 • Okan Köpüklü, Neslihan Köse, Gerhard Rigoll
Acquiring spatio-temporal states of an action is the most crucial step for action classification.
Ranked #1 on
Hand Gesture Recognition
on ChaLean test