Search Results for author: Ziad Al-Halah

Found 26 papers, 6 papers with code

Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos

no code implementations10 Jul 2023 Sagnik Majumder, Ziad Al-Halah, Kristen Grauman

We propose a self-supervised method for learning representations based on spatial audio-visual correspondences in egocentric videos.

Audio Denoising Denoising

SpotEM: Efficient Video Search for Episodic Memory

no code implementations28 Jun 2023 Santhosh Kumar Ramakrishnan, Ziad Al-Halah, Kristen Grauman

The goal in episodic memory (EM) is to search a long egocentric video to answer a natural language query (e. g., "where did I leave my purse?").

Natural Language Queries

NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory

1 code implementation CVPR 2023 Santhosh Kumar Ramakrishnan, Ziad Al-Halah, Kristen Grauman

Searching long egocentric videos with natural language queries (NLQ) has compelling applications in augmented reality and robotics, where a fluid index into everything that a person (agent) has seen before could augment human memory and surface relevant information on demand.

Data Augmentation Natural Language Queries

Few-Shot Audio-Visual Learning of Environment Acoustics

no code implementations8 Jun 2022 Sagnik Majumder, Changan Chen, Ziad Al-Halah, Kristen Grauman

Room impulse response (RIR) functions capture how the surrounding physical environment transforms the sounds heard by a listener, with implications for various applications in AR, VR, and robotics.

audio-visual learning Room Impulse Response (RIR)

Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation

no code implementations CVPR 2022 Ziad Al-Halah, Santhosh K. Ramakrishnan, Kristen Grauman

In reinforcement learning for visual navigation, it is common to develop a model for each new task, and train that model from scratch with task-specific interactions in 3D environments.

Transfer Learning Visual Navigation

PONI: Potential Functions for ObjectGoal Navigation with Interaction-free Learning

no code implementations CVPR 2022 Santhosh Kumar Ramakrishnan, Devendra Singh Chaplot, Ziad Al-Halah, Jitendra Malik, Kristen Grauman

We propose Potential functions for ObjectGoal Navigation with Interaction-free learning (PONI), a modular approach that disentangles the skills of `where to look?'

Navigate

Move2Hear: Active Audio-Visual Source Separation

no code implementations ICCV 2021 Sagnik Majumder, Ziad Al-Halah, Kristen Grauman

We introduce the active audio-visual source separation problem, where an agent must move intelligently in order to better isolate the sounds coming from an object of interest in its environment.

Audio Source Separation Object

Environment Predictive Coding for Embodied Agents

no code implementations3 Feb 2021 Santhosh K. Ramakrishnan, Tushar Nagarajan, Ziad Al-Halah, Kristen Grauman

We introduce environment predictive coding, a self-supervised approach to learn environment-level representations for embodied agents.

Self-Supervised Learning

Semantic Audio-Visual Navigation

no code implementations CVPR 2021 Changan Chen, Ziad Al-Halah, Kristen Grauman

We propose a transformer-based model to tackle this new semantic AudioGoal task, incorporating an inferred goal descriptor that captures both spatial and semantic properties of the target.

Position Visual Navigation

Modeling Fashion Influence from Photos

no code implementations17 Nov 2020 Ziad Al-Halah, Kristen Grauman

The discovered influence relationships reveal how both cities and brands exert and receive fashion influence for an array of visual styles inferred from the images.

Learning to Set Waypoints for Audio-Visual Navigation

1 code implementation ICLR 2021 Changan Chen, Sagnik Majumder, Ziad Al-Halah, Ruohan Gao, Santhosh Kumar Ramakrishnan, Kristen Grauman

In audio-visual navigation, an agent intelligently travels through a complex, unmapped 3D environment using both sights and sounds to find a sound source (e. g., a phone ringing in another room).

Visual Navigation

Occupancy Anticipation for Efficient Exploration and Navigation

1 code implementation ECCV 2020 Santhosh K. Ramakrishnan, Ziad Al-Halah, Kristen Grauman

State-of-the-art navigation methods leverage a spatial memory to generalize to new environments, but their occupancy maps are limited to capturing the geometric structures directly observed by the agent.

Decision Making Efficient Exploration +1

VisualEchoes: Spatial Image Representation Learning through Echolocation

no code implementations ECCV 2020 Ruohan Gao, Changan Chen, Ziad Al-Halah, Carl Schissler, Kristen Grauman

Several animal species (e. g., bats, dolphins, and whales) and even visually impaired humans have the remarkable ability to perform echolocation: a biological sonar used to perceive spatial layout and locate objects in the world.

Monocular Depth Estimation Representation Learning +2

From Paris to Berlin: Discovering Fashion Style Influences Around the World

1 code implementation CVPR 2020 Ziad Al-Halah, Kristen Grauman

The evolution of clothing styles and their migration across the world is intriguing, yet difficult to describe quantitatively.

SoundSpaces: Audio-Visual Navigation in 3D Environments

2 code implementations ECCV 2020 Changan Chen, Unnat Jain, Carl Schissler, Sebastia Vicenc Amengual Gari, Ziad Al-Halah, Vamsi Krishna Ithapu, Philip Robinson, Kristen Grauman

Moving around in the world is naturally a multisensory experience, but today's embodied agents are deaf---restricted to solely their visual perception of the environment.

Navigate Visual Navigation

Smile, Be Happy :) Emoji Embedding for Visual Sentiment Analysis

no code implementations14 Jul 2019 Ziad Al-Halah, Andrew Aitken, Wenzhe Shi, Jose Caballero

Additionally, we introduce a novel emoji representation based on their visual emotional response which supports a deeper understanding of the emoji modality and their usage on social media.

Sentiment Analysis Transfer Learning

Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language Feedback

3 code implementations CVPR 2021 Hui Wu, Yupeng Gao, Xiaoxiao Guo, Ziad Al-Halah, Steven Rennie, Kristen Grauman, Rogerio Feris

We provide a detailed analysis of the characteristics of the Fashion IQ data, and present a transformer-based user simulator and interactive image retriever that can seamlessly integrate visual attributes with image features, user feedback, and dialog history, leading to improved performance over the state of the art in dialog-based image retrieval.

Attribute Image Retrieval +1

Traversing the Continuous Spectrum of Image Retrieval with Deep Dynamic Models

no code implementations1 Dec 2018 Ziad Al-Halah, Andreas M. Lehrmann, Leonid Sigal

While the proposed approaches in the literature can be roughly categorized into two main groups: category- and instance-based retrieval, in this work we show that the retrieval task is much richer and more complex.

Attribute Continuous Control +2

Informed Democracy: Voting-based Novelty Detection for Action Recognition

no code implementations30 Oct 2018 Alina Roitberg, Ziad Al-Halah, Rainer Stiefelhagen

While it is common in activity recognition to assume a closed-set setting, i. e. test samples are always of training categories, this assumption is impractical in a real-world scenario.

Action Classification Action Recognition +3

Automatic Discovery, Association Estimation and Learning of Semantic Attributes for a Thousand Categories

no code implementations CVPR 2017 Ziad Al-Halah, Rainer Stiefelhagen

Furthermore, we demonstrate that our model outperforms the state-of-the-art in zero-shot learning on three data sets: ImageNet, Animals with Attributes and aPascal/aYahoo.

Attribute Zero-Shot Learning

Relaxed Earth Mover's Distances for Chain- and Tree-connected Spaces and their use as a Loss Function in Deep Learning

no code implementations22 Nov 2016 Manuel Martinez, Monica Haurilet, Ziad Al-Halah, Makarand Tapaswi, Rainer Stiefelhagen

The Earth Mover's Distance (EMD) computes the optimal cost of transforming one distribution into another, given a known transport metric between them.

Small Data Image Classification

How to Transfer? Zero-Shot Object Recognition via Hierarchical Transfer of Semantic Attributes

no code implementations1 Apr 2016 Ziad Al-Halah, Rainer Stiefelhagen

We propose to capture these variations in a hierarchical model that expands the knowledge source with additional abstraction levels of attributes.

Attribute Object Recognition +1

Cannot find the paper you are looking for? You can Submit a new open access paper.