Search Results for author: Shah Nawaz

Found 20 papers, 10 papers with code

Face-voice Association in Multilingual Environments (FAME) Challenge 2024 Evaluation Plan

1 code implementation14 Apr 2024 Muhammad Saad Saeed, Shah Nawaz, Muhammad Salman Tahir, Rohan Kumar Das, Muhammad Zaigham Zaheer, Marta Moscati, Markus Schedl, Muhammad Haris Khan, Karthik Nandakumar, Muhammad Haroon Yousaf

The Face-voice Association in Multilingual Environments (FAME) Challenge 2024 focuses on exploring face-voice association under a unique condition of multilingual scenario.

DCTM: Dilated Convolutional Transformer Model for Multimodal Engagement Estimation in Conversation

no code implementations31 Jul 2023 Vu Ngoc Tu, Van Thong Huynh, Hyung-Jeong Yang, M. Zaigham Zaheer, Shah Nawaz, Karthik Nandakumar, Soo-Hyung Kim

Conversational engagement estimation is posed as a regression problem, entailing the identification of the favorable attention and involvement of the participants in the conversation.

regression

Single-branch Network for Multimodal Training

1 code implementation10 Mar 2023 Muhammad Saad Saeed, Shah Nawaz, Muhammad Haris Khan, Muhammad Zaigham Zaheer, Karthik Nandakumar, Muhammad Haroon Yousaf, Arif Mahmood

With the rapid growth of social media platforms, users are sharing billions of multimedia posts containing audio, images, and text.

Cross-Modal Retrieval Retrieval

Speaker Recognition in Realistic Scenario Using Multimodal Data

no code implementations25 Feb 2023 Saqlain Hussain Shah, Muhammad Saad Saeed, Shah Nawaz, Muhammad Haroon Yousaf

To achieve this task, we proposed a two-branch network to learn joint representations of faces and voices in a multimodal system.

Speaker Recognition

Learning Branched Fusion and Orthogonal Projection for Face-Voice Association

1 code implementation22 Aug 2022 Muhammad Saad Saeed, Shah Nawaz, Muhammad Haris Khan, Sajid Javed, Muhammad Haroon Yousaf, Alessio Del Bue

In addition, we leverage cross-modal verification and matching tasks to analyze the impact of multiple languages on face-voice association.

Metric Learning

Guiding Attention using Partial-Order Relationships for Image Captioning

no code implementations15 Apr 2022 Murad Popattia, Muhammad Rafi, Rizwan Qureshi, Shah Nawaz

A pairwise ranking objective is used for training this embedding space which allows similar images, topics and captions in the shared semantic space to maintain a partial order in the visual-semantic hierarchy and hence, helps the model to produce more visually accurate captions.

Caption Generation Image Captioning

Semantically Grounded Visual Embeddings for Zero-Shot Learning

no code implementations3 Jan 2022 Shah Nawaz, Jacopo Cavazza, Alessio Del Bue

Zero-shot learning methods rely on fixed visual and semantic embeddings, extracted from independent vision and language models, both pre-trained for other large-scale tasks.

Zero-Shot Learning

Fusion and Orthogonal Projection for Improved Face-Voice Association

2 code implementations20 Dec 2021 Muhammad Saad Saeed, Muhammad Haris Khan, Shah Nawaz, Muhammad Haroon Yousaf, Alessio Del Bue

Prior works adopt pairwise or triplet loss formulations to learn an embedding space amenable for associated matching and verification tasks.

Cross-Modal Retrieval

Visual Word Embedding for Text Classification

1 code implementation25 Feb 2021 Ignazio Gallo, Shah Nawaz, Nicola Landro, Riccardo La Grassa

The question we answer with this paper is: ‘can we convert a text document into an image to take advantage of image neural models to classify text documents?’ To answer this question we present a novel text classification method that converts a document into an encoded image, using word embedding.

General Classification Image Classification +2

Cross-modal Speaker Verification and Recognition: A Multilingual Perspective

no code implementations28 Apr 2020 Muhammad Saad Saeed, Shah Nawaz, Pietro Morerio, Arif Mahmood, Ignazio Gallo, Muhammad Haroon Yousaf, Alessio Del Bue

Recent years have seen a surge in finding association between faces and voices within a cross-modal biometric application along with speaker recognition.

Speaker Recognition Speaker Verification

Deep Latent Space Learning for Cross-modal Mapping of Audio and Visual Signals

no code implementations18 Sep 2019 Shah Nawaz, Muhammad Kamran Janjua, Ignazio Gallo, Arif Mahmood, Alessandro Calefati

We quantitatively and qualitatively evaluate the proposed approach on VoxCeleb, a benchmarks audio-visual dataset on a multitude of tasks including cross-modal verification, cross-modal matching, and cross-modal retrieval.

Cross-Modal Retrieval Retrieval

Picture What you Read

1 code implementation9 Sep 2019 Ignazio Gallo, Shah Nawaz, Alessandro Calefati, Riccardo La Grassa, Nicola Landro

Visualization refers to our ability to create an image in our head based on the text we read or the words we hear.

Reading Comprehension

Do Cross Modal Systems Leverage Semantic Relationships?

no code implementations3 Sep 2019 Shah Nawaz, Muhammad Kamran Janjua, Ignazio Gallo, Arif Mahmood, Alessandro Calefati, Faisal Shafait

Our proposed measure evaluates the semantic similarity between the image and text representations in the latent embedding space.

Cross-Modal Retrieval Retrieval +2

Aiding Intra-Text Representations with Visual Context for Multimodal Named Entity Recognition

1 code implementation2 Apr 2019 Omer Arshad, Ignazio Gallo, Shah Nawaz, Alessandro Calefati

With massive explosion of social media such as Twitter and Instagram, people daily share billions of multimedia posts, containing images and text.

named-entity-recognition Named Entity Recognition +1

Learning Inward Scaled Hypersphere Embedding: Exploring Projections in Higher Dimensions

no code implementations16 Oct 2018 Muhammad Kamran Janjua, Shah Nawaz, Alessandro Calefati, Ignazio Gallo

Majority of the current dimensionality reduction or retrieval techniques rely on embedding the learned feature representations onto a computable metric space.

Dimensionality Reduction General Classification +1

Image and Encoded Text Fusion for Multi-Modal Classification

1 code implementation3 Oct 2018 Ignazio Gallo, Alessandro Calefati, Shah Nawaz, Muhammad Kamran Janjua

To learn feature representations of resulting images, standard Convolutional Neural Networks (CNNs) are employed for the classification task.

General Classification Multi-modal Classification

Seeing Colors: Learning Semantic Text Encoding for Classification

no code implementations31 Aug 2018 Shah Nawaz, Alessandro Calefati, Muhammad Kamran Janjua, Ignazio Gallo

The question we answer with this work is: can we convert a text document into an image to exploit best image classification models to classify documents?

General Classification Image Classification +2

Git Loss for Deep Face Recognition

1 code implementation23 Jul 2018 Alessandro Calefati, Muhammad Kamran Janjua, Shah Nawaz, Ignazio Gallo

Conventionally, CNNs have been trained with softmax as supervision signal to penalize the classification loss.

Face Identification Face Recognition +1

Revisiting Cross Modal Retrieval

no code implementations19 Jul 2018 Shah Nawaz, Muhammad Kamran Janjua, Alessandro Calefati, Ignazio Gallo

We show that text encodings can capture semantic relationships between multiple modalities.

Cross-Modal Retrieval Retrieval

Cannot find the paper you are looking for? You can Submit a new open access paper.