Search Results for author: Shah Nawaz

Found 20 papers, 10 papers with code

Face-voice Association in Multilingual Environments (FAME) Challenge 2024 Evaluation Plan

1 code implementation • 14 Apr 2024 • Muhammad Saad Saeed, Shah Nawaz, Muhammad Salman Tahir, Rohan Kumar Das, Muhammad Zaigham Zaheer, Marta Moscati, Markus Schedl, Muhammad Haris Khan, Karthik Nandakumar, Muhammad Haroon Yousaf

The Face-voice Association in Multilingual Environments (FAME) Challenge 2024 focuses on exploring face-voice association under a unique condition of multilingual scenario.

Paper
Code

DCTM: Dilated Convolutional Transformer Model for Multimodal Engagement Estimation in Conversation

no code implementations • 31 Jul 2023 • Vu Ngoc Tu, Van Thong Huynh, Hyung-Jeong Yang, M. Zaigham Zaheer, Shah Nawaz, Karthik Nandakumar, Soo-Hyung Kim

Conversational engagement estimation is posed as a regression problem, entailing the identification of the favorable attention and involvement of the participants in the conversation.

regression

Paper
Add Code

Single-branch Network for Multimodal Training

1 code implementation • 10 Mar 2023 • Muhammad Saad Saeed, Shah Nawaz, Muhammad Haris Khan, Muhammad Zaigham Zaheer, Karthik Nandakumar, Muhammad Haroon Yousaf, Arif Mahmood

With the rapid growth of social media platforms, users are sharing billions of multimedia posts containing audio, images, and text.

Cross-Modal Retrieval Retrieval

Paper
Code

Speaker Recognition in Realistic Scenario Using Multimodal Data

no code implementations • 25 Feb 2023 • Saqlain Hussain Shah, Muhammad Saad Saeed, Shah Nawaz, Muhammad Haroon Yousaf

To achieve this task, we proposed a two-branch network to learn joint representations of faces and voices in a multimodal system.

Speaker Recognition

Paper
Add Code

Learning Branched Fusion and Orthogonal Projection for Face-Voice Association

1 code implementation • 22 Aug 2022 • Muhammad Saad Saeed, Shah Nawaz, Muhammad Haris Khan, Sajid Javed, Muhammad Haroon Yousaf, Alessio Del Bue

In addition, we leverage cross-modal verification and matching tasks to analyze the impact of multiple languages on face-voice association.

Metric Learning

Paper
Code

Guiding Attention using Partial-Order Relationships for Image Captioning

no code implementations • 15 Apr 2022 • Murad Popattia, Muhammad Rafi, Rizwan Qureshi, Shah Nawaz

A pairwise ranking objective is used for training this embedding space which allows similar images, topics and captions in the shared semantic space to maintain a partial order in the visual-semantic hierarchy and hence, helps the model to produce more visually accurate captions.

Caption Generation Image Captioning

Paper
Add Code

Semantically Grounded Visual Embeddings for Zero-Shot Learning

no code implementations • 3 Jan 2022 • Shah Nawaz, Jacopo Cavazza, Alessio Del Bue

Zero-shot learning methods rely on fixed visual and semantic embeddings, extracted from independent vision and language models, both pre-trained for other large-scale tasks.

Zero-Shot Learning

Paper
Add Code

Fusion and Orthogonal Projection for Improved Face-Voice Association

2 code implementations • 20 Dec 2021 • Muhammad Saad Saeed, Muhammad Haris Khan, Shah Nawaz, Muhammad Haroon Yousaf, Alessio Del Bue

Prior works adopt pairwise or triplet loss formulations to learn an embedding space amenable for associated matching and verification tasks.

Cross-Modal Retrieval

Paper
Code

Visual Word Embedding for Text Classification

1 code implementation • 25 Feb 2021 • Ignazio Gallo, Shah Nawaz, Nicola Landro, Riccardo La Grassa

The question we answer with this paper is: ‘can we convert a text document into an image to take advantage of image neural models to classify text documents?’ To answer this question we present a novel text classification method that converts a document into an encoded image, using word embedding.

General Classification Image Classification +2

Paper
Code

Cross-modal Speaker Verification and Recognition: A Multilingual Perspective

no code implementations • 28 Apr 2020 • Muhammad Saad Saeed, Shah Nawaz, Pietro Morerio, Arif Mahmood, Ignazio Gallo, Muhammad Haroon Yousaf, Alessio Del Bue

Recent years have seen a surge in finding association between faces and voices within a cross-modal biometric application along with speaker recognition.

Speaker Recognition Speaker Verification

Paper
Add Code

Are These Birds Similar: Learning Branched Networks for Fine-grained Representations

3 code implementations • 16 Jan 2020 • Shah Nawaz, Alessandro Calefati, Moreno Caraffini, Nicola Landro, Ignazio Gallo

In recent years, natural language descriptions are used to obtain information on discriminative parts of the object.

Ranked #1 on Multi-Modal Document Classification on CUB-200-2011

Classification Document Text Classification +4

Paper
Code

Deep Latent Space Learning for Cross-modal Mapping of Audio and Visual Signals

no code implementations • 18 Sep 2019 • Shah Nawaz, Muhammad Kamran Janjua, Ignazio Gallo, Arif Mahmood, Alessandro Calefati

We quantitatively and qualitatively evaluate the proposed approach on VoxCeleb, a benchmarks audio-visual dataset on a multitude of tasks including cross-modal verification, cross-modal matching, and cross-modal retrieval.

Cross-Modal Retrieval Retrieval

Paper
Add Code

Picture What you Read

1 code implementation • 9 Sep 2019 • Ignazio Gallo, Shah Nawaz, Alessandro Calefati, Riccardo La Grassa, Nicola Landro

Visualization refers to our ability to create an image in our head based on the text we read or the words we hear.

Reading Comprehension

Paper
Code

Do Cross Modal Systems Leverage Semantic Relationships?

no code implementations • 3 Sep 2019 • Shah Nawaz, Muhammad Kamran Janjua, Ignazio Gallo, Arif Mahmood, Alessandro Calefati, Faisal Shafait

Our proposed measure evaluates the semantic similarity between the image and text representations in the latent embedding space.

Cross-Modal Retrieval Retrieval +2

Paper
Add Code

Aiding Intra-Text Representations with Visual Context for Multimodal Named Entity Recognition

1 code implementation • 2 Apr 2019 • Omer Arshad, Ignazio Gallo, Shah Nawaz, Alessandro Calefati

With massive explosion of social media such as Twitter and Instagram, people daily share billions of multimedia posts, containing images and text.

named-entity-recognition Named Entity Recognition +1

Paper
Code

Learning Inward Scaled Hypersphere Embedding: Exploring Projections in Higher Dimensions

no code implementations • 16 Oct 2018 • Muhammad Kamran Janjua, Shah Nawaz, Alessandro Calefati, Ignazio Gallo

Majority of the current dimensionality reduction or retrieval techniques rely on embedding the learned feature representations onto a computable metric space.

Dimensionality Reduction General Classification +1

Paper
Add Code

Image and Encoded Text Fusion for Multi-Modal Classification

1 code implementation • 3 Oct 2018 • Ignazio Gallo, Alessandro Calefati, Shah Nawaz, Muhammad Kamran Janjua

To learn feature representations of resulting images, standard Convolutional Neural Networks (CNNs) are employed for the classification task.

General Classification Multi-modal Classification

Paper
Code

Seeing Colors: Learning Semantic Text Encoding for Classification

no code implementations • 31 Aug 2018 • Shah Nawaz, Alessandro Calefati, Muhammad Kamran Janjua, Ignazio Gallo

The question we answer with this work is: can we convert a text document into an image to exploit best image classification models to classify documents?

General Classification Image Classification +2