Search Results for author: Mohamed Elhoseiny

Found 93 papers, 47 papers with code

Text to Multi-level MindMaps: A Novel Method for Hierarchical Visual Abstraction of Natural Language Text

no code implementations • 1 Aug 2014 • Mohamed Elhoseiny, Ahmed Elgammal

This work firstly introduces MindMap Multilevel Visualization concept which is to jointly visualize and summarize textual information.

Paper
Add Code

Generalized Twin Gaussian Processes using Sharma-Mittal Divergence

no code implementations • 26 Sep 2014 • Mohamed Elhoseiny, Ahmed Elgammal

In this paper, we present a generalized structured regression framework based on Shama-Mittal divergence, a relative entropy measure, which is introduced to the Machine Learning community in this work.

BIG-bench Machine Learning Gaussian Processes

Paper
Add Code

Learning Hypergraph-regularized Attribute Predictors

no code implementations • CVPR 2015 • Sheng Huang, Mohamed Elhoseiny, Ahmed Elgammal, Dan Yang

Then the attribute prediction problem is casted as a regularized hypergraph cut problem in which HAP jointly learns a collection of attribute projections from the feature space to a hypergraph embedding space aligned with the attribute space.

Attribute hypergraph embedding

Paper
Add Code

Tell and Predict: Kernel Classifier Prediction for Unseen Visual Classes from Unstructured Text Descriptions

no code implementations • 29 Jun 2015 • Mohamed Elhoseiny, Ahmed Elgammal, Babak Saleh

In this paper we propose a framework for predicting kernelized classifiers in the visual domain for categories with no training images where the knowledge comes from textual description about these categories.

Zero-Shot Learning

Paper
Add Code

Digging Deep into the layers of CNNs: In Search of How CNNs Achieve View Invariance

no code implementations • 9 Aug 2015 • Amr Bakry, Mohamed Elhoseiny, Tarek El-Gaaly, Ahmed Elgammal

How does fine-tuning of a pre-trained CNN on a multi-view dataset affect the representation at each layer of the network?

Paper
Add Code

Visual Classifier Prediction by Distributional Semantic Embedding of Text Descriptions

no code implementations • WS 2015 • Mohamed Elhoseiny, Ahmed Elgammal

Domain Adaptation Image Captioning +3

Paper
Add Code

Sherlock: Scalable Fact Learning in Images

no code implementations • 16 Nov 2015 • Mohamed Elhoseiny, Scott Cohen, Walter Chang, Brian Price, Ahmed Elgammal

We show that learning visual facts in a structured way enables not only a uniform but also generalizable visual understanding.

Multiview Learning Retrieval

Paper
Add Code

Convolutional Models for Joint Object Categorization and Pose Estimation

no code implementations • 16 Nov 2015 • Mohamed Elhoseiny, Tarek El-Gaaly, Amr Bakry, Ahmed Elgammal

In the task of Object Recognition, there exists a dichotomy between the categorization of objects and estimating object pose, where the former necessitates a view-invariant representation, while the latter requires a representation capable of capturing pose information over different categories of objects.

Object Object Categorization +2

Paper
Add Code

Zero-Shot Event Detection by Multimodal Distributional Semantic Embedding of Videos

no code implementations • 2 Dec 2015 • Mohamed Elhoseiny, Jingen Liu, Hui Cheng, Harpreet Sawhney, Ahmed Elgammal

To our knowledge, this is the first Zero-Shot event detection model that is built on top of distributional semantics and extends it in the following directions: (a) semantic embedding of multimodal information in videos (with focus on the visual modalities), (b) automatically determining relevance of concepts/attributes to a free text query, which could be useful for other applications, and (c) retrieving videos by free text event query (e. g., "changing a vehicle tire") based on their content.

Event Detection

Paper
Add Code

Write a Classifier: Predicting Visual Classifiers from Unstructured Text

no code implementations • 31 Dec 2015 • Mohamed Elhoseiny, Ahmed Elgammal, Babak Saleh

Then, we propose a new constrained optimization formulation that combines a regression function and a knowledge transfer function with additional constraints to predict the parameters of a linear classifier.

regression Transfer Learning

Paper
Add Code

Automatic Annotation of Structured Facts in Images

no code implementations • WS 2016 • Mohamed Elhoseiny, Scott Cohen, Walter Chang, Brian Price, Ahmed Elgammal

Motivated by the application of fact-level image understanding, we present an automatic method for data collection of structured visual facts from images with captions.

Paper
Add Code

SPDA-CNN: Unifying Semantic Part Detection and Abstraction for Fine-Grained Recognition

no code implementations • CVPR 2016 • Han Zhang, Tao Xu, Mohamed Elhoseiny, Xiaolei Huang, Shaoting Zhang, Ahmed Elgammal, Dimitris Metaxas

In this paper, we propose a new CNN architecture that integrates semantic part detection and abstraction (SPDA-CNN) for fine-grained classification.

General Classification Object Recognition +1

Paper
Add Code

Overlapping Cover Local Regression Machines

no code implementations • 5 Jan 2017 • Mohamed Elhoseiny, Ahmed Elgammal

We present the Overlapping Domain Cover (ODC) notion for kernel machines, as a set of overlapping subsets of the data that covers the entire training set and optimized to be spatially cohesive as possible.

GPR Pose Estimation +1

Paper
Add Code

CAN: Creative Adversarial Networks, Generating "Art" by Learning About Styles and Deviating from Style Norms

10 code implementations • 21 Jun 2017 • Ahmed Elgammal, Bingchen Liu, Mohamed Elhoseiny, Marian Mazzone

We argue that such networks are limited in their ability to generate creative products in their original design.

221

Paper
Code

Relationship Proposal Networks

no code implementations • CVPR 2017 • Ji Zhang, Mohamed Elhoseiny, Scott Cohen, Walter Chang, Ahmed Elgammal

We demonstrate the ability of our Rel-PN to localize relationships with only a few thousand proposals.

Scene Understanding

Paper
Add Code

Link the head to the "beak": Zero Shot Learning from Noisy Text Description at Part Precision

no code implementations • CVPR 2017 • Mohamed Elhoseiny, Yizhe Zhu, Han Zhang, Ahmed Elgammal

We propose a learning framework that is able to connect text terms to its relevant parts and suppress connections to non-visual text terms without any part-text annotations.

Zero-Shot Learning

Paper
Add Code

Memory Aware Synapses: Learning what (not) to forget

3 code implementations • ECCV 2018 • Rahaf Aljundi, Francesca Babiloni, Mohamed Elhoseiny, Marcus Rohrbach, Tinne Tuytelaars

We show state-of-the-art performance and, for the first time, the ability to adapt the importance of the parameters based on unlabeled data towards what the network needs (not) to forget, which may vary depending on test conditions.

Object Recognition

1,657

Paper
Code

A Generative Adversarial Approach for Zero-Shot Learning from Noisy Texts

no code implementations • CVPR 2018 • Yizhe Zhu, Mohamed Elhoseiny, Bingchen Liu, Xi Peng, Ahmed Elgammal

Most existing zero-shot learning methods consider the problem as a visual semantic embedding one.

Zero-Shot Learning

Paper
Add Code

The Shape of Art History in the Eyes of the Machine

no code implementations • 23 Jan 2018 • Ahmed Elgammal, Marian Mazzone, Bingchen Liu, Diana Kim, Mohamed Elhoseiny

How does the machine classify styles in art?

Paper
Add Code

DeSIGN: Design Inspiration from Generative Networks

1 code implementation • 3 Apr 2018 • Othman Sbai, Mohamed Elhoseiny, Antoine Bordes, Yann Lecun, Camille Couprie

Can an algorithm create original and compelling fashion designs to serve as an inspirational assistant?

Image Generation Retrieval

Paper
Code

Large-Scale Visual Relationship Understanding

2 code implementations • 27 Apr 2018 • Ji Zhang, Yannis Kalantidis, Marcus Rohrbach, Manohar Paluri, Ahmed Elgammal, Mohamed Elhoseiny

Large scale visual understanding is challenging, as it requires a model to handle the widely-spread and imbalanced distribution of <subject, relation, object> triples.

Relationship Detection

113

Paper
Code

Choose Your Neuron: Incorporating Domain Knowledge through Neuron-Importance

1 code implementation • ECCV 2018 • Ramprasaath R. Selvaraju, Prithvijit Chattopadhyay, Mohamed Elhoseiny, Tilak Sharma, Dhruv Batra, Devi Parikh, Stefan Lee

Our approach, which we call Neuron Importance-AwareWeight Transfer (NIWT), learns to map domain knowledge about novel "unseen" classes onto this dictionary of learned concepts and then optimizes for network parameters that can effectively combine these concepts - essentially learning classifiers by discovering and composing learned semantic concepts in deep networks.

Generalized Zero-Shot Learning

Paper
Code

Uncertainty-guided Lifelong Learning in Bayesian Networks

no code implementations • 27 Sep 2018 • Sayna Ebrahimi, Mohamed Elhoseiny, Trevor Darrell, Marcus Rohrbach

Sequentially learning of tasks arriving in a continuous stream is a complex problem and becomes more challenging when the model has a fixed capacity.

Continual Learning

Paper
Add Code

Video Object Segmentation using Teacher-Student Adaptation in a Human Robot Interaction (HRI) Setting

1 code implementation • 17 Oct 2018 • Mennatullah Siam, Chen Jiang, Steven Lu, Laura Petrich, Mahmoud Gamal, Mohamed Elhoseiny, Martin Jagersand

A human teacher can show potential objects of interest to the robot, which is able to self adapt to the teaching signal without providing manual segmentation labels.

Incremental Learning Robot Manipulation +4

Paper
Code

GDPP: Learning Diverse Generations Using Determinantal Point Process

4 code implementations • 30 Nov 2018 • Mohamed Elfeki, Camille Couprie, Morgane Riviere, Mohamed Elhoseiny

Generative models have proven to be an outstanding tool for representing high-dimensional probability distributions and generating realistic-looking images.

1,595

Paper
Code

Efficient Lifelong Learning with A-GEM

2 code implementations • ICLR 2019 • Arslan Chaudhry, Marc'Aurelio Ranzato, Marcus Rohrbach, Mohamed Elhoseiny

In lifelong learning, the learner is presented with a sequence of tasks, incrementally building a data-driven prior which may be leveraged to speed up learning of a new task.

Ranked #6 on Continual Learning on ASC (19 tasks)

Class Incremental Learning

1,657

Paper
Code

Exploring the Challenges towards Lifelong Fact Learning

no code implementations • 26 Dec 2018 • Mohamed Elhoseiny, Francesca Babiloni, Rahaf Aljundi, Marcus Rohrbach, Manohar Paluri, Tinne Tuytelaars

So far life-long learning (LLL) has been studied in relatively small-scale and relatively artificial setups.

Paper
Add Code

On Tiny Episodic Memories in Continual Learning

6 code implementations • 27 Feb 2019 • Arslan Chaudhry, Marcus Rohrbach, Mohamed Elhoseiny, Thalaiyasingam Ajanthan, Puneet K. Dokania, Philip H. S. Torr, Marc'Aurelio Ranzato

But for a successful knowledge transfer, the learner needs to remember how to perform previous tasks.

Ranked #7 on Class Incremental Learning on cifar100

Class Incremental Learning Transfer Learning

1,657

Paper
Code

Semi-Supervised Few-Shot Learning with Prototypical Random Walks

1 code implementation • 6 Mar 2019 • Ahmed Ayyad, Yuchen Li, Nassir Navab, Shadi Albarqouni, Mohamed Elhoseiny

We develop a random walk semi-supervised loss that enables the network to learn representations that are compact and well-separated.

Few-Shot Learning

Paper
Code

Creativity Inspired Zero-Shot Learning

2 code implementations • ICCV 2019 • Mohamed Elhoseiny, Mohamed Elfeki

We relate ZSL to human creativity by observing that zero-shot learning is about recognizing the unseen and creativity is about creating a likable unseen.

Attribute Transfer Learning +1

Paper
Code

Learning Diverse Generations using Determinantal Point Processes

no code implementations • ICLR 2019 • Mohamed Elfeki, Camille Couprie, Mohamed Elhoseiny

Embedded in an adversarial training and variational autoencoder, our Generative DPP approach shows a consistent resistance to mode-collapse on a wide-variety of synthetic data and natural image datasets including MNIST, CIFAR10, and CelebA, while outperforming state-of-the-art methods for data-efficiency, convergence-time, and generation quality.

Point Processes

Paper
Add Code

Uncertainty-guided Continual Learning with Bayesian Neural Networks

2 code implementations • ICLR 2020 • Sayna Ebrahimi, Mohamed Elhoseiny, Trevor Darrell, Marcus Rohrbach

Continual learning aims to learn new tasks without forgetting previously learned ones.

Continual Learning

Paper
Code

Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction

2 code implementations • CVPR 2020 • Abduallah Mohamed, Kun Qian, Mohamed Elhoseiny, Christian Claudel

Better machine understanding of pedestrian behaviors enables faster progress in modeling interactions between agents such as autonomous vehicles and humans.

Ranked #3 on Trajectory Prediction on ETH

Autonomous Vehicles Human motion prediction +3

457

Paper
Code

Exploring Long Tail Visual Relationship Recognition with Large Vocabulary

3 code implementations • ICCV 2021 • Sherif Abdelkarim, Aniket Agarwal, Panos Achlioptas, Jun Chen, Jiaji Huang, Boyang Li, Kenneth Church, Mohamed Elhoseiny

We use these benchmarks to study the performance of several state-of-the-art long-tail models on the LTVRR setup.

Visual Relationship Detection

Paper
Code

Efficient long-distance relation extraction with DG-SpanBERT

no code implementations • Conference 2020 • Jun Chen, Robert Hoehndorf, Mohamed Elhoseiny, Xiangliang Zhang

In natural language processing, relation extraction seeks to rationally understand unstructured text.

Ranked #16 on Relation Extraction on TACRED

Language Modelling Relation +2

Paper
Add Code

Compositional Language Continual Learning

1 code implementation • ICLR 2020 • Yuanpeng Li, Liang Zhao, Kenneth Church, Mohamed Elhoseiny

It also shows significant improvement in machine translation task.

Continual Learning Machine Translation +2

Paper
Code

Inner Ensemble Networks: Average Ensemble as an Effective Regularizer

1 code implementation • 15 Jun 2020 • Abduallah Mohamed, Muhammed Mohaimin Sadiq, Ehab AlBadawy, Mohamed Elhoseiny, Christian Claudel

Also, we show empirically and theoretically that IENs lead to a greater variance reduction in comparison with other similar approaches such as dropout and maxout.

Neural Architecture Search

Paper
Code

Class Normalization for (Continual)? Generalized Zero-Shot Learning

3 code implementations • 19 Jun 2020 • Ivan Skorokhodov, Mohamed Elhoseiny

Normalization techniques have proved to be a crucial ingredient of successful training in a traditional supervised learning regime.

Generalized Zero-Shot Learning

Paper
Code

Temporal Positive-unlabeled Learning for Biomedical Hypothesis Generation via Risk Estimation

no code implementations • NeurIPS 2020 • Uchenna Akujuobi, Jun Chen, Mohamed Elhoseiny, Michael Spranger, Xiangliang Zhang

Then, the key is to capture the temporal evolution of node pair (term pair) relations from just the positive and unlabeled data.

Link Prediction Variational Inference

Paper
Add Code

Adversarial Generation of Continuous Images

1 code implementation • CVPR 2021 • Ivan Skorokhodov, Savva Ignatyev, Mohamed Elhoseiny

In most existing learning systems, images are typically viewed as 2D pixel arrays.

Ranked #12 on Image Generation on LSUN Churches 256 x 256

Image Generation

228

Paper
Code

CIZSL++: Creativity Inspired Generative Zero-Shot Learning

2 code implementations • 1 Jan 2021 • Mohamed Elhoseiny, Kai Yi, Mohamed Elfeki

To improve the discriminative power of ZSL, we model the visual learning process of unseen categories with inspiration from the psychology of human creativity for producing novel art.

Attribute Transfer Learning +1

Paper
Code

HalentNet: Multimodal Trajectory Forecasting with Hallucinative Intents

no code implementations • ICLR 2021 • Deyao Zhu, Mohamed Zahran, Li Erran Li, Mohamed Elhoseiny

Our model's learned representation leads to better and more semantically meaningful coverage of the trajectory distribution.

Motion Forecasting Trajectory Forecasting

Paper
Add Code

Motion Forecasting with Unlikelihood Training

no code implementations • 1 Jan 2021 • Deyao Zhu, Mohamed Zahran, Li Erran Li, Mohamed Elhoseiny

We propose a new objective, unlikelihood training, which forces generated trajectories that conflicts with contextual information to be assigned a lower probability by our model.

Motion Forecasting Trajectory Forecasting

Paper
Add Code

Class Normalization for Zero-Shot Learning

no code implementations • ICLR 2021 • Ivan Skorokhodov, Mohamed Elhoseiny

Normalization techniques have proved to be a crucial ingredient of successful training in a traditional supervised learning regime.

Zero-Shot Learning

Paper
Add Code

Gradient Descent Resists Compositionality

no code implementations • 1 Jan 2021 • Yuanpeng Li, Liang Zhao, Joel Hestness, Kenneth Church, Mohamed Elhoseiny

In this paper, we argue that gradient descent is one of the reasons that make compositionality learning hard during neural network optimization.

Paper
Add Code

Transferability of Compositionality

no code implementations • 1 Jan 2021 • Yuanpeng Li, Liang Zhao, Joel Hestness, Ka Yee Lun, Kenneth Church, Mohamed Elhoseiny

To our best knowledge, this is the first work to focus on the transferability of compositionality, and it is orthogonal to existing efforts of learning compositional representations in training distribution.

Out-of-Distribution Generalization

Paper
Add Code

ArtEmis: Affective Language for Visual Art

3 code implementations • CVPR 2021 • Panos Achlioptas, Maks Ovsjanikov, Kilichbek Haydarov, Mohamed Elhoseiny, Leonidas Guibas

We present a novel large-scale dataset and accompanying machine learning models aimed at providing a detailed understanding of the interplay between visual content, its emotional effect, and explanations for the latter in language.

296

Paper
Code

VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

1 code implementation • CVPR 2022 • Jun Chen, Han Guo, Kai Yi, Boyang Li, Mohamed Elhoseiny

To the best of our knowledge, this is the first work that improves data efficiency of image captioning by utilizing LM pretrained on unimodal data.

Image Captioning Language Modelling +1

305

Paper
Code

Aligning Latent and Image Spaces to Connect the Unconnectable

1 code implementation • ICCV 2021 • Ivan Skorokhodov, Grigorii Sotnikov, Mohamed Elhoseiny

In this work, we develop a method to generate infinite high-resolution images with diverse and complex content.

Ranked #1 on Infinite Image Generation on LHQ

Infinite Image Generation

231

Paper
Code

Imaginative Walks: Generative Random Walk Deviation Loss for Improved Unseen Learning Representation

1 code implementation • 20 Apr 2021 • Divyansh Jha, Kai Yi, Ivan Skorokhodov, Mohamed Elhoseiny

By generating representations of unseen classes based on their semantic descriptions, e. g., attributes or text, generative ZSL attempts to differentiate unseen from seen categories.

Attribute Image Generation +1

Paper
Code

RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition

1 code implementation • CVPR 2022 • Jun Chen, Aniket Agarwal, Sherif Abdelkarim, Deyao Zhu, Mohamed Elhoseiny

This paper shows that modeling an effective message-passing flow through an attention mechanism can be critical to tackling the compositionality and long-tail challenges in VRR.

Image Captioning Object Recognition +5

Paper
Code

CausalDyna: Improving Generalization of Dyna-style Reinforcement Learning via Counterfactual-Based Data Augmentation

no code implementations • 29 Sep 2021 • Deyao Zhu, Li Erran Li, Mohamed Elhoseiny

Deep reinforcement learning agents trained in real-world environments with a limited diversity of object properties to learn manipulation tasks tend to suffer overfitting and fail to generalize to unseen testing environments.

counterfactual Data Augmentation +3

Paper
Add Code

HyperCGAN: Text-to-Image Synthesis with HyperNet-Modulated Conditional Generative Adversarial Networks

no code implementations • 29 Sep 2021 • Kilichbek Haydarov, Aashiq Muhamed, Jovana Lazarevic, Ivan Skorokhodov, Mohamed Elhoseiny

To the best of our knowledge, our work is the first one which explores text-controllable continuous image generation.

Image Generation

Paper
Add Code

Domain-Aware Continual Zero-Shot Learning

no code implementations • 24 Dec 2021 • Kai Yi, Paul Janson, Wenxuan Zhang, Mohamed Elhoseiny

Accordingly, we propose a Domain-Invariant Network (DIN) to learn factorized features for shifting domains and improved textual representation for unseen classes.

Disentanglement Zero-Shot Learning

Paper
Add Code

StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2

1 code implementation • CVPR 2022 • Ivan Skorokhodov, Sergey Tulyakov, Mohamed Elhoseiny

We build our model on top of StyleGAN2 and it is just ${\approx}5\%$ more expensive to train at the same resolution while achieving almost the same image quality.

Video Generation

314

Paper
Code

Efficiently Disentangle Causal Representations

1 code implementation • 6 Jan 2022 • Yuanpeng Li, Joel Hestness, Mohamed Elhoseiny, Liang Zhao, Kenneth Church

This paper proposes an efficient approach to learning disentangled representations with causal mechanisms based on the difference of conditional probabilities in original and new distributions.

Paper
Code

Exploring Hierarchical Graph Representation for Large-Scale Zero-Shot Image Classification

1 code implementation • 2 Mar 2022 • Kai Yi, Xiaoqian Shen, Yunhao Gou, Mohamed Elhoseiny

The main question we address in this paper is how to scale up visual recognition of unseen classes, also known as zero-shot learning, to tens of thousands of categories as in the ImageNet-21K benchmark.

Image Classification Zero-Shot Image Classification +1

Paper
Code

Social-Implicit: Rethinking Trajectory Prediction Evaluation and The Effectiveness of Implicit Maximum Likelihood Estimation

1 code implementation • 6 Mar 2022 • Abduallah Mohamed, Deyao Zhu, Warren Vu, Mohamed Elhoseiny, Christian Claudel

AMD is a metric that quantifies how close the whole generated samples are to the ground truth.

Ranked #1 on Trajectory Prediction on Stanford Drone (ADE (in world coordinates) metric)

Human motion prediction motion prediction +2

Paper
Code

It is Okay to Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection

1 code implementation • CVPR 2022 • Youssef Mohamed, Faizan Farooq Khan, Kilichbek Haydarov, Mohamed Elhoseiny

As a step in this direction, the ArtEmis dataset was recently introduced as a large-scale dataset of emotional reactions to images along with language explanations of these chosen emotions.

Image Captioning

Paper
Code

Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction

1 code implementation • 1 Jun 2022 • Jun Chen, Ming Hu, Boyang Li, Mohamed Elhoseiny

After finetuning the pretrained LoMaR on 384$\times$384 images, it can reach 85. 4% top-1 accuracy, surpassing MAE by 0. 6%.

Image Classification Instance Segmentation +3

Paper
Code

Value Memory Graph: A Graph-Structured World Model for Offline Reinforcement Learning

1 code implementation • 9 Jun 2022 • Deyao Zhu, Li Erran Li, Mohamed Elhoseiny

In some complex environments with continuous state-action spaces, sparse rewards, and/or long temporal horizons, learning a good policy in the original environments can be difficult.

D4RL Model-based Reinforcement Learning +3

Paper
Code

PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies

3 code implementations • 9 Jun 2022 • Guocheng Qian, Yuchen Li, Houwen Peng, Jinjie Mai, Hasan Abed Al Kader Hammoud, Mohamed Elhoseiny, Bernard Ghanem

In this work, we revisit the classical PointNet++ through a systematic study of model training and scaling strategies, and offer two major contributions.

Ranked #3 on 3D Semantic Segmentation on OpenTrench3D

3D Classification 3D Part Segmentation +3

695

Paper
Code

A Simple Baseline that Questions the Use of Pretrained-Models in Continual Learning

1 code implementation • 10 Oct 2022 • Paul Janson, Wenxuan Zhang, Rahaf Aljundi, Mohamed Elhoseiny

With the success of pretraining techniques in representation learning, a number of continual learning methods based on pretrained models have been proposed.

Continual Learning Representation Learning

Paper
Code

ArtELingo: A Million Emotion Annotations of WikiArt with Emphasis on Diversity over Language and Culture

no code implementations • 19 Nov 2022 • Youssef Mohamed, Mohamed Abdelfattah, Shyma Alhuwaider, Feifan Li, Xiangliang Zhang, Kenneth Ward Church, Mohamed Elhoseiny

This paper introduces ArtELingo, a new benchmark and dataset, designed to encourage work on diversity across languages and cultures.

Cultural Vocal Bursts Intensity Prediction

Paper
Add Code

Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual Grounding

1 code implementation • 25 Nov 2022 • Eslam Mohamed BAKR, Yasmeen Alsaedy, Mohamed Elhoseiny

The main question we address in this paper is "can we consolidate the 3D visual stream by 2D clues synthesized from point clouds and efficiently utilize them in training and testing?".

Knowledge Distillation Visual Grounding

Paper
Code

Exploring Open-Vocabulary Semantic Segmentation from CLIP Vision Encoder Distillation Only

no code implementations • ICCV 2023 • Jun Chen, Deyao Zhu, Guocheng Qian, Bernard Ghanem, Zhicheng Yan, Chenchen Zhu, Fanyi Xiao, Sean Chang Culatana, Mohamed Elhoseiny

Semantic segmentation is a crucial task in computer vision that involves segmenting images into semantically meaningful regions at the pixel level.

Open Vocabulary Semantic Segmentation Segmentation +3

Paper
Add Code

FishNet: A Large-scale Dataset and Benchmark for Fish Recognition, Detection, and Functional Trait Prediction

no code implementations • ICCV 2023 • Faizan Farooq Khan, Xiang Li, Andrew J. Temple, Mohamed Elhoseiny

Aquatic species are essential components of the world's ecosystem, and the preservation of aquatic biodiversity is crucial for maintaining proper ecosystem functioning.

Fish Detection

Paper
Add Code

Guiding Online Reinforcement Learning with Action-Free Offline Pretraining

1 code implementation • 30 Jan 2023 • Deyao Zhu, Yuhui Wang, Jürgen Schmidhuber, Mohamed Elhoseiny

In this paper, we investigate the potential of using action-free offline datasets to improve online reinforcement learning, name this problem Reinforcement Learning with Action-Free Offline Pretraining (AFP-RL).

Offline RL reinforcement-learning +1

Paper
Code

ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions

1 code implementation • 12 Mar 2023 • Deyao Zhu, Jun Chen, Kilichbek Haydarov, Xiaoqian Shen, Wenxuan Zhang, Mohamed Elhoseiny

By keeping acquiring new visual information from BLIP-2's answers, ChatCaptioner is able to generate more enriched image descriptions.

Image Captioning Question Answering +1

431

Paper
Code

MoStGAN-V: Video Generation with Temporal Motion Styles

1 code implementation • CVPR 2023 • Xiaoqian Shen, Xiang Li, Mohamed Elhoseiny

Video generation remains a challenging task due to spatiotemporal complexity and the requirement of synthesizing diverse motions with temporal consistency.

Video Generation

Paper
Code

Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions

1 code implementation • 9 Apr 2023 • Jun Chen, Deyao Zhu, Kilichbek Haydarov, Xiang Li, Mohamed Elhoseiny

Video captioning aims to convey dynamic scenes from videos using natural language, facilitating the understanding of spatiotemporal information within our environment.

Video Captioning

431

Paper
Code

ImageCaptioner$^2$: Image Captioner for Image Captioning Bias Amplification Assessment

no code implementations • 10 Apr 2023 • Eslam Mohamed BAKR, Pengzhan Sun, Li Erran Li, Mohamed Elhoseiny

In addition, we design a formulation for measuring the bias of generated captions as prompt-based image captioning instead of using language classifiers.

Image Captioning

Paper
Add Code

HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models

1 code implementation • ICCV 2023 • Eslam Mohamed BAKR, Pengzhan Sun, Xiaoqian Shen, Faizan Farooq Khan, Li Erran Li, Mohamed Elhoseiny

A human evaluation aligned with 95% of our evaluations on average was conducted to probe the effectiveness of HRS-Bench.

Fairness Text-to-Image Generation

Paper
Code

LLM as A Robotic Brain: Unifying Egocentric Memory and Control

no code implementations • 19 Apr 2023 • Jinjie Mai, Jun Chen, Bing Li, Guocheng Qian, Mohamed Elhoseiny, Bernard Ghanem

In this paper, we propose a novel and generalizable framework called LLM-Brain: using Large-scale Language Model as a robotic brain to unify egocentric memory and control.

Embodied Question Answering Language Modelling +2

Paper
Add Code

MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models

5 code implementations • 20 Apr 2023 • Deyao Zhu, Jun Chen, Xiaoqian Shen, Xiang Li, Mohamed Elhoseiny

Our work, for the first time, uncovers that properly aligning the visual features with an advanced large language model can possess numerous advanced multi-modal abilities demonstrated by GPT-4, such as detailed image description generation and website creation from hand-drawn drafts.

Ranked #9 on Visual Question Answering on BenchLMM

Language Modelling Large Language Model +3

24,854

Paper
Code

Exploring Open-Vocabulary Semantic Segmentation without Human Labels

no code implementations • 1 Jun 2023 • Jun Chen, Deyao Zhu, Guocheng Qian, Bernard Ghanem, Zhicheng Yan, Chenchen Zhu, Fanyi Xiao, Mohamed Elhoseiny, Sean Chang Culatana

Although acquired extensive knowledge of visual concepts, it is non-trivial to exploit knowledge from these VL models to the task of semantic segmentation, as they are usually trained at an image level.

Open Vocabulary Semantic Segmentation Segmentation +3

Paper
Add Code

MammalNet: A Large-scale Video Benchmark for Mammal Recognition and Behavior Understanding

no code implementations • CVPR 2023 • Jun Chen, Ming Hu, Darren J. Coker, Michael L. Berumen, Blair Costelloe, Sara Beery, Anna Rohrbach, Mohamed Elhoseiny

Monitoring animal behavior can facilitate conservation efforts by providing key insights into wildlife health, population status, and ecosystem function.

Paper
Add Code

SLAMB: Accelerated Large Batch Training with Sparse Communication

1 code implementation • The International Conference on Machine Learning (ICML) 2023 • Hang Xu, Wenxuan Zhang, Jiawei Fei, Yuzhe Wu, Tingwen Xie, Jun Huang, Yuchen Xie, Mohamed Elhoseiny, Panos Kalnis

Distributed training of large deep neural networks requires frequent exchange of massive data between machines, thus communication efficiency is a major concern.

Paper
Code

OxfordTVG-HIC: Can Machine Make Humorous Captions from Images?

no code implementations • ICCV 2023 • Runjia Li, Shuyang Sun, Mohamed Elhoseiny, Philip Torr

Hence, humour generation and understanding can serve as a new task for evaluating the ability of deep-learning methods to process abstract and subjective information.

Image Captioning

Paper
Add Code

Continual Zero-Shot Learning through Semantically Guided Generative Random Walks

1 code implementation • ICCV 2023 • Wenxuan Zhang, Paul Janson, Kai Yi, Ivan Skorokhodov, Mohamed Elhoseiny

The GRW loss augments the training by continually encouraging the model to generate realistic and characterized samples to represent the unseen space.

Novel Concepts Zero-Shot Learning

Paper
Code

Overcoming Generic Knowledge Loss with Selective Parameter Update

no code implementations • 23 Aug 2023 • Wenxuan Zhang, Paul Janson, Rahaf Aljundi, Mohamed Elhoseiny

Our method achieves improvements on the accuracy of the newly learned tasks up to 7% while preserving the pretraining knowledge with a negligible decrease of 0. 9% on a representative control set accuracy.

Continual Learning General Knowledge

Paper
Add Code

Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations

no code implementations • 30 Aug 2023 • Kilichbek Haydarov, Xiaoqian Shen, Avinash Madasu, Mahmoud Salem, Li-Jia Li, Gamaleldin Elsayed, Mohamed Elhoseiny

We introduce Affective Visual Dialog, an emotion explanation and reasoning task as a testbed for research on understanding the formation of emotions in visually grounded conversations.

Explanation Generation Question Answering +1

Paper
Add Code

CoT3DRef: Chain-of-Thoughts Data-Efficient 3D Visual Grounding

no code implementations • 10 Oct 2023 • Eslam Mohamed BAKR, Mohamed Ayman, Mahmoud Ahmed, Habib Slim, Mohamed Elhoseiny

To this end, we formulate the 3D visual grounding problem as a sequence-to-sequence task by first predicting a chain of anchors and then the final target.

Visual Grounding

Paper
Add Code

MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning

1 code implementation • 14 Oct 2023 • Jun Chen, Deyao Zhu, Xiaoqian Shen, Xiang Li, Zechun Liu, Pengchuan Zhang, Raghuraman Krishnamoorthi, Vikas Chandra, Yunyang Xiong, Mohamed Elhoseiny

Motivated by this, we target to build a unified interface for completing many vision-language tasks including image description, visual question answering, and visual grounding, among others.

Ranked #10 on Visual Question Answering on BenchLMM

Language Modelling Large Language Model +4

24,854

Paper
Code

A Hybrid Graph Network for Complex Activity Detection in Video

no code implementations • 26 Oct 2023 • Salman Khan, Izzeddin Teeti, Andrew Bradley, Mohamed Elhoseiny, Fabio Cuzzolin

Attention is then applied to this graph to obtain an overall representation of the local dynamic scene.

Action Detection Activity Detection +2

Paper
Add Code

3DCoMPaT$^{++}$: An improved Large-scale 3D Vision Dataset for Compositional Recognition

1 code implementation • 27 Oct 2023 • Habib Slim, Xiang Li, Yuchen Li, Mahmoud Ahmed, Mohamed Ayman, Ujjwal Upadhyay, Ahmed Abdelreheem, Arpit Prajapati, Suhail Pothigara, Peter Wonka, Mohamed Elhoseiny

In this work, we present 3DCoMPaT$^{++}$, a multimodal 2D/3D dataset with 160 million rendered views of more than 10 million stylized 3D shapes carefully annotated at the part-instance level, alongside matching RGB point clouds, 3D textured meshes, depth maps, and segmentation masks.

Paper
Code

ToddlerDiffusion: Flash Interpretable Controllable Diffusion Model

no code implementations • 24 Nov 2023 • Eslam Mohamed BAKR, Liangbing Zhao, Vincent Tao Hu, Matthieu Cord, Patrick Perez, Mohamed Elhoseiny

Diffusion-based generative models excel in perceptually impressive synthesis but face challenges in interpretability.

Denoising Image Generation

Paper
Add Code

Label Delay in Continual Learning

no code implementations • 1 Dec 2023 • Botos Csaba, Wenxuan Zhang, Matthias Müller, Ser-Nam Lim, Mohamed Elhoseiny, Philip Torr, Adel Bibi

We introduce a new continual learning framework with explicit modeling of the label delay between data and label streams over time steps.

Continual Learning

Paper
Add Code

StoryGPT-V: Large Language Models as Consistent Story Visualizers

1 code implementation • 4 Dec 2023 • Xiaoqian Shen, Mohamed Elhoseiny

Therefore, we introduce \textbf{StoryGPT-V}, which leverages the merits of the latent diffusion (LDM) and LLM to produce images with consistent and high-quality characters grounded on given story descriptions.

Language Modelling Large Language Model +2

Paper
Code

Uni3DL: Unified Model for 3D and Language Understanding

no code implementations • 5 Dec 2023 • Xiang Li, Jian Ding, Zhaoyang Chen, Mohamed Elhoseiny

In this work, we present Uni3DL, a unified model for 3D and Language understanding.

Cross-Modal Retrieval Instance Segmentation +5

Paper
Add Code

AI Art Neural Constellation: Revealing the Collective and Contrastive State of AI-Generated and Human Art

1 code implementation • 4 Feb 2024 • Faizan Farooq Khan, Diana Kim, Divyansh Jha, Youssef Mohamed, Hanna H Chang, Ahmed Elgammal, Luba Elliott, Mohamed Elhoseiny

Our comparative analysis is based on an extensive dataset, dubbed ``ArtConstellation,'' consisting of annotations about art principles, likability, and emotions for 6, 000 WikiArt and 3, 200 AI-generated artworks.

Paper
Code

MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens

no code implementations • 4 Apr 2024 • Kirolos Ataallah, Xiaoqian Shen, Eslam Abdelrahman, Essam Sleiman, Deyao Zhu, Jian Ding, Mohamed Elhoseiny

This paper introduces MiniGPT4-Video, a multimodal Large Language Model (LLM) designed specifically for video understanding.

Language Modelling Large Language Model +1

Paper
Add Code

ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes

1 code implementation • ECCV 2020 • Panos Achlioptas, Ahmed Abdelreheem, Fei Xia, Mohamed Elhoseiny, Leonidas Guibas

Due to the scarcity and unsuitability of existent 3D-oriented linguistic resources for this task, we first develop two large-scale and complementary visio-linguistic datasets: i) extbf{ extit{Sr3D}}, which contains 83. 5K template-based utterances leveraging extit{spatial relations} with other fine-grained object classes to localize a referred object in a given scene, and ii) extbf{ extit{Nr3D}} which contains 41. 5K extit{natural, free-form}, utterances collected by deploying a 2-player object reference game in 3D scenes.

Object

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.