Search Results for author: Mohamed Elhoseiny

Found 93 papers, 47 papers with code

Text to Multi-level MindMaps: A Novel Method for Hierarchical Visual Abstraction of Natural Language Text

no code implementations1 Aug 2014 Mohamed Elhoseiny, Ahmed Elgammal

This work firstly introduces MindMap Multilevel Visualization concept which is to jointly visualize and summarize textual information.

Generalized Twin Gaussian Processes using Sharma-Mittal Divergence

no code implementations26 Sep 2014 Mohamed Elhoseiny, Ahmed Elgammal

In this paper, we present a generalized structured regression framework based on Shama-Mittal divergence, a relative entropy measure, which is introduced to the Machine Learning community in this work.

BIG-bench Machine Learning Gaussian Processes

Learning Hypergraph-regularized Attribute Predictors

no code implementations CVPR 2015 Sheng Huang, Mohamed Elhoseiny, Ahmed Elgammal, Dan Yang

Then the attribute prediction problem is casted as a regularized hypergraph cut problem in which HAP jointly learns a collection of attribute projections from the feature space to a hypergraph embedding space aligned with the attribute space.

Attribute hypergraph embedding

Tell and Predict: Kernel Classifier Prediction for Unseen Visual Classes from Unstructured Text Descriptions

no code implementations29 Jun 2015 Mohamed Elhoseiny, Ahmed Elgammal, Babak Saleh

In this paper we propose a framework for predicting kernelized classifiers in the visual domain for categories with no training images where the knowledge comes from textual description about these categories.

Zero-Shot Learning

Digging Deep into the layers of CNNs: In Search of How CNNs Achieve View Invariance

no code implementations9 Aug 2015 Amr Bakry, Mohamed Elhoseiny, Tarek El-Gaaly, Ahmed Elgammal

How does fine-tuning of a pre-trained CNN on a multi-view dataset affect the representation at each layer of the network?

Sherlock: Scalable Fact Learning in Images

no code implementations16 Nov 2015 Mohamed Elhoseiny, Scott Cohen, Walter Chang, Brian Price, Ahmed Elgammal

We show that learning visual facts in a structured way enables not only a uniform but also generalizable visual understanding.

Multiview Learning Retrieval

Convolutional Models for Joint Object Categorization and Pose Estimation

no code implementations16 Nov 2015 Mohamed Elhoseiny, Tarek El-Gaaly, Amr Bakry, Ahmed Elgammal

In the task of Object Recognition, there exists a dichotomy between the categorization of objects and estimating object pose, where the former necessitates a view-invariant representation, while the latter requires a representation capable of capturing pose information over different categories of objects.

Object Object Categorization +2

Zero-Shot Event Detection by Multimodal Distributional Semantic Embedding of Videos

no code implementations2 Dec 2015 Mohamed Elhoseiny, Jingen Liu, Hui Cheng, Harpreet Sawhney, Ahmed Elgammal

To our knowledge, this is the first Zero-Shot event detection model that is built on top of distributional semantics and extends it in the following directions: (a) semantic embedding of multimodal information in videos (with focus on the visual modalities), (b) automatically determining relevance of concepts/attributes to a free text query, which could be useful for other applications, and (c) retrieving videos by free text event query (e. g., "changing a vehicle tire") based on their content.

Event Detection

Write a Classifier: Predicting Visual Classifiers from Unstructured Text

no code implementations31 Dec 2015 Mohamed Elhoseiny, Ahmed Elgammal, Babak Saleh

Then, we propose a new constrained optimization formulation that combines a regression function and a knowledge transfer function with additional constraints to predict the parameters of a linear classifier.

regression Transfer Learning

Automatic Annotation of Structured Facts in Images

no code implementations WS 2016 Mohamed Elhoseiny, Scott Cohen, Walter Chang, Brian Price, Ahmed Elgammal

Motivated by the application of fact-level image understanding, we present an automatic method for data collection of structured visual facts from images with captions.

Overlapping Cover Local Regression Machines

no code implementations5 Jan 2017 Mohamed Elhoseiny, Ahmed Elgammal

We present the Overlapping Domain Cover (ODC) notion for kernel machines, as a set of overlapping subsets of the data that covers the entire training set and optimized to be spatially cohesive as possible.

GPR Pose Estimation +1

CAN: Creative Adversarial Networks, Generating "Art" by Learning About Styles and Deviating from Style Norms

10 code implementations21 Jun 2017 Ahmed Elgammal, Bingchen Liu, Mohamed Elhoseiny, Marian Mazzone

We argue that such networks are limited in their ability to generate creative products in their original design.

Relationship Proposal Networks

no code implementations CVPR 2017 Ji Zhang, Mohamed Elhoseiny, Scott Cohen, Walter Chang, Ahmed Elgammal

We demonstrate the ability of our Rel-PN to localize relationships with only a few thousand proposals.

Scene Understanding

Link the head to the "beak": Zero Shot Learning from Noisy Text Description at Part Precision

no code implementations CVPR 2017 Mohamed Elhoseiny, Yizhe Zhu, Han Zhang, Ahmed Elgammal

We propose a learning framework that is able to connect text terms to its relevant parts and suppress connections to non-visual text terms without any part-text annotations.

Zero-Shot Learning

Memory Aware Synapses: Learning what (not) to forget

3 code implementations ECCV 2018 Rahaf Aljundi, Francesca Babiloni, Mohamed Elhoseiny, Marcus Rohrbach, Tinne Tuytelaars

We show state-of-the-art performance and, for the first time, the ability to adapt the importance of the parameters based on unlabeled data towards what the network needs (not) to forget, which may vary depending on test conditions.

Object Recognition

DeSIGN: Design Inspiration from Generative Networks

1 code implementation3 Apr 2018 Othman Sbai, Mohamed Elhoseiny, Antoine Bordes, Yann Lecun, Camille Couprie

Can an algorithm create original and compelling fashion designs to serve as an inspirational assistant?

Image Generation Retrieval

Large-Scale Visual Relationship Understanding

2 code implementations27 Apr 2018 Ji Zhang, Yannis Kalantidis, Marcus Rohrbach, Manohar Paluri, Ahmed Elgammal, Mohamed Elhoseiny

Large scale visual understanding is challenging, as it requires a model to handle the widely-spread and imbalanced distribution of <subject, relation, object> triples.

Relationship Detection

Choose Your Neuron: Incorporating Domain Knowledge through Neuron-Importance

1 code implementation ECCV 2018 Ramprasaath R. Selvaraju, Prithvijit Chattopadhyay, Mohamed Elhoseiny, Tilak Sharma, Dhruv Batra, Devi Parikh, Stefan Lee

Our approach, which we call Neuron Importance-AwareWeight Transfer (NIWT), learns to map domain knowledge about novel "unseen" classes onto this dictionary of learned concepts and then optimizes for network parameters that can effectively combine these concepts - essentially learning classifiers by discovering and composing learned semantic concepts in deep networks.

Generalized Zero-Shot Learning

Uncertainty-guided Lifelong Learning in Bayesian Networks

no code implementations27 Sep 2018 Sayna Ebrahimi, Mohamed Elhoseiny, Trevor Darrell, Marcus Rohrbach

Sequentially learning of tasks arriving in a continuous stream is a complex problem and becomes more challenging when the model has a fixed capacity.

Continual Learning

Video Object Segmentation using Teacher-Student Adaptation in a Human Robot Interaction (HRI) Setting

1 code implementation17 Oct 2018 Mennatullah Siam, Chen Jiang, Steven Lu, Laura Petrich, Mahmoud Gamal, Mohamed Elhoseiny, Martin Jagersand

A human teacher can show potential objects of interest to the robot, which is able to self adapt to the teaching signal without providing manual segmentation labels.

Incremental Learning Robot Manipulation +4

GDPP: Learning Diverse Generations Using Determinantal Point Process

4 code implementations30 Nov 2018 Mohamed Elfeki, Camille Couprie, Morgane Riviere, Mohamed Elhoseiny

Generative models have proven to be an outstanding tool for representing high-dimensional probability distributions and generating realistic-looking images.

Efficient Lifelong Learning with A-GEM

2 code implementations ICLR 2019 Arslan Chaudhry, Marc'Aurelio Ranzato, Marcus Rohrbach, Mohamed Elhoseiny

In lifelong learning, the learner is presented with a sequence of tasks, incrementally building a data-driven prior which may be leveraged to speed up learning of a new task.

Class Incremental Learning

Exploring the Challenges towards Lifelong Fact Learning

no code implementations26 Dec 2018 Mohamed Elhoseiny, Francesca Babiloni, Rahaf Aljundi, Marcus Rohrbach, Manohar Paluri, Tinne Tuytelaars

So far life-long learning (LLL) has been studied in relatively small-scale and relatively artificial setups.

Semi-Supervised Few-Shot Learning with Prototypical Random Walks

1 code implementation6 Mar 2019 Ahmed Ayyad, Yuchen Li, Nassir Navab, Shadi Albarqouni, Mohamed Elhoseiny

We develop a random walk semi-supervised loss that enables the network to learn representations that are compact and well-separated.

Few-Shot Learning

Creativity Inspired Zero-Shot Learning

2 code implementations ICCV 2019 Mohamed Elhoseiny, Mohamed Elfeki

We relate ZSL to human creativity by observing that zero-shot learning is about recognizing the unseen and creativity is about creating a likable unseen.

Attribute Transfer Learning +1

Learning Diverse Generations using Determinantal Point Processes

no code implementations ICLR 2019 Mohamed Elfeki, Camille Couprie, Mohamed Elhoseiny

Embedded in an adversarial training and variational autoencoder, our Generative DPP approach shows a consistent resistance to mode-collapse on a wide-variety of synthetic data and natural image datasets including MNIST, CIFAR10, and CelebA, while outperforming state-of-the-art methods for data-efficiency, convergence-time, and generation quality.

Point Processes

Inner Ensemble Networks: Average Ensemble as an Effective Regularizer

1 code implementation15 Jun 2020 Abduallah Mohamed, Muhammed Mohaimin Sadiq, Ehab AlBadawy, Mohamed Elhoseiny, Christian Claudel

Also, we show empirically and theoretically that IENs lead to a greater variance reduction in comparison with other similar approaches such as dropout and maxout.

Neural Architecture Search

Class Normalization for (Continual)? Generalized Zero-Shot Learning

3 code implementations19 Jun 2020 Ivan Skorokhodov, Mohamed Elhoseiny

Normalization techniques have proved to be a crucial ingredient of successful training in a traditional supervised learning regime.

Generalized Zero-Shot Learning

CIZSL++: Creativity Inspired Generative Zero-Shot Learning

2 code implementations1 Jan 2021 Mohamed Elhoseiny, Kai Yi, Mohamed Elfeki

To improve the discriminative power of ZSL, we model the visual learning process of unseen categories with inspiration from the psychology of human creativity for producing novel art.

Attribute Transfer Learning +1

Motion Forecasting with Unlikelihood Training

no code implementations1 Jan 2021 Deyao Zhu, Mohamed Zahran, Li Erran Li, Mohamed Elhoseiny

We propose a new objective, unlikelihood training, which forces generated trajectories that conflicts with contextual information to be assigned a lower probability by our model.

Motion Forecasting Trajectory Forecasting

Class Normalization for Zero-Shot Learning

no code implementations ICLR 2021 Ivan Skorokhodov, Mohamed Elhoseiny

Normalization techniques have proved to be a crucial ingredient of successful training in a traditional supervised learning regime.

Zero-Shot Learning

Gradient Descent Resists Compositionality

no code implementations1 Jan 2021 Yuanpeng Li, Liang Zhao, Joel Hestness, Kenneth Church, Mohamed Elhoseiny

In this paper, we argue that gradient descent is one of the reasons that make compositionality learning hard during neural network optimization.

Transferability of Compositionality

no code implementations1 Jan 2021 Yuanpeng Li, Liang Zhao, Joel Hestness, Ka Yee Lun, Kenneth Church, Mohamed Elhoseiny

To our best knowledge, this is the first work to focus on the transferability of compositionality, and it is orthogonal to existing efforts of learning compositional representations in training distribution.

Out-of-Distribution Generalization

ArtEmis: Affective Language for Visual Art

3 code implementations CVPR 2021 Panos Achlioptas, Maks Ovsjanikov, Kilichbek Haydarov, Mohamed Elhoseiny, Leonidas Guibas

We present a novel large-scale dataset and accompanying machine learning models aimed at providing a detailed understanding of the interplay between visual content, its emotional effect, and explanations for the latter in language.

VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

1 code implementation CVPR 2022 Jun Chen, Han Guo, Kai Yi, Boyang Li, Mohamed Elhoseiny

To the best of our knowledge, this is the first work that improves data efficiency of image captioning by utilizing LM pretrained on unimodal data.

Image Captioning Language Modelling +1

Imaginative Walks: Generative Random Walk Deviation Loss for Improved Unseen Learning Representation

1 code implementation20 Apr 2021 Divyansh Jha, Kai Yi, Ivan Skorokhodov, Mohamed Elhoseiny

By generating representations of unseen classes based on their semantic descriptions, e. g., attributes or text, generative ZSL attempts to differentiate unseen from seen categories.

Attribute Image Generation +1

RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition

1 code implementation CVPR 2022 Jun Chen, Aniket Agarwal, Sherif Abdelkarim, Deyao Zhu, Mohamed Elhoseiny

This paper shows that modeling an effective message-passing flow through an attention mechanism can be critical to tackling the compositionality and long-tail challenges in VRR.

Image Captioning Object Recognition +5

CausalDyna: Improving Generalization of Dyna-style Reinforcement Learning via Counterfactual-Based Data Augmentation

no code implementations29 Sep 2021 Deyao Zhu, Li Erran Li, Mohamed Elhoseiny

Deep reinforcement learning agents trained in real-world environments with a limited diversity of object properties to learn manipulation tasks tend to suffer overfitting and fail to generalize to unseen testing environments.

counterfactual Data Augmentation +3

Domain-Aware Continual Zero-Shot Learning

no code implementations24 Dec 2021 Kai Yi, Paul Janson, Wenxuan Zhang, Mohamed Elhoseiny

Accordingly, we propose a Domain-Invariant Network (DIN) to learn factorized features for shifting domains and improved textual representation for unseen classes.

Disentanglement Zero-Shot Learning

StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2

1 code implementation CVPR 2022 Ivan Skorokhodov, Sergey Tulyakov, Mohamed Elhoseiny

We build our model on top of StyleGAN2 and it is just ${\approx}5\%$ more expensive to train at the same resolution while achieving almost the same image quality.

Video Generation

Efficiently Disentangle Causal Representations

1 code implementation6 Jan 2022 Yuanpeng Li, Joel Hestness, Mohamed Elhoseiny, Liang Zhao, Kenneth Church

This paper proposes an efficient approach to learning disentangled representations with causal mechanisms based on the difference of conditional probabilities in original and new distributions.

Exploring Hierarchical Graph Representation for Large-Scale Zero-Shot Image Classification

1 code implementation2 Mar 2022 Kai Yi, Xiaoqian Shen, Yunhao Gou, Mohamed Elhoseiny

The main question we address in this paper is how to scale up visual recognition of unseen classes, also known as zero-shot learning, to tens of thousands of categories as in the ImageNet-21K benchmark.

Image Classification Zero-Shot Image Classification +1

It is Okay to Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection

1 code implementation CVPR 2022 Youssef Mohamed, Faizan Farooq Khan, Kilichbek Haydarov, Mohamed Elhoseiny

As a step in this direction, the ArtEmis dataset was recently introduced as a large-scale dataset of emotional reactions to images along with language explanations of these chosen emotions.

Image Captioning

Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction

1 code implementation1 Jun 2022 Jun Chen, Ming Hu, Boyang Li, Mohamed Elhoseiny

After finetuning the pretrained LoMaR on 384$\times$384 images, it can reach 85. 4% top-1 accuracy, surpassing MAE by 0. 6%.

Image Classification Instance Segmentation +3

Value Memory Graph: A Graph-Structured World Model for Offline Reinforcement Learning

1 code implementation9 Jun 2022 Deyao Zhu, Li Erran Li, Mohamed Elhoseiny

In some complex environments with continuous state-action spaces, sparse rewards, and/or long temporal horizons, learning a good policy in the original environments can be difficult.

D4RL Model-based Reinforcement Learning +3

A Simple Baseline that Questions the Use of Pretrained-Models in Continual Learning

1 code implementation10 Oct 2022 Paul Janson, Wenxuan Zhang, Rahaf Aljundi, Mohamed Elhoseiny

With the success of pretraining techniques in representation learning, a number of continual learning methods based on pretrained models have been proposed.

Continual Learning Representation Learning

Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual Grounding

1 code implementation25 Nov 2022 Eslam Mohamed BAKR, Yasmeen Alsaedy, Mohamed Elhoseiny

The main question we address in this paper is "can we consolidate the 3D visual stream by 2D clues synthesized from point clouds and efficiently utilize them in training and testing?".

Knowledge Distillation Visual Grounding

FishNet: A Large-scale Dataset and Benchmark for Fish Recognition, Detection, and Functional Trait Prediction

no code implementations ICCV 2023 Faizan Farooq Khan, Xiang Li, Andrew J. Temple, Mohamed Elhoseiny

Aquatic species are essential components of the world's ecosystem, and the preservation of aquatic biodiversity is crucial for maintaining proper ecosystem functioning.

Fish Detection

Guiding Online Reinforcement Learning with Action-Free Offline Pretraining

1 code implementation30 Jan 2023 Deyao Zhu, Yuhui Wang, Jürgen Schmidhuber, Mohamed Elhoseiny

In this paper, we investigate the potential of using action-free offline datasets to improve online reinforcement learning, name this problem Reinforcement Learning with Action-Free Offline Pretraining (AFP-RL).

Offline RL reinforcement-learning +1

ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions

1 code implementation12 Mar 2023 Deyao Zhu, Jun Chen, Kilichbek Haydarov, Xiaoqian Shen, Wenxuan Zhang, Mohamed Elhoseiny

By keeping acquiring new visual information from BLIP-2's answers, ChatCaptioner is able to generate more enriched image descriptions.

Image Captioning Question Answering +1

MoStGAN-V: Video Generation with Temporal Motion Styles

1 code implementation CVPR 2023 Xiaoqian Shen, Xiang Li, Mohamed Elhoseiny

Video generation remains a challenging task due to spatiotemporal complexity and the requirement of synthesizing diverse motions with temporal consistency.

Video Generation

Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions

1 code implementation9 Apr 2023 Jun Chen, Deyao Zhu, Kilichbek Haydarov, Xiang Li, Mohamed Elhoseiny

Video captioning aims to convey dynamic scenes from videos using natural language, facilitating the understanding of spatiotemporal information within our environment.

Video Captioning

ImageCaptioner$^2$: Image Captioner for Image Captioning Bias Amplification Assessment

no code implementations10 Apr 2023 Eslam Mohamed BAKR, Pengzhan Sun, Li Erran Li, Mohamed Elhoseiny

In addition, we design a formulation for measuring the bias of generated captions as prompt-based image captioning instead of using language classifiers.

Image Captioning

LLM as A Robotic Brain: Unifying Egocentric Memory and Control

no code implementations19 Apr 2023 Jinjie Mai, Jun Chen, Bing Li, Guocheng Qian, Mohamed Elhoseiny, Bernard Ghanem

In this paper, we propose a novel and generalizable framework called LLM-Brain: using Large-scale Language Model as a robotic brain to unify egocentric memory and control.

Embodied Question Answering Language Modelling +2

MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models

5 code implementations20 Apr 2023 Deyao Zhu, Jun Chen, Xiaoqian Shen, Xiang Li, Mohamed Elhoseiny

Our work, for the first time, uncovers that properly aligning the visual features with an advanced large language model can possess numerous advanced multi-modal abilities demonstrated by GPT-4, such as detailed image description generation and website creation from hand-drawn drafts.

Language Modelling Large Language Model +3

Exploring Open-Vocabulary Semantic Segmentation without Human Labels

no code implementations1 Jun 2023 Jun Chen, Deyao Zhu, Guocheng Qian, Bernard Ghanem, Zhicheng Yan, Chenchen Zhu, Fanyi Xiao, Mohamed Elhoseiny, Sean Chang Culatana

Although acquired extensive knowledge of visual concepts, it is non-trivial to exploit knowledge from these VL models to the task of semantic segmentation, as they are usually trained at an image level.

Open Vocabulary Semantic Segmentation Segmentation +3

MammalNet: A Large-scale Video Benchmark for Mammal Recognition and Behavior Understanding

no code implementations CVPR 2023 Jun Chen, Ming Hu, Darren J. Coker, Michael L. Berumen, Blair Costelloe, Sara Beery, Anna Rohrbach, Mohamed Elhoseiny

Monitoring animal behavior can facilitate conservation efforts by providing key insights into wildlife health, population status, and ecosystem function.

SLAMB: Accelerated Large Batch Training with Sparse Communication

1 code implementation The International Conference on Machine Learning (ICML) 2023 Hang Xu, Wenxuan Zhang, Jiawei Fei, Yuzhe Wu, Tingwen Xie, Jun Huang, Yuchen Xie, Mohamed Elhoseiny, Panos Kalnis

Distributed training of large deep neural networks requires frequent exchange of massive data between machines, thus communication efficiency is a major concern.

OxfordTVG-HIC: Can Machine Make Humorous Captions from Images?

no code implementations ICCV 2023 Runjia Li, Shuyang Sun, Mohamed Elhoseiny, Philip Torr

Hence, humour generation and understanding can serve as a new task for evaluating the ability of deep-learning methods to process abstract and subjective information.

Image Captioning

Continual Zero-Shot Learning through Semantically Guided Generative Random Walks

1 code implementation ICCV 2023 Wenxuan Zhang, Paul Janson, Kai Yi, Ivan Skorokhodov, Mohamed Elhoseiny

The GRW loss augments the training by continually encouraging the model to generate realistic and characterized samples to represent the unseen space.

Novel Concepts Zero-Shot Learning

Overcoming Generic Knowledge Loss with Selective Parameter Update

no code implementations23 Aug 2023 Wenxuan Zhang, Paul Janson, Rahaf Aljundi, Mohamed Elhoseiny

Our method achieves improvements on the accuracy of the newly learned tasks up to 7% while preserving the pretraining knowledge with a negligible decrease of 0. 9% on a representative control set accuracy.

Continual Learning General Knowledge

Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations

no code implementations30 Aug 2023 Kilichbek Haydarov, Xiaoqian Shen, Avinash Madasu, Mahmoud Salem, Li-Jia Li, Gamaleldin Elsayed, Mohamed Elhoseiny

We introduce Affective Visual Dialog, an emotion explanation and reasoning task as a testbed for research on understanding the formation of emotions in visually grounded conversations.

Explanation Generation Question Answering +1

CoT3DRef: Chain-of-Thoughts Data-Efficient 3D Visual Grounding

no code implementations10 Oct 2023 Eslam Mohamed BAKR, Mohamed Ayman, Mahmoud Ahmed, Habib Slim, Mohamed Elhoseiny

To this end, we formulate the 3D visual grounding problem as a sequence-to-sequence task by first predicting a chain of anchors and then the final target.

Visual Grounding

MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning

1 code implementation14 Oct 2023 Jun Chen, Deyao Zhu, Xiaoqian Shen, Xiang Li, Zechun Liu, Pengchuan Zhang, Raghuraman Krishnamoorthi, Vikas Chandra, Yunyang Xiong, Mohamed Elhoseiny

Motivated by this, we target to build a unified interface for completing many vision-language tasks including image description, visual question answering, and visual grounding, among others.

Language Modelling Large Language Model +4

3DCoMPaT$^{++}$: An improved Large-scale 3D Vision Dataset for Compositional Recognition

1 code implementation27 Oct 2023 Habib Slim, Xiang Li, Yuchen Li, Mahmoud Ahmed, Mohamed Ayman, Ujjwal Upadhyay, Ahmed Abdelreheem, Arpit Prajapati, Suhail Pothigara, Peter Wonka, Mohamed Elhoseiny

In this work, we present 3DCoMPaT$^{++}$, a multimodal 2D/3D dataset with 160 million rendered views of more than 10 million stylized 3D shapes carefully annotated at the part-instance level, alongside matching RGB point clouds, 3D textured meshes, depth maps, and segmentation masks.

ToddlerDiffusion: Flash Interpretable Controllable Diffusion Model

no code implementations24 Nov 2023 Eslam Mohamed BAKR, Liangbing Zhao, Vincent Tao Hu, Matthieu Cord, Patrick Perez, Mohamed Elhoseiny

Diffusion-based generative models excel in perceptually impressive synthesis but face challenges in interpretability.

Denoising Image Generation

Label Delay in Continual Learning

no code implementations1 Dec 2023 Botos Csaba, Wenxuan Zhang, Matthias Müller, Ser-Nam Lim, Mohamed Elhoseiny, Philip Torr, Adel Bibi

We introduce a new continual learning framework with explicit modeling of the label delay between data and label streams over time steps.

Continual Learning

StoryGPT-V: Large Language Models as Consistent Story Visualizers

1 code implementation4 Dec 2023 Xiaoqian Shen, Mohamed Elhoseiny

Therefore, we introduce \textbf{StoryGPT-V}, which leverages the merits of the latent diffusion (LDM) and LLM to produce images with consistent and high-quality characters grounded on given story descriptions.

Language Modelling Large Language Model +2

AI Art Neural Constellation: Revealing the Collective and Contrastive State of AI-Generated and Human Art

1 code implementation4 Feb 2024 Faizan Farooq Khan, Diana Kim, Divyansh Jha, Youssef Mohamed, Hanna H Chang, Ahmed Elgammal, Luba Elliott, Mohamed Elhoseiny

Our comparative analysis is based on an extensive dataset, dubbed ``ArtConstellation,'' consisting of annotations about art principles, likability, and emotions for 6, 000 WikiArt and 3, 200 AI-generated artworks.

ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes

1 code implementation ECCV 2020 Panos Achlioptas, Ahmed Abdelreheem, Fei Xia, Mohamed Elhoseiny, Leonidas Guibas

Due to the scarcity and unsuitability of existent 3D-oriented linguistic resources for this task, we first develop two large-scale and complementary visio-linguistic datasets: i) extbf{ extit{Sr3D}}, which contains 83. 5K template-based utterances leveraging extit{spatial relations} with other fine-grained object classes to localize a referred object in a given scene, and ii) extbf{ extit{Nr3D}} which contains 41. 5K extit{natural, free-form}, utterances collected by deploying a 2-player object reference game in 3D scenes.

Object

Cannot find the paper you are looking for? You can Submit a new open access paper.