1 code implementation • ECCV 2020 • Panos Achlioptas, Ahmed Abdelreheem, Fei Xia, Mohamed Elhoseiny, Leonidas Guibas
Due to the scarcity and unsuitability of existent 3D-oriented linguistic resources for this task, we first develop two large-scale and complementary visio-linguistic datasets: i) extbf{ extit{Sr3D}}, which contains 83. 5K template-based utterances leveraging extit{spatial relations} with other fine-grained object classes to localize a referred object in a given scene, and ii) extbf{ extit{Nr3D}} which contains 41. 5K extit{natural, free-form}, utterances collected by deploying a 2-player object reference game in 3D scenes.
1 code implementation • 12 Jan 2025 • Mahmoud Ahmed, Xiang Li, Arpit Prajapati, Mohamed Elhoseiny
To foster richer and fine-grained part-level 3D understanding, we introduce 3DCoMPaT200, a large-scale dataset tailored for compositional understanding of object parts and materials, with 200 object categories with $\approx$5 times larger object vocabulary compared to 3DCoMPaT and $\approx$ 4 times larger part categories.
no code implementations • 3 Jan 2025 • Sanjoy Chowdhury, Sayan Nag, Subhrajyoti Dasgupta, Yaoting Wang, Mohamed Elhoseiny, Ruohan Gao, Dinesh Manocha
With the rapid advancement of Multi-modal Large Language Models (MLLMs), several diagnostic benchmarks have recently been developed to assess these models' multi-modal reasoning proficiency.
1 code implementation • 23 Nov 2024 • Jun Chen, Dannong Xu, Junjie Fei, Chun-Mei Feng, Mohamed Elhoseiny
Large multimodal models (LMMs) have achieved impressive progress in vision-language understanding, yet they face limitations in real-world applications requiring complex reasoning over a large number of images.
1 code implementation • 6 Nov 2024 • Youssef Mohamed, Runjia Li, Ibrahim Said Ahmad, Kilichbek Haydarov, Philip Torr, Kenneth Ward Church, Mohamed Elhoseiny
Research in vision and language has made considerable progress thanks to benchmarks such as COCO.
1 code implementation • 28 Oct 2024 • Han Bao, Yue Huang, Yanbo Wang, Jiayi Ye, Xiangqi Wang, Xiuying Chen, Yue Zhao, Tianyi Zhou, Mohamed Elhoseiny, Xiangliang Zhang
Large Vision-Language Models (LVLMs) have become essential for advancing the integration of visual and linguistic information.
1 code implementation • 22 Oct 2024 • Xiaoqian Shen, Yunyang Xiong, Changsheng Zhao, Lemeng Wu, Jun Chen, Chenchen Zhu, Zechun Liu, Fanyi Xiao, Balakrishnan Varadarajan, Florian Bordes, Zhuang Liu, Hu Xu, Hyunwoo J. Kim, Bilge Soran, Raghuraman Krishnamoorthi, Mohamed Elhoseiny, Vikas Chandra
Given a light-weight LLM, our LongVU also scales effectively into a smaller size with state-of-the-art video understanding performance.
no code implementations • 27 Aug 2024 • Wenxuan Zhang, Philip H. S. Torr, Mohamed Elhoseiny, Adel Bibi
Fine-tuning large language models (LLMs) on human preferences, typically through reinforcement learning from human feedback (RLHF), has proven successful in enhancing their capabilities.
1 code implementation • 7 Aug 2024 • Zilyu Ye, Jinxiu Liu, Ruotian Peng, Jinjin Cao, Zhiyang Chen, Yiyang Zhang, Ziwei Xuan, Mingyuan Zhou, Xiaoqian Shen, Mohamed Elhoseiny, Qi Liu, Guo-Jun Qi
Recent image generation models excel at creating high-quality images from brief captions.
no code implementations • 7 Aug 2024 • Chenhui Gou, Abdulwahab Felemban, Faizan Farooq Khan, Deyao Zhu, Jianfei Cai, Hamid Rezatofighi, Mohamed Elhoseiny
In our study, we introduce a pixel value prediction task (PVP) to explore "How Well Can Vision Language Models See Image Details?"
1 code implementation • 17 Jul 2024 • Kirolos Ataallah, Xiaoqian Shen, Eslam Abdelrahman, Essam Sleiman, Mingchen Zhuge, Jian Ding, Deyao Zhu, Jürgen Schmidhuber, Mohamed Elhoseiny
This design of the retrieval mechanism enables the Goldfish to efficiently process arbitrarily long video sequences, facilitating its application in contexts such as movies or television series.
1 code implementation • 4 Jul 2024 • Asma Alkhaldi, Raneem Alnajim, Layan Alabdullatef, Rawan Alyahya, Jun Chen, Deyao Zhu, Ahmed Alsinan, Mohamed Elhoseiny
Our empirical assessments confirm MiniGPT-Med's superior performance in disease grounding, medical report generation, and VQA benchmarks, representing a significant step towards reducing the gap in assisting radiology practice.
1 code implementation • 1 Jul 2024 • Sanjoy Chowdhury, Sayan Nag, Subhrajyoti Dasgupta, Jun Chen, Mohamed Elhoseiny, Ruohan Gao, Dinesh Manocha
Leveraging Large Language Models' remarkable proficiency in text-based tasks, recent works on Multi-modal LLMs (MLLMs) extend them to other modalities like vision and audio.
1 code implementation • 28 Jun 2024 • Kirolos Ataallah, Chenhui Gou, Eslam Abdelrahman, Khushbu Pahwa, Jian Ding, Mohamed Elhoseiny
To address this gap, we introduce InfiniBench a comprehensive benchmark for very long video understanding which presents 1)The longest video duration, averaging 52. 59 minutes per video 2) The largest number of question-answer pairs, 108. 2K 3) Diversity in questions that examine nine different skills and include both multiple-choice questions and open-ended questions 4) Human-centric, as the video sources come from movies and daily TV shows, with specific human-level question designs such as Movie Spoiler Questions that require critical thinking and comprehensive understanding.
2 code implementations • 18 Jun 2024 • Xiang Li, Jian Ding, Mohamed Elhoseiny
We introduce a new benchmark designed to advance the development of general-purpose, large-scale vision-language models for remote sensing images.
no code implementations • 10 Jun 2024 • Abdulwahab Felemban, Eslam Mohamed BAKR, Xiaoqian Shen, Jian Ding, Abduallah Mohamed, Mohamed Elhoseiny
We introduce iMotion-LLM: a Multimodal Large Language Models (LLMs) with trajectory prediction, tailored to guide interactive multi-agent scenarios.
no code implementations • 29 May 2024 • Junjie Fei, Mahmoud Ahmed, Jian Ding, Eslam Mohamed BAKR, Mohamed Elhoseiny
Therefore, we propose two novel tasks: (1) Part-Aware Point Grounding, the model is tasked with directly predicting a part-level segmentation mask based on user instructions, and (2) Part-Aware Point Grounded Captioning, the model provides a detailed caption that includes part-level descriptions and their corresponding masks.
1 code implementation • 19 Apr 2024 • Wenxuan Zhang, Youssef Mohamed, Bernard Ghanem, Philip H. S. Torr, Adel Bibi, Mohamed Elhoseiny
DietCL meticulously allocates computational budget for both types of data.
2 code implementations • 4 Apr 2024 • Kirolos Ataallah, Xiaoqian Shen, Eslam Abdelrahman, Essam Sleiman, Deyao Zhu, Jian Ding, Mohamed Elhoseiny
This paper introduces MiniGPT4-Video, a multimodal Large Language Model (LLM) designed specifically for video understanding.
Ranked #3 on
Zero-Shot Video Question Answer
on TVQA
1 code implementation • 4 Feb 2024 • Faizan Farooq Khan, Diana Kim, Divyansh Jha, Youssef Mohamed, Hanna H Chang, Ahmed Elgammal, Luba Elliott, Mohamed Elhoseiny
Our comparative analysis is based on an extensive dataset, dubbed ``ArtConstellation,'' consisting of annotations about art principles, likability, and emotions for 6, 000 WikiArt and 3, 200 AI-generated artworks.
no code implementations • CVPR 2024 • Kilichbek Haydarov, Aashiq Muhamed, Xiaoqian Shen, Jovana Lazarevic, Ivan Skorokhodov, Chamuditha Jayanga Galappaththige, Mohamed Elhoseiny
Existing GAN-based text-to-image models treat images as 2D pixel arrays.
no code implementations • CVPR 2024 • Habib Slim, Mohamed Elhoseiny
To illustrate the practicality of our contribution we train neural editor modules in the latent space of shape autoencoders and demonstrate the ability of our dataset to enable a variety of language-guided shape edits.
no code implementations • 5 Dec 2023 • Xiang Li, Jian Ding, Zhaoyang Chen, Mohamed Elhoseiny
In this work, we present Uni3DL, a unified model for 3D and Language understanding.
1 code implementation • 4 Dec 2023 • Xiaoqian Shen, Mohamed Elhoseiny
Therefore, we introduce \textbf{StoryGPT-V}, which leverages the merits of the latent diffusion (LDM) and LLM to produce images with consistent and high-quality characters grounded on given story descriptions.
no code implementations • 1 Dec 2023 • Botos Csaba, Wenxuan Zhang, Matthias Müller, Ser-Nam Lim, Mohamed Elhoseiny, Philip Torr, Adel Bibi
We introduce a new continual learning framework with explicit modeling of the label delay between data and label streams over time steps.
no code implementations • 24 Nov 2023 • Eslam Abdelrahman, Liangbing Zhao, Vincent Tao Hu, Matthieu Cord, Patrick Perez, Mohamed Elhoseiny
Diffusion models break down the challenging task of generating data from high-dimensional distributions into a series of easier denoising steps.
1 code implementation • 27 Oct 2023 • Habib Slim, Xiang Li, Yuchen Li, Mahmoud Ahmed, Mohamed Ayman, Ujjwal Upadhyay, Ahmed Abdelreheem, Arpit Prajapati, Suhail Pothigara, Peter Wonka, Mohamed Elhoseiny
In this work, we present 3DCoMPaT$^{++}$, a multimodal 2D/3D dataset with 160 million rendered views of more than 10 million stylized 3D shapes carefully annotated at the part-instance level, alongside matching RGB point clouds, 3D textured meshes, depth maps, and segmentation masks.
no code implementations • 26 Oct 2023 • Salman Khan, Izzeddin Teeti, Andrew Bradley, Mohamed Elhoseiny, Fabio Cuzzolin
Attention is then applied to this graph to obtain an overall representation of the local dynamic scene.
2 code implementations • 14 Oct 2023 • Jun Chen, Deyao Zhu, Xiaoqian Shen, Xiang Li, Zechun Liu, Pengchuan Zhang, Raghuraman Krishnamoorthi, Vikas Chandra, Yunyang Xiong, Mohamed Elhoseiny
Motivated by this, we target to build a unified interface for completing many vision-language tasks including image description, visual question answering, and visual grounding, among others.
1 code implementation • 10 Oct 2023 • Eslam Abdelrahman, Mohamed Ayman, Mahmoud Ahmed, Habib Slim, Mohamed Elhoseiny
To this end, we formulate the 3D visual grounding problem as a sequence-to-sequence Seq2Seq task by first predicting a chain of anchors and then the final target.
no code implementations • 30 Aug 2023 • Kilichbek Haydarov, Xiaoqian Shen, Avinash Madasu, Mahmoud Salem, Li-Jia Li, Gamaleldin Elsayed, Mohamed Elhoseiny
We introduce Affective Visual Dialog, an emotion explanation and reasoning task as a testbed for research on understanding the formation of emotions in visually grounded conversations.
1 code implementation • ICCV 2023 • Wenxuan Zhang, Paul Janson, Kai Yi, Ivan Skorokhodov, Mohamed Elhoseiny
The GRW loss augments the training by continually encouraging the model to generate realistic and characterized samples to represent the unseen space.
1 code implementation • CVPR 2024 • Wenxuan Zhang, Paul Janson, Rahaf Aljundi, Mohamed Elhoseiny
Our method achieves improvements on the accuracy of the newly learned tasks up to 7% while preserving the pretraining knowledge with a negligible decrease of 0. 9% on a representative control set accuracy.
no code implementations • ICCV 2023 • Runjia Li, Shuyang Sun, Mohamed Elhoseiny, Philip Torr
Hence, humour generation and understanding can serve as a new task for evaluating the ability of deep-learning methods to process abstract and subjective information.
1 code implementation • The International Conference on Machine Learning (ICML) 2023 • Hang Xu, Wenxuan Zhang, Jiawei Fei, Yuzhe Wu, Tingwen Xie, Jun Huang, Yuchen Xie, Mohamed Elhoseiny, Panos Kalnis
Distributed training of large deep neural networks requires frequent exchange of massive data between machines, thus communication efficiency is a major concern.
no code implementations • 1 Jun 2023 • Jun Chen, Deyao Zhu, Guocheng Qian, Bernard Ghanem, Zhicheng Yan, Chenchen Zhu, Fanyi Xiao, Mohamed Elhoseiny, Sean Chang Culatana
Although acquired extensive knowledge of visual concepts, it is non-trivial to exploit knowledge from these VL models to the task of semantic segmentation, as they are usually trained at an image level.
Open Vocabulary Semantic Segmentation
Open-Vocabulary Semantic Segmentation
+3
no code implementations • CVPR 2023 • Jun Chen, Ming Hu, Darren J. Coker, Michael L. Berumen, Blair Costelloe, Sara Beery, Anna Rohrbach, Mohamed Elhoseiny
Monitoring animal behavior can facilitate conservation efforts by providing key insights into wildlife health, population status, and ecosystem function.
6 code implementations • 20 Apr 2023 • Deyao Zhu, Jun Chen, Xiaoqian Shen, Xiang Li, Mohamed Elhoseiny
Our work, for the first time, uncovers that properly aligning the visual features with an advanced large language model can possess numerous advanced multi-modal abilities demonstrated by GPT-4, such as detailed image description generation and website creation from hand-drawn drafts.
Ranked #3 on
Visual Question Answering (VQA)
on AutoHallusion
no code implementations • 19 Apr 2023 • Jinjie Mai, Jun Chen, Bing Li, Guocheng Qian, Mohamed Elhoseiny, Bernard Ghanem
In this paper, we propose a novel and generalizable framework called LLM-Brain: using Large-scale Language Model as a robotic brain to unify egocentric memory and control.
1 code implementation • ICCV 2023 • Eslam Mohamed BAKR, Pengzhan Sun, Xiaoqian Shen, Faizan Farooq Khan, Li Erran Li, Mohamed Elhoseiny
A human evaluation aligned with 95% of our evaluations on average was conducted to probe the effectiveness of HRS-Bench.
no code implementations • 10 Apr 2023 • Eslam Mohamed BAKR, Pengzhan Sun, Li Erran Li, Mohamed Elhoseiny
In addition, we design a formulation for measuring the bias of generated captions as prompt-based image captioning instead of using language classifiers.
1 code implementation • 9 Apr 2023 • Jun Chen, Deyao Zhu, Kilichbek Haydarov, Xiang Li, Mohamed Elhoseiny
Video captioning aims to convey dynamic scenes from videos using natural language, facilitating the understanding of spatiotemporal information within our environment.
1 code implementation • CVPR 2023 • Xiaoqian Shen, Xiang Li, Mohamed Elhoseiny
Video generation remains a challenging task due to spatiotemporal complexity and the requirement of synthesizing diverse motions with temporal consistency.
1 code implementation • 12 Mar 2023 • Deyao Zhu, Jun Chen, Kilichbek Haydarov, Xiaoqian Shen, Wenxuan Zhang, Mohamed Elhoseiny
By keeping acquiring new visual information from BLIP-2's answers, ChatCaptioner is able to generate more enriched image descriptions.
1 code implementation • 30 Jan 2023 • Deyao Zhu, Yuhui Wang, Jürgen Schmidhuber, Mohamed Elhoseiny
In this paper, we investigate the potential of using action-free offline datasets to improve online reinforcement learning, name this problem Reinforcement Learning with Action-Free Offline Pretraining (AFP-RL).
no code implementations • ICCV 2023 • Faizan Farooq Khan, Xiang Li, Andrew J. Temple, Mohamed Elhoseiny
Aquatic species are essential components of the world's ecosystem, and the preservation of aquatic biodiversity is crucial for maintaining proper ecosystem functioning.
no code implementations • ICCV 2023 • Jun Chen, Deyao Zhu, Guocheng Qian, Bernard Ghanem, Zhicheng Yan, Chenchen Zhu, Fanyi Xiao, Sean Chang Culatana, Mohamed Elhoseiny
Semantic segmentation is a crucial task in computer vision that involves segmenting images into semantically meaningful regions at the pixel level.
Open Vocabulary Semantic Segmentation
Open-Vocabulary Semantic Segmentation
+3
1 code implementation • 25 Nov 2022 • Eslam Mohamed BAKR, Yasmeen Alsaedy, Mohamed Elhoseiny
The main question we address in this paper is "can we consolidate the 3D visual stream by 2D clues synthesized from point clouds and efficiently utilize them in training and testing?".
no code implementations • 19 Nov 2022 • Youssef Mohamed, Mohamed Abdelfattah, Shyma Alhuwaider, Feifan Li, Xiangliang Zhang, Kenneth Ward Church, Mohamed Elhoseiny
This paper introduces ArtELingo, a new benchmark and dataset, designed to encourage work on diversity across languages and cultures.
1 code implementation • 10 Oct 2022 • Paul Janson, Wenxuan Zhang, Rahaf Aljundi, Mohamed Elhoseiny
With the success of pretraining techniques in representation learning, a number of continual learning methods based on pretrained models have been proposed.
1 code implementation • 9 Jun 2022 • Deyao Zhu, Li Erran Li, Mohamed Elhoseiny
In some complex environments with continuous state-action spaces, sparse rewards, and/or long temporal horizons, learning a good policy in the original environments can be difficult.
3 code implementations • 9 Jun 2022 • Guocheng Qian, Yuchen Li, Houwen Peng, Jinjie Mai, Hasan Abed Al Kader Hammoud, Mohamed Elhoseiny, Bernard Ghanem
In this work, we revisit the classical PointNet++ through a systematic study of model training and scaling strategies, and offer two major contributions.
Ranked #3 on
3D Semantic Segmentation
on OpenTrench3D
1 code implementation • 1 Jun 2022 • Jun Chen, Ming Hu, Boyang Li, Mohamed Elhoseiny
After finetuning the pretrained LoMaR on 384$\times$384 images, it can reach 85. 4% top-1 accuracy, surpassing MAE by 0. 6%.
2 code implementations • CVPR 2022 • Youssef Mohamed, Faizan Farooq Khan, Kilichbek Haydarov, Mohamed Elhoseiny
As a step in this direction, the ArtEmis dataset was recently introduced as a large-scale dataset of emotional reactions to images along with language explanations of these chosen emotions.
1 code implementation • 6 Mar 2022 • Abduallah Mohamed, Deyao Zhu, Warren Vu, Mohamed Elhoseiny, Christian Claudel
AMD is a metric that quantifies how close the whole generated samples are to the ground truth.
Ranked #1 on
Trajectory Prediction
on Stanford Drone
(ADE (in world coordinates) metric)
1 code implementation • 2 Mar 2022 • Kai Yi, Xiaoqian Shen, Yunhao Gou, Mohamed Elhoseiny
The main question we address in this paper is how to scale up visual recognition of unseen classes, also known as zero-shot learning, to tens of thousands of categories as in the ImageNet-21K benchmark.
1 code implementation • 6 Jan 2022 • Yuanpeng Li, Joel Hestness, Mohamed Elhoseiny, Liang Zhao, Kenneth Church
This paper proposes an efficient approach to learning disentangled representations with causal mechanisms based on the difference of conditional probabilities in original and new distributions.
1 code implementation • CVPR 2022 • Ivan Skorokhodov, Sergey Tulyakov, Mohamed Elhoseiny
We build our model on top of StyleGAN2 and it is just ${\approx}5\%$ more expensive to train at the same resolution while achieving almost the same image quality.
no code implementations • 24 Dec 2021 • Kai Yi, Paul Janson, Wenxuan Zhang, Mohamed Elhoseiny
Accordingly, we propose a Domain-Invariant Network (DIN) to learn factorized features for shifting domains and improved textual representation for unseen classes.
no code implementations • 29 Sep 2021 • Deyao Zhu, Li Erran Li, Mohamed Elhoseiny
Deep reinforcement learning agents trained in real-world environments with a limited diversity of object properties to learn manipulation tasks tend to suffer overfitting and fail to generalize to unseen testing environments.
no code implementations • 29 Sep 2021 • Kilichbek Haydarov, Aashiq Muhamed, Jovana Lazarevic, Ivan Skorokhodov, Mohamed Elhoseiny
To the best of our knowledge, our work is the first one which explores text-controllable continuous image generation.
1 code implementation • CVPR 2022 • Jun Chen, Aniket Agarwal, Sherif Abdelkarim, Deyao Zhu, Mohamed Elhoseiny
This paper shows that modeling an effective message-passing flow through an attention mechanism can be critical to tackling the compositionality and long-tail challenges in VRR.
1 code implementation • 20 Apr 2021 • Divyansh Jha, Kai Yi, Ivan Skorokhodov, Mohamed Elhoseiny
By generating representations of unseen classes based on their semantic descriptions, e. g., attributes or text, generative ZSL attempts to differentiate unseen from seen categories.
1 code implementation • ICCV 2021 • Ivan Skorokhodov, Grigorii Sotnikov, Mohamed Elhoseiny
In this work, we develop a method to generate infinite high-resolution images with diverse and complex content.
Ranked #1 on
Infinite Image Generation
on LHQ
1 code implementation • CVPR 2022 • Jun Chen, Han Guo, Kai Yi, Boyang Li, Mohamed Elhoseiny
To the best of our knowledge, this is the first work that improves data efficiency of image captioning by utilizing LM pretrained on unimodal data.
5 code implementations • CVPR 2021 • Panos Achlioptas, Maks Ovsjanikov, Kilichbek Haydarov, Mohamed Elhoseiny, Leonidas Guibas
We present a novel large-scale dataset and accompanying machine learning models aimed at providing a detailed understanding of the interplay between visual content, its emotional effect, and explanations for the latter in language.
no code implementations • 1 Jan 2021 • Deyao Zhu, Mohamed Zahran, Li Erran Li, Mohamed Elhoseiny
We propose a new objective, unlikelihood training, which forces generated trajectories that conflicts with contextual information to be assigned a lower probability by our model.
no code implementations • ICLR 2021 • Deyao Zhu, Mohamed Zahran, Li Erran Li, Mohamed Elhoseiny
Our model's learned representation leads to better and more semantically meaningful coverage of the trajectory distribution.
no code implementations • ICLR 2021 • Ivan Skorokhodov, Mohamed Elhoseiny
Normalization techniques have proved to be a crucial ingredient of successful training in a traditional supervised learning regime.
2 code implementations • 1 Jan 2021 • Mohamed Elhoseiny, Kai Yi, Mohamed Elfeki
To improve the discriminative power of ZSL, we model the visual learning process of unseen categories with inspiration from the psychology of human creativity for producing novel art.
no code implementations • 1 Jan 2021 • Yuanpeng Li, Liang Zhao, Joel Hestness, Kenneth Church, Mohamed Elhoseiny
In this paper, we argue that gradient descent is one of the reasons that make compositionality learning hard during neural network optimization.
no code implementations • 1 Jan 2021 • Yuanpeng Li, Liang Zhao, Joel Hestness, Ka Yee Lun, Kenneth Church, Mohamed Elhoseiny
To our best knowledge, this is the first work to focus on the transferability of compositionality, and it is orthogonal to existing efforts of learning compositional representations in training distribution.
1 code implementation • CVPR 2021 • Ivan Skorokhodov, Savva Ignatyev, Mohamed Elhoseiny
In most existing learning systems, images are typically viewed as 2D pixel arrays.
Ranked #12 on
Image Generation
on LSUN Churches 256 x 256
no code implementations • NeurIPS 2020 • Uchenna Akujuobi, Jun Chen, Mohamed Elhoseiny, Michael Spranger, Xiangliang Zhang
Then, the key is to capture the temporal evolution of node pair (term pair) relations from just the positive and unlabeled data.
3 code implementations • 19 Jun 2020 • Ivan Skorokhodov, Mohamed Elhoseiny
Normalization techniques have proved to be a crucial ingredient of successful training in a traditional supervised learning regime.
1 code implementation • 15 Jun 2020 • Abduallah Mohamed, Muhammed Mohaimin Sadiq, Ehab AlBadawy, Mohamed Elhoseiny, Christian Claudel
Also, we show empirically and theoretically that IENs lead to a greater variance reduction in comparison with other similar approaches such as dropout and maxout.
1 code implementation • ICLR 2020 • Yuanpeng Li, Liang Zhao, Kenneth Church, Mohamed Elhoseiny
It also shows significant improvement in machine translation task.
no code implementations • Conference 2020 • Jun Chen, Robert Hoehndorf, Mohamed Elhoseiny, Xiangliang Zhang
In natural language processing, relation extraction seeks to rationally understand unstructured text.
Ranked #17 on
Relation Extraction
on TACRED
3 code implementations • ICCV 2021 • Sherif Abdelkarim, Aniket Agarwal, Panos Achlioptas, Jun Chen, Jiaji Huang, Boyang Li, Kenneth Church, Mohamed Elhoseiny
We use these benchmarks to study the performance of several state-of-the-art long-tail models on the LTVRR setup.
3 code implementations • CVPR 2020 • Abduallah Mohamed, Kun Qian, Mohamed Elhoseiny, Christian Claudel
Better machine understanding of pedestrian behaviors enables faster progress in modeling interactions between agents such as autonomous vehicles and humans.
Ranked #3 on
Trajectory Prediction
on ETH
2 code implementations • ICLR 2020 • Sayna Ebrahimi, Mohamed Elhoseiny, Trevor Darrell, Marcus Rohrbach
Continual learning aims to learn new tasks without forgetting previously learned ones.
no code implementations • ICLR 2019 • Mohamed Elfeki, Camille Couprie, Mohamed Elhoseiny
Embedded in an adversarial training and variational autoencoder, our Generative DPP approach shows a consistent resistance to mode-collapse on a wide-variety of synthetic data and natural image datasets including MNIST, CIFAR10, and CelebA, while outperforming state-of-the-art methods for data-efficiency, convergence-time, and generation quality.
2 code implementations • ICCV 2019 • Mohamed Elhoseiny, Mohamed Elfeki
We relate ZSL to human creativity by observing that zero-shot learning is about recognizing the unseen and creativity is about creating a likable unseen.
1 code implementation • 6 Mar 2019 • Ahmed Ayyad, Yuchen Li, Nassir Navab, Shadi Albarqouni, Mohamed Elhoseiny
We develop a random walk semi-supervised loss that enables the network to learn representations that are compact and well-separated.
6 code implementations • 27 Feb 2019 • Arslan Chaudhry, Marcus Rohrbach, Mohamed Elhoseiny, Thalaiyasingam Ajanthan, Puneet K. Dokania, Philip H. S. Torr, Marc'Aurelio Ranzato
But for a successful knowledge transfer, the learner needs to remember how to perform previous tasks.
Ranked #7 on
Class Incremental Learning
on cifar100
no code implementations • 26 Dec 2018 • Mohamed Elhoseiny, Francesca Babiloni, Rahaf Aljundi, Marcus Rohrbach, Manohar Paluri, Tinne Tuytelaars
So far life-long learning (LLL) has been studied in relatively small-scale and relatively artificial setups.
3 code implementations • ICLR 2019 • Arslan Chaudhry, Marc'Aurelio Ranzato, Marcus Rohrbach, Mohamed Elhoseiny
In lifelong learning, the learner is presented with a sequence of tasks, incrementally building a data-driven prior which may be leveraged to speed up learning of a new task.
Ranked #6 on
Continual Learning
on ASC (19 tasks)
4 code implementations • 30 Nov 2018 • Mohamed Elfeki, Camille Couprie, Morgane Riviere, Mohamed Elhoseiny
Generative models have proven to be an outstanding tool for representing high-dimensional probability distributions and generating realistic-looking images.
1 code implementation • 17 Oct 2018 • Mennatullah Siam, Chen Jiang, Steven Lu, Laura Petrich, Mahmoud Gamal, Mohamed Elhoseiny, Martin Jagersand
A human teacher can show potential objects of interest to the robot, which is able to self adapt to the teaching signal without providing manual segmentation labels.
no code implementations • 27 Sep 2018 • Sayna Ebrahimi, Mohamed Elhoseiny, Trevor Darrell, Marcus Rohrbach
Sequentially learning of tasks arriving in a continuous stream is a complex problem and becomes more challenging when the model has a fixed capacity.
1 code implementation • ECCV 2018 • Ramprasaath R. Selvaraju, Prithvijit Chattopadhyay, Mohamed Elhoseiny, Tilak Sharma, Dhruv Batra, Devi Parikh, Stefan Lee
Our approach, which we call Neuron Importance-AwareWeight Transfer (NIWT), learns to map domain knowledge about novel "unseen" classes onto this dictionary of learned concepts and then optimizes for network parameters that can effectively combine these concepts - essentially learning classifiers by discovering and composing learned semantic concepts in deep networks.
2 code implementations • 27 Apr 2018 • Ji Zhang, Yannis Kalantidis, Marcus Rohrbach, Manohar Paluri, Ahmed Elgammal, Mohamed Elhoseiny
Large scale visual understanding is challenging, as it requires a model to handle the widely-spread and imbalanced distribution of <subject, relation, object> triples.
1 code implementation • 3 Apr 2018 • Othman Sbai, Mohamed Elhoseiny, Antoine Bordes, Yann Lecun, Camille Couprie
Can an algorithm create original and compelling fashion designs to serve as an inspirational assistant?
no code implementations • 23 Jan 2018 • Ahmed Elgammal, Marian Mazzone, Bingchen Liu, Diana Kim, Mohamed Elhoseiny
How does the machine classify styles in art?
no code implementations • CVPR 2018 • Yizhe Zhu, Mohamed Elhoseiny, Bingchen Liu, Xi Peng, Ahmed Elgammal
Most existing zero-shot learning methods consider the problem as a visual semantic embedding one.
3 code implementations • ECCV 2018 • Rahaf Aljundi, Francesca Babiloni, Mohamed Elhoseiny, Marcus Rohrbach, Tinne Tuytelaars
We show state-of-the-art performance and, for the first time, the ability to adapt the importance of the parameters based on unlabeled data towards what the network needs (not) to forget, which may vary depending on test conditions.
no code implementations • CVPR 2017 • Mohamed Elhoseiny, Yizhe Zhu, Han Zhang, Ahmed Elgammal
We propose a learning framework that is able to connect text terms to its relevant parts and suppress connections to non-visual text terms without any part-text annotations.
no code implementations • CVPR 2017 • Ji Zhang, Mohamed Elhoseiny, Scott Cohen, Walter Chang, Ahmed Elgammal
We demonstrate the ability of our Rel-PN to localize relationships with only a few thousand proposals.
10 code implementations • 21 Jun 2017 • Ahmed Elgammal, Bingchen Liu, Mohamed Elhoseiny, Marian Mazzone
We argue that such networks are limited in their ability to generate creative products in their original design.
no code implementations • 5 Jan 2017 • Mohamed Elhoseiny, Ahmed Elgammal
We present the Overlapping Domain Cover (ODC) notion for kernel machines, as a set of overlapping subsets of the data that covers the entire training set and optimized to be spatially cohesive as possible.
no code implementations • CVPR 2016 • Han Zhang, Tao Xu, Mohamed Elhoseiny, Xiaolei Huang, Shaoting Zhang, Ahmed Elgammal, Dimitris Metaxas
In this paper, we propose a new CNN architecture that integrates semantic part detection and abstraction (SPDA-CNN) for fine-grained classification.
no code implementations • WS 2016 • Mohamed Elhoseiny, Scott Cohen, Walter Chang, Brian Price, Ahmed Elgammal
Motivated by the application of fact-level image understanding, we present an automatic method for data collection of structured visual facts from images with captions.
no code implementations • 31 Dec 2015 • Mohamed Elhoseiny, Ahmed Elgammal, Babak Saleh
Then, we propose a new constrained optimization formulation that combines a regression function and a knowledge transfer function with additional constraints to predict the parameters of a linear classifier.
no code implementations • 2 Dec 2015 • Mohamed Elhoseiny, Jingen Liu, Hui Cheng, Harpreet Sawhney, Ahmed Elgammal
To our knowledge, this is the first Zero-Shot event detection model that is built on top of distributional semantics and extends it in the following directions: (a) semantic embedding of multimodal information in videos (with focus on the visual modalities), (b) automatically determining relevance of concepts/attributes to a free text query, which could be useful for other applications, and (c) retrieving videos by free text event query (e. g., "changing a vehicle tire") based on their content.
no code implementations • 16 Nov 2015 • Mohamed Elhoseiny, Scott Cohen, Walter Chang, Brian Price, Ahmed Elgammal
We show that learning visual facts in a structured way enables not only a uniform but also generalizable visual understanding.
no code implementations • 16 Nov 2015 • Mohamed Elhoseiny, Tarek El-Gaaly, Amr Bakry, Ahmed Elgammal
In the task of Object Recognition, there exists a dichotomy between the categorization of objects and estimating object pose, where the former necessitates a view-invariant representation, while the latter requires a representation capable of capturing pose information over different categories of objects.
no code implementations • 9 Aug 2015 • Amr Bakry, Mohamed Elhoseiny, Tarek El-Gaaly, Ahmed Elgammal
How does fine-tuning of a pre-trained CNN on a multi-view dataset affect the representation at each layer of the network?
no code implementations • 29 Jun 2015 • Mohamed Elhoseiny, Ahmed Elgammal, Babak Saleh
In this paper we propose a framework for predicting kernelized classifiers in the visual domain for categories with no training images where the knowledge comes from textual description about these categories.
no code implementations • CVPR 2015 • Sheng Huang, Mohamed Elhoseiny, Ahmed Elgammal, Dan Yang
Then the attribute prediction problem is casted as a regularized hypergraph cut problem in which HAP jointly learns a collection of attribute projections from the feature space to a hypergraph embedding space aligned with the attribute space.
no code implementations • 26 Sep 2014 • Mohamed Elhoseiny, Ahmed Elgammal
In this paper, we present a generalized structured regression framework based on Shama-Mittal divergence, a relative entropy measure, which is introduced to the Machine Learning community in this work.
no code implementations • 1 Aug 2014 • Mohamed Elhoseiny, Ahmed Elgammal
This work firstly introduces MindMap Multilevel Visualization concept which is to jointly visualize and summarize textual information.