Search Results for author: Alexander Schwing

Found 39 papers, 14 papers with code

MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding

1 code implementation20 Dec 2021 Revanth Gangi Reddy, Xilin Rui, Manling Li, Xudong Lin, Haoyang Wen, Jaemin Cho, Lifu Huang, Mohit Bansal, Avirup Sil, Shih-Fu Chang, Alexander Schwing, Heng Ji

Specifically, the task involves multi-hop questions that require reasoning over image-caption pairs to identify the grounded visual object being referred to and then predicting a span from the news body text to answer the question.

Answer Generation Data Augmentation +2

Perceptual Score: What Data Modalities Does Your Model Perceive?

1 code implementation NeurIPS 2021 Itai Gat, Idan Schwartz, Alexander Schwing

To study and quantify this concern, we introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features, i. e., modalities.

Question Answering Visual Dialog +1

Towards Coherent Visual Storytelling with Ordered Image Attention

no code implementations4 Aug 2021 Tom Braude, Idan Schwartz, Alexander Schwing, Ariel Shamir

OIA models interactions between the sentence-corresponding image and important regions in other images of the sequence.

Visual Storytelling

GridToPix: Training Embodied Agents with Minimal Supervision

no code implementations ICCV 2021 Unnat Jain, Iou-Jen Liu, Svetlana Lazebnik, Aniruddha Kembhavi, Luca Weihs, Alexander Schwing

While deep reinforcement learning (RL) promises freedom from hand-labeled data, great successes, especially for Embodied AI, require significant work to create supervision via carefully shaped rewards.

PointGoal Navigation

A Contrastive Learning Approach for Training Variational Autoencoder Priors

no code implementations NeurIPS 2021 Jyoti Aneja, Alexander Schwing, Jan Kautz, Arash Vahdat

To tackle this issue, we propose an energy-based prior defined by the product of a base prior distribution and a reweighting factor, designed to bring the base closer to the aggregate posterior.

Ranked #2 on Image Generation on CelebA 256x256 (FID metric)

Contrastive Learning Image Generation

Bridging the Imitation Gap by Adaptive Insubordination

no code implementations NeurIPS 2021 Luca Weihs, Unnat Jain, Iou-Jen Liu, Jordi Salvador, Svetlana Lazebnik, Aniruddha Kembhavi, Alexander Schwing

However, we show that when the teaching agent makes decisions with access to privileged information that is unavailable to the student, this information is marginalized during imitation learning, resulting in an "imitation gap" and, potentially, poor results.

Imitation Learning reinforcement-learning

Disentangling Controllable Object through Video Prediction Improves Visual Reinforcement Learning

no code implementations21 Feb 2020 Yuanyi Zhong, Alexander Schwing, Jian Peng

In many vision-based reinforcement learning (RL) problems, the agent controls a movable object in its visual field, e. g., the player's avatar in video games and the robotic arm in visual grasping and manipulation.

Atari Games reinforcement-learning +1

TAB-VCR: Tags and Attributes based VCR Baselines

1 code implementation NeurIPS 2019 Jingxiang Lin, Unnat Jain, Alexander Schwing

Despite impressive recent progress that has been reported on tasks that necessitate reasoning, such as visual question answering and visual dialog, models often exploit biases in datasets.

Question Answering Visual Commonsense Reasoning +2

Graph Structured Prediction Energy Networks

1 code implementation NeurIPS 2019 Colin Graber, Alexander Schwing

For joint inference over multiple variables, a variety of structured prediction techniques have been developed to model correlations among variables and thereby improve predictions.

Structured Prediction

Towards Principled Objectives for Contrastive Disentanglement

no code implementations25 Sep 2019 Anwesa Choudhuri, Ashok Vardhan Makkuva, Ranvir Rana, Sewoong Oh, Girish Chowdhary, Alexander Schwing

%In fact, contrastive disentanglement and unsupervised recovery are often combined in that we seek additional variations that exhibit salient factors/properties.


Unsupervised Discovery of Dynamic Neural Circuits

no code implementations NeurIPS Workshop Neuro_AI 2019 Colin Graber, Ryan Loh, Yurii Vlasov, Alexander Schwing

What can we learn about the functional organization of cortical microcircuits from large-scale recordings of neural activity?

ViCo: Word Embeddings from Visual Co-occurrences

1 code implementation ICCV 2019 Tanmay Gupta, Alexander Schwing, Derek Hoiem

Through unsupervised clustering, supervised partitioning, and a zero-shot-like generalization analysis we show that our word embeddings complement text-only embeddings like GloVe by better representing similarities and differences between visual concepts that are difficult to obtain from text corpora alone.

Word Embeddings

Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning

no code implementations ICCV 2019 Jyoti Aneja, Harsh Agrawal, Dhruv Batra, Alexander Schwing

We encourage this temporal latent space to capture the 'intention' about how to complete the sentence by mimicking a representation which summarizes the future.

Image Captioning Language Modelling

Factor Graph Attention

1 code implementation CVPR 2019 Idan Schwartz, Seunghak Yu, Tamir Hazan, Alexander Schwing

We address this issue and develop a general attention mechanism for visual dialog which operates on any number of data utilities.

Graph Attention Question Answering +2

Max-Sliced Wasserstein Distance and its use for GANs

no code implementations CVPR 2019 Ishan Deshpande, Yuan-Ting Hu, Ruoyu Sun, Ayis Pyrros, Nasir Siddiqui, Sanmi Koyejo, Zhizhen Zhao, David Forsyth, Alexander Schwing

Generative adversarial nets (GANs) and variational auto-encoders have significantly improved our distribution modeling capabilities, showing promise for dataset augmentation, image-to-image translation and feature learning.

Image-to-Image Translation Translation

No-Frills Human-Object Interaction Detection: Factorization, Layout Encodings, and Training Techniques

3 code implementations ICCV 2019 Tanmay Gupta, Alexander Schwing, Derek Hoiem

We show that for human-object interaction detection a relatively simple factorized model with appearance and layout encodings constructed from pre-trained object detectors outperforms more sophisticated approaches.

Human-Object Interaction Detection

Pipe-SGD: A Decentralized Pipelined SGD Framework for Distributed Deep Net Training

no code implementations NeurIPS 2018 Youjie Li, Mingchao Yu, Songze Li, Salman Avestimehr, Nam Sung Kim, Alexander Schwing

Distributed training of deep nets is an important technique to address some of the present day computing challenges like memory consumption and computational demands.

Deep Structured Prediction with Nonlinear Output Transformations

1 code implementation NeurIPS 2018 Colin Graber, Ofer Meshi, Alexander Schwing

Deep structured models are widely used for tasks like semantic segmentation, where explicit correlations between variables provide important prior information which generally helps to reduce the data needs of deep nets.

Semantic Segmentation Structured Prediction

Fast, Diverse and Accurate Image Captioning Guided By Part-of-Speech

no code implementations CVPR 2019 Aditya Deshpande, Jyoti Aneja, Li-Wei Wang, Alexander Schwing, D. A. Forsyth

We achieve the trifecta: (1) High accuracy for the diverse captions as evaluated by standard captioning metrics and user studies; (2) Faster computation of diverse captions compared to beam search and diverse beam search; and (3) High diversity as evaluated by counting novel sentences, distinct n-grams and mutual overlap (i. e., mBleu-4) scores.

Image Captioning

Two can play this Game: Visual Dialog with Discriminative Question Generation and Answering

no code implementations CVPR 2018 Unnat Jain, Svetlana Lazebnik, Alexander Schwing

In addition, for the first time on the visual dialog dataset, we assess the performance of a system asking questions, and demonstrate how visual dialog can be generated from discriminative question generation and question answering.

Image Captioning Question Answering +3

Generative Modeling using the Sliced Wasserstein Distance

1 code implementation CVPR 2018 Ishan Deshpande, Ziyu Zhang, Alexander Schwing

While this is particularly true for early GAN formulations, there has been significant empirically motivated and theoretically founded progress to improve stability, for instance, by using the Wasserstein distance rather than the Jenson-Shannon divergence.

Asynchronous Parallel Coordinate Minimization for MAP Inference

no code implementations NeurIPS 2017 Ofer Meshi, Alexander Schwing

Finding the maximum a-posteriori (MAP) assignment is a central task in graphical models.

Convolutional Image Captioning

4 code implementations CVPR 2018 Jyoti Aneja, Aditya Deshpande, Alexander Schwing

In recent years significant progress has been made in image captioning, using Recurrent Neural Networks powered by long-short-term-memory (LSTM) units.

Image Captioning Text Generation +1

Dualing GANs

no code implementations NeurIPS 2017 Yujia Li, Alexander Schwing, Kuan-Chieh Wang, Richard Zemel

We start from linear discriminators in which case conjugate duality provides a mechanism to reformulate the saddle point objective into a maximization problem, such that both the generator and the discriminator of this 'dualing GAN' act in concert.

Creativity: Generating Diverse Questions using Variational Autoencoders

no code implementations CVPR 2017 Unnat Jain, Ziyu Zhang, Alexander Schwing

Generating diverse questions for given images is an important task for computational education, entertainment and AI assistants.

Question Generation

Statistical Inference, Learning and Models in Big Data

no code implementations9 Sep 2015 Beate Franke, Jean-François Plante, Ribana Roscher, Annie Lee, Cathal Smyth, Armin Hatefi, Fuqi Chen, Einat Gil, Alexander Schwing, Alessandro Selvitella, Michael M. Hoffman, Roger Grosse, Dieter Hendricks, Nancy Reid

The need for new methods to deal with big data is a common theme in most scientific fields, although its definition tends to vary with the context.

Blending Learning and Inference in Structured Prediction

no code implementations8 Oct 2012 Tamir Hazan, Alexander Schwing, David Mcallester, Raquel Urtasun

In this paper we derive an efficient algorithm to learn the parameters of structured predictors in general graphical models.

Scene Understanding Semantic Segmentation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.