Search Results for author: Aude Oliva

Found 41 papers, 20 papers with code

Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models

no code implementations • 8 Apr 2024 • Bowen Pan, Yikang Shen, Haokun Liu, Mayank Mishra, Gaoyuan Zhang, Aude Oliva, Colin Raffel, Rameswar Panda

Mixture-of-Experts (MoE) language models can reduce computational costs by 2-4$\times$ compared to dense models without sacrificing performance, making them more efficient in computation-bounded scenarios.

Paper
Add Code

Learning Human Action Recognition Representations Without Real Humans

1 code implementation • NeurIPS 2023 • Howard Zhong, Samarth Mishra, Donghyun Kim, SouYoung Jin, Rameswar Panda, Hilde Kuehne, Leonid Karlinsky, Venkatesh Saligrama, Aude Oliva, Rogerio Feris

To this end, we present, for the first time, a benchmark that leverages real-world videos with humans removed and synthetic data containing virtual humans to pre-train a model.

Action Recognition Ethics +2

Paper
Code

LangNav: Language as a Perceptual Representation for Navigation

no code implementations • 11 Oct 2023 • Bowen Pan, Rameswar Panda, SouYoung Jin, Rogerio Feris, Aude Oliva, Phillip Isola, Yoon Kim

We explore the use of language as a perceptual representation for vision-and-language navigation (VLN), with a focus on low-data settings.

Image Captioning Language Modelling +4

Paper
Add Code

Going Beyond Nouns With Vision & Language Models Using Synthetic Data

1 code implementation • ICCV 2023 • Paola Cascante-Bonilla, Khaled Shehada, James Seale Smith, Sivan Doveh, Donghyun Kim, Rameswar Panda, Gül Varol, Aude Oliva, Vicente Ordonez, Rogerio Feris, Leonid Karlinsky

We contribute Synthetic Visual Concepts (SyViC) - a million-scale synthetic dataset and data generation codebase allowing to generate additional suitable data to improve VLC understanding and compositional reasoning of VL models.

Ranked #68 on Visual Reasoning on Winoground

Sentence Visual Reasoning

Paper
Code

Leveraging Temporal Context in Low Representational Power Regimes

no code implementations • CVPR 2023 • Camilo L. Fosco, SouYoung Jin, Emilie Josephs, Aude Oliva

We show that including information from the ETM during training improves action recognition and anticipation performance on various egocentric video datasets.

Action Anticipation Action Recognition

Paper
Add Code

Deepfake Caricatures: Amplifying attention to artifacts increases deepfake detection by humans and machines

no code implementations • 1 Jun 2022 • Camilo Fosco, Emilie Josephs, Alex Andonian, Allen Lee, Xi Wang, Aude Oliva

Second, they allow us to generate novel "Deepfake Caricatures": transformations of the deepfake that exacerbate artifacts to improve human detection.

DeepFake Detection Face Swapping +2

Paper
Add Code

Ego4D: Around the World in 3,000 Hours of Egocentric Video

6 code implementations • CVPR 2022 • Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.

De-identification Ethics

4,978

Paper
Code

Dynamic Network Quantization for Efficient Video Inference

1 code implementation • ICCV 2021 • Ximeng Sun, Rameswar Panda, Chun-Fu Chen, Aude Oliva, Rogerio Feris, Kate Saenko

Deep convolutional networks have recently achieved great success in video recognition, yet their practical realization remains a challenge due to the large amount of computational resources required to achieve robust recognition.

Quantization Video Recognition

Paper
Code

IA-RED$^2$: Interpretability-Aware Redundancy Reduction for Vision Transformers

no code implementations • NeurIPS 2021 • Bowen Pan, Rameswar Panda, Yifan Jiang, Zhangyang Wang, Rogerio Feris, Aude Oliva

The self-attention-based model, transformer, is recently becoming the leading backbone in the field of computer vision.

Ranked #29 on Efficient ViTs on ImageNet-1K (with DeiT-S)

Efficient ViTs

Paper
Add Code

Cross-Modal Discrete Representation Learning

no code implementations • ACL 2022 • Alexander H. Liu, SouYoung Jin, Cheng-I Jeff Lai, Andrew Rouditchenko, Aude Oliva, James Glass

Recent advances in representation learning have demonstrated an ability to represent information from different modalities such as video, text, and audio in a single high-level embedding vector.

Cross-Modal Retrieval Quantization +4

Paper
Add Code

AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition

1 code implementation • ICCV 2021 • Rameswar Panda, Chun-Fu Chen, Quanfu Fan, Ximeng Sun, Kate Saenko, Aude Oliva, Rogerio Feris

Specifically, given a video segment, a multi-modal policy network is used to decide what modalities should be used for processing by the recognition model, with the goal of improving both accuracy and efficiency.

Video Recognition

Paper
Code

Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions

no code implementations • CVPR 2021 • Mathew Monfort, SouYoung Jin, Alexander Liu, David Harwath, Rogerio Feris, James Glass, Aude Oliva

With this in mind, the descriptions people generate for videos of different dynamic events can greatly improve our understanding of the key information of interest in each video.

Contrastive Learning Retrieval +1

Paper
Add Code

Memorability: An image-computable measure of information utility

no code implementations • 1 Apr 2021 • Zoya Bylinskii, Lore Goetschalckx, Anelise Newman, Aude Oliva

The pixels in an image, and the objects, scenes, and actions that they compose, determine whether an image will be memorable or forgettable.

Paper
Add Code

Improved Techniques for Quantizing Deep Networks with Adaptive Bit-Widths

no code implementations • 2 Mar 2021 • Ximeng Sun, Rameswar Panda, Chun-Fu Chen, Naigang Wang, Bowen Pan, Kailash Gopalakrishnan, Aude Oliva, Rogerio Feris, Kate Saenko

Second, to effectively transfer knowledge, we develop a dynamic block swapping method by randomly replacing the blocks in the lower-precision student network with the corresponding blocks in the higher-precision teacher network.

Image Classification Quantization +2

Paper
Add Code

VA-RED$^2$: Video Adaptive Redundancy Reduction

no code implementations • ICLR 2021 • Bowen Pan, Rameswar Panda, Camilo Fosco, Chung-Ching Lin, Alex Andonian, Yue Meng, Kate Saenko, Aude Oliva, Rogerio Feris

An inherent property of real-world videos is the high correlation of information across frames which can translate into redundancy in either temporal or spatial feature maps of the models, or both.

Paper
Add Code

AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition

no code implementations • ICLR 2021 • Yue Meng, Rameswar Panda, Chung-Ching Lin, Prasanna Sattigeri, Leonid Karlinsky, Kate Saenko, Aude Oliva, Rogerio Feris

Temporal modelling is the key for efficient video action recognition.

Action Recognition Temporal Action Localization

Paper
Add Code

Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition

1 code implementation • CVPR 2021 • Chun-Fu Chen, Rameswar Panda, Kandan Ramakrishnan, Rogerio Feris, John Cohn, Aude Oliva, Quanfu Fan

In recent years, a number of approaches based on 2D or 3D convolutional neural networks (CNN) have emerged for video action recognition, achieving state-of-the-art results on several large-scale benchmark datasets.

Action Recognition Temporal Action Localization

238

Paper
Code

Multimodal Memorability: Modeling Effects of Semantics and Decay on Video Memorability

1 code implementation • ECCV 2020 • Anelise Newman, Camilo Fosco, Vincent Casser, Allen Lee, Barry McNamara, Aude Oliva

Based on our findings we propose a new mathematical formulation of memorability decay, resulting in a model that is able to produce the first quantitative estimation of how a video decays in memory over time.

Paper
Code

We Have So Much In Common: Modeling Semantic Relational Set Abstractions in Videos

1 code implementation • ECCV 2020 • Alex Andonian, Camilo Fosco, Mathew Monfort, Allen Lee, Rogerio Feris, Carl Vondrick, Aude Oliva

This allows our model to perform cognitive tasks such as set abstraction (which general concept is in common among a set of videos?

Decision Making Odd One Out

Paper
Code

AR-Net: Adaptive Frame Resolution for Efficient Action Recognition

1 code implementation • ECCV 2020 • Yue Meng, Chung-Ching Lin, Rameswar Panda, Prasanna Sattigeri, Leonid Karlinsky, Aude Oliva, Kate Saenko, Rogerio Feris

Specifically, given a video frame, a policy network is used to decide what input resolution should be used for processing by the action recognition model, with the goal of improving both accuracy and efficiency.

Action Recognition

Paper
Code

Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding

2 code implementations • 1 Nov 2019 • Mathew Monfort, Bowen Pan, Kandan Ramakrishnan, Alex Andonian, Barry A McNamara, Alex Lascelles, Quanfu Fan, Dan Gutfreund, Rogerio Feris, Aude Oliva

Videos capture events that typically contain multiple sequential, and simultaneous, actions even in the span of only a few seconds.

Action Detection Action Recognition +2

354

Paper
Code

Reasoning About Human-Object Interactions Through Dual Attention Networks

no code implementations • ICCV 2019 • Tete Xiao, Quanfu Fan, Dan Gutfreund, Mathew Monfort, Aude Oliva, Bolei Zhou

The model not only finds when an action is happening and which object is being manipulated, but also identifies which part of the object is being interacted with.

Human-Object Interaction Detection Object

Paper
Add Code

GANalyze: Toward Visual Definitions of Cognitive Image Properties

1 code implementation • ICCV 2019 • Authors, :, Lore Goetschalckx, Alex Andonian, Aude Oliva, Phillip Isola

We introduce a framework that uses Generative Adversarial Networks (GANs) to study cognitive properties like memorability, aesthetics, and emotional valence.

128

Paper
Code

Examining Interpretable Feature Relationships in Deep Networks for Action recognition

no code implementations • 28 May 2019 • Mathew Monfort, Kandan Ramakrishnan, Barry A McNamara, Alex Lascelles, Dan Gutfreund, Rogerio Feris, Aude Oliva

A number of recent methods to understand neural networks have focused on quantifying the role of individual features.

Action Recognition

Paper
Add Code

The Algonauts Project: A Platform for Communication between the Sciences of Biological and Artificial Intelligence

no code implementations • 14 May 2019 • Radoslaw Martin Cichy, Gemma Roig, Alex Andonian, Kshitij Dwivedi, Benjamin Lahner, Alex Lascelles, Yalda Mohsenzadeh, Kandan Ramakrishnan, Aude Oliva

Recently, researchers of natural intelligence have begun using those AI models to explore how the brain performs such tasks.

Benchmarking speech-recognition +1

Paper
Add Code

Synthetically Trained Icon Proposals for Parsing and Summarizing Infographics

1 code implementation • 27 Jul 2018 • Spandan Madan, Zoya Bylinskii, Matthew Tancik, Adrià Recasens, Kimberli Zhong, Sami Alsheikh, Hanspeter Pfister, Aude Oliva, Fredo Durand

While automatic text extraction works well on infographics, computer vision approaches trained on natural images fail to identify the stand-alone visual elements in infographics, or `icons'.

Synthetic Data Generation

Paper
Code

Moments in Time Dataset: one million videos for event understanding

4 code implementations • 9 Jan 2018 • Mathew Monfort, Alex Andonian, Bolei Zhou, Kandan Ramakrishnan, Sarah Adel Bargal, Tom Yan, Lisa Brown, Quanfu Fan, Dan Gutfruend, Carl Vondrick, Aude Oliva

We present the Moments in Time Dataset, a large-scale human-annotated collection of one million short videos corresponding to dynamic events unfolding within three seconds.

Ranked #2 on Multimodal Activity Recognition on Moments in Time Dataset

Action Recognition Multimodal Activity Recognition +1

354

Paper
Code

Temporal Relational Reasoning in Videos

5 code implementations • ECCV 2018 • Bolei Zhou, Alex Andonian, Aude Oliva, Antonio Torralba

Temporal relational reasoning, the ability to link meaningful transformations of objects or entities over time, is a fundamental property of intelligent species.

Ranked #2 on Hand Gesture Recognition on Jester test

Action Classification Action Recognition In Videos +4

782

Paper
Code

Interpreting Deep Visual Representations via Network Dissection

2 code implementations • 15 Nov 2017 • Bolei Zhou, David Bau, Aude Oliva, Antonio Torralba

In this work, we describe Network Dissection, a method that interprets networks by providing labels for the units of their deep visual representations.

211

Paper
Code

Understanding Infographics through Textual and Visual Tag Prediction

1 code implementation • 26 Sep 2017 • Zoya Bylinskii, Sami Alsheikh, Spandan Madan, Adria Recasens, Kimberli Zhong, Hanspeter Pfister, Fredo Durand, Aude Oliva

And second, we use these predicted text tags as a supervisory signal to localize the most diagnostic visual elements from within the infographic i. e. visual hashtags.

TAG

Paper
Code

Network Dissection: Quantifying Interpretability of Deep Visual Representations

1 code implementation • CVPR 2017 • David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, Antonio Torralba

Given any CNN model, the proposed method draws on a broad data set of visual concepts to score the semantics of hidden units at each intermediate convolutional layer.

211

Paper
Code

BubbleView: an interface for crowdsourcing image importance maps and tracking visual attention

no code implementations • 16 Feb 2017 • Nam Wook Kim, Zoya Bylinskii, Michelle A. Borkin, Krzysztof Z. Gajos, Aude Oliva, Fredo Durand, Hanspeter Pfister

In this paper, we present BubbleView, an alternative methodology for eye tracking using discrete mouse clicks to measure which information people consciously choose to examine.

Paper
Add Code

Places: An Image Database for Deep Scene Understanding

no code implementations • 6 Oct 2016 • Bolei Zhou, Aditya Khosla, Agata Lapedriza, Antonio Torralba, Aude Oliva

The rise of multi-million-item dataset initiatives has enabled data-hungry machine learning algorithms to reach near-human semantic classification at tasks such as object and scene recognition.

BIG-bench Machine Learning Classification +4

Paper
Add Code

What do different evaluation metrics tell us about saliency models?

1 code implementation • 12 Apr 2016 • Zoya Bylinskii, Tilke Judd, Aude Oliva, Antonio Torralba, Frédo Durand

How best to evaluate a saliency model's ability to predict where humans look in images is an open research question.

173

Paper
Code

Deep Neural Networks predict Hierarchical Spatio-temporal Cortical Dynamics of Human Visual Object Recognition

no code implementations • 12 Jan 2016 • Radoslaw M. Cichy, Aditya Khosla, Dimitrios Pantazis, Antonio Torralba, Aude Oliva

The complex multi-stage architecture of cortical visual pathways provides the neural basis for efficient visual object recognition in humans.

Object Recognition

Paper
Add Code

Learning Deep Features for Discriminative Localization

33 code implementations • CVPR 2016 • Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba

In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network to have remarkable localization ability despite being trained on image-level labels.

Ranked #2 on Weakly-Supervised Object Localization on Tiny ImageNet

Weakly-Supervised Object Localization

6,299

Paper
Code

Understanding and Predicting Image Memorability at a Large Scale

no code implementations • ICCV 2015 • Aditya Khosla, Akhil S. Raju, Antonio Torralba, Aude Oliva

Progress in estimating visual memorability has been limited by the small scale and lack of variety of benchmark data.

Paper
Add Code

Object Detectors Emerge in Deep Scene CNNs

1 code implementation • 22 Dec 2014 • Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba

With the success of new computational architectures for visual processing, such as convolutional neural networks (CNN) and access to image databases with millions of labeled examples (e. g., ImageNet, Places), the state of the art in computer vision is advancing rapidly.

General Classification Object +3

Paper
Code

Learning Deep Features for Scene Recognition using Places Database

no code implementations • NeurIPS 2014 • Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, Aude Oliva

Whereas the tremendous recent progress in object recognition tasks is due to the availability of large datasets like ImageNet and the rise of Convolutional Neural Networks (CNNs) for learning high-level features, performance at scene recognition has not attained the same level of success.

Object Object Recognition +1

Paper
Add Code

Learning visual biases from human imagination

no code implementations • NeurIPS 2015 • Carl Vondrick, Hamed Pirsiavash, Aude Oliva, Antonio Torralba

Although the human visual system can recognize many concepts under challenging conditions, it still has some biases.

Object Recognition

Paper
Add Code

Understanding the Intrinsic Memorability of Images

no code implementations • NeurIPS 2011 • Phillip Isola, Devi Parikh, Antonio Torralba, Aude Oliva

Artists, advertisers, and photographers are routinely presented with the task of creating an image that a viewer will remember.

feature selection

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.