Search Results for author: Aude Oliva

Found 41 papers, 20 papers with code

Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models

no code implementations8 Apr 2024 Bowen Pan, Yikang Shen, Haokun Liu, Mayank Mishra, Gaoyuan Zhang, Aude Oliva, Colin Raffel, Rameswar Panda

Mixture-of-Experts (MoE) language models can reduce computational costs by 2-4$\times$ compared to dense models without sacrificing performance, making them more efficient in computation-bounded scenarios.

Learning Human Action Recognition Representations Without Real Humans

1 code implementation NeurIPS 2023 Howard Zhong, Samarth Mishra, Donghyun Kim, SouYoung Jin, Rameswar Panda, Hilde Kuehne, Leonid Karlinsky, Venkatesh Saligrama, Aude Oliva, Rogerio Feris

To this end, we present, for the first time, a benchmark that leverages real-world videos with humans removed and synthetic data containing virtual humans to pre-train a model.

Action Recognition Ethics +2

LangNav: Language as a Perceptual Representation for Navigation

no code implementations11 Oct 2023 Bowen Pan, Rameswar Panda, SouYoung Jin, Rogerio Feris, Aude Oliva, Phillip Isola, Yoon Kim

We explore the use of language as a perceptual representation for vision-and-language navigation (VLN), with a focus on low-data settings.

Image Captioning Language Modelling +4

Going Beyond Nouns With Vision & Language Models Using Synthetic Data

1 code implementation ICCV 2023 Paola Cascante-Bonilla, Khaled Shehada, James Seale Smith, Sivan Doveh, Donghyun Kim, Rameswar Panda, Gül Varol, Aude Oliva, Vicente Ordonez, Rogerio Feris, Leonid Karlinsky

We contribute Synthetic Visual Concepts (SyViC) - a million-scale synthetic dataset and data generation codebase allowing to generate additional suitable data to improve VLC understanding and compositional reasoning of VL models.

Sentence Visual Reasoning

Leveraging Temporal Context in Low Representational Power Regimes

no code implementations CVPR 2023 Camilo L. Fosco, SouYoung Jin, Emilie Josephs, Aude Oliva

We show that including information from the ETM during training improves action recognition and anticipation performance on various egocentric video datasets.

Action Anticipation Action Recognition

Deepfake Caricatures: Amplifying attention to artifacts increases deepfake detection by humans and machines

no code implementations1 Jun 2022 Camilo Fosco, Emilie Josephs, Alex Andonian, Allen Lee, Xi Wang, Aude Oliva

Second, they allow us to generate novel "Deepfake Caricatures": transformations of the deepfake that exacerbate artifacts to improve human detection.

DeepFake Detection Face Swapping +2

Ego4D: Around the World in 3,000 Hours of Egocentric Video

6 code implementations CVPR 2022 Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.

De-identification Ethics

Dynamic Network Quantization for Efficient Video Inference

1 code implementation ICCV 2021 Ximeng Sun, Rameswar Panda, Chun-Fu Chen, Aude Oliva, Rogerio Feris, Kate Saenko

Deep convolutional networks have recently achieved great success in video recognition, yet their practical realization remains a challenge due to the large amount of computational resources required to achieve robust recognition.

Quantization Video Recognition

Cross-Modal Discrete Representation Learning

no code implementations ACL 2022 Alexander H. Liu, SouYoung Jin, Cheng-I Jeff Lai, Andrew Rouditchenko, Aude Oliva, James Glass

Recent advances in representation learning have demonstrated an ability to represent information from different modalities such as video, text, and audio in a single high-level embedding vector.

Cross-Modal Retrieval Quantization +4

AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition

1 code implementation ICCV 2021 Rameswar Panda, Chun-Fu Chen, Quanfu Fan, Ximeng Sun, Kate Saenko, Aude Oliva, Rogerio Feris

Specifically, given a video segment, a multi-modal policy network is used to decide what modalities should be used for processing by the recognition model, with the goal of improving both accuracy and efficiency.

Video Recognition

Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions

no code implementations CVPR 2021 Mathew Monfort, SouYoung Jin, Alexander Liu, David Harwath, Rogerio Feris, James Glass, Aude Oliva

With this in mind, the descriptions people generate for videos of different dynamic events can greatly improve our understanding of the key information of interest in each video.

Contrastive Learning Retrieval +1

Memorability: An image-computable measure of information utility

no code implementations1 Apr 2021 Zoya Bylinskii, Lore Goetschalckx, Anelise Newman, Aude Oliva

The pixels in an image, and the objects, scenes, and actions that they compose, determine whether an image will be memorable or forgettable.

Improved Techniques for Quantizing Deep Networks with Adaptive Bit-Widths

no code implementations2 Mar 2021 Ximeng Sun, Rameswar Panda, Chun-Fu Chen, Naigang Wang, Bowen Pan, Kailash Gopalakrishnan, Aude Oliva, Rogerio Feris, Kate Saenko

Second, to effectively transfer knowledge, we develop a dynamic block swapping method by randomly replacing the blocks in the lower-precision student network with the corresponding blocks in the higher-precision teacher network.

Image Classification Quantization +2

VA-RED$^2$: Video Adaptive Redundancy Reduction

no code implementations ICLR 2021 Bowen Pan, Rameswar Panda, Camilo Fosco, Chung-Ching Lin, Alex Andonian, Yue Meng, Kate Saenko, Aude Oliva, Rogerio Feris

An inherent property of real-world videos is the high correlation of information across frames which can translate into redundancy in either temporal or spatial feature maps of the models, or both.

Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition

1 code implementation CVPR 2021 Chun-Fu Chen, Rameswar Panda, Kandan Ramakrishnan, Rogerio Feris, John Cohn, Aude Oliva, Quanfu Fan

In recent years, a number of approaches based on 2D or 3D convolutional neural networks (CNN) have emerged for video action recognition, achieving state-of-the-art results on several large-scale benchmark datasets.

Action Recognition Temporal Action Localization

Multimodal Memorability: Modeling Effects of Semantics and Decay on Video Memorability

1 code implementation ECCV 2020 Anelise Newman, Camilo Fosco, Vincent Casser, Allen Lee, Barry McNamara, Aude Oliva

Based on our findings we propose a new mathematical formulation of memorability decay, resulting in a model that is able to produce the first quantitative estimation of how a video decays in memory over time.

AR-Net: Adaptive Frame Resolution for Efficient Action Recognition

1 code implementation ECCV 2020 Yue Meng, Chung-Ching Lin, Rameswar Panda, Prasanna Sattigeri, Leonid Karlinsky, Aude Oliva, Kate Saenko, Rogerio Feris

Specifically, given a video frame, a policy network is used to decide what input resolution should be used for processing by the action recognition model, with the goal of improving both accuracy and efficiency.

Action Recognition

Reasoning About Human-Object Interactions Through Dual Attention Networks

no code implementations ICCV 2019 Tete Xiao, Quanfu Fan, Dan Gutfreund, Mathew Monfort, Aude Oliva, Bolei Zhou

The model not only finds when an action is happening and which object is being manipulated, but also identifies which part of the object is being interacted with.

Human-Object Interaction Detection Object

GANalyze: Toward Visual Definitions of Cognitive Image Properties

1 code implementation ICCV 2019 Authors, :, Lore Goetschalckx, Alex Andonian, Aude Oliva, Phillip Isola

We introduce a framework that uses Generative Adversarial Networks (GANs) to study cognitive properties like memorability, aesthetics, and emotional valence.

Synthetically Trained Icon Proposals for Parsing and Summarizing Infographics

1 code implementation27 Jul 2018 Spandan Madan, Zoya Bylinskii, Matthew Tancik, Adrià Recasens, Kimberli Zhong, Sami Alsheikh, Hanspeter Pfister, Aude Oliva, Fredo Durand

While automatic text extraction works well on infographics, computer vision approaches trained on natural images fail to identify the stand-alone visual elements in infographics, or `icons'.

Synthetic Data Generation

Temporal Relational Reasoning in Videos

5 code implementations ECCV 2018 Bolei Zhou, Alex Andonian, Aude Oliva, Antonio Torralba

Temporal relational reasoning, the ability to link meaningful transformations of objects or entities over time, is a fundamental property of intelligent species.

Action Classification Action Recognition In Videos +4

Interpreting Deep Visual Representations via Network Dissection

2 code implementations15 Nov 2017 Bolei Zhou, David Bau, Aude Oliva, Antonio Torralba

In this work, we describe Network Dissection, a method that interprets networks by providing labels for the units of their deep visual representations.

Understanding Infographics through Textual and Visual Tag Prediction

1 code implementation26 Sep 2017 Zoya Bylinskii, Sami Alsheikh, Spandan Madan, Adria Recasens, Kimberli Zhong, Hanspeter Pfister, Fredo Durand, Aude Oliva

And second, we use these predicted text tags as a supervisory signal to localize the most diagnostic visual elements from within the infographic i. e. visual hashtags.

TAG

Network Dissection: Quantifying Interpretability of Deep Visual Representations

1 code implementation CVPR 2017 David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, Antonio Torralba

Given any CNN model, the proposed method draws on a broad data set of visual concepts to score the semantics of hidden units at each intermediate convolutional layer.

BubbleView: an interface for crowdsourcing image importance maps and tracking visual attention

no code implementations16 Feb 2017 Nam Wook Kim, Zoya Bylinskii, Michelle A. Borkin, Krzysztof Z. Gajos, Aude Oliva, Fredo Durand, Hanspeter Pfister

In this paper, we present BubbleView, an alternative methodology for eye tracking using discrete mouse clicks to measure which information people consciously choose to examine.

Places: An Image Database for Deep Scene Understanding

no code implementations6 Oct 2016 Bolei Zhou, Aditya Khosla, Agata Lapedriza, Antonio Torralba, Aude Oliva

The rise of multi-million-item dataset initiatives has enabled data-hungry machine learning algorithms to reach near-human semantic classification at tasks such as object and scene recognition.

BIG-bench Machine Learning Classification +4

What do different evaluation metrics tell us about saliency models?

1 code implementation12 Apr 2016 Zoya Bylinskii, Tilke Judd, Aude Oliva, Antonio Torralba, Frédo Durand

How best to evaluate a saliency model's ability to predict where humans look in images is an open research question.

Deep Neural Networks predict Hierarchical Spatio-temporal Cortical Dynamics of Human Visual Object Recognition

no code implementations12 Jan 2016 Radoslaw M. Cichy, Aditya Khosla, Dimitrios Pantazis, Antonio Torralba, Aude Oliva

The complex multi-stage architecture of cortical visual pathways provides the neural basis for efficient visual object recognition in humans.

Object Recognition

Learning Deep Features for Discriminative Localization

33 code implementations CVPR 2016 Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba

In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network to have remarkable localization ability despite being trained on image-level labels.

Weakly-Supervised Object Localization

Understanding and Predicting Image Memorability at a Large Scale

no code implementations ICCV 2015 Aditya Khosla, Akhil S. Raju, Antonio Torralba, Aude Oliva

Progress in estimating visual memorability has been limited by the small scale and lack of variety of benchmark data.

Object Detectors Emerge in Deep Scene CNNs

1 code implementation22 Dec 2014 Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba

With the success of new computational architectures for visual processing, such as convolutional neural networks (CNN) and access to image databases with millions of labeled examples (e. g., ImageNet, Places), the state of the art in computer vision is advancing rapidly.

General Classification Object +3

Learning Deep Features for Scene Recognition using Places Database

no code implementations NeurIPS 2014 Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, Aude Oliva

Whereas the tremendous recent progress in object recognition tasks is due to the availability of large datasets like ImageNet and the rise of Convolutional Neural Networks (CNNs) for learning high-level features, performance at scene recognition has not attained the same level of success.

Object Object Recognition +1

Learning visual biases from human imagination

no code implementations NeurIPS 2015 Carl Vondrick, Hamed Pirsiavash, Aude Oliva, Antonio Torralba

Although the human visual system can recognize many concepts under challenging conditions, it still has some biases.

Object Recognition

Understanding the Intrinsic Memorability of Images

no code implementations NeurIPS 2011 Phillip Isola, Devi Parikh, Antonio Torralba, Aude Oliva

Artists, advertisers, and photographers are routinely presented with the task of creating an image that a viewer will remember.

feature selection

Cannot find the paper you are looking for? You can Submit a new open access paper.