no code implementations • 30 Mar 2023 • Paola Cascante-Bonilla, Khaled Shehada, James Seale Smith, Sivan Doveh, Donghyun Kim, Rameswar Panda, Gül Varol, Aude Oliva, Vicente Ordonez, Rogerio Feris, Leonid Karlinsky
We contribute Synthetic Visual Concepts (SyViC) - a million-scale synthetic dataset and data generation codebase allowing to generate additional suitable data to improve VLC understanding and compositional reasoning of VL models.
no code implementations • CVPR 2023 • Camilo L. Fosco, SouYoung Jin, Emilie Josephs, Aude Oliva
We show that including information from the ETM during training improves action recognition and anticipation performance on various egocentric video datasets.
no code implementations • 1 Jun 2022 • Camilo Fosco, Emilie Josephs, Alex Andonian, Allen Lee, Xi Wang, Aude Oliva
Second, they allow us to generate novel "Deepfake Caricatures": transformations of the deepfake that exacerbate artifacts to improve human detection.
3 code implementations • CVPR 2022 • Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.
1 code implementation • ICCV 2021 • Ximeng Sun, Rameswar Panda, Chun-Fu Chen, Aude Oliva, Rogerio Feris, Kate Saenko
Deep convolutional networks have recently achieved great success in video recognition, yet their practical realization remains a challenge due to the large amount of computational resources required to achieve robust recognition.
no code implementations • NeurIPS 2021 • Bowen Pan, Rameswar Panda, Yifan Jiang, Zhangyang Wang, Rogerio Feris, Aude Oliva
The self-attention-based model, transformer, is recently becoming the leading backbone in the field of computer vision.
no code implementations • ACL 2022 • Alexander H. Liu, SouYoung Jin, Cheng-I Jeff Lai, Andrew Rouditchenko, Aude Oliva, James Glass
Recent advances in representation learning have demonstrated an ability to represent information from different modalities such as video, text, and audio in a single high-level embedding vector.
1 code implementation • ICCV 2021 • Rameswar Panda, Chun-Fu Chen, Quanfu Fan, Ximeng Sun, Kate Saenko, Aude Oliva, Rogerio Feris
Specifically, given a video segment, a multi-modal policy network is used to decide what modalities should be used for processing by the recognition model, with the goal of improving both accuracy and efficiency.
no code implementations • CVPR 2021 • Mathew Monfort, SouYoung Jin, Alexander Liu, David Harwath, Rogerio Feris, James Glass, Aude Oliva
With this in mind, the descriptions people generate for videos of different dynamic events can greatly improve our understanding of the key information of interest in each video.
no code implementations • 1 Apr 2021 • Zoya Bylinskii, Lore Goetschalckx, Anelise Newman, Aude Oliva
The pixels in an image, and the objects, scenes, and actions that they compose, determine whether an image will be memorable or forgettable.
no code implementations • 2 Mar 2021 • Ximeng Sun, Rameswar Panda, Chun-Fu Chen, Naigang Wang, Bowen Pan, Kailash Gopalakrishnan, Aude Oliva, Rogerio Feris, Kate Saenko
Second, to effectively transfer knowledge, we develop a dynamic block swapping method by randomly replacing the blocks in the lower-precision student network with the corresponding blocks in the higher-precision teacher network.
no code implementations • ICLR 2021 • Bowen Pan, Rameswar Panda, Camilo Fosco, Chung-Ching Lin, Alex Andonian, Yue Meng, Kate Saenko, Aude Oliva, Rogerio Feris
An inherent property of real-world videos is the high correlation of information across frames which can translate into redundancy in either temporal or spatial feature maps of the models, or both.
no code implementations • ICLR 2021 • Yue Meng, Rameswar Panda, Chung-Ching Lin, Prasanna Sattigeri, Leonid Karlinsky, Kate Saenko, Aude Oliva, Rogerio Feris
Temporal modelling is the key for efficient video action recognition.
1 code implementation • CVPR 2021 • Chun-Fu Chen, Rameswar Panda, Kandan Ramakrishnan, Rogerio Feris, John Cohn, Aude Oliva, Quanfu Fan
In recent years, a number of approaches based on 2D or 3D convolutional neural networks (CNN) have emerged for video action recognition, achieving state-of-the-art results on several large-scale benchmark datasets.
1 code implementation • ECCV 2020 • Anelise Newman, Camilo Fosco, Vincent Casser, Allen Lee, Barry McNamara, Aude Oliva
Based on our findings we propose a new mathematical formulation of memorability decay, resulting in a model that is able to produce the first quantitative estimation of how a video decays in memory over time.
1 code implementation • ECCV 2020 • Alex Andonian, Camilo Fosco, Mathew Monfort, Allen Lee, Rogerio Feris, Carl Vondrick, Aude Oliva
This allows our model to perform cognitive tasks such as set abstraction (which general concept is in common among a set of videos?
1 code implementation • ECCV 2020 • Yue Meng, Chung-Ching Lin, Rameswar Panda, Prasanna Sattigeri, Leonid Karlinsky, Aude Oliva, Kate Saenko, Rogerio Feris
Specifically, given a video frame, a policy network is used to decide what input resolution should be used for processing by the action recognition model, with the goal of improving both accuracy and efficiency.
2 code implementations • 1 Nov 2019 • Mathew Monfort, Bowen Pan, Kandan Ramakrishnan, Alex Andonian, Barry A McNamara, Alex Lascelles, Quanfu Fan, Dan Gutfreund, Rogerio Feris, Aude Oliva
Videos capture events that typically contain multiple sequential, and simultaneous, actions even in the span of only a few seconds.
no code implementations • ICCV 2019 • Tete Xiao, Quanfu Fan, Dan Gutfreund, Mathew Monfort, Aude Oliva, Bolei Zhou
The model not only finds when an action is happening and which object is being manipulated, but also identifies which part of the object is being interacted with.
1 code implementation • ICCV 2019 • Authors, :, Lore Goetschalckx, Alex Andonian, Aude Oliva, Phillip Isola
We introduce a framework that uses Generative Adversarial Networks (GANs) to study cognitive properties like memorability, aesthetics, and emotional valence.
no code implementations • 28 May 2019 • Mathew Monfort, Kandan Ramakrishnan, Barry A McNamara, Alex Lascelles, Dan Gutfreund, Rogerio Feris, Aude Oliva
A number of recent methods to understand neural networks have focused on quantifying the role of individual features.
no code implementations • 14 May 2019 • Radoslaw Martin Cichy, Gemma Roig, Alex Andonian, Kshitij Dwivedi, Benjamin Lahner, Alex Lascelles, Yalda Mohsenzadeh, Kandan Ramakrishnan, Aude Oliva
Recently, researchers of natural intelligence have begun using those AI models to explore how the brain performs such tasks.
1 code implementation • 27 Jul 2018 • Spandan Madan, Zoya Bylinskii, Matthew Tancik, Adrià Recasens, Kimberli Zhong, Sami Alsheikh, Hanspeter Pfister, Aude Oliva, Fredo Durand
While automatic text extraction works well on infographics, computer vision approaches trained on natural images fail to identify the stand-alone visual elements in infographics, or `icons'.
4 code implementations • 9 Jan 2018 • Mathew Monfort, Alex Andonian, Bolei Zhou, Kandan Ramakrishnan, Sarah Adel Bargal, Tom Yan, Lisa Brown, Quanfu Fan, Dan Gutfruend, Carl Vondrick, Aude Oliva
We present the Moments in Time Dataset, a large-scale human-annotated collection of one million short videos corresponding to dynamic events unfolding within three seconds.
3 code implementations • ECCV 2018 • Bolei Zhou, Alex Andonian, Aude Oliva, Antonio Torralba
Temporal relational reasoning, the ability to link meaningful transformations of objects or entities over time, is a fundamental property of intelligent species.
Ranked #2 on
Hand Gesture Recognition
on Jester test
2 code implementations • 15 Nov 2017 • Bolei Zhou, David Bau, Aude Oliva, Antonio Torralba
In this work, we describe Network Dissection, a method that interprets networks by providing labels for the units of their deep visual representations.
1 code implementation • 26 Sep 2017 • Zoya Bylinskii, Sami Alsheikh, Spandan Madan, Adria Recasens, Kimberli Zhong, Hanspeter Pfister, Fredo Durand, Aude Oliva
And second, we use these predicted text tags as a supervisory signal to localize the most diagnostic visual elements from within the infographic i. e. visual hashtags.
1 code implementation • CVPR 2017 • David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, Antonio Torralba
Given any CNN model, the proposed method draws on a broad data set of visual concepts to score the semantics of hidden units at each intermediate convolutional layer.
no code implementations • 16 Feb 2017 • Nam Wook Kim, Zoya Bylinskii, Michelle A. Borkin, Krzysztof Z. Gajos, Aude Oliva, Fredo Durand, Hanspeter Pfister
In this paper, we present BubbleView, an alternative methodology for eye tracking using discrete mouse clicks to measure which information people consciously choose to examine.
1 code implementation • 6 Oct 2016 • Bolei Zhou, Aditya Khosla, Agata Lapedriza, Antonio Torralba, Aude Oliva
The rise of multi-million-item dataset initiatives has enabled data-hungry machine learning algorithms to reach near-human semantic classification at tasks such as object and scene recognition.
1 code implementation • 12 Apr 2016 • Zoya Bylinskii, Tilke Judd, Aude Oliva, Antonio Torralba, Frédo Durand
How best to evaluate a saliency model's ability to predict where humans look in images is an open research question.
no code implementations • 12 Jan 2016 • Radoslaw M. Cichy, Aditya Khosla, Dimitrios Pantazis, Antonio Torralba, Aude Oliva
The complex multi-stage architecture of cortical visual pathways provides the neural basis for efficient visual object recognition in humans.
32 code implementations • CVPR 2016 • Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba
In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network to have remarkable localization ability despite being trained on image-level labels.
Ranked #2 on
Weakly-Supervised Object Localization
on ILSVRC 2015
no code implementations • ICCV 2015 • Aditya Khosla, Akhil S. Raju, Antonio Torralba, Aude Oliva
Progress in estimating visual memorability has been limited by the small scale and lack of variety of benchmark data.
1 code implementation • 22 Dec 2014 • Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba
With the success of new computational architectures for visual processing, such as convolutional neural networks (CNN) and access to image databases with millions of labeled examples (e. g., ImageNet, Places), the state of the art in computer vision is advancing rapidly.
no code implementations • NeurIPS 2014 • Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, Aude Oliva
Whereas the tremendous recent progress in object recognition tasks is due to the availability of large datasets like ImageNet and the rise of Convolutional Neural Networks (CNNs) for learning high-level features, performance at scene recognition has not attained the same level of success.
no code implementations • NeurIPS 2015 • Carl Vondrick, Hamed Pirsiavash, Aude Oliva, Antonio Torralba
Although the human visual system can recognize many concepts under challenging conditions, it still has some biases.
no code implementations • NeurIPS 2011 • Phillip Isola, Devi Parikh, Antonio Torralba, Aude Oliva
Artists, advertisers, and photographers are routinely presented with the task of creating an image that a viewer will remember.