1 code implementation • NAACL 2022 • Jae Sung Park, Sheng Shen, Ali Farhadi, Trevor Darrell, Yejin Choi, Anna Rohrbach
We test the robustness of recent methods on the proposed automatic contrast sets, and compare them to additionally collected human-generated counterparts, to assess their effectiveness.
1 code implementation • 1 Oct 2024 • Aaron Walsman, Muru Zhang, Adam Fishman, Ali Farhadi, Dieter Fox
By disassembling an unseen assembly and periodically saving images of it, the agent is able to create a set of instructions so that it has the information necessary to rebuild it.
no code implementations • 25 Sep 2024 • Matt Deitke, Christopher Clark, Sangho Lee, Rohun Tripathi, Yue Yang, Jae Sung Park, Mohammadreza Salehi, Niklas Muennighoff, Kyle Lo, Luca Soldaini, Jiasen Lu, Taira Anderson, Erin Bransom, Kiana Ehsani, Huong Ngo, YenSung Chen, Ajay Patel, Mark Yatskar, Chris Callison-Burch, Andrew Head, Rose Hendrix, Favyen Bastani, Eli VanderBilt, Nathan Lambert, Yvonne Chou, Arnavi Chheda, Jenna Sparks, Sam Skjonsberg, Michael Schmitz, Aaron Sarnat, Byron Bischoff, Pete Walsh, Chris Newell, Piper Wolters, Tanmay Gupta, Kuo-Hao Zeng, Jon Borchardt, Dirk Groeneveld, Jen Dumas, Crystal Nam, Sophie Lebrecht, Caitlin Wittlif, Carissa Schoenick, Oscar Michel, Ranjay Krishna, Luca Weihs, Noah A. Smith, Hannaneh Hajishirzi, Ross Girshick, Ali Farhadi, Aniruddha Kembhavi
We will be releasing all of our model weights, captioning and fine-tuning data, and source code in the near future.
no code implementations • 25 Sep 2024 • Jiaheng Hu, Rose Hendrix, Ali Farhadi, Aniruddha Kembhavi, Roberto Martin-Martin, Peter Stone, Kuo-Hao Zeng, Kiana Ehsani
How can we break through the performance plateau of these models and elevate their capabilities to new heights?
1 code implementation • 3 Sep 2024 • Niklas Muennighoff, Luca Soldaini, Dirk Groeneveld, Kyle Lo, Jacob Morrison, Sewon Min, Weijia Shi, Pete Walsh, Oyvind Tafjord, Nathan Lambert, Yuling Gu, Shane Arora, Akshita Bhagia, Dustin Schwenk, David Wadden, Alexander Wettig, Binyuan Hui, Tim Dettmers, Douwe Kiela, Ali Farhadi, Noah A. Smith, Pang Wei Koh, Amanpreet Singh, Hannaneh Hajishirzi
We introduce OLMoE, a fully open, state-of-the-art language model leveraging sparse Mixture-of-Experts (MoE).
1 code implementation • 17 Jun 2024 • Jieyu Zhang, Weikai Huang, Zixian Ma, Oscar Michel, Dong He, Tanmay Gupta, Wei-Chiu Ma, Ali Farhadi, Aniruddha Kembhavi, Ranjay Krishna
As a result, when a developer wants to identify which models to use for their application, they are overwhelmed by the number of benchmarks and remain uncertain about which benchmark's results are most reflective of their specific use case.
1 code implementation • 28 May 2024 • Ethan Shen, Alan Fan, Sarah M. Pratt, Jae Sung Park, Matthew Wallingford, Sham M. Kakade, Ari Holtzman, Ranjay Krishna, Ali Farhadi, Aditya Kusupati
We achieve this by feeding a superposition of the most recent token embeddings from the $k$ drafts as input to the next decoding step of the language model.
2 code implementations • NeurIPS 2023 • Jae Sung Park, Jack Hessel, Khyathi Raghavi Chandu, Paul Pu Liang, Ximing Lu, Peter West, Youngjae Yu, Qiuyuan Huang, Jianfeng Gao, Ali Farhadi, Yejin Choi
Empirical results and human evaluations in a zero-shot setup demonstrate that our distillation method results in more precise VL models of reasoning compared to a baseline of passing a generated referring expression to an LLM.
1 code implementation • 9 Nov 2023 • Ethan Shen, Ali Farhadi, Aditya Kusupati
In this work, we set out to investigate if hierarchical visual representations truly capture the human perceived hierarchy better than standard learned representations.
no code implementations • 7 Nov 2023 • Ainaz Eftekhar, Kuo-Hao Zeng, Jiafei Duan, Ali Farhadi, Ani Kembhavi, Ranjay Krishna
Inspired by selective attention in humans-the process through which people filter their perception based on their experiences, knowledge, and the task at hand-we introduce a parameter-efficient approach to filter visual stimuli for embodied AI.
no code implementations • 21 Oct 2023 • Mohammadreza Salehi, Mehrdad Farajtabar, Maxwell Horton, Fartash Faghri, Hadi Pouransari, Raviteja Vemulapalli, Oncel Tuzel, Ali Farhadi, Mohammad Rastegari, Sachin Mehta
While CLIP is scalable, promptable, and robust to distribution shifts on image classification tasks, it lacks object localization capabilities.
no code implementations • 18 Oct 2023 • Mohammadreza Salehi, Sachin Mehta, Aditya Kusupati, Ali Farhadi, Hannaneh Hajishirzi
We introduce SHARCS for adaptive inference that takes into account the hardness of input samples.
2 code implementations • 11 Oct 2023 • Devvrit, Sneha Kudugunta, Aditya Kusupati, Tim Dettmers, KaiFeng Chen, Inderjit Dhillon, Yulia Tsvetkov, Hannaneh Hajishirzi, Sham Kakade, Ali Farhadi, Prateek Jain
Furthermore, we observe that smaller encoders extracted from a universal MatFormer-based ViT (MatViT) encoder preserve the metric-space structure for adaptive large-scale retrieval.
1 code implementation • NeurIPS 2023 • Matthew Wallingford, Vivek Ramanujan, Alex Fang, Aditya Kusupati, Roozbeh Mottaghi, Aniruddha Kembhavi, Ludwig Schmidt, Ali Farhadi
Performing lightweight updates on the recalled data significantly improves accuracy across a variety of distribution shift and transfer learning benchmarks.
2 code implementations • 31 May 2023 • Maxwell Horton, Sachin Mehta, Ali Farhadi, Mohammad Rastegari
Compared to Perceiver IO, our model requires absolutely no modality-specific processing at inference time, and uses an order of magnitude fewer parameters at equivalent accuracy on ImageNet.
1 code implementation • NeurIPS 2023 • Aniket Rege, Aditya Kusupati, Sharan Ranjit S, Alan Fan, Qingqing Cao, Sham Kakade, Prateek Jain, Ali Farhadi
Finally, we demonstrate that AdANNS can enable inference-time adaptivity for compute-aware search on ANNS indices built non-adaptively on matryoshka representations.
3 code implementations • NeurIPS 2023 • Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang Wei Koh, Olga Saukh, Alexander Ratner, Shuran Song, Hannaneh Hajishirzi, Ali Farhadi, Romain Beaumont, Sewoong Oh, Alex Dimakis, Jenia Jitsev, Yair Carmon, Vaishaal Shankar, Ludwig Schmidt
Multimodal datasets are a critical component in recent breakthroughs such as Stable Diffusion and GPT-4, yet their design does not receive the same research attention as model architectures or training algorithms.
1 code implementation • NeurIPS 2023 • Mitchell Wortsman, Tim Dettmers, Luke Zettlemoyer, Ari Morcos, Ali Farhadi, Ludwig Schmidt
We introduce new methods for 1) accelerating and 2) stabilizing training for large language-vision models.
no code implementations • 24 Apr 2023 • Kuo-Hao Zeng, Luca Weihs, Roozbeh Mottaghi, Ali Farhadi
A common assumption when training embodied agents is that the impact of taking an action is stable; for instance, executing the "move ahead" action will always move the agent forward by a fixed distance, perhaps with some small amount of actuator-induced noise.
1 code implementation • ICCV 2023 • Fartash Faghri, Hadi Pouransari, Sachin Mehta, Mehrdad Farajtabar, Ali Farhadi, Mohammad Rastegari, Oncel Tuzel
Models pretrained on ImageNet+ and fine-tuned on CIFAR-100+, Flowers-102+, and Food-101+, reach up to 3. 4% improved accuracy.
1 code implementation • 8 Mar 2023 • Florian Jaeckle, Fartash Faghri, Ali Farhadi, Oncel Tuzel, Hadi Pouransari
The task of retrieving the most similar data from a gallery set to a given query data is performed through a similarity comparison on features.
1 code implementation • 10 Jan 2023 • Matthew Wallingford, Aditya Kusupati, Alex Fang, Vivek Ramanujan, Aniruddha Kembhavi, Roozbeh Mottaghi, Ali Farhadi
Compositional representations of the world are a promising step towards enabling high-level scene understanding and efficient transfer to downstream tasks.
1 code implementation • 20 Dec 2022 • Sachin Mehta, Saeid Naderiparizi, Fartash Faghri, Maxwell Horton, Lailin Chen, Ali Farhadi, Oncel Tuzel, Mohammad Rastegari
To answer the open question on the importance of magnitude ranges for each augmentation operation, we introduce RangeAugment that allows us to efficiently learn the range of magnitudes for individual as well as composite augmentation operations.
no code implementations • CVPR 2023 • Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, Ali Farhadi
Massive data corpora like WebText, Wikipedia, Conceptual Captions, WebImageText, and LAION have propelled recent dramatic progress in AI.
no code implementations • 9 Dec 2022 • So Yeon Min, Yao-Hung Hubert Tsai, Wei Ding, Ali Farhadi, Ruslan Salakhutdinov, Yonatan Bisk, Jian Zhang
In contrast, our LocCon shows the most robust transfer in the real world among the set of models we compare to, and that the real-world performance of all models can be further improved with self-supervised LocCon in-situ training.
5 code implementations • 8 Dec 2022 • Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Suchin Gururangan, Ludwig Schmidt, Hannaneh Hajishirzi, Ali Farhadi
Changing how pre-trained models behave -- e. g., improving their performance on a downstream task or mitigating biases learned during pre-training -- is a common practice when developing machine learning systems.
no code implementations • CVPR 2023 • Matt Deitke, Rose Hendrix, Luca Weihs, Ali Farhadi, Kiana Ehsani, Aniruddha Kembhavi
Training embodied agents in simulation has become mainstream for the embodied AI community.
no code implementations • 19 Oct 2022 • Mitchell Wortsman, Suchin Gururangan, Shen Li, Ali Farhadi, Ludwig Schmidt, Michael Rabbat, Ari S. Morcos
When fine-tuning DeiT-base and DeiT-large on ImageNet, this procedure matches accuracy in-distribution and improves accuracy under distribution shift compared to the baseline, which observes the same amount of data but communicates gradients at each step.
no code implementations • 13 Oct 2022 • Matt Deitke, Dhruv Batra, Yonatan Bisk, Tommaso Campari, Angel X. Chang, Devendra Singh Chaplot, Changan Chen, Claudia Pérez D'Arpino, Kiana Ehsani, Ali Farhadi, Li Fei-Fei, Anthony Francis, Chuang Gan, Kristen Grauman, David Hall, Winson Han, Unnat Jain, Aniruddha Kembhavi, Jacob Krantz, Stefan Lee, Chengshu Li, Sagnik Majumder, Oleksandr Maksymets, Roberto Martín-Martín, Roozbeh Mottaghi, Sonia Raychaudhuri, Mike Roberts, Silvio Savarese, Manolis Savva, Mohit Shridhar, Niko Sünderhauf, Andrew Szot, Ben Talbot, Joshua B. Tenenbaum, Jesse Thomason, Alexander Toshev, Joanne Truong, Luca Weihs, Jiajun Wu
We present a retrospective on the state of Embodied AI research.
no code implementations • 27 Sep 2022 • Yao-Hung Hubert Tsai, Hanlin Goh, Ali Farhadi, Jian Zhang
The perception system in personalized mobile agents requires developing indoor scene understanding models, which can understand 3D geometries, capture objectiveness, analyze human behaviors, etc.
no code implementations • 23 Sep 2022 • Mario Srouji, Hugues Thomas, Hubert Tsai, Ali Farhadi, Jian Zhang
Collision avoidance is key for mobile robots and agents to operate safely in the real world.
3 code implementations • ICCV 2023 • Sarah Pratt, Ian Covert, Rosanne Liu, Ali Farhadi
Unlike traditional classification models, open-vocabulary models classify among any arbitrary set of categories specified with natural language during inference.
1 code implementation • 10 Aug 2022 • Gabriel Ilharco, Mitchell Wortsman, Samir Yitzhak Gadre, Shuran Song, Hannaneh Hajishirzi, Simon Kornblith, Ali Farhadi, Ludwig Schmidt
We study model patching, where the goal is to improve accuracy on specific tasks without degrading accuracy on tasks where performance is already adequate.
2 code implementations • 27 Jul 2022 • Aaron Walsman, Muru Zhang, Klemen Kotar, Karthik Desingh, Ali Farhadi, Dieter Fox
We pair this simulator with a new dataset of fan-made LEGO creations that have been uploaded to the internet in order to provide complex scenes containing over a thousand unique brick shapes.
no code implementations • 14 Jun 2022 • Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Jordi Salvador, Kiana Ehsani, Winson Han, Eric Kolve, Ali Farhadi, Aniruddha Kembhavi, Roozbeh Mottaghi
Massive datasets and high-capacity models have driven many recent advancements in computer vision and natural language understanding.
3 code implementations • 26 May 2022 • Aditya Kusupati, Gantavya Bhatt, Aniket Rege, Matthew Wallingford, Aditya Sinha, Vivek Ramanujan, William Howard-Snyder, KaiFeng Chen, Sham Kakade, Prateek Jain, Ali Farhadi
The flexibility within the learned Matryoshka Representations offer: (a) up to 14x smaller embedding size for ImageNet-1K classification at the same level of accuracy; (b) up to 14x real-world speed-ups for large-scale retrieval on ImageNet-1K and 4K; and (c) up to 2% accuracy improvements for long-tail few-shot classification, all while being as robust as the original representations.
Ranked #25 on Image Classification on ObjectNet (using extra training data)
no code implementations • 15 Mar 2022 • Kiana Ehsani, Ali Farhadi, Aniruddha Kembhavi, Roozbeh Mottaghi
Object manipulation is a critical skill required for Embodied AI agents interacting with the world around them.
6 code implementations • 10 Mar 2022 • Mitchell Wortsman, Gabriel Ilharco, Samir Yitzhak Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S. Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, Ludwig Schmidt
The conventional recipe for maximizing model accuracy is to (1) train multiple models with various hyperparameters and (2) pick the individual model which performs best on a held-out validation set, discarding the remainder.
Ranked #1 on Image Classification on ImageNet V2 (using extra training data)
no code implementations • CVPR 2022 • Rowan Zellers, Jiasen Lu, Ximing Lu, Youngjae Yu, Yanpeng Zhao, Mohammadreza Salehi, Aditya Kusupati, Jack Hessel, Ali Farhadi, Yejin Choi
Given a video, we replace snippets of text and audio with a MASK token; the model learns by choosing the correct masked-out snippet.
Ranked #6 on Action Classification on Kinetics-600 (using extra training data)
1 code implementation • 2 Jan 2022 • Sarah Pratt, Luca Weihs, Ali Farhadi
While traditional embodied agents manipulate an environment to best achieve a goal, we argue for an introspective agent, which considers its own abilities in the context of its environment.
1 code implementation • CVPR 2022 • Vivek Ramanujan, Pavan Kumar Anasosalu Vasu, Ali Farhadi, Oncel Tuzel, Hadi Pouransari
To avoid the cost of backfilling, BCT modifies training of the new model to make its representations compatible with those of the old model.
1 code implementation • EMNLP 2021 • Christopher Clark, Jordi Salvador, Dustin Schwenk, Derrick Bonafilia, Mark Yatskar, Eric Kolve, Alvaro Herrasti, Jonghyun Choi, Sachin Mehta, Sam Skjonsberg, Carissa Schoenick, Aaron Sarnat, Hannaneh Hajishirzi, Aniruddha Kembhavi, Oren Etzioni, Ali Farhadi
We investigate these challenges in the context of Iconary, a collaborative game of drawing and guessing based on Pictionary, that poses a novel challenge for the research community.
1 code implementation • 8 Oct 2021 • Elvis Nunez, Maxwell Horton, Anish Prabhu, Anurag Ranjan, Ali Farhadi, Mohammad Rastegari
Our models require no retraining, thus our subspace of models can be deployed entirely on-device to allow adaptive network compression at inference time.
3 code implementations • CVPR 2022 • Mitchell Wortsman, Gabriel Ilharco, Jong Wook Kim, Mike Li, Simon Kornblith, Rebecca Roelofs, Raphael Gontijo-Lopes, Hannaneh Hajishirzi, Ali Farhadi, Hongseok Namkoong, Ludwig Schmidt
Compared to standard fine-tuning, WiSE-FT provides large accuracy improvements under distribution shift, while preserving high accuracy on the target distribution.
Ranked #12 on Image Classification on ObjectNet (using extra training data)
no code implementations • 7 Jul 2021 • Junha Roh, Karthik Desingh, Ali Farhadi, Dieter Fox
Specifically, given a reconstructed 3D scene in the form of point clouds with 3D bounding boxes of potential object candidates, and a language utterance referring to a target object in the scene, our model successfully identifies the target object from a set of potential candidates.
1 code implementation • NeurIPS 2021 • Rowan Zellers, Ximing Lu, Jack Hessel, Youngjae Yu, Jae Sung Park, Jize Cao, Ali Farhadi, Yejin Choi
As humans, we understand events in the visual world contextually, performing multimodal reasoning across time to make inferences about the past, present, and future.
1 code implementation • NeurIPS 2021 • Aditya Kusupati, Matthew Wallingford, Vivek Ramanujan, Raghav Somani, Jae Sung Park, Krishna Pillutla, Prateek Jain, Sham Kakade, Ali Farhadi
We further quantitatively measure the quality of our codes by applying it to the efficient image retrieval as well as out-of-distribution (OOD) detection problems.
no code implementations • ACL 2021 • Rowan Zellers, Ari Holtzman, Matthew Peters, Roozbeh Mottaghi, Aniruddha Kembhavi, Ali Farhadi, Yejin Choi
We propose PIGLeT: a model that learns physical commonsense knowledge through interaction, and then uses this knowledge to ground language.
1 code implementation • CVPR 2021 • Kuo-Hao Zeng, Luca Weihs, Ali Farhadi, Roozbeh Mottaghi
In this paper, we study the problem of interactive navigation where agents learn to change the environment to navigate more efficiently to their goals.
1 code implementation • 20 Feb 2021 • Mitchell Wortsman, Maxwell Horton, Carlos Guestrin, Ali Farhadi, Mohammad Rastegari
Recent observations have advanced our understanding of the neural network optimization landscape, revealing the existence of (1) paths of high accuracy containing diverse solutions and (2) wider minima offering improved performance.
no code implementations • ICLR 2021 • Kiana Ehsani, Daniel Gordon, Thomas Hai Dang Nguyen, Roozbeh Mottaghi, Ali Farhadi
Learning effective representations of visual data that generalize to a variety of downstream tasks has been a long quest for computer vision.
no code implementations • ICLR 2021 • Luca Weihs, Aniruddha Kembhavi, Kiana Ehsani, Sarah M Pratt, Winson Han, Alvaro Herrasti, Eric Kolve, Dustin Schwenk, Roozbeh Mottaghi, Ali Farhadi
A growing body of research suggests that embodied gameplay, prevalent not just in human cultures but across a variety of animal species including turtles and ravens, is critical in developing the neural flexibility for creative problem solving, decision making and socialization.
no code implementations • 18 Nov 2020 • Maxwell Horton, Yanzi Jin, Ali Farhadi, Mohammad Rastegari
We also show how to precondition the network to improve the accuracy of our layer-wise compression method.
1 code implementation • 16 Oct 2020 • Kiana Ehsani, Daniel Gordon, Thomas Nguyen, Roozbeh Mottaghi, Ali Farhadi
Learning effective representations of visual data that generalize to a variety of downstream tasks has been a long quest for computer vision.
no code implementations • ECCV 2020 • Unnat Jain, Luca Weihs, Eric Kolve, Ali Farhadi, Svetlana Lazebnik, Aniruddha Kembhavi, Alexander Schwing
Autonomous agents must learn to collaborate.
2 code implementations • 6 Jul 2020 • Matthew Wallingford, Aditya Kusupati, Keivan Alizadeh-Vahid, Aaron Walsman, Aniruddha Kembhavi, Ali Farhadi
To foster research towards the goal of general ML methods, we introduce a new unified evaluation framework - FLUID (Flexible Sequential Data).
2 code implementations • NeurIPS 2020 • Mitchell Wortsman, Vivek Ramanujan, Rosanne Liu, Aniruddha Kembhavi, Mohammad Rastegari, Jason Yosinski, Ali Farhadi
We present the Supermasks in Superposition (SupSup) model, capable of sequentially learning thousands of tasks without catastrophic forgetting.
no code implementations • NAACL 2021 • Gabriel Ilharco, Rowan Zellers, Ali Farhadi, Hannaneh Hajishirzi
The success of large-scale contextual language models has attracted great interest in probing what is encoded in their representations.
no code implementations • ECCV 2020 • Jae Sung Park, Chandra Bhagavatula, Roozbeh Mottaghi, Ali Farhadi, Yejin Choi
In addition, we provide person-grounding (i. e., co-reference links) between people appearing in the image and people mentioned in the textual commonsense descriptions, allowing for tighter integration between images and text.
1 code implementation • CVPR 2020 • Matt Deitke, Winson Han, Alvaro Herrasti, Aniruddha Kembhavi, Eric Kolve, Roozbeh Mottaghi, Jordi Salvador, Dustin Schwenk, Eli VanderBilt, Matthew Wallingford, Luca Weihs, Mark Yatskar, Ali Farhadi
We argue that interactive and embodied visual AI has reached a stage of development similar to visual recognition prior to the advent of these ecosystems.
1 code implementation • NAACL 2021 • Rowan Zellers, Ari Holtzman, Elizabeth Clark, Lianhui Qin, Ali Farhadi, Yejin Choi
We propose TuringAdvice, a new challenge task and dataset for language understanding models.
3 code implementations • CVPR 2020 • Kiana Ehsani, Shubham Tulsiani, Saurabh Gupta, Ali Farhadi, Abhinav Gupta
Our quantitative and qualitative results show that (a) we can predict meaningful forces from videos whose effects lead to accurate imitation of the motions observed, (b) by jointly optimizing for contact point and force prediction, we can improve the performance on both tasks in comparison to independent training, and (c) we can learn a representation from this model that generalizes to novel objects using few shot examples.
1 code implementation • ECCV 2020 • Sarah Pratt, Mark Yatskar, Luca Weihs, Ali Farhadi, Aniruddha Kembhavi
We introduce Grounded Situation Recognition (GSR), a task that requires producing structured semantic summaries of images describing: the primary activity, entities engaged in the activity with their roles (e. g. agent, tool), and bounding-box groundings of entities.
Ranked #6 on Situation Recognition on imSitu
1 code implementation • 18 Mar 2020 • Daniel Gordon, Kiana Ehsani, Dieter Fox, Ali Farhadi
Recent single image unsupervised representation learning techniques show remarkable success on a variety of tasks.
4 code implementations • 15 Feb 2020 • Jesse Dodge, Gabriel Ilharco, Roy Schwartz, Ali Farhadi, Hannaneh Hajishirzi, Noah Smith
We publicly release all of our experimental data, including training and validation scores for 2, 100 trials, to encourage further analysis of training dynamics during fine-tuning.
1 code implementation • ICML 2020 • Aditya Kusupati, Vivek Ramanujan, Raghav Somani, Mitchell Wortsman, Prateek Jain, Sham Kakade, Ali Farhadi
Sparsity in Deep Neural Networks (DNNs) is studied extensively with the focus of maximizing prediction accuracy given an overall parameter budget.
no code implementations • 17 Dec 2019 • Luca Weihs, Aniruddha Kembhavi, Kiana Ehsani, Sarah M Pratt, Winson Han, Alvaro Herrasti, Eric Kolve, Dustin Schwenk, Roozbeh Mottaghi, Ali Farhadi
A growing body of research suggests that embodied gameplay, prevalent not just in human cultures but across a variety of animal species including turtles and ravens, is critical in developing the neural flexibility for creative problem solving, decision making, and socialization.
1 code implementation • CVPR 2020 • Kuo-Hao Zeng, Roozbeh Mottaghi, Luca Weihs, Ali Farhadi
In this paper we address the problem of visual reaction: the task of interacting with dynamic environments where the changes in the environment are not necessarily caused by the agent itself.
3 code implementations • CVPR 2020 • Vivek Ramanujan, Mitchell Wortsman, Aniruddha Kembhavi, Ali Farhadi, Mohammad Rastegari
Training a neural network is synonymous with learning the values of the weights.
Ranked #934 on Image Classification on ImageNet
no code implementations • 16 Oct 2019 • Junha Roh, Chris Paxton, Andrzej Pronobis, Ali Farhadi, Dieter Fox
Widespread adoption of self-driving cars will depend not only on their safety but largely on their ability to interact with human users.
1 code implementation • ACL 2019 • Minjoon Seo, Jinhyuk Lee, Tom Kwiatkowski, Ankur P. Parikh, Ali Farhadi, Hannaneh Hajishirzi
Existing open-domain question answering (QA) models are not suitable for real-time usage because they need to process several long documents on-demand for every input query.
1 code implementation • CVPR 2020 • Keivan Alizadeh Vahid, Anish Prabhu, Ali Farhadi, Mohammad Rastegari
By replacing pointwise convolutions with BFT, we reduce the computational complexity of these layers from O(n^2) to O(n\log n) with respect to the number of channels.
4 code implementations • NeurIPS 2019 • Mitchell Wortsman, Ali Farhadi, Mohammad Rastegari
In this work we propose a method for discovering neural wirings.
1 code implementation • CVPR 2019 • Kenneth Marino, Mohammad Rastegari, Ali Farhadi, Roozbeh Mottaghi
In this paper, we address the task of knowledge-based visual question answering and provide a benchmark, called OK-VQA, where the image content is not sufficient to answer the questions, encouraging methods that rely on external knowledge resources.
4 code implementations • NeurIPS 2019 • Rowan Zellers, Ari Holtzman, Hannah Rashkin, Yonatan Bisk, Ali Farhadi, Franziska Roesner, Yejin Choi
We find that best current discriminators can classify neural fake news from real, human-written, news with 73% accuracy, assuming access to a moderate level of training data.
Ranked #2 on Fake News Detection on Grover-Mega
2 code implementations • ACL 2019 • Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, Yejin Choi
In this paper, we show that commonsense inference still proves difficult for even state-of-the-art models, by presenting HellaSwag, a new challenge dataset.
Ranked #71 on Sentence Completion on HellaSwag
no code implementations • CVPR 2019 • Unnat Jain, Luca Weihs, Eric Kolve, Mohammad Rastegari, Svetlana Lazebnik, Ali Farhadi, Alexander Schwing, Aniruddha Kembhavi
Collaboration is a necessary skill to perform tasks that are beyond one agent's capabilities.
1 code implementation • CVPR 2019 • Yao-Hung Hubert Tsai, Santosh Divvala, Louis-Philippe Morency, Ruslan Salakhutdinov, Ali Farhadi
Visual relationship reasoning is a crucial yet challenging task for understanding rich interactions across visual concepts.
no code implementations • 6 Jan 2019 • Daniel Gordon, Dieter Fox, Ali Farhadi
In this work we propose Hierarchical Planning and Reinforcement Learning (HIP-RL), a method for merging the benefits and capabilities of Symbolic Planning with the learning abilities of Deep Reinforcement Learning.
1 code implementation • CVPR 2019 • Huiyu Wang, Aniruddha Kembhavi, Ali Farhadi, Alan Yuille, Mohammad Rastegari
We formulate the scaling policy as a non-linear function inside the network's structure that (a) is learned from data, (b) is instance specific, (c) does not add extra computation, and (d) can be applied on any network architecture.
2 code implementations • CVPR 2019 • Mitchell Wortsman, Kiana Ehsani, Mohammad Rastegari, Ali Farhadi, Roozbeh Mottaghi
In this paper we study the problem of learning to learn at both training and test time in the context of visual navigation.
Ranked #2 on Visual Navigation on AI2-THOR
4 code implementations • CVPR 2019 • Rowan Zellers, Yonatan Bisk, Ali Farhadi, Yejin Choi
While this task is easy for humans, it is tremendously difficult for today's vision systems, requiring higher-order cognition and commonsense reasoning about the world.
Multiple-choice Multiple Choice Question Answering (MCQA) +1
1 code implementation • ICLR 2019 • Wei Yang, Xiaolong Wang, Ali Farhadi, Abhinav Gupta, Roozbeh Mottaghi
Do we use the semantic/functional priors we have built over years to efficiently search and navigate?
1 code implementation • 26 Sep 2018 • Keunhong Park, Konstantinos Rematas, Ali Farhadi, Steven M. Seitz
Existing online 3D shape repositories contain thousands of 3D models but lack photorealistic appearance.
4 code implementations • 7 May 2018 • Hessam Bagherinezhad, Maxwell Horton, Mohammad Rastegari, Ali Farhadi
Among the three main components (data, labels, and models) of any supervised learning system, data and models have been the main subjects of active research.
no code implementations • 25 Apr 2018 • Gunnar A. Sigurdsson, Abhinav Gupta, Cordelia Schmid, Ali Farhadi, Karteek Alahari
In this paper we describe the egocentric aspect of the dataset and present annotations for Charades-Ego with 68, 536 activity instances in 68. 8 hours of first and third-person video, making it one of the largest and most diverse egocentric datasets available.
1 code implementation • CVPR 2018 • Gunnar A. Sigurdsson, Abhinav Gupta, Cordelia Schmid, Ali Farhadi, Karteek Alahari
Several theories in cognitive neuroscience suggest that when people interact with the world, or simulate interactions, they do so from a first-person egocentric perspective, and seamlessly transfer knowledge between third-person (observer) and first-person (actor).
1 code implementation • EMNLP 2018 • Minjoon Seo, Tom Kwiatkowski, Ankur P. Parikh, Ali Farhadi, Hannaneh Hajishirzi
We formalize a new modular variant of current question answering tasks by enforcing complete independence of the document encoder from the question encoder.
5 code implementations • ECCV 2018 • Tanmay Gupta, Dustin Schwenk, Ali Farhadi, Derek Hoiem, Aniruddha Kembhavi
Imagining a scene described in natural language with realistic layout and appearance of entities is the ultimate test of spatial, visual, and semantic world knowledge.
314 code implementations • 8 Apr 2018 • Joseph Redmon, Ali Farhadi
At 320x320 YOLOv3 runs in 22 ms at 28. 2 mAP, as accurate as SSD but three times faster.
Ranked #2 on Pedestrian Detection on DVTOD
Classification One-stage Anchor-free Oriented Object Detection +3
no code implementations • ECCV 2018 • Krishna Kumar Singh, Santosh Divvala, Ali Farhadi, Yong Jae Lee
We present a scalable approach for Detecting Objects by transferring Common-sense Knowledge (DOCK) from source to target categories.
1 code implementation • CVPR 2018 • Kiana Ehsani, Hessam Bagherinezhad, Joseph Redmon, Roozbeh Mottaghi, Ali Farhadi
We introduce the task of directly modeling a visually intelligent agent.
2 code implementations • 14 Dec 2017 • Eric Kolve, Roozbeh Mottaghi, Winson Han, Eli VanderBilt, Luca Weihs, Alvaro Herrasti, Matt Deitke, Kiana Ehsani, Daniel Gordon, Yuke Zhu, Aniruddha Kembhavi, Abhinav Gupta, Ali Farhadi
We introduce The House Of inteRactions (THOR), a framework for visual AI research, available at http://ai2thor. allenai. org.
1 code implementation • CVPR 2018 • Daniel Gordon, Aniruddha Kembhavi, Mohammad Rastegari, Joseph Redmon, Dieter Fox, Ali Farhadi
Our experiments show that our proposed model outperforms popular single controller based methods on IQUAD V1.
no code implementations • CVPR 2018 • Jonghyun Choi, Jayant Krishnamurthy, Aniruddha Kembhavi, Ali Farhadi
Diagrams often depict complex phenomena and serve as a good test bed for visual and textual reasoning.
1 code implementation • ICLR 2018 • Minjoon Seo, Sewon Min, Ali Farhadi, Hannaneh Hajishirzi
Inspired by the principles of speed reading, we introduce Skim-RNN, a recurrent neural network (RNN) that dynamically decides to update only a small fraction of the hidden state for relatively unimportant input tokens.
no code implementations • 13 Sep 2017 • Nancy Xin Ru Wang, Ali Farhadi, Rajesh Rao, Bingni Brunton
This paper describes our approach to detect and to predict natural human arm movements in the future, a key challenge in brain computer interfacing that has never before been attempted.
no code implementations • CVPR 2017 • Aniruddha Kembhavi, Minjoon Seo, Dustin Schwenk, Jonghyun Choi, Ali Farhadi, Hannaneh Hajishirzi
Our analysis shows that a significant portion of questions require complex parsing of the text and the diagrams and reasoning, indicating that our dataset is more complex compared to previous machine comprehension and visual question answering datasets.
no code implementations • ICCV 2017 • Yuke Zhu, Daniel Gordon, Eric Kolve, Dieter Fox, Li Fei-Fei, Abhinav Gupta, Roozbeh Mottaghi, Ali Farhadi
A crucial capability of real-world intelligent agents is their ability to plan a sequence of actions to achieve their goals in the visual world.
11 code implementations • 17 May 2017 • Daniel Gordon, Ali Farhadi, Dieter Fox
Robust object tracking requires knowledge and understanding of the object being tracked: its appearance, its motion, and how it changes over time.
1 code implementation • CVPR 2018 • Kiana Ehsani, Roozbeh Mottaghi, Ali Farhadi
Objects often occlude each other in scenes; Inferring their appearance beyond their visible parts plays an important role in scene understanding, depth estimation, object interaction and manipulation.
no code implementations • ICCV 2017 • Roozbeh Mottaghi, Connor Schenck, Dieter Fox, Ali Farhadi
Doing so requires estimating the volume of the cup, approximating the amount of water in the pitcher, and predicting the behavior of water when we tilt the pitcher.
231 code implementations • CVPR 2017 • Joseph Redmon, Ali Farhadi
On the 156 classes not in COCO, YOLO9000 gets 16. 0 mAP.
Ranked #8 on Object Detection on UA-DETRAC
2 code implementations • CVPR 2017 • Gunnar A. Sigurdsson, Santosh Divvala, Ali Farhadi, Abhinav Gupta
Actions are more than just movements and trajectories: we cook to eat and we hold a cup to drink from it.
Ranked #16 on Action Detection on Charades
2 code implementations • CVPR 2017 • Mark Yatskar, Vicente Ordonez, Luke Zettlemoyer, Ali Farhadi
Semantic sparsity is a common challenge in structured visual classification problems; when the output space is complex, the vast majority of the possible predictions are rarely, if ever, seen in the training set.
Ranked #11 on Situation Recognition on imSitu
no code implementations • CVPR 2017 • Hessam Bagherinezhad, Mohammad Rastegari, Ali Farhadi
We introduce LCNN, a lookup-based convolutional neural network that encodes convolutions by few lookups to a dictionary that is trained to cover the space of weights in CNNs.
26 code implementations • 5 Nov 2016 • Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, Hannaneh Hajishirzi
Machine comprehension (MC), answering a query about a given context paragraph, requires modeling complex interactions between the context and the query.
Ranked #1 on Navigate on !(()&&!|*|*| (using extra training data)
2 code implementations • 16 Sep 2016 • Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J. Lim, Abhinav Gupta, Li Fei-Fei, Ali Farhadi
To address the second issue, we propose AI2-THOR framework, which provides an environment with high-quality 3D scenes and physics engine.
no code implementations • 25 Jul 2016 • Gunnar A. Sigurdsson, Olga Russakovsky, Ali Farhadi, Ivan Laptev, Abhinav Gupta
We conclude that the optimal strategy is to ask as many questions as possible in a HIT (up to 52 binary questions after watching a 30-second video clip in our experiments).
2 code implementations • 14 Jun 2016 • Minjoon Seo, Sewon Min, Ali Farhadi, Hannaneh Hajishirzi
In this paper, we study the problem of question answering when reasoning over multiple facts is required.
Ranked #2 on Question Answering on bAbi
no code implementations • CVPR 2016 • Roozbeh Mottaghi, Hessam Bagherinezhad, Mohammad Rastegari, Ali Farhadi
Direct and explicit estimation of the forces and the motion of objects from a single image is extremely challenging.
1 code implementation • CVPR 2016 • Mark Yatskar, Luke Zettlemoyer, Ali Farhadi
This paper introduces situation recognition, the problem of producing a concise summary of the situation an image depicts including: (1) the main activity (e. g., clipping), (2) the participating actors, objects, substances, and locations (e. g., man, shears, sheep, wool, and field) and most importantly (3) the roles these participants play in the activity (e. g., the man is clipping, the shears are his tool, the wool is being clipped from the sheep, and the clipping is in a field).
Ranked #12 on Situation Recognition on imSitu
no code implementations • CVPR 2016 • Roozbeh Mottaghi, Hannaneh Hajishirzi, Ali Farhadi
With the recent progress in visual recognition, we have already started to see a surge of vision related real-world applications.
4 code implementations • 13 Apr 2016 • Junyuan Xie, Ross Girshick, Ali Farhadi
As 3D movie viewing becomes mainstream and Virtual Reality (VR) market emerges, the demand for 3D contents is growing rapidly.
no code implementations • 6 Apr 2016 • Gunnar A. Sigurdsson, Gül Varol, Xiaolong Wang, Ali Farhadi, Ivan Laptev, Abhinav Gupta
Each video is annotated by multiple free-text descriptions, action labels, action intervals and classes of interacted objects.
1 code implementation • 24 Mar 2016 • Aniruddha Kembhavi, Mike Salvato, Eric Kolve, Minjoon Seo, Hannaneh Hajishirzi, Ali Farhadi
We define syntactic parsing of diagrams as learning to infer DPGs for diagrams and study semantic interpretation and reasoning of diagrams in the context of diagram question answering.
no code implementations • 17 Mar 2016 • Roozbeh Mottaghi, Mohammad Rastegari, Abhinav Gupta, Ali Farhadi
To build a dataset of forces in scenes, we reconstructed all images in SUN RGB-D dataset in a physics simulator to estimate the physical movements of objects caused by external forces applied to them.
19 code implementations • 16 Mar 2016 • Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi
We propose two efficient approximations to standard convolutional neural networks: Binary-Weight-Networks and XNOR-Networks.
Ranked #10 on Classification with Binary Neural Network on ImageNet
no code implementations • 2 Feb 2016 • Hessam Bagherinezhad, Hannaneh Hajishirzi, Yejin Choi, Ali Farhadi
In this paper, we introduce a method to automatically infer object sizes, leveraging visual and textual information from web.
no code implementations • 4 Dec 2015 • Babak Saleh, Ahmed Elgammal, Jacob Feldman, Ali Farhadi
In this paper we study various types of atypicalities in images in a more comprehensive way than has been done before.
1 code implementation • CVPR 2016 • Xiaolong Wang, Ali Farhadi, Abhinav Gupta
In this paper, we propose a novel representation for actions by modeling an action as a transformation which changes the state of the environment before the action happens (precondition) to the state after the action (effect).
no code implementations • ICCV 2015 • Bilge Soran, Ali Farhadi, Linda Shapiro
The overall prediction accuracy is 46. 2% when only 10 frames of an action are seen (2/3 of a sec).
22 code implementations • 19 Nov 2015 • Junyuan Xie, Ross Girshick, Ali Farhadi
Clustering is central to many data-driven application domains and has been studied extensively in terms of distance functions and grouping algorithms.
Ranked #4 on Unsupervised Image Classification on SVHN (using extra training data)
no code implementations • 12 Nov 2015 • Roozbeh Mottaghi, Hessam Bagherinezhad, Mohammad Rastegari, Ali Farhadi
Direct and explicit estimation of the forces and the motion of objects from a single image is extremely challenging.
no code implementations • NeurIPS 2015 • Fereshteh Sadeghi, C. Lawrence Zitnick, Ali Farhadi
In this paper, we study the problem of answering visual analogy questions.
no code implementations • ICCV 2015 • Hamid Izadinia, Fereshteh Sadeghi, Santosh Kumar Divvala, Yejin Choi, Ali Farhadi
Next, we show that the association of high-quality segmentations to textual phrases aids in richer semantic understanding and reasoning of these textual phrases.
144 code implementations • CVPR 2016 • Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi
A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation.
Ranked #1 on Real-Time Object Detection on PASCAL VOC 2007
no code implementations • CVPR 2015 • Mohammad Rastegari, Hannaneh Hajishirzi, Ali Farhadi
In this paper we present a bottom-up method to instance-level Multiple Instance Learning (MIL) that learns to discover positive instances with globally constrained reasoning about local pairwise similarities.
no code implementations • CVPR 2015 • Fereshteh Sadeghi, Santosh K. Kumar Divvala, Ali Farhadi
How can we know whether a statement about our world is valid.
no code implementations • 25 Nov 2014 • Hamid Izadinia, Ali Farhadi, Aaron Hertzmann, Matthew D. Hoffman
This paper proposes direct learning of image classification from user-supplied tags, without filtering.
no code implementations • 9 Nov 2014 • Babak Saleh, Ali Farhadi, Ahmed Elgammal
When describing images, humans tend not to talk about the obvious, but rather mention what they find interesting.
no code implementations • CVPR 2014 • Peng Zhang, Jiuling Wang, Ali Farhadi, Martial Hebert, Devi Parikh
We show that a surprisingly straightforward and general approach, that we call ALERT, can predict the likely accuracy (or failure) of a variety of computer vision systems semantic segmentation, vanishing point and camera parameter estimation, and image memorability prediction on individual input images.
no code implementations • CVPR 2014 • Hamid Izadinia, Fereshteh Sadeghi, Ali Farhadi
In this paper, we propose a method to learn scene structures that can encode three main interlacing components of a scene: the scene category, the context-specific appearance of objects, and their layout.
no code implementations • CVPR 2014 • Santosh K. Divvala, Ali Farhadi, Carlos Guestrin
How can we learn a model for any concept that exhaustively covers all its appearance variations, while requiring minimal or no human supervision for compiling the vocabulary of visual variance, gathering the training images and annotations, and learning the models?
no code implementations • CVPR 2013 • Babak Saleh, Ali Farhadi, Ahmed Elgammal
When describing images, humans tend not to talk about the obvious, but rather mention what they find interesting.
no code implementations • CVPR 2013 • Mohammad Rastegari, Ali Diba, Devi Parikh, Ali Farhadi
We exploit a discriminative binary space to compute these geometric quantities efficiently.
no code implementations • CVPR 2013 • Jonghyun Choi, Mohammad Rastegari, Ali Farhadi, Larry S. Davis
We propose a method to expand the visual coverage of training sets that consist of a small number of labeled examples using learned attributes.