1 code implementation • 1 Dec 2023 • Yutong Bai, Xinyang Geng, Karttikeya Mangalam, Amir Bar, Alan Yuille, Trevor Darrell, Jitendra Malik, Alexei A Efros
We introduce a novel sequential modeling approach which enables learning a Large Vision Model (LVM) without making use of any linguistic data.
2 code implementations • CVPR 2019 • Shiry Ginosar, Amir Bar, Gefen Kohavi, Caroline Chan, Andrew Owens, Jitendra Malik
Specifically, we perform cross-modal translation from "in-the-wild'' monologue speech of a single speaker to their hand and arm motion.
Ranked #4 on Gesture Generation on BEAT
1 code implementation • CVPR 2022 • Amir Bar, Xin Wang, Vadim Kantorov, Colorado J Reed, Roei Herzig, Gal Chechik, Anna Rohrbach, Trevor Darrell, Amir Globerson
Recent self-supervised pretraining methods for object detection largely focus on pretraining the backbone of the object detector, neglecting key parts of detection architecture.
Ranked #1 on Few-Shot Object Detection on COCO 2017
1 code implementation • 1 Sep 2022 • Amir Bar, Yossi Gandelsman, Trevor Darrell, Amir Globerson, Alexei A. Efros
How does one adapt a pre-trained visual model to novel downstream tasks without task-specific finetuning or any model modification?
Ranked #5 on Personalized Segmentation on PerSeg
3 code implementations • 5 Jun 2017 • Ofir Press, Amir Bar, Ben Bogin, Jonathan Berant, Lior Wolf
Generative Adversarial Networks (GANs) have shown great promise recently in image generation.
1 code implementation • CVPR 2022 • Roei Herzig, Elad Ben-Avraham, Karttikeya Mangalam, Amir Bar, Gal Chechik, Anna Rohrbach, Trevor Darrell, Amir Globerson
In this work, we present Object-Region Video Transformers (ORViT), an \emph{object-centric} approach that extends video transformer layers with a block that directly incorporates object representations.
Ranked #6 on Action Recognition on Diving-48
2 code implementations • ECCV 2020 • Roei Herzig, Amir Bar, Huijuan Xu, Gal Chechik, Trevor Darrell, Amir Globerson
Generating realistic images of complex visual scenes becomes challenging when one wishes to control the structure of the generated images.
Ranked #3 on Layout-to-Image Generation on Visual Genome 256x256
1 code implementation • 27 Jun 2020 • Amir Bar, Roei Herzig, Xiaolong Wang, Anna Rohrbach, Gal Chechik, Trevor Darrell, Amir Globerson
Our generative model for this task (AG2Vid) disentangles motion and appearance features, and by incorporating a scheduling mechanism for actions facilitates a timely and coordinated video generation.
1 code implementation • 8 Apr 2024 • Alberto Hojel, Yutong Bai, Trevor Darrell, Amir Globerson, Amir Bar
In this work, we analyze the activations of MAE-VQGAN, a recent Visual Prompting model, and find task vectors, activations that encode task-specific information.
no code implementations • 6 Jun 2017 • Amir Bar, Lior Wolf, Orna Bergman Amitai, Eyal Toledano, Eldad Elnekave
Finally a Recurrent Neural Network (RNN) is utilized to predict whether a vertebral fracture is present in the series of patches.
no code implementations • 28 May 2019 • David Chettrit, Orna Bregman Amitai, Itamar Tamir, Amir Bar, Eldad Elnekave
Secondary pulmonary hypertension is a manifestation of advanced COPD, which can be reliably diagnosed by the main Pulmonary Artery (PA) to Ascending Aorta (Ao) ratio.
no code implementations • 29 Jun 2019 • Amir Bar, Michal Mauda, Yoni Turner, Michal Safadi, Eldad Elnekave
Head CT is one of the most commonly performed imaging studied in the Emergency Department setting and Intracranial hemorrhage (ICH) is among the most critical and timesensitive findings to be detected on Head CT. We present BloodNet, a deep learning architecture designed for optimal triaging of Head CTs, with the goal of decreasing the time from CT acquisition to accurate ICH detection.
no code implementations • 8 Oct 2020 • David Chettrit, Tomer Meir, Hila Lebel, Mila Orlovsky, Ronen Gordon, Ayelet Akselrod-Ballin, Amir Bar
An osteoporosis-related fracture occurs every three seconds worldwide, affecting one in three women and one in five men aged over 50.
no code implementations • 13 Jun 2022 • Elad Ben-Avraham, Roei Herzig, Karttikeya Mangalam, Amir Bar, Anna Rohrbach, Leonid Karlinsky, Trevor Darrell, Amir Globerson
We explore a particular instantiation of scene structure, namely a \emph{Hand-Object Graph}, consisting of hands and objects with their locations as nodes, and physical relations of contact/no-contact as edges.
no code implementations • 15 Jun 2022 • Elad Ben-Avraham, Roei Herzig, Karttikeya Mangalam, Amir Bar, Anna Rohrbach, Leonid Karlinsky, Trevor Darrell, Amir Globerson
First, as both images and videos contain structured information, we enrich a transformer model with a set of \emph{object tokens} that can be used across images and videos.
Point- of-no-return (PNR) temporal localization Temporal Localization
no code implementations • 24 Apr 2023 • Randall Balestriero, Mark Ibrahim, Vlad Sobal, Ari Morcos, Shashank Shekhar, Tom Goldstein, Florian Bordes, Adrien Bardes, Gregoire Mialon, Yuandong Tian, Avi Schwarzschild, Andrew Gordon Wilson, Jonas Geiping, Quentin Garrido, Pierre Fernandez, Amir Bar, Hamed Pirsiavash, Yann Lecun, Micah Goldblum
Self-supervised learning, dubbed the dark matter of intelligence, is a promising path to advance machine learning.
no code implementations • 31 Jul 2023 • Amir Bar, Florian Bordes, Assaf Shocher, Mahmoud Assran, Pascal Vincent, Nicolas Ballas, Trevor Darrell, Amir Globerson, Yann Lecun
Masked Image Modeling (MIM) is a promising self-supervised learning approach that enables learning from unlabeled images.
no code implementations • 29 Nov 2023 • Dantong Niu, Amir Bar, Roei Herzig, Trevor Darrell, Anna Rohrbach
Existing video-based action recognition systems typically require dense annotation and struggle in environments when there is significant distribution shift relative to the training data.
no code implementations • 4 Dec 2023 • Jiarui Xu, Yossi Gandelsman, Amir Bar, Jianwei Yang, Jianfeng Gao, Trevor Darrell, Xiaolong Wang
Given a textual description of a visual task (e. g. "Left: input image, Right: foreground segmentation"), a few input-output visual examples, or both, the model in-context learns to solve it for a new test input.
no code implementations • 15 Apr 2024 • Amir Bar, Arya Bakhtiar, Danny Tran, Antonio Loquercio, Jathushan Rajasegaran, Yann Lecun, Amir Globerson, Trevor Darrell
Animals perceive the world to plan their actions and interact with other agents to accomplish complex tasks, demonstrating capabilities that are still unmatched by AI systems.