2 code implementations • 4 Mar 2024 • Mathilde Caron, Ahmet Iscen, Alireza Fathi, Cordelia Schmid
In this paper, we address web-scale visual entity recognition, specifically the task of mapping a given query image to one of the 6 million existing entities in Wikipedia.
no code implementations • 2 Mar 2024 • Ziniu Hu, Ahmet Iscen, Aashi Jain, Thomas Kipf, Yisong Yue, David A. Ross, Cordelia Schmid, Alireza Fathi
SceneCraft first models a scene graph as a blueprint, detailing the spatial relationships among assets in the scene.
no code implementations • 12 Jun 2023 • Ahmet Iscen, Mathilde Caron, Alireza Fathi, Cordelia Schmid
Contrastive image-text models such as CLIP form the building blocks of many state-of-the-art systems.
Ranked #3 on Fine-Grained Image Recognition on OVEN
no code implementations • CVPR 2023 • Ahmet Iscen, Alireza Fathi, Cordelia Schmid
Retrieval augmented models are becoming increasingly popular for computer vision tasks after their recent success in NLP problems.
Ranked #1 on Image Classification on WebVision-1000 (using extra training data)
no code implementations • 10 Mar 2023 • Hong-Xing Yu, Michelle Guo, Alireza Fathi, Yen-Yu Chang, Eric Ryan Chan, Ruohan Gao, Thomas Funkhouser, Jiajun Wu
We propose Object-Centric Neural Scattering Functions (OSFs) for learning to reconstruct object appearance from only images.
1 code implementation • CVPR 2023 • Ziniu Hu, Ahmet Iscen, Chen Sun, ZiRui Wang, Kai-Wei Chang, Yizhou Sun, Cordelia Schmid, David A. Ross, Alireza Fathi
REVEAL consists of four key components: the memory, the encoder, the retriever and the generator.
Ranked #9 on Visual Question Answering (VQA) on OK-VQA
no code implementations • 10 Oct 2022 • Ahmet Iscen, Thomas Bird, Mathilde Caron, Alireza Fathi, Cordelia Schmid
We study class-incremental learning, a training setup in which new classes of data are observed over time for the model to learn from.
no code implementations • 8 Sep 2022 • Lu Mi, Abhijit Kundu, David Ross, Frank Dellaert, Noah Snavely, Alireza Fathi
We take a step towards addressing this shortcoming by introducing a model that encodes the input image into a disentangled object representation that contains a code for object shape, a code for object appearance, and an estimated camera pose from which the object image is captured.
no code implementations • CVPR 2022 • Abhijit Kundu, Kyle Genova, Xiaoqi Yin, Alireza Fathi, Caroline Pantofaru, Leonidas Guibas, Andrea Tagliasacchi, Frank Dellaert, Thomas Funkhouser
Our model builds a panoptic radiance field representation of any scene from just color images.
1 code implementation • 21 Apr 2022 • Chenfeng Xu, Tian Li, Chen Tang, Lingfeng Sun, Kurt Keutzer, Masayoshi Tomizuka, Alireza Fathi, Wei Zhan
It is hard to replicate these approaches in trajectory forecasting due to the lack of adequate trajectory data (e. g., 34K samples in the nuScenes dataset).
no code implementations • 15 Dec 2020 • Michelle Guo, Alireza Fathi, Jiajun Wu, Thomas Funkhouser
We present a method for composing photorealistic scenes from captured images of objects.
no code implementations • 24 Sep 2020 • Yue Wang, Alireza Fathi, Jiajun Wu, Thomas Funkhouser, Justin Solomon
A common dilemma in 3D object detection for autonomous driving is that high-quality, dense point clouds are only available during training, but not testing.
1 code implementation • ECCV 2020 • Abhijit Kundu, Xiaoqi Yin, Alireza Fathi, David Ross, Brian Brewington, Thomas Funkhouser, Caroline Pantofaru
Features from multiple per view predictions are finally fused on 3D mesh vertices to predict mesh semantic segmentation labels.
Ranked #12 on Semantic Segmentation on ScanNet
no code implementations • ECCV 2020 • Rui Huang, Wanyue Zhang, Abhijit Kundu, Caroline Pantofaru, David A. Ross, Thomas Funkhouser, Alireza Fathi
We use a U-Net style 3D sparse convolution network to extract features for each frame's LiDAR point-cloud.
1 code implementation • ECCV 2020 • Yue Wang, Alireza Fathi, Abhijit Kundu, David Ross, Caroline Pantofaru, Thomas Funkhouser, Justin Solomon
We present a simple and flexible object detection framework optimized for autonomous driving.
no code implementations • CVPR 2020 • Mahyar Najibi, Guangda Lai, Abhijit Kundu, Zhichao Lu, Vivek Rathod, Thomas Funkhouser, Caroline Pantofaru, David Ross, Larry S. Davis, Alireza Fathi
In contrast, we propose a general-purpose method that works on both indoor and outdoor scenes.
1 code implementation • 30 Mar 2020 • Francis Engelmann, Martin Bokeloh, Alireza Fathi, Bastian Leibe, Matthias Nießner
We show that grouping proposals improves over NMS and outperforms previous state-of-the-art methods on the tasks of 3D object detection and semantic instance segmentation on the ScanNetV2 benchmark and the S3DIS dataset.
Ranked #1 on 3D Semantic Instance Segmentation on ScanNetV2
1 code implementation • 16 Jun 2019 • Steven Hickson, Karthik Raveendran, Alireza Fathi, Kevin Murphy, Irfan Essa
We propose 4 insights that help to significantly improve the performance of deep learning models that predict surface normals and semantic labels from a single RGB image.
Ranked #1 on Semantic Segmentation on ScanNetV2 (Pixel Accuracy metric)
1 code implementation • ECCV 2018 • Carl Vondrick, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, Kevin Murphy
We use large amounts of unlabeled video to learn models for visual tracking without manual human supervision.
no code implementations • CVPR 2018 • Siyang Li, Bryan Seybold, Alexey Vorobyov, Alireza Fathi, Qin Huang, C. -C. Jay Kuo
We propose a method for unsupervised video object segmentation by transferring the knowledge encapsulated in image-based instance embedding networks.
1 code implementation • 18 Jul 2017 • Zbigniew Wojna, Vittorio Ferrari, Sergio Guadarrama, Nathan Silberman, Liang-Chieh Chen, Alireza Fathi, Jasper Uijlings
Many machine vision applications, such as semantic segmentation and depth prediction, require predictions for every pixel of the input image.
1 code implementation • 30 Mar 2017 • Alireza Fathi, Zbigniew Wojna, Vivek Rathod, Peng Wang, Hyun Oh Song, Sergio Guadarrama, Kevin P. Murphy
We propose a new method for semantic instance segmentation, by first computing how likely two pixels are to belong to the same object, and then by grouping similar pixels together.
14 code implementations • CVPR 2017 • Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang song, Sergio Guadarrama, Kevin Murphy
On the opposite end in which accuracy is critical, we present a detector that achieves state-of-the-art performance measured on the COCO detection task.
Ranked #220 on Object Detection on COCO test-dev (using extra training data)
no code implementations • 23 Jun 2014 • Serena Yeung, Alireza Fathi, Li Fei-Fei
In this paper we present VideoSET, a method for Video Summary Evaluation through Text that can evaluate how well a video summary is able to retain the semantic information contained in its original video.
no code implementations • 15 Dec 2013 • Ahmad Mozaffari, Alireza Fathi
In last decades optimization and control of complex systems that possessed various conflicted objectives simultaneously attracted an incremental interest of scientists.
no code implementations • 14 Dec 2013 • Ahmad Mozaffari, Alireza Fathi
The obtained results confirm the acceptable performance of the proposed method in both robustness and quality for different bench-mark optimizing problems and also prove the authors claim.
no code implementations • CVPR 2013 • Alireza Fathi, James M. Rehg
The key to differentiating these actions is the ability to identify how they change the state of objects and materials in the environment.