Search Results for author: Alireza Fathi

Found 27 papers, 11 papers with code

A Generative Approach for Wikipedia-Scale Visual Entity Recognition

2 code implementations • 4 Mar 2024 • Mathilde Caron, Ahmet Iscen, Alireza Fathi, Cordelia Schmid

In this paper, we address web-scale visual entity recognition, specifically the task of mapping a given query image to one of the 6 million existing entities in Wikipedia.

3,057

Paper
Code

SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code

no code implementations • 2 Mar 2024 • Ziniu Hu, Ahmet Iscen, Aashi Jain, Thomas Kipf, Yisong Yue, David A. Ross, Cordelia Schmid, Alireza Fathi

SceneCraft first models a scene graph as a blueprint, detailing the spatial relationships among assets in the scene.

Language Modelling Large Language Model

Paper
Add Code

Retrieval-Enhanced Contrastive Vision-Text Models

no code implementations • 12 Jun 2023 • Ahmet Iscen, Mathilde Caron, Alireza Fathi, Cordelia Schmid

Contrastive image-text models such as CLIP form the building blocks of many state-of-the-art systems.

Ranked #3 on Fine-Grained Image Recognition on OVEN

Fine-Grained Image Recognition Retrieval

Paper
Add Code

Improving Image Recognition by Retrieving from Web-Scale Image-Text Data

no code implementations • CVPR 2023 • Ahmet Iscen, Alireza Fathi, Cordelia Schmid

Retrieval augmented models are becoming increasingly popular for computer vision tasks after their recent success in NLP problems.

Ranked #1 on Image Classification on WebVision-1000 (using extra training data)

Learning with noisy labels Long-tail Learning

Paper
Add Code

Learning Object-Centric Neural Scattering Functions for Free-Viewpoint Relighting and Scene Composition

no code implementations • 10 Mar 2023 • Hong-Xing Yu, Michelle Guo, Alireza Fathi, Yen-Yu Chang, Eric Ryan Chan, Ruohan Gao, Thomas Funkhouser, Jiajun Wu

We propose Object-Centric Neural Scattering Functions (OSFs) for learning to reconstruct object appearance from only images.

Inverse Rendering Object

Paper
Add Code

REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory

1 code implementation • CVPR 2023 • Ziniu Hu, Ahmet Iscen, Chen Sun, ZiRui Wang, Kai-Wei Chang, Yizhou Sun, Cordelia Schmid, David A. Ross, Alireza Fathi

REVEAL consists of four key components: the memory, the encoder, the retriever and the generator.

Ranked #9 on Visual Question Answering (VQA) on OK-VQA

Image Captioning Language Modelling +4

3,057

Paper
Code

A Memory Transformer Network for Incremental Learning

no code implementations • 10 Oct 2022 • Ahmet Iscen, Thomas Bird, Mathilde Caron, Alireza Fathi, Cordelia Schmid

We study class-incremental learning, a training setup in which new classes of data are observed over time for the model to learn from.

Class Incremental Learning Incremental Learning

Paper
Add Code

im2nerf: Image to Neural Radiance Field in the Wild

no code implementations • 8 Sep 2022 • Lu Mi, Abhijit Kundu, David Ross, Frank Dellaert, Noah Snavely, Alireza Fathi

We take a step towards addressing this shortcoming by introducing a model that encodes the input image into a disentangled object representation that contains a code for object shape, a code for object appearance, and an estimated camera pose from which the object image is captured.

Novel View Synthesis Object

Paper
Add Code

Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation

no code implementations • CVPR 2022 • Abhijit Kundu, Kyle Genova, Xiaoqi Yin, Alireza Fathi, Caroline Pantofaru, Leonidas Guibas, Andrea Tagliasacchi, Frank Dellaert, Thomas Funkhouser

Our model builds a panoptic radiance field representation of any scene from just color images.

3D scene Editing Depth Estimation +4

Paper
Add Code

PreTraM: Self-Supervised Pre-training via Connecting Trajectory and Map

1 code implementation • 21 Apr 2022 • Chenfeng Xu, Tian Li, Chen Tang, Lingfeng Sun, Kurt Keutzer, Masayoshi Tomizuka, Alireza Fathi, Wei Zhan

It is hard to replicate these approaches in trajectory forecasting due to the lack of adequate trajectory data (e. g., 34K samples in the nuScenes dataset).

Contrastive Learning Representation Learning +1

Paper
Code

Object-Centric Neural Scene Rendering

no code implementations • 15 Dec 2020 • Michelle Guo, Alireza Fathi, Jiajun Wu, Thomas Funkhouser

We present a method for composing photorealistic scenes from captured images of objects.

Object

Paper
Add Code

Multi-Frame to Single-Frame: Knowledge Distillation for 3D Object Detection

no code implementations • 24 Sep 2020 • Yue Wang, Alireza Fathi, Jiajun Wu, Thomas Funkhouser, Justin Solomon

A common dilemma in 3D object detection for autonomous driving is that high-quality, dense point clouds are only available during training, but not testing.

3D Object Detection Autonomous Driving +3

Paper
Add Code

Virtual Multi-view Fusion for 3D Semantic Segmentation

1 code implementation • ECCV 2020 • Abhijit Kundu, Xiaoqi Yin, Alireza Fathi, David Ross, Brian Brewington, Thomas Funkhouser, Caroline Pantofaru

Features from multiple per view predictions are finally fused on 3D mesh vertices to predict mesh semantic segmentation labels.

Ranked #12 on Semantic Segmentation on ScanNet

2D Semantic Segmentation 3D Semantic Segmentation +2

Paper
Code

An LSTM Approach to Temporal 3D Object Detection in LiDAR Point Clouds

no code implementations • ECCV 2020 • Rui Huang, Wanyue Zhang, Abhijit Kundu, Caroline Pantofaru, David A. Ross, Thomas Funkhouser, Alireza Fathi

We use a U-Net style 3D sparse convolution network to extract features for each frame's LiDAR point-cloud.

3D Object Detection Autonomous Driving +2

Paper
Add Code

Pillar-based Object Detection for Autonomous Driving

1 code implementation • ECCV 2020 • Yue Wang, Alireza Fathi, Abhijit Kundu, David Ross, Caroline Pantofaru, Thomas Funkhouser, Justin Solomon

We present a simple and flexible object detection framework optimized for autonomous driving.

3D Object Detection Autonomous Driving +2

133

Paper
Code

DOPS: Learning to Detect 3D Objects and Predict their 3D Shapes

no code implementations • CVPR 2020 • Mahyar Najibi, Guangda Lai, Abhijit Kundu, Zhichao Lu, Vivek Rathod, Thomas Funkhouser, Caroline Pantofaru, David Ross, Larry S. Davis, Alireza Fathi

In contrast, we propose a general-purpose method that works on both indoor and outdoor scenes.

3D Object Detection Autonomous Driving +3

Paper
Add Code

3D-MPA: Multi Proposal Aggregation for 3D Semantic Instance Segmentation

1 code implementation • 30 Mar 2020 • Francis Engelmann, Martin Bokeloh, Alireza Fathi, Bastian Leibe, Matthias Nießner

We show that grouping proposals improves over NMS and outperforms previous state-of-the-art methods on the tasks of 3D object detection and semantic instance segmentation on the ScanNetV2 benchmark and the S3DIS dataset.

Ranked #1 on 3D Semantic Instance Segmentation on ScanNetV2

3D Instance Segmentation 3D Object Detection +3

Paper
Code

Floors are Flat: Leveraging Semantics for Real-Time Surface Normal Prediction

1 code implementation • 16 Jun 2019 • Steven Hickson, Karthik Raveendran, Alireza Fathi, Kevin Murphy, Irfan Essa

We propose 4 insights that help to significantly improve the performance of deep learning models that predict surface normals and semantic labels from a single RGB image.

Ranked #1 on Semantic Segmentation on ScanNetV2 (Pixel Accuracy metric)

Semantic Segmentation Surface Normals Estimation +1

Paper
Code

Tracking Emerges by Colorizing Videos

1 code implementation • ECCV 2018 • Carl Vondrick, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, Kevin Murphy

We use large amounts of unlabeled video to learn models for visual tracking without manual human supervision.

Ranked #2 on Skeleton Based Action Recognition on JHMDB Pose Tracking

Colorization Optical Flow Estimation +2

Paper
Code

Instance Embedding Transfer to Unsupervised Video Object Segmentation

no code implementations • CVPR 2018 • Siyang Li, Bryan Seybold, Alexey Vorobyov, Alireza Fathi, Qin Huang, C. -C. Jay Kuo

We propose a method for unsupervised video object segmentation by transferring the knowledge encapsulated in image-based instance embedding networks.

Object Optical Flow Estimation +4

Paper
Add Code

The Devil is in the Decoder: Classification, Regression and GANs

1 code implementation • 18 Jul 2017 • Zbigniew Wojna, Vittorio Ferrari, Sergio Guadarrama, Nathan Silberman, Liang-Chieh Chen, Alireza Fathi, Jasper Uijlings

Many machine vision applications, such as semantic segmentation and depth prediction, require predictions for every pixel of the input image.

Boundary Detection Decoder +5

Paper
Code

Semantic Instance Segmentation via Deep Metric Learning

1 code implementation • 30 Mar 2017 • Alireza Fathi, Zbigniew Wojna, Vivek Rathod, Peng Wang, Hyun Oh Song, Sergio Guadarrama, Kevin P. Murphy

We propose a new method for semantic instance segmentation, by first computing how likely two pixels are to belong to the same object, and then by grouping similar pixels together.

Ranked #3 on Object Proposal Generation on PASCAL VOC 2012, 60 proposals per image

Instance Segmentation Metric Learning +3

Paper
Code

Speed/accuracy trade-offs for modern convolutional object detectors

14 code implementations • CVPR 2017 • Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang song, Sergio Guadarrama, Kevin Murphy

On the opposite end in which accuracy is critical, we present a detector that achieves state-of-the-art performance measured on the COCO detection task.

Ranked #220 on Object Detection on COCO test-dev (using extra training data)

Object object-detection +1

76,675

Paper
Code

VideoSET: Video Summary Evaluation through Text

no code implementations • 23 Jun 2014 • Serena Yeung, Alireza Fathi, Li Fei-Fei

In this paper we present VideoSET, a method for Video Summary Evaluation through Text that can evaluate how well a video summary is able to retain the semantic information contained in its original video.

Paper
Add Code

An introduction to synchronous self-learning Pareto strategy

no code implementations • 15 Dec 2013 • Ahmad Mozaffari, Alireza Fathi

In last decades optimization and control of complex systems that possessed various conflicted objectives simultaneously attracted an incremental interest of scientists.

Evolutionary Algorithms Self-Learning

Paper
Add Code

A natural-inspired optimization machine based on the annual migration of salmons in nature

no code implementations • 14 Dec 2013 • Ahmad Mozaffari, Alireza Fathi

The obtained results confirm the acceptable performance of the proposed method in both robustness and quality for different bench-mark optimizing problems and also prove the authors claim.

Fault Detection

Paper
Add Code

Modeling Actions through State Changes

no code implementations • CVPR 2013 • Alireza Fathi, James M. Rehg

The key to differentiating these actions is the ability to identify how they change the state of objects and materials in the environment.

Action Recognition Temporal Action Localization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.