Search Results for author: Omid Poursaeed

Found 16 papers, 4 papers with code

Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs

no code implementations • 11 Apr 2024 • Kanchana Ranasinghe, Satya Narayan Shukla, Omid Poursaeed, Michael S. Ryoo, Tsung-Yu Lin

Integration of Large Language Models (LLMs) into visual domain tasks, resulting in visual-LLMs (V-LLMs), has enabled exceptional performance in vision-language tasks, particularly for visual question answering (VQA).

Descriptive Hallucination +2

Paper
Add Code

Universal Pyramid Adversarial Training for Improved ViT Performance

no code implementations • 26 Dec 2023 • Ping-Yeh Chiang, Yipin Zhou, Omid Poursaeed, Satya Narayan Shukla, Ashish Shah, Tom Goldstein, Ser-Nam Lim

Recently, Pyramid Adversarial training (Herrmann et al., 2022) has been shown to be very effective for improving clean accuracy and distribution-shift robustness of vision transformers.

Paper
Add Code

Revisiting Kernel Temporal Segmentation as an Adaptive Tokenizer for Long-form Video Understanding

no code implementations • 20 Sep 2023 • Mohamed Afham, Satya Narayan Shukla, Omid Poursaeed, Pengchuan Zhang, Ashish Shah, SerNam Lim

While most modern video understanding models operate on short-range clips, real-world videos are often several minutes long with semantically consistent segments of variable length.

Temporal Action Localization Video Classification +1

Paper
Add Code

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

2 code implementations • 1 Jun 2023 • Chaitanya Ryali, Yuan-Ting Hu, Daniel Bolya, Chen Wei, Haoqi Fan, Po-Yao Huang, Vaibhav Aggarwal, Arkabandhu Chowdhury, Omid Poursaeed, Judy Hoffman, Jitendra Malik, Yanghao Li, Christoph Feichtenhofer

Modern hierarchical vision transformers have added several vision-specific components in the pursuit of supervised classification performance.

Ranked #1 on Image Classification on iNaturalist 2019 (using extra training data)

Action Classification Action Recognition In Videos +4

691

Paper
Code

Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning

no code implementations • CVPR 2023 • Jishnu Mukhoti, Tsung-Yu Lin, Omid Poursaeed, Rui Wang, Ashish Shah, Philip H. S. Torr, Ser-Nam Lim

We introduce Patch Aligned Contrastive Learning (PACL), a modified compatibility function for CLIP's contrastive loss, intending to train an alignment between the patch tokens of the vision encoder and the CLS token of the text encoder.

Ranked #1 on Open Vocabulary Semantic Segmentation on Cityscape-171

Contrastive Learning Image Classification +5

Paper
Add Code

Unifying Tracking and Image-Video Object Detection

no code implementations • 20 Nov 2022 • Peirong Liu, Rui Wang, Pengchuan Zhang, Omid Poursaeed, Yipin Zhou, Xuefei Cao, Sreya Dutta Roy, Ashish Shah, Ser-Nam Lim

We propose TrIVD (Tracking and Image-Video Detection), the first framework that unifies image OD, video OD, and MOT within one end-to-end model.

Multi-Object Tracking Object +2

Paper
Add Code

Robustness and Generalization via Generative Adversarial Training

no code implementations • ICCV 2021 • Omid Poursaeed, Tianxing Jiang, Harry Yang, Serge Belongie, SerNam Lim

Adversarial training with these examples enable the model to withstand a wide range of attacks by observing a variety of input alterations during training.

object-detection Object Detection

Paper
Add Code

Augmentation-Interpolative AutoEncoders for Unsupervised Few-Shot Image Generation

no code implementations • 25 Nov 2020 • Davis Wertheimer, Omid Poursaeed, Bharath Hariharan

We aim to build image generation models that generalize to new domains from few examples.

Image Generation

Paper
Add Code

Self-supervised Learning of Point Clouds via Orientation Estimation

1 code implementation • 1 Aug 2020 • Omid Poursaeed, Tianxing Jiang, Han Qiao, Nayun Xu, Vladimir G. Kim

A point cloud can be rotated in infinitely many ways, which provides a rich label-free source for self-supervision.

Ranked #10 on 3D Point Cloud Linear Classification on ModelNet40

3D Point Cloud Linear Classification Self-Supervised Learning +1

Paper
Code

Coupling Explicit and Implicit Surface Representations for Generative 3D Modeling

no code implementations • ECCV 2020 • Omid Poursaeed, Matthew Fisher, Noam Aigerman, Vladimir G. Kim

We propose a novel neural architecture for representing 3D surfaces, which harnesses two complementary shape representations: (i) an explicit representation via an atlas, i. e., embeddings of 2D domains into 3D; (ii) an implicit-function representation, i. e., a scalar function over the 3D volume, with its levels denoting surfaces.

Surface Reconstruction

Paper
Add Code

Fine-grained Synthesis of Unrestricted Adversarial Examples

no code implementations • 20 Nov 2019 • Omid Poursaeed, Tianxing Jiang, Yordanos Goshu, Harry Yang, Serge Belongie, Ser-Nam Lim

We propose a novel approach for generating unrestricted adversarial examples by manipulating fine-grained aspects of image generation.

Image Generation object-detection +2

Paper
Add Code

Neural Puppet: Generative Layered Cartoon Characters

no code implementations • 4 Oct 2019 • Omid Poursaeed, Vladimir G. Kim, Eli Shechtman, Jun Saito, Serge Belongie

We capture these subtle changes by applying an image translation network to refine the mesh rendering, providing an end-to-end model to generate new animations of a character with high visual quality.

Paper
Add Code

Deep Fundamental Matrix Estimation without Correspondences

no code implementations • 3 Oct 2018 • Omid Poursaeed, Guandao Yang, Aditya Prakash, Qiuren Fang, Hanqing Jiang, Bharath Hariharan, Serge Belongie

Estimating fundamental matrices is a classic problem in computer vision.

Paper
Add Code

Generative Adversarial Perturbations

1 code implementation • CVPR 2018 • Omid Poursaeed, Isay Katsman, Bicheng Gao, Serge Belongie

In this paper, we propose novel generative models for creating adversarial examples, slightly perturbed images resembling natural images but maliciously crafted to fool pre-trained models.

General Classification Semantic Segmentation

127

Paper
Code

Vision-based Real Estate Price Estimation

no code implementations • 18 Jul 2017 • Omid Poursaeed, Tomas Matera, Serge Belongie

Using deep convolutional neural networks on a large dataset of photos of home interiors and exteriors, we develop a method for estimating the luxury level of real estate photos.

Paper
Add Code

Stacked Generative Adversarial Networks

2 code implementations • CVPR 2017 • Xun Huang, Yixuan Li, Omid Poursaeed, John Hopcroft, Serge Belongie

In this paper, we propose a novel generative model named Stacked Generative Adversarial Networks (SGAN), which is trained to invert the hierarchical representations of a bottom-up discriminative network.

Ranked #11 on Conditional Image Generation on CIFAR-10 (Inception score metric)

Conditional Image Generation

244

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.