Search Results for author: Alireza Zareian

Found 17 papers, 9 papers with code

Learning from Children: Improving Image-Caption Pretraining via Curriculum

1 code implementation • 27 May 2023 • Hammad A. Ayyubi, Rahul Lokesh, Alireza Zareian, Bo Wu, Shih-Fu Chang

The difficulty is progressively increased with each new phase by adding one more concept per caption.

Image Classification object-detection +2

Paper
Code

GOCA: Guided Online Cluster Assignment for Self-Supervised Video Representation Learning

1 code implementation • 20 Jul 2022 • Huseyin Coskun, Alireza Zareian, Joshua L. Moore, Federico Tombari, Chen Wang

Specifically, we outperform the state of the art by 7% on UCF and 4% on HMDB for video retrieval, and 5% on UCF and 6% on HMDB for video classification

Action Recognition Clustering +6

Paper
Code

SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning

no code implementations • 16 Dec 2021 • Zhecan Wang, Haoxuan You, Liunian Harold Li, Alireza Zareian, Suji Park, Yiqing Liang, Kai-Wei Chang, Shih-Fu Chang

As for pre-training, a scene-graph-aware pre-training method is proposed to leverage structure knowledge extracted in the visual scene graph.

Visual Commonsense Reasoning

Paper
Add Code

Analogical Reasoning for Visually Grounded Compositional Generalization

no code implementations • 1 Jan 2021 • Bo Wu, Haoyu Qin, Alireza Zareian, Carl Vondrick, Shih-Fu Chang

Children acquire language subconsciously by observing the surrounding world and listening to descriptions.

Language Acquisition

Paper
Add Code

Open-Vocabulary Object Detection Using Captions

1 code implementation • CVPR 2021 • Alireza Zareian, Kevin Dela Rosa, Derek Hao Hu, Shih-Fu Chang

Weakly supervised and zero-shot learning techniques have been explored to scale object detectors to more categories with less supervision, but they have not been as successful and widely adopted as supervised models.

Ranked #3 on Open Vocabulary Attribute Detection on OVAD benchmark

Object object-detection +2

211

Paper
Code

Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions

1 code implementation • NAACL 2021 • Liunian Harold Li, Haoxuan You, Zhecan Wang, Alireza Zareian, Shih-Fu Chang, Kai-Wei Chang

Pre-trained contextual vision-and-language (V&L) models have achieved impressive performance on various benchmarks.

Object Recognition Translation +2

516

Paper
Code

Analogical Reasoning for Visually Grounded Language Acquisition

no code implementations • 22 Jul 2020 • Bo Wu, Haoyu Qin, Alireza Zareian, Carl Vondrick, Shih-Fu Chang

Children acquire language subconsciously by observing the surrounding world and listening to descriptions.

Language Acquisition

Paper
Add Code

GAIA: A Fine-grained Multimedia Knowledge Extraction System

no code implementations • ACL 2020 • Manling Li, Alireza Zareian, Ying Lin, Xiaoman Pan, Spencer Whitehead, Brian Chen, Bo Wu, Heng Ji, Shih-Fu Chang, Clare Voss, Daniel Napierski, Marjorie Freedman

We present the first comprehensive, open source multimedia knowledge extraction system that takes a massive stream of unstructured, heterogeneous multimedia data from various sources and languages as input, and creates a coherent, structured knowledge base, indexing entities, relations, and events, following a rich, fine-grained ontology.

Paper
Add Code

Learning Visual Commonsense for Robust Scene Graph Generation

2 code implementations • ECCV 2020 • Alireza Zareian, Zhecan Wang, Haoxuan You, Shih-Fu Chang

Scene graph generation models understand the scene through object and predicate recognition, but are prone to mistakes due to the challenges of perception in the wild.

Graph Generation Scene Graph Generation +1

Paper
Code

Cross-media Structured Common Space for Multimedia Event Extraction

no code implementations • ACL 2020 • Manling Li, Alireza Zareian, Qi Zeng, Spencer Whitehead, Di Lu, Heng Ji, Shih-Fu Chang

We introduce a new task, MultiMedia Event Extraction (M2E2), which aims to extract events and their arguments from multimedia documents.

Event Extraction

Paper
Add Code

Weakly Supervised Visual Semantic Parsing

1 code implementation • CVPR 2020 • Alireza Zareian, Svebor Karaman, Shih-Fu Chang

Scene Graph Generation (SGG) aims to extract entities, predicates and their semantic structure from images, enabling deep understanding of visual content, with many applications such as visual reasoning and image retrieval.

Graph Generation Image Retrieval +5

Paper
Code

Bridging Knowledge Graphs to Generate Scene Graphs

1 code implementation • ECCV 2020 • Alireza Zareian, Svebor Karaman, Shih-Fu Chang

Scene graphs are powerful representations that parse images into their abstract semantic elements, i. e., objects and their interactions, which facilitates visual comprehension and explainable reasoning.

Graph Generation Knowledge Graphs +1

Paper
Code

General Partial Label Learning via Dual Bipartite Graph Autoencoder

no code implementations • 5 Jan 2020 • Brian Chen, Bo Wu, Alireza Zareian, Hanwang Zhang, Shih-Fu Chang

Compared to the traditional Partial Label Learning (PLL) problem, GPLL relaxes the supervision assumption from instance-level -- a label set partially labels an instance -- to group-level: 1) a label set partially labels a group of instances, where the within-group instance-label link annotations are missing, and 2) cross-group links are allowed -- instances in a group may be partially linked to the label set from another group.

Ranked #1 on Partial Label Learning on MPII Movie Description

Partial Label Learning

Paper
Add Code

Cross-Dimensional Self-Attention for Multivariate, Geo-tagged Time Series Imputation

no code implementations • ICLR 2020 • Jiawei Ma*, Zheng Shou*, Alireza Zareian, Hassan Mansour, Anthony Vetro, Shih-Fu Chang

In order to impute the missing values, state-of-the-art methods are built on Recurrent Neural Networks (RNN), which process each time stamp sequentially, prohibiting the direct modeling of the relationship between distant time stamps.

Imputation Machine Translation +2

Paper
Add Code

CDSA: Cross-Dimensional Self-Attention for Multivariate, Geo-tagged Time Series Imputation

2 code implementations • 23 May 2019 • Jiawei Ma, Zheng Shou, Alireza Zareian, Hassan Mansour, Anthony Vetro, Shih-Fu Chang

In order to jointly capture the self-attention across multiple dimensions, including time, location and the sensor measurements, while maintain low computational complexity, we propose a novel approach called Cross-Dimensional Self-Attention (CDSA) to process each dimension sequentially, yet in an order-independent manner.

Imputation Machine Translation +2