1 code implementation • 27 May 2023 • Hammad A. Ayyubi, Rahul Lokesh, Alireza Zareian, Bo Wu, Shih-Fu Chang
The difficulty is progressively increased with each new phase by adding one more concept per caption.
1 code implementation • 20 Jul 2022 • Huseyin Coskun, Alireza Zareian, Joshua L. Moore, Federico Tombari, Chen Wang
Specifically, we outperform the state of the art by 7% on UCF and 4% on HMDB for video retrieval, and 5% on UCF and 6% on HMDB for video classification
no code implementations • 16 Dec 2021 • Zhecan Wang, Haoxuan You, Liunian Harold Li, Alireza Zareian, Suji Park, Yiqing Liang, Kai-Wei Chang, Shih-Fu Chang
As for pre-training, a scene-graph-aware pre-training method is proposed to leverage structure knowledge extracted in the visual scene graph.
no code implementations • 1 Jan 2021 • Bo Wu, Haoyu Qin, Alireza Zareian, Carl Vondrick, Shih-Fu Chang
Children acquire language subconsciously by observing the surrounding world and listening to descriptions.
1 code implementation • CVPR 2021 • Alireza Zareian, Kevin Dela Rosa, Derek Hao Hu, Shih-Fu Chang
Weakly supervised and zero-shot learning techniques have been explored to scale object detectors to more categories with less supervision, but they have not been as successful and widely adopted as supervised models.
1 code implementation • NAACL 2021 • Liunian Harold Li, Haoxuan You, Zhecan Wang, Alireza Zareian, Shih-Fu Chang, Kai-Wei Chang
Pre-trained contextual vision-and-language (V&L) models have achieved impressive performance on various benchmarks.
no code implementations • 22 Jul 2020 • Bo Wu, Haoyu Qin, Alireza Zareian, Carl Vondrick, Shih-Fu Chang
Children acquire language subconsciously by observing the surrounding world and listening to descriptions.
no code implementations • ACL 2020 • Manling Li, Alireza Zareian, Ying Lin, Xiaoman Pan, Spencer Whitehead, Brian Chen, Bo Wu, Heng Ji, Shih-Fu Chang, Clare Voss, Daniel Napierski, Marjorie Freedman
We present the first comprehensive, open source multimedia knowledge extraction system that takes a massive stream of unstructured, heterogeneous multimedia data from various sources and languages as input, and creates a coherent, structured knowledge base, indexing entities, relations, and events, following a rich, fine-grained ontology.
2 code implementations • ECCV 2020 • Alireza Zareian, Zhecan Wang, Haoxuan You, Shih-Fu Chang
Scene graph generation models understand the scene through object and predicate recognition, but are prone to mistakes due to the challenges of perception in the wild.
no code implementations • ACL 2020 • Manling Li, Alireza Zareian, Qi Zeng, Spencer Whitehead, Di Lu, Heng Ji, Shih-Fu Chang
We introduce a new task, MultiMedia Event Extraction (M2E2), which aims to extract events and their arguments from multimedia documents.
1 code implementation • CVPR 2020 • Alireza Zareian, Svebor Karaman, Shih-Fu Chang
Scene Graph Generation (SGG) aims to extract entities, predicates and their semantic structure from images, enabling deep understanding of visual content, with many applications such as visual reasoning and image retrieval.
1 code implementation • ECCV 2020 • Alireza Zareian, Svebor Karaman, Shih-Fu Chang
Scene graphs are powerful representations that parse images into their abstract semantic elements, i. e., objects and their interactions, which facilitates visual comprehension and explainable reasoning.
no code implementations • 5 Jan 2020 • Brian Chen, Bo Wu, Alireza Zareian, Hanwang Zhang, Shih-Fu Chang
Compared to the traditional Partial Label Learning (PLL) problem, GPLL relaxes the supervision assumption from instance-level -- a label set partially labels an instance -- to group-level: 1) a label set partially labels a group of instances, where the within-group instance-label link annotations are missing, and 2) cross-group links are allowed -- instances in a group may be partially linked to the label set from another group.
Ranked #1 on Partial Label Learning on MPII Movie Description
no code implementations • ICLR 2020 • Jiawei Ma*, Zheng Shou*, Alireza Zareian, Hassan Mansour, Anthony Vetro, Shih-Fu Chang
In order to impute the missing values, state-of-the-art methods are built on Recurrent Neural Networks (RNN), which process each time stamp sequentially, prohibiting the direct modeling of the relationship between distant time stamps.
2 code implementations • 23 May 2019 • Jiawei Ma, Zheng Shou, Alireza Zareian, Hassan Mansour, Anthony Vetro, Shih-Fu Chang
In order to jointly capture the self-attention across multiple dimensions, including time, location and the sensor measurements, while maintain low computational complexity, we propose a novel approach called Cross-Dimensional Self-Attention (CDSA) to process each dimension sequentially, yet in an order-independent manner.
no code implementations • NeurIPS 2018 • Hang Gao, Zheng Shou, Alireza Zareian, Hanwang Zhang, Shih-Fu Chang
Deep neural networks suffer from over-fitting and catastrophic forgetting when trained with small data.
1 code implementation • CVPR 2017 • Zheng Shou, Jonathan Chan, Alireza Zareian, Kazuyuki Miyazawa, Shih-Fu Chang
Temporal action localization is an important yet challenging problem.
Ranked #27 on Temporal Action Localization on THUMOS’14 (mAP IOU@0.6 metric)