1 code implementation • NAACL (ACL) 2022 • Xinya Du, Zixuan Zhang, Sha Li, Pengfei Yu, Hongwei Wang, Tuan Lai, Xudong Lin, Ziqi Wang, Iris Liu, Ben Zhou, Haoyang Wen, Manling Li, Darryl Hannan, Jie Lei, Hyounghun Kim, Rotem Dror, Haoyu Wang, Michael Regan, Qi Zeng, Qing Lyu, Charles Yu, Carl Edwards, Xiaomeng Jin, Yizhu Jiao, Ghazaleh Kazeminejad, Zhenhailong Wang, Chris Callison-Burch, Mohit Bansal, Carl Vondrick, Jiawei Han, Dan Roth, Shih-Fu Chang, Martha Palmer, Heng Ji
We introduce RESIN-11, a new schema-guided event extraction&prediction framework that can be applied to a large variety of newsworthy scenarios.
no code implementations • CRAC (ACL) 2021 • Liming Wang, Shengyu Feng, Xudong Lin, Manling Li, Heng Ji, Shih-Fu Chang
Event coreference resolution is critical to understand events in the growing number of online news with multiple modalities including text, video, speech, etc.
no code implementations • CSRR (ACL) 2022 • Yue Wan, Yueen Ma, Haoxuan You, Zhecan Wang, Shih-Fu Chang
Large-scale visual-linguistic pre-training aims to capture the generic representations from multimodal features, which are essential for downstream vision-language tasks.
1 code implementation • 25 Mar 2023 • Han Lin, Guangxing Han, Jiawei Ma, Shiyuan Huang, Xudong Lin, Shih-Fu Chang
Vision Transformers (ViTs) emerge to achieve impressive performance on many data-abundant computer vision tasks by capturing long-range dependencies among local features.
1 code implementation • 16 Mar 2023 • Jiawei Ma, Yulei Niu, Jincheng Xu, Shiyuan Huang, Guangxing Han, Shih-Fu Chang
Generalized few-shot object detection aims to achieve precise detection on both base classes with abundant annotations and novel classes with limited training data.
no code implementations • 6 Jan 2023 • Andrew Lu, Xudong Lin, Yulei Niu, Shih-Fu Chang
Understanding event relationships in videos requires a model to understand the underlying structures of events, i. e., the event type, the associated argument roles, and corresponding entities) along with factual knowledge needed for reasoning.
no code implementations • 28 Dec 2022 • Yuncong Yang, Jiawei Ma, Shiyuan Huang, Long Chen, Xudong Lin, Guangxing Han, Shih-Fu Chang
For long videos, given a paragraph of description where the sentences describe different segments of the video, by matching all sentence-clip pairs, the paragraph and the full video are aligned implicitly.
no code implementations • 14 Dec 2022 • Haoxuan You, Rui Sun, Zhecan Wang, Kai-Wei Chang, Shih-Fu Chang
We present a new commonsense task, Human-centric Commonsense Grounding, that tests the models' ability to ground individuals given the context descriptions about what happened before, and their mental/physical states or intentions.
no code implementations • 10 Nov 2022 • Zhecan Wang, Haoxuan You, Yicheng He, Wenhao Li, Kai-Wei Chang, Shih-Fu Chang
Visual commonsense understanding requires Vision Language (VL) models to not only understand image and text but also cross-reference in-between to fully integrate and achieve comprehension of the visual scene described.
no code implementations • 3 Nov 2022 • Guang Yang, Manling Li, Jiajie Zhang, Xudong Lin, Shih-Fu Chang, Heng Ji
Video event extraction aims to detect salient events from a video and identify the arguments for each event as well as their semantic roles.
1 code implementation • 22 Oct 2022 • Long Chen, Yulei Niu, Brian Chen, Xudong Lin, Guangxing Han, Christopher Thomas, Hammad Ayyubi, Heng Ji, Shih-Fu Chang
Specifically, given an article and a relevant video, WSAG aims to localize all ``groundable'' sentences to the video, and these sentences are possibly at different semantic scales.
no code implementations • 15 Oct 2022 • Shiyuan Huang, Robinson Piramuthu, Shih-Fu Chang, Gunnar A. Sigurdsson
Specifically, we insert a lightweight Feature Compression Module (FeatComp) into a VideoQA model which learns to extract task-specific tiny features as little as 10 bits, which are optimal for answering certain types of questions.
1 code implementation • 26 Jul 2022 • Haoxuan You, Luowei Zhou, Bin Xiao, Noel Codella, Yu Cheng, Ruochen Xu, Shih-Fu Chang, Lu Yuan
Large-scale multi-modal contrastive pre-training has demonstrated great utility to learn transferable features for a range of downstream tasks by mapping multiple modalities into a shared embedding space.
no code implementations • 14 Jun 2022 • Hammad A. Ayyubi, Christopher Thomas, Lovish Chum, Rahul Lokesh, Yulei Niu, Xudong Lin, Long Chen, Jaywon Koo, Sounak Ray, Shih-Fu Chang
Recognizing that the visual `arrest' event is a subevent of the broader `protest' event is a challenging, yet important problem that prior work has not explored.
no code implementations • 5 Jun 2022 • Xudong Lin, Simran Tiwari, Shiyuan Huang, Manling Li, Mike Zheng Shou, Heng Ji, Shih-Fu Chang
We surprisingly find that discrete text tokens coupled with a pretrained contrastive text model yields the best performance, which can even outperform state-of-the-art on the iVQA and How2QA datasets without the additional training on millions of video-language data.
Ranked #1 on
Video Question Answering
on How2QA
1 code implementation • 22 May 2022 • Zhenhailong Wang, Manling Li, Ruochen Xu, Luowei Zhou, Jie Lei, Xudong Lin, Shuohang Wang, ZiYi Yang, Chenguang Zhu, Derek Hoiem, Shih-Fu Chang, Mohit Bansal, Heng Ji
The goal of this work is to build flexible video-language models that can generalize to various video-to-text tasks from few examples, such as domain-specific captioning, question answering, and future event prediction.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
no code implementations • 22 Apr 2022 • Zhecan Wang, Noel Codella, Yen-Chun Chen, Luowei Zhou, Xiyang Dai, Bin Xiao, Jianwei Yang, Haoxuan You, Kai-Wei Chang, Shih-Fu Chang, Lu Yuan
Experiments demonstrate that MAD leads to consistent gains in the low-shot, domain-shifted, and fully-supervised conditions on VCR, SNLI-VE, and VQA, achieving SOTA performance on VCR compared to other single models pretrained with image-text data.
Ranked #3 on
Visual Question Answering (VQA)
on VCR (Q-A) test
no code implementations • 16 Apr 2022 • Guangxing Han, Long Chen, Jiawei Ma, Shiyuan Huang, Rama Chellappa, Shih-Fu Chang
Our approach is motivated by the high-level conceptual similarity of (metric-based) meta-learning and prompt-based learning to learn generalizable few-shot and zero-shot object detection models respectively without fine-tuning.
1 code implementation • 29 Mar 2022 • Christopher Thomas, YiPeng Zhang, Shih-Fu Chang
In this paper, we propose an extension of this task, where the goal is to predict the logical relationship of fine-grained knowledge elements within a piece of text to an image.
1 code implementation • CVPR 2022 • Guangxing Han, Jiawei Ma, Shiyuan Huang, Long Chen, Shih-Fu Chang
Inspired by the recent work on vision transformers and vision-language transformers, we propose a novel Fully Cross-Transformer based model (FCT) for FSOD by incorporating cross-transformer into both the feature backbone and detection head.
1 code implementation • CVPR 2022 • Xudong Lin, Fabio Petroni, Gedas Bertasius, Marcus Rohrbach, Shih-Fu Chang, Lorenzo Torresani
In this paper we consider the problem of classifying fine-grained, multi-step activities (e. g., cooking different recipes, making disparate home improvements, creating various forms of arts and crafts) from long videos spanning up to several minutes.
no code implementations • 15 Jan 2022 • Zhecan Wang, Noel Codella, Yen-Chun Chen, Luowei Zhou, Jianwei Yang, Xiyang Dai, Bin Xiao, Haoxuan You, Shih-Fu Chang, Lu Yuan
Experiments demonstrate that our proposed CLIP-TD leads to exceptional gains in the low-shot (up to 51. 9%) and domain-shifted (up to 71. 3%) conditions of VCR, while simultaneously improving performance under standard fully-supervised conditions (up to 2%), achieving state-of-art performance on VCR compared to other single models that are pretrained with image-text data only.
1 code implementation • CVPR 2022 • Manling Li, Ruochen Xu, Shuohang Wang, Luowei Zhou, Xudong Lin, Chenguang Zhu, Michael Zeng, Heng Ji, Shih-Fu Chang
Vision-language (V+L) pretraining models have achieved great success in supporting multimedia applications by understanding the alignments between images and text.
2 code implementations • 20 Dec 2021 • Revanth Gangi Reddy, Xilin Rui, Manling Li, Xudong Lin, Haoyang Wen, Jaemin Cho, Lifu Huang, Mohit Bansal, Avirup Sil, Shih-Fu Chang, Alexander Schwing, Heng Ji
Specifically, the task involves multi-hop questions that require reasoning over image-caption pairs to identify the grounded visual object being referred to and then predicting a span from the news body text to answer the question.
1 code implementation • ICCV 2021 • Guangxing Han, Yicheng He, Shiyuan Huang, Jiawei Ma, Shih-Fu Chang
Few-shot object detection (FSOD) aims to detect never-seen objects using few examples.
no code implementations • 16 Dec 2021 • Zhecan Wang, Haoxuan You, Liunian Harold Li, Alireza Zareian, Suji Park, Yiqing Liang, Kai-Wei Chang, Shih-Fu Chang
As for pre-training, a scene-graph-aware pre-training method is proposed to leverage structure knowledge extracted in the visual scene graph.
no code implementations • 1 Dec 2021 • Brian Chen, Ramprasaath R. Selvaraju, Shih-Fu Chang, Juan Carlos Niebles, Nikhil Naik
In this work, we propose PreViTS, an SSL framework that utilizes an unsupervised tracking signal for selecting clips containing the same object, which helps better utilize temporal transformations of objects.
no code implementations • 29 Sep 2021 • Haoxuan You, Luowei Zhou, Bin Xiao, Noel C Codella, Yu Cheng, Ruochen Xu, Shih-Fu Chang, Lu Yuan
Large-scale multimodal contrastive pretraining has demonstrated great utility to support high performance in a range of downstream tasks by mapping multiple modalities into a shared embedding space.
no code implementations • Findings (EMNLP) 2021 • Brian Chen, Xudong Lin, Christopher Thomas, Manling Li, Shoya Yoshida, Lovish Chum, Heng Ji, Shih-Fu Chang
We introduce the new task of Video MultiMedia Event Extraction (Video M2E2) and propose two novel components to build the first system towards this task.
no code implementations • ICCV 2021 • Jiawei Ma, Hanchen Xie, Guangxing Han, Shih-Fu Chang, Aram Galstyan, Wael Abd-Almageed
In this paper, we focus on the design of training strategy to obtain an elemental representation such that the prototype of each novel class can be estimated from a few labeled samples.
no code implementations • ACL 2021 • Yi Fung, Christopher Thomas, Revanth Gangi Reddy, Sandeep Polisetty, Heng Ji, Shih-Fu Chang, Kathleen McKeown, Mohit Bansal, Avi Sil
To defend against machine-generated fake news, an effective mechanism is urgently needed.
1 code implementation • NAACL 2021 • Haoyang Wen, Ying Lin, Tuan Lai, Xiaoman Pan, Sha Li, Xudong Lin, Ben Zhou, Manling Li, Haoyu Wang, Hongming Zhang, Xiaodong Yu, Alexander Dong, Zhenhailong Wang, Yi Fung, Piyush Mishra, Qing Lyu, D{\'\i}dac Sur{\'\i}s, Brian Chen, Susan Windisch Brown, Martha Palmer, Chris Callison-Burch, Carl Vondrick, Jiawei Han, Dan Roth, Shih-Fu Chang, Heng Ji
We present a new information extraction system that can automatically construct temporal event graphs from a collection of news documents from multiple sources, multiple languages (English and Spanish for our experiment), and multiple data modalities (speech, text, image and video).
1 code implementation • ICCV 2021 • Brian Chen, Andrew Rouditchenko, Kevin Duarte, Hilde Kuehne, Samuel Thomas, Angie Boggust, Rameswar Panda, Brian Kingsbury, Rogerio Feris, David Harwath, James Glass, Michael Picheny, Shih-Fu Chang
Multimodal self-supervised learning is getting more and more attention as it allows not only to train large networks without human supervision but also to search and retrieve data across various modalities.
2 code implementations • NeurIPS 2021 • Hassan Akbari, Liangzhe Yuan, Rui Qian, Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, Boqing Gong
We train VATT end-to-end from scratch using multimodal contrastive losses and evaluate its performance by the downstream tasks of video action recognition, audio event classification, image classification, and text-to-video retrieval.
Ranked #7 on
Action Classification
on Moments in Time
(using extra training data)
2 code implementations • 15 Apr 2021 • Guangxing Han, Shiyuan Huang, Jiawei Ma, Yicheng He, Shih-Fu Chang
To improve the fine-grained few-shot proposal classification, we propose a novel attentive feature alignment method to address the spatial misalignment between the noisy proposals and few-shot classes, thus improving the performance of few-shot object detection.
no code implementations • CVPR 2021 • Sijie Song, Xudong Lin, Jiaying Liu, Zongming Guo, Shih-Fu Chang
In this paper, we address the problem of referring expression comprehension in videos, which is challenging due to complex expression and scene dynamics.
no code implementations • CVPR 2021 • Xudong Lin, Gedas Bertasius, Jue Wang, Shih-Fu Chang, Devi Parikh, Lorenzo Torresani
We present \textsc{Vx2Text}, a framework for text generation from multimodal inputs consisting of video plus text, speech, or audio.
no code implementations • 1 Jan 2021 • Bo Wu, Haoyu Qin, Alireza Zareian, Carl Vondrick, Shih-Fu Chang
Children acquire language subconsciously by observing the surrounding world and listening to descriptions.
1 code implementation • CVPR 2022 • Shiyuan Huang, Jiawei Ma, Guangxing Han, Shih-Fu Chang
In this paper, we instead propose task-adaptive negative class envision for FSOR to integrate threshold tuning into the learning process.
1 code implementation • CVPR 2021 • Alireza Zareian, Kevin Dela Rosa, Derek Hao Hu, Shih-Fu Chang
Weakly supervised and zero-shot learning techniques have been explored to scale object detectors to more categories with less supervision, but they have not been as successful and widely adopted as supervised models.
1 code implementation • 18 Nov 2020 • Hassan Akbari, Hamid Palangi, Jianwei Yang, Sudha Rao, Asli Celikyilmaz, Roland Fernandez, Paul Smolensky, Jianfeng Gao, Shih-Fu Chang
In this paper, we propose a new model architecture for learning multi-modal neuro-symbolic representations for video captioning.
1 code implementation • NAACL 2021 • Liunian Harold Li, Haoxuan You, Zhecan Wang, Alireza Zareian, Shih-Fu Chang, Kai-Wei Chang
Pre-trained contextual vision-and-language (V&L) models have achieved impressive performance on various benchmarks.
no code implementations • 9 Oct 2020 • Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Zhibo Chen, Shih-Fu Chang
In this work, we propose Uncertainty-Aware Few-Shot framework for image classification by modeling uncertainty of the similarities of query-support pairs and performing uncertainty-aware optimization.
1 code implementation • 3 Sep 2020 • Long Chen, Wenbo Ma, Jun Xiao, Hanwang Zhang, Shih-Fu Chang
The prevailing framework for solving referring expression grounding is based on a two-stage process: 1) detecting proposals with an object detector and 2) grounding the referent to one of the proposals.
no code implementations • 22 Jul 2020 • Bo Wu, Haoyu Qin, Alireza Zareian, Carl Vondrick, Shih-Fu Chang
Children acquire language subconsciously by observing the surrounding world and listening to descriptions.
no code implementations • ACL 2020 • Manling Li, Alireza Zareian, Ying Lin, Xiaoman Pan, Spencer Whitehead, Brian Chen, Bo Wu, Heng Ji, Shih-Fu Chang, Clare Voss, Daniel Napierski, Marjorie Freedman
We present the first comprehensive, open source multimedia knowledge extraction system that takes a massive stream of unstructured, heterogeneous multimedia data from various sources and languages as input, and creates a coherent, structured knowledge base, indexing entities, relations, and events, following a rich, fine-grained ontology.
no code implementations • NAACL 2021 • Qingyun Wang, Manling Li, Xuan Wang, Nikolaus Parulian, Guangxing Han, Jiawei Ma, Jingxuan Tu, Ying Lin, Haoran Zhang, Weili Liu, Aabhas Chauhan, Yingjun Guan, Bangzheng Li, Ruisong Li, Xiangchen Song, Yi R. Fung, Heng Ji, Jiawei Han, Shih-Fu Chang, James Pustejovsky, Jasmine Rah, David Liem, Ahmed Elsayed, Martha Palmer, Clare Voss, Cynthia Schneider, Boyan Onyshkevych
To combat COVID-19, both clinicians and scientists need to digest vast amounts of relevant biomedical knowledge in scientific literature to understand the disease mechanism and related biological functions.
2 code implementations • ECCV 2020 • Alireza Zareian, Zhecan Wang, Haoxuan You, Shih-Fu Chang
Scene graph generation models understand the scene through object and predicate recognition, but are prone to mistakes due to the challenges of perception in the wild.
no code implementations • 8 Jun 2020 • Zhizheng Zhang, Cuiling Lan, Wen-Jun Zeng, Zhibo Chen, Shih-Fu Chang
There is a lack of loss design which enables the joint optimization of multiple instances (of multiple classes) within per-query optimization for person ReID.
no code implementations • 19 May 2020 • Bo Xu, Xu Zhang, Zhixin Li, Matt Leotta, Shih-Fu Chang, Jie Shan
For points that belong to the same roof shape, a multi-cue, hierarchical RANSAC approach is proposed for efficient and reliable segmenting and reconstructing the building point cloud.
no code implementations • ACL 2020 • Manling Li, Alireza Zareian, Qi Zeng, Spencer Whitehead, Di Lu, Heng Ji, Shih-Fu Chang
We introduce a new task, MultiMedia Event Extraction (M2E2), which aims to extract events and their arguments from multimedia documents.
no code implementations • LREC 2020 • Di Lu, Ananya Subburathinam, Heng Ji, Jonathan May, Shih-Fu Chang, Avi Sil, Clare Voss
Most of the current cross-lingual transfer learning methods for Information Extraction (IE) have been only applied to name tagging.
no code implementations • 8 Mar 2020 • Yang Feng, Futang Peng, Xu Zhang, Wei Zhu, Shanfeng Zhang, Howard Zhou, Zhen Li, Tom Duerig, Shih-Fu Chang, Jiebo Luo
Therefore, we propose to distill the knowledge in multiple specialists into a universal embedding to solve this problem.
no code implementations • 11 Feb 2020 • Tongtao Zhang, Heng Ji, Shih-Fu Chang, Marjorie Freedman
In this paper, we address a practical scenario where training data is released in a sequence of small-scale batches and annotation in earlier phases has lower quality than the later counterparts.
1 code implementation • CVPR 2020 • Alireza Zareian, Svebor Karaman, Shih-Fu Chang
Scene Graph Generation (SGG) aims to extract entities, predicates and their semantic structure from images, enabling deep understanding of visual content, with many applications such as visual reasoning and image retrieval.
1 code implementation • ECCV 2020 • Alireza Zareian, Svebor Karaman, Shih-Fu Chang
Scene graphs are powerful representations that parse images into their abstract semantic elements, i. e., objects and their interactions, which facilitates visual comprehension and explainable reasoning.
no code implementations • 5 Jan 2020 • Brian Chen, Bo Wu, Alireza Zareian, Hanwang Zhang, Shih-Fu Chang
Compared to the traditional Partial Label Learning (PLL) problem, GPLL relaxes the supervision assumption from instance-level -- a label set partially labels an instance -- to group-level: 1) a label set partially labels a group of instances, where the within-group instance-label link annotations are missing, and 2) cross-group links are allowed -- instances in a group may be partially linked to the label set from another group.
Ranked #1 on
Partial Label Learning
on MPII Movie Description
no code implementations • ICLR 2020 • Jiawei Ma*, Zheng Shou*, Alireza Zareian, Hassan Mansour, Anthony Vetro, Shih-Fu Chang
In order to impute the missing values, state-of-the-art methods are built on Recurrent Neural Networks (RNN), which process each time stamp sequentially, prohibiting the direct modeling of the relationship between distant time stamps.
no code implementations • 10 Dec 2019 • Shiyuan Huang, Xudong Lin, Svebor Karaman, Shih-Fu Chang
Recent works instead use modern compressed video modalities as an alternative to the RGB spatial stream and improve the inference speed by orders of magnitudes.
1 code implementation • ECCV 2020 • Dídac Surís, Dave Epstein, Heng Ji, Shih-Fu Chang, Carl Vondrick
Language acquisition is the process of learning words from the surrounding scene.
no code implementations • IJCNLP 2019 • Ananya Subburathinam, Di Lu, Heng Ji, Jonathan May, Shih-Fu Chang, Avirup Sil, Clare Voss
The identification of complex semantic structures such as events and entity relations, already a challenging Information Extraction task, is doubly difficult from sources written in under-resourced and under-annotated languages.
no code implementations • 24 Oct 2019 • Xudong Lin, Zheng Shou, Shih-Fu Chang
The inconsistent strategy makes it hard to explicitly supervise the action localization model with temporal boundary annotations at training time.
1 code implementation • ECCV 2020 • Xudong Lin, Lin Ma, Wei Liu, Shih-Fu Chang
As such, being aware of the global context, the modulated convolution kernel of our proposed CGC can better extract representative local patterns and compose discriminative features.
Ranked #58 on
Image Classification
on ObjectNet
(using extra training data)
1 code implementation • 15 Jul 2019 • Xu Zhang, Svebor Karaman, Shih-Fu Chang
By using the simulated images to train a spectrum based classifier, even without seeing the fake images produced by the targeted GAN model during training, our approach achieves state-of-the-art performances on detecting fake images generated by popular GAN models such as CycleGAN.
no code implementations • 8 Jul 2019 • Yulei Niu, Hanwang Zhang, Zhiwu Lu, Shih-Fu Chang
Specifically, our framework exploits the reciprocal relation between the referent and context, i. e., either of them influences estimation of the posterior distribution of the other, and thereby the search space of context can be greatly reduced.
1 code implementation • 23 May 2019 • Jiawei Ma, Zheng Shou, Alireza Zareian, Hassan Mansour, Anthony Vetro, Shih-Fu Chang
In order to jointly capture the self-attention across multiple dimensions, including time, location and the sensor measurements, while maintain low computational complexity, we propose a novel approach called Cross-Dimensional Self-Attention (CDSA) to process each dimension sequentially, yet in an order-independent manner.
1 code implementation • CVPR 2019 • Mang Ye, Xu Zhang, Pong C. Yuen, Shih-Fu Chang
This paper studies the unsupervised embedding learning problem, which requires an effective similarity measurement between samples in low-dimensional embedding space.
no code implementations • 4 Mar 2019 • Svebor Karaman, Xudong Lin, Xuefeng Hu, Shih-Fu Chang
We propose an unsupervised hashing method which aims to produce binary codes that preserve the ranking induced by a real-valued representation.
no code implementations • CVPR 2019 • Zheng Shou, Xudong Lin, Yannis Kalantidis, Laura Sevilla-Lara, Marcus Rohrbach, Shih-Fu Chang, Zhicheng Yan
Motion has shown to be useful for video understanding, where motion is typically represented by optical flow.
Ranked #1 on
Action Recognition
on UCF-101
no code implementations • ICCV 2019 • Long Chen, Hanwang Zhang, Jun Xiao, Xiangnan He, ShiLiang Pu, Shih-Fu Chang
CMAT is a multi-agent policy gradient method that frames objects as cooperative agents, and then directly maximizes a graph-level metric as the reward.
no code implementations • CVPR 2019 • Yuan Liu, Lin Ma, Yifeng Zhang, Wei Liu, Shih-Fu Chang
In this paper, we propose a multi-granularity generator (MGG) to perform the temporal action proposal from different granularity perspectives, relying on the video visual features equipped with the position embedding information.
Ranked #2 on
Action Recognition
on THUMOS’14
1 code implementation • CVPR 2019 • Hassan Akbari, Svebor Karaman, Surabhi Bhargava, Brian Chen, Carl Vondrick, Shih-Fu Chang
Following dedicated non-linear mappings for visual features at each level, word, and sentence embeddings, we obtain multiple instantiations of our common semantic space in which comparisons between any target text and the visual content is performed with cosine similarity.
Ranked #1 on
Phrase Grounding
on Flickr30k
no code implementations • NeurIPS 2018 • Hang Gao, Zheng Shou, Alireza Zareian, Hanwang Zhang, Shih-Fu Chang
Deep neural networks suffer from over-fitting and catastrophic forgetting when trained with small data.
no code implementations • EMNLP 2018 • Spencer Whitehead, Heng Ji, Mohit Bansal, Shih-Fu Chang, Clare Voss
We develop an approach that uses video meta-data to retrieve topically related news documents for a video and extracts the events and named entities from these documents.
1 code implementation • ICLR 2019 • Xu Zhang, Felix Xinnan Yu, Svebor Karaman, Wei zhang, Shih-Fu Chang
Metric learning aims at learning a distance which is consistent with the semantic meaning of the samples.
no code implementations • ECCV 2018 • Zheng Shou, Hang Gao, Lei Zhang, Kazuyuki Miyazawa, Shih-Fu Chang
In this paper, we first develop a novel weakly-supervised TAL framework called AutoLoc to directly predict the temporal boundary of each action instance.
Ranked #14 on
Weakly Supervised Action Localization
on ActivityNet-1.2
Weakly Supervised Action Localization
Weakly-supervised Temporal Action Localization
+1
no code implementations • 23 Jul 2018 • Philipp Blandfort, Desmond Patton, William R. Frey, Svebor Karaman, Surabhi Bhargava, Fei-Tzin Lee, Siddharth Varia, Chris Kedzie, Michael B. Gaskell, Rossano Schifanella, Kathleen McKeown, Shih-Fu Chang
In this paper we partnered computer scientists with social work researchers, who have domain expertise in gang violence, to analyze how public tweets with images posted by youth who mention gang associations on Twitter can be leveraged to automatically detect psychosocial factors and conditions that could potentially assist social workers and violence outreach workers in prevention and early intervention programs.
1 code implementation • 22 Jul 2018 • Zheng Shou, Hang Gao, Lei Zhang, Kazuyuki Miyazawa, Shih-Fu Chang
In this paper, we first develop a novel weakly-supervised TAL framework called AutoLoc to directly predict the temporal boundary of each action instance.
Weakly-supervised Temporal Action Localization
Weakly Supervised Temporal Action Localization
no code implementations • EMNLP 2018 • Di Lu, Spencer Whitehead, Lifu Huang, Heng Ji, Shih-Fu Chang
Current image captioning approaches generate descriptions which lack specific information, such as named entities that are involved in the images.
no code implementations • ECCV 2018 • Zheng Shou, Junting Pan, Jonathan Chan, Kazuyuki Miyazawa, Hassan Mansour, Anthony Vetro, Xavier Giro-i-Nieto, Shih-Fu Chang
We aim to tackle a novel task in action detection - Online Detection of Action Start (ODAS) in untrimmed, streaming videos.
1 code implementation • CVPR 2018 • Hanwang Zhang, Yulei Niu, Shih-Fu Chang
This is a general yet challenging vision-language task since it does not only require the localization of objects, but also the multimodal comprehension of context --- visual attributes (e. g., "largest", "baby") and relationships (e. g., "behind") that help to distinguish the referent from other objects, especially those of the same category.
1 code implementation • CVPR 2018 • Long Chen, Hanwang Zhang, Jun Xiao, Wei Liu, Shih-Fu Chang
We propose a novel framework called Semantics-Preserving Adversarial Embedding Network (SP-AEN) for zero-shot visual recognition (ZSL), where test images and their classes are both unseen during training.
no code implementations • 5 Sep 2017 • Yulei Niu, Zhiwu Lu, Ji-Rong Wen, Tao Xiang, Shih-Fu Chang
In this paper, we address two main issues in large-scale image annotation: 1) how to learn a rich feature representation suitable for predicting a diverse set of visual concepts ranging from object, scene to abstract concept; 2) how to annotate an image with the optimal number of class labels.
3 code implementations • ICLR 2018 • Victor Campos, Brendan Jou, Xavier Giro-i-Nieto, Jordi Torres, Shih-Fu Chang
We introduce the Skip RNN model which extends existing RNN models by learning to skip state updates and shortens the effective size of the computational graph.
2 code implementations • ICCV 2017 • Xu Zhang, Felix X. Yu, Sanjiv Kumar, Shih-Fu Chang
We propose a simple, yet powerful regularization technique that can be used to significantly improve both the pairwise and triplet losses in learning local feature descriptors.
1 code implementation • 21 Aug 2017 • Delia Fernandez, Alejandro Woodward, Victor Campos, Xavier Giro-i-Nieto, Brendan Jou, Shih-Fu Chang
This work aims at disentangling the contributions of the `adjectives' and `nouns' in the visual prediction of ANPs.
1 code implementation • 16 Aug 2017 • Du Tran, Jamie Ray, Zheng Shou, Shih-Fu Chang, Manohar Paluri
Learning image representations with ConvNets by pre-training on ImageNet has proven useful across many visual understanding tasks including object detection, semantic segmentation, and image captioning.
Ranked #68 on
Action Recognition
on HMDB-51
no code implementations • ICCV 2017 • Hanwang Zhang, Zawlin Kyaw, Jinyang Yu, Shih-Fu Chang
We aim to tackle a novel vision task called Weakly Supervised Visual Relation Detection (WSVRD) to detect "subject-predicate-object" relations in an image with object relation groundtruths available only at the image level.
no code implementations • 28 Jul 2017 • Pascal Mettes, Cees G. M. Snoek, Shih-Fu Chang
The goal of this paper is to determine the spatio-temporal location of actions in video.
1 code implementation • Computer Vision and Pattern Recognition 2017 • Xu Zhang, Felix X. Yu, Svebor Karaman, Shih-Fu Chang
Specifically, we extend the covariant constraint proposed by Lenc and Vedaldi [8] by defining the concepts of “standard patch” and “canonical feature” and leverage these to train a novel robust covariant detector.
1 code implementation • CVPR 2017 • Xu Zhang, Felix X. Yu, Svebor Karaman, Shih-Fu Chang
Specifically, we extend the covariant constraint proposed by Lenc and Vedaldi by defining the concepts of "standard patch" and "canonical feature" and leverage these to train a novel robust covariant detector.
no code implementations • 14 Jun 2017 • Yu-Gang Jiang, Zuxuan Wu, Jinhui Tang, Zechao Li, xiangyang xue, Shih-Fu Chang
More specifically, we utilize three Convolutional Neural Networks (CNNs) operating on appearance, motion and audio signals to extract their corresponding features.
no code implementations • 18 Mar 2017 • Hongzhi Li, Joseph G. Ellis, Lei Zhang, Shih-Fu Chang
In this paper, we study the problem of visual pattern mining and propose a novel deep neural network architecture called PatternNet for discovering these patterns that are both discriminative and representative.
1 code implementation • CVPR 2017 • Zheng Shou, Jonathan Chan, Alireza Zareian, Kazuyuki Miyazawa, Shih-Fu Chang
Temporal action localization is an important yet challenging problem.
Ranked #20 on
Temporal Action Localization
on THUMOS’14
(mAP IOU@0.6 metric)
2 code implementations • CVPR 2017 • Hanwang Zhang, Zawlin Kyaw, Shih-Fu Chang, Tat-Seng Chua
To the best of our knowledge, VTransE is the first end-to-end relation detection network.
no code implementations • 15 Jul 2016 • Yinxiao Li, Yan Wang, Yonghao Yue, Danfei Xu, Michael Case, Shih-Fu Chang, Eitan Grinspun, Peter Allen
A fully featured 3D model of the garment is constructed in real-time and volumetric features are then used to obtain the most similar model in the database to predict the object category and pose.
no code implementations • 16 Jun 2016 • Jie Feng, Svebor Karaman, I-Hong Jhuo, Shih-Fu Chang
Learning-based hashing is often used in large scale image retrieval as they provide a compact representation of each sample and the Hamming distance can be used to efficiently compare two samples.
no code implementations • 7 Jun 2016 • Nikolaos Pappas, Miriam Redi, Mercan Topkara, Brendan Jou, Hongyi Liu, Tao Chen, Shih-Fu Chang
The impact of culture in visual emotion perception has recently captured the attention of multimedia research.
no code implementations • NAACL 2016 • Di Lu, Clare Voss, Fangbo Tao, Xiang Ren, Rachel Guan, Rostyslav Korolov, Tongtao Zhang, Dongang Wang, Hongzhi Li, Taylor Cassidy, Heng Ji, Shih-Fu Chang, Jiawei Han, William Wallace, James Hendler, Mei Si, Lance Kaplan
no code implementations • CVPR 2016 • Jie Feng, Brian Price, Scott Cohen, Shih-Fu Chang
While these methods achieve better results than color-based methods, they are still limited in either using depth as an additional color channel or simply combining depth with color in a linear way.
no code implementations • 30 May 2016 • Brendan Jou, Shih-Fu Chang
In the original MVSO release, adjective-noun pair (ANP) detectors were trained for the six languages using an AlexNet-styled architecture by fine-tuning from DeepSentiBank.
no code implementations • 24 May 2016 • Dongang Wang, Zheng Shou, Hongyi Liu, Shih-Fu Chang
Finally, EventNet version 1. 1 contains 67, 641 videos, 500 events, and 5, 028 event-specific concepts.
no code implementations • 23 May 2016 • Ran Tao, Arnold W. M. Smeulders, Shih-Fu Chang
Searching among instances from the same category as the query, the category-specific attributes outperform existing approaches by a large margin on shoes and cars and perform on par with the state-of-the-art on buildings.
1 code implementation • 5 Apr 2016 • Brendan Jou, Shih-Fu Chang
We propose a novel extension of residual learning for deep networks that enables intuitive learning across multiple related tasks using cross-connections called cross-residuals.
1 code implementation • CVPR 2016 • Zheng Shou, Dongang Wang, Shih-Fu Chang
To address this challenging issue, we exploit the effectiveness of deep networks in temporal action localization via three segment-based 3D ConvNets: (1) a proposal network identifies candidate segments in a long video that may contain actions; (2) a classification network learns one-vs-all action classification model to serve as initialization for the localization network; and (3) a localization network fine-tunes on the learned classification network to localize each action instance.
Ranked #1 on
Temporal Action Localization
on MEXaction2
no code implementations • 31 Dec 2015 • Hongzhi Li, Joseph G. Ellis, Shih-Fu Chang
In this paper we describe a novel framework and algorithms for discovering image patch patterns from a large corpus of weakly supervised image-caption pairs generated from news events.
no code implementations • 20 Nov 2015 • Felix X. Yu, Aditya Bhaskara, Sanjiv Kumar, Yunchao Gong, Shih-Fu Chang
To address this problem, we propose Circulant Binary Embedding (CBE) which generates binary codes by projecting the data with a circulant matrix.
no code implementations • 17 Sep 2015 • Jun Wang, Wei Liu, Sanjiv Kumar, Shih-Fu Chang
Such learning to hash methods exploit information such as data distributions or class labels when optimizing the hash codes or functions.
no code implementations • 16 Aug 2015 • Brendan Jou, Tao Chen, Nikolaos Pappas, Miriam Redi, Mercan Topkara, Shih-Fu Chang
Our work expressly focuses on the uniqueness of culture and language in relation to human affect, specifically sentiment and emotion semantics, and how they manifest in social multimedia.
no code implementations • 8 Jun 2015 • Guangnan Ye, Yitong Li, Hongliang Xu, Dong Liu, Shih-Fu Chang
Extensive experiments over the zero-shot event retrieval task when no training samples are available show that the EventNet concept library consistently and significantly outperforms the state-of-the-art (such as the 20K ImageNet concepts trained with CNN) by a large margin up to 207%.
no code implementations • CVPR 2015 • Ran Tao, Arnold W. M. Smeulders, Shih-Fu Chang
This paper aims for generic instance search from one example where the instance can be an arbitrary 3D object like shoes, not just near-planar and one-sided instances like buildings and logos.
no code implementations • CVPR 2015 • Xiao-Ming Wu, Zhenguo Li, Shih-Fu Chang
Graph-based computer vision applications rely critically on similarity metrics which compute the pairwise similarity between any pair of vertices on graphs.
no code implementations • 12 Mar 2015 • Felix X. Yu, Sanjiv Kumar, Henry Rowley, Shih-Fu Chang
This leads to much more compact maps without hurting the performance.
no code implementations • 2 Mar 2015 • Xu Zhang, Felix Xinnan Yu, Shih-Fu Chang, Shengjin Wang
In this paper, we propose a new domain adaptation framework named Deep Transfer Network (DTN), where the highly flexible deep neural networks are used to implement such a distribution matching process.
no code implementations • 25 Feb 2015 • Yu-Gang Jiang, Zuxuan Wu, Jun Wang, xiangyang xue, Shih-Fu Chang
In this paper, we study the challenging problem of categorizing videos according to high-level semantics such as the existence of a particular human action or a complex event.
no code implementations • ICCV 2015 • Yu Cheng, Felix X. Yu, Rogerio S. Feris, Sanjiv Kumar, Alok Choudhary, Shih-Fu Chang
We explore the redundancy of parameters in deep neural networks by replacing the conventional linear projection in fully-connected layers with the circulant projection.
no code implementations • NeurIPS 2014 • Wei Liu, Cun Mu, Sanjiv Kumar, Shih-Fu Chang
Hashing has emerged as a popular technique for fast nearest neighbor search in gigantic databases.
1 code implementation • 30 Oct 2014 • Tao Chen, Damian Borth, Trevor Darrell, Shih-Fu Chang
Nearly one million Flickr images tagged with these ANPs are downloaded to train the classifiers of the concepts.
no code implementations • CVPR 2014 • Yadong Mu, Gang Hua, Wei Fan, Shih-Fu Chang
This paper presents a novel algorithm which uses compact hash bits to greatly improve the efficiency of non-linear kernel SVM in very large scale visual classification problems.
no code implementations • CVPR 2014 • Go Irie, Zhenguo Li, Xiao-Ming Wu, Shih-Fu Chang
Previous efforts in hashing intend to preserve data variance or pairwise affinity, but neither is adequate in capturing the manifold structures hidden in most visual data.
no code implementations • CVPR 2014 • Kuan-Ting Lai, Felix X. Yu, Ming-Syan Chen, Shih-Fu Chang
To solve this problem, we propose a large-margin formulation which treats the instance labels as hidden latent variables, and simultaneously infers the instance labels as well as the instance-level classification model.
no code implementations • 13 May 2014 • Felix X. Yu, Sanjiv Kumar, Yunchao Gong, Shih-Fu Chang
To address this problem, we propose Circulant Binary Embedding (CBE) which generates binary codes by projecting the data with a circulant matrix.
no code implementations • 29 Mar 2014 • Yin Cui, Dong Liu, Jiawei Chen, Shih-Fu Chang
In this paper, we propose to build Concept Bank, the largest concept library consisting of 4, 876 concepts specifically designed to cover 631 real-world events.
1 code implementation • 24 Feb 2014 • Felix X. Yu, Krzysztof Choromanski, Sanjiv Kumar, Tony Jebara, Shih-Fu Chang
Learning from Label Proportions (LLP) is a learning setting, where the training data is provided in groups, or "bags", and only the proportion of each class in each bag is known.
no code implementations • NeurIPS 2013 • Xiao-Ming Wu, Zhenguo Li, Shih-Fu Chang
We show that either explicitly or implicitly, various well-known graph-based models exhibit a common significant \emph{harmonic} structure in its target function -- the value of a vertex is approximately the weighted average of the values of its adjacent neighbors.
no code implementations • 4 Jun 2013 • Felix X. Yu, Dong Liu, Sanjiv Kumar, Tony Jebara, Shih-Fu Chang
We study the problem of learning with label proportions in which the training data is provided in groups and only the proportion of each class in each group is known.
no code implementations • CVPR 2013 • Felix X. Yu, Liangliang Cao, Rogerio S. Feris, John R. Smith, Shih-Fu Chang
In this paper, we propose a novel formulation to automatically design discriminative "category-level attributes", which can be efficiently encoded by a compact category-attribute matrix.
no code implementations • CVPR 2013 • Dong Liu, Kuan-Ting Lai, Guangnan Ye, Ming-Syan Chen, Shih-Fu Chang
However, the existing methods generally use a fixed fusion weight for all the scores of a classifier, and thus fail to optimally determine the fusion weight for the individual samples.
no code implementations • CVPR 2013 • Go Irie, Dong Liu, Zhenguo Li, Shih-Fu Chang
nary learning methods rely on image descriptors alone or together with class labels.
no code implementations • CVPR 2013 • Xianglong Liu, Junfeng He, Bo Lang, Shih-Fu Chang
We represent the bit pool as a vertx- and edge-weighted graph with the candidate bits as vertices.
no code implementations • CVPR 2013 • Yan Wang, Rongrong Ji, Shih-Fu Chang
Our approach shows further major gains in accuracy when the training data from the target scenes is used, outperforming state-ofthe-art approaches with far better efficiency.
no code implementations • CVPR 2013 • Xin Guo, Dong Liu, Brendan Jou, Mojun Zhu, Anni Cai, Shih-Fu Chang
Object co-detection aims at simultaneous detection of objects of the same category from a pool of related images by exploiting consistent visual patterns present in candidate objects in the images.
no code implementations • 20 Apr 2013 • Ameet Talwalkar, Lester Mackey, Yadong Mu, Shih-Fu Chang, Michael. I. Jordan
Vision problems ranging from image clustering to motion segmentation to semi-supervised learning can naturally be framed as subspace segmentation problems, in which one aims to recover multiple low-dimensional subspaces from noisy and corrupted input data.
no code implementations • NeurIPS 2012 • Xiao-Ming Wu, Zhenguo Li, Anthony M. So, John Wright, Shih-Fu Chang
We prove that under proper absorption rates, a random walk starting from a set $\mathcal{S}$ of low conductance will be mostly absorbed in $\mathcal{S}$.