Search Results for author: Shih-Fu Chang

Found 111 papers, 32 papers with code

Coreference by Appearance: Visually Grounded Event Coreference Resolution

no code implementations CRAC (ACL) 2021 Liming Wang, Shengyu Feng, Xudong Lin, Manling Li, Heng Ji, Shih-Fu Chang

Event coreference resolution is critical to understand events in the growing number of online news with multiple modalities including text, video, speech, etc.

Coreference Resolution Event Coreference Resolution +2

MA-CLIP: Towards Modality-Agnostic Contrastive Language-Image Pre-training

no code implementations29 Sep 2021 Haoxuan You, Luowei Zhou, Bin Xiao, Noel C Codella, Yu Cheng, Ruochen Xu, Shih-Fu Chang, Lu Yuan

Large-scale multimodal contrastive pretraining has demonstrated great utility to support high performance in a range of downstream tasks by mapping multiple modalities into a shared embedding space.

Partner-Assisted Learning for Few-Shot Image Classification

no code implementations ICCV 2021 Jiawei Ma, Hanchen Xie, Guangxing Han, Shih-Fu Chang, Aram Galstyan, Wael Abd-Almageed

In this paper, we focus on the design of training strategy to obtain an elemental representation such that the prototype of each novel class can be estimated from a few labeled samples.

Classification Few-Shot Image Classification

RESIN: A Dockerized Schema-Guided Cross-document Cross-lingual Cross-media Information Extraction and Event Tracking System

1 code implementation NAACL 2021 Haoyang Wen, Ying Lin, Tuan Lai, Xiaoman Pan, Sha Li, Xudong Lin, Ben Zhou, Manling Li, Haoyu Wang, Hongming Zhang, Xiaodong Yu, Alexander Dong, Zhenhailong Wang, Yi Fung, Piyush Mishra, Qing Lyu, D{\'\i}dac Sur{\'\i}s, Brian Chen, Susan Windisch Brown, Martha Palmer, Chris Callison-Burch, Carl Vondrick, Jiawei Han, Dan Roth, Shih-Fu Chang, Heng Ji

We present a new information extraction system that can automatically construct temporal event graphs from a collection of news documents from multiple sources, multiple languages (English and Spanish for our experiment), and multiple data modalities (speech, text, image and video).

Coreference Resolution Event Extraction

Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos

1 code implementation ICCV 2021 Brian Chen, Andrew Rouditchenko, Kevin Duarte, Hilde Kuehne, Samuel Thomas, Angie Boggust, Rameswar Panda, Brian Kingsbury, Rogerio Feris, David Harwath, James Glass, Michael Picheny, Shih-Fu Chang

Multimodal self-supervised learning is getting more and more attention as it allows not only to train large networks without human supervision but also to search and retrieve data across various modalities.

Contrastive Learning Self-Supervised Learning +2

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text

2 code implementations NeurIPS 2021 Hassan Akbari, Liangzhe Yuan, Rui Qian, Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, Boqing Gong

We train VATT end-to-end from scratch using multimodal contrastive losses and evaluate its performance by the downstream tasks of video action recognition, audio event classification, image classification, and text-to-video retrieval.

 Ranked #1 on Action Classification on Moments in Time (using extra training data)

Action Classification Action Recognition +7

Meta Faster R-CNN: Towards Accurate Few-Shot Object Detection with Attentive Feature Alignment

1 code implementation15 Apr 2021 Guangxing Han, Shiyuan Huang, Jiawei Ma, Yicheng He, Shih-Fu Chang

We propose a meta-learning based few-shot object detection method by transferring meta-knowledge learned from data-abundant base classes to data-scarce novel classes.

Few-Shot Learning Few-Shot Object Detection

Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos

no code implementations CVPR 2021 Sijie Song, Xudong Lin, Jiaying Liu, Zongming Guo, Shih-Fu Chang

In this paper, we address the problem of referring expression comprehension in videos, which is challenging due to complex expression and scene dynamics.

Referring Expression Comprehension

Analogical Reasoning for Visually Grounded Compositional Generalization

no code implementations1 Jan 2021 Bo Wu, Haoyu Qin, Alireza Zareian, Carl Vondrick, Shih-Fu Chang

Children acquire language subconsciously by observing the surrounding world and listening to descriptions.

Language Acquisition

Open-Vocabulary Object Detection Using Captions

1 code implementation CVPR 2021 Alireza Zareian, Kevin Dela Rosa, Derek Hao Hu, Shih-Fu Chang

Weakly supervised and zero-shot learning techniques have been explored to scale object detectors to more categories with less supervision, but they have not been as successful and widely adopted as supervised models.

Object Detection Zero-Shot Learning

Uncertainty-Aware Few-Shot Image Classification

no code implementations9 Oct 2020 Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Zhibo Chen, Shih-Fu Chang

In this work, we propose Uncertainty-Aware Few-Shot framework for image classification by modeling uncertainty of the similarities of query-support pairs and performing uncertainty-aware optimization.

Classification Few-Shot Image Classification +2

Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding

1 code implementation3 Sep 2020 Long Chen, Wenbo Ma, Jun Xiao, Hanwang Zhang, Shih-Fu Chang

The prevailing framework for solving referring expression grounding is based on a two-stage process: 1) detecting proposals with an object detector and 2) grounding the referent to one of the proposals.

Analogical Reasoning for Visually Grounded Language Acquisition

no code implementations22 Jul 2020 Bo Wu, Haoyu Qin, Alireza Zareian, Carl Vondrick, Shih-Fu Chang

Children acquire language subconsciously by observing the surrounding world and listening to descriptions.

Language Acquisition

GAIA: A Fine-grained Multimedia Knowledge Extraction System

no code implementations ACL 2020 Manling Li, Alireza Zareian, Ying Lin, Xiaoman Pan, Spencer Whitehead, Brian Chen, Bo Wu, Heng Ji, Shih-Fu Chang, Clare Voss, Daniel Napierski, Marjorie Freedman

We present the first comprehensive, open source multimedia knowledge extraction system that takes a massive stream of unstructured, heterogeneous multimedia data from various sources and languages as input, and creates a coherent, structured knowledge base, indexing entities, relations, and events, following a rich, fine-grained ontology.

Learning Visual Commonsense for Robust Scene Graph Generation

2 code implementations ECCV 2020 Alireza Zareian, Zhecan Wang, Haoxuan You, Shih-Fu Chang

Scene graph generation models understand the scene through object and predicate recognition, but are prone to mistakes due to the challenges of perception in the wild.

Graph Generation Scene Graph Generation +1

Beyond Triplet Loss: Meta Prototypical N-tuple Loss for Person Re-identification

no code implementations8 Jun 2020 Zhizheng Zhang, Cuiling Lan, Wen-Jun Zeng, Zhibo Chen, Shih-Fu Chang

There is a lack of loss design which enables the joint optimization of multiple instances (of multiple classes) within per-query optimization for person ReID.

Classification General Classification +3

Deep Learning Guided Building Reconstruction from Satellite Imagery-derived Point Clouds

no code implementations19 May 2020 Bo Xu, Xu Zhang, Zhixin Li, Matt Leotta, Shih-Fu Chang, Jie Shan

For points that belong to the same roof shape, a multi-cue, hierarchical RANSAC approach is proposed for efficient and reliable segmenting and reconstructing the building point cloud.

3D Reconstruction

Cross-media Structured Common Space for Multimedia Event Extraction

no code implementations ACL 2020 Manling Li, Alireza Zareian, Qi Zeng, Spencer Whitehead, Di Lu, Heng Ji, Shih-Fu Chang

We introduce a new task, MultiMedia Event Extraction (M2E2), which aims to extract events and their arguments from multimedia documents.

Event Extraction

Training with Streaming Annotation

no code implementations11 Feb 2020 Tongtao Zhang, Heng Ji, Shih-Fu Chang, Marjorie Freedman

In this paper, we address a practical scenario where training data is released in a sequence of small-scale batches and annotation in earlier phases has lower quality than the later counterparts.

Event Extraction

Weakly Supervised Visual Semantic Parsing

1 code implementation CVPR 2020 Alireza Zareian, Svebor Karaman, Shih-Fu Chang

Scene Graph Generation (SGG) aims to extract entities, predicates and their semantic structure from images, enabling deep understanding of visual content, with many applications such as visual reasoning and image retrieval.

Graph Generation Image Retrieval +3

Bridging Knowledge Graphs to Generate Scene Graphs

1 code implementation ECCV 2020 Alireza Zareian, Svebor Karaman, Shih-Fu Chang

Scene graphs are powerful representations that parse images into their abstract semantic elements, i. e., objects and their interactions, which facilitates visual comprehension and explainable reasoning.

Graph Generation Knowledge Graphs +1

General Partial Label Learning via Dual Bipartite Graph Autoencoder

no code implementations5 Jan 2020 Brian Chen, Bo Wu, Alireza Zareian, Hanwang Zhang, Shih-Fu Chang

Compared to the traditional Partial Label Learning (PLL) problem, GPLL relaxes the supervision assumption from instance-level -- a label set partially labels an instance -- to group-level: 1) a label set partially labels a group of instances, where the within-group instance-label link annotations are missing, and 2) cross-group links are allowed -- instances in a group may be partially linked to the label set from another group.

Partial Label Learning

Cross-Dimensional Self-Attention for Multivariate, Geo-tagged Time Series Imputation

no code implementations ICLR 2020 Jiawei Ma*, Zheng Shou*, Alireza Zareian, Hassan Mansour, Anthony Vetro, Shih-Fu Chang

In order to impute the missing values, state-of-the-art methods are built on Recurrent Neural Networks (RNN), which process each time stamp sequentially, prohibiting the direct modeling of the relationship between distant time stamps.

Imputation Machine Translation +1

Flow-Distilled IP Two-Stream Networks for Compressed Video Action Recognition

no code implementations10 Dec 2019 Shiyuan Huang, Xudong Lin, Svebor Karaman, Shih-Fu Chang

Recent works instead use modern compressed video modalities as an alternative to the RGB spatial stream and improve the inference speed by orders of magnitudes.

Action Recognition Optical Flow Estimation +1

Cross-lingual Structure Transfer for Relation and Event Extraction

no code implementations IJCNLP 2019 Ananya Subburathinam, Di Lu, Heng Ji, Jonathan May, Shih-Fu Chang, Avirup Sil, Clare Voss

The identification of complex semantic structures such as events and entity relations, already a challenging Information Extraction task, is doubly difficult from sources written in under-resourced and under-annotated languages.

Event Extraction Relation Extraction

Towards Train-Test Consistency for Semi-supervised Temporal Action Localization

no code implementations24 Oct 2019 Xudong Lin, Zheng Shou, Shih-Fu Chang

The inconsistent strategy makes it hard to explicitly supervise the action localization model with temporal boundary annotations at training time.

Multiple Instance Learning Video Classification +2

Context-Gated Convolution

1 code implementation ECCV 2020 Xudong Lin, Lin Ma, Wei Liu, Shih-Fu Chang

As such, being aware of the global context, the modulated convolution kernel of our proposed CGC can better extract representative local patterns and compose discriminative features.

Action Recognition Image Classification +1

Detecting and Simulating Artifacts in GAN Fake Images

1 code implementation15 Jul 2019 Xu Zhang, Svebor Karaman, Shih-Fu Chang

By using the simulated images to train a spectrum based classifier, even without seeing the fake images produced by the targeted GAN model during training, our approach achieves state-of-the-art performances on detecting fake images generated by popular GAN models such as CycleGAN.

GAN image forensics

Variational Context: Exploiting Visual and Textual Context for Grounding Referring Expressions

no code implementations8 Jul 2019 Yulei Niu, Hanwang Zhang, Zhiwu Lu, Shih-Fu Chang

Specifically, our framework exploits the reciprocal relation between the referent and context, i. e., either of them influences estimation of the posterior distribution of the other, and thereby the search space of context can be greatly reduced.

Multiple Instance Learning

CDSA: Cross-Dimensional Self-Attention for Multivariate, Geo-tagged Time Series Imputation

1 code implementation23 May 2019 Jiawei Ma, Zheng Shou, Alireza Zareian, Hassan Mansour, Anthony Vetro, Shih-Fu Chang

In order to jointly capture the self-attention across multiple dimensions, including time, location and the sensor measurements, while maintain low computational complexity, we propose a novel approach called Cross-Dimensional Self-Attention (CDSA) to process each dimension sequentially, yet in an order-independent manner.

Imputation Machine Translation +1

Unsupervised Embedding Learning via Invariant and Spreading Instance Feature

1 code implementation CVPR 2019 Mang Ye, Xu Zhang, Pong C. Yuen, Shih-Fu Chang

This paper studies the unsupervised embedding learning problem, which requires an effective similarity measurement between samples in low-dimensional embedding space.

Data Augmentation

Unsupervised Rank-Preserving Hashing for Large-Scale Image Retrieval

no code implementations4 Mar 2019 Svebor Karaman, Xudong Lin, Xuefeng Hu, Shih-Fu Chang

We propose an unsupervised hashing method which aims to produce binary codes that preserve the ranking induced by a real-valued representation.

Image Retrieval Re-Ranking

Counterfactual Critic Multi-Agent Training for Scene Graph Generation

no code implementations ICCV 2019 Long Chen, Hanwang Zhang, Jun Xiao, Xiangnan He, ShiLiang Pu, Shih-Fu Chang

CMAT is a multi-agent policy gradient method that frames objects as cooperative agents, and then directly maximizes a graph-level metric as the reward.

Graph Generation Scene Graph Generation +1

Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding

1 code implementation CVPR 2019 Hassan Akbari, Svebor Karaman, Surabhi Bhargava, Brian Chen, Carl Vondrick, Shih-Fu Chang

Following dedicated non-linear mappings for visual features at each level, word, and sentence embeddings, we obtain multiple instantiations of our common semantic space in which comparisons between any target text and the visual content is performed with cosine similarity.

Language Modelling Phrase Grounding +1

Multi-granularity Generator for Temporal Action Proposal

no code implementations CVPR 2019 Yuan Liu, Lin Ma, Yifeng Zhang, Wei Liu, Shih-Fu Chang

In this paper, we propose a multi-granularity generator (MGG) to perform the temporal action proposal from different granularity perspectives, relying on the video visual features equipped with the position embedding information.

Action Recognition Temporal Action Proposal Generation

Incorporating Background Knowledge into Video Description Generation

no code implementations EMNLP 2018 Spencer Whitehead, Heng Ji, Mohit Bansal, Shih-Fu Chang, Clare Voss

We develop an approach that uses video meta-data to retrieve topically related news documents for a video and extracts the events and named entities from these documents.

Text Generation Video Captioning +1

Heated-Up Softmax Embedding

1 code implementation ICLR 2019 Xu Zhang, Felix Xinnan Yu, Svebor Karaman, Wei zhang, Shih-Fu Chang

Metric learning aims at learning a distance which is consistent with the semantic meaning of the samples.

Metric Learning

Multimodal Social Media Analysis for Gang Violence Prevention

no code implementations23 Jul 2018 Philipp Blandfort, Desmond Patton, William R. Frey, Svebor Karaman, Surabhi Bhargava, Fei-Tzin Lee, Siddharth Varia, Chris Kedzie, Michael B. Gaskell, Rossano Schifanella, Kathleen McKeown, Shih-Fu Chang

In this paper we partnered computer scientists with social work researchers, who have domain expertise in gang violence, to analyze how public tweets with images posted by youth who mention gang associations on Twitter can be leveraged to automatically detect psychosocial factors and conditions that could potentially assist social workers and violence outreach workers in prevention and early intervention programs.

General Classification

AutoLoc: Weakly-supervised Temporal Action Localization

1 code implementation22 Jul 2018 Zheng Shou, Hang Gao, Lei Zhang, Kazuyuki Miyazawa, Shih-Fu Chang

In this paper, we first develop a novel weakly-supervised TAL framework called AutoLoc to directly predict the temporal boundary of each action instance.

Weakly-supervised Temporal Action Localization Weakly Supervised Temporal Action Localization

Entity-aware Image Caption Generation

no code implementations EMNLP 2018 Di Lu, Spencer Whitehead, Lifu Huang, Heng Ji, Shih-Fu Chang

Current image captioning approaches generate descriptions which lack specific information, such as named entities that are involved in the images.

Image Captioning

Zero-Shot Visual Recognition using Semantics-Preserving Adversarial Embedding Networks

1 code implementation CVPR 2018 Long Chen, Hanwang Zhang, Jun Xiao, Wei Liu, Shih-Fu Chang

We propose a novel framework called Semantics-Preserving Adversarial Embedding Network (SP-AEN) for zero-shot visual recognition (ZSL), where test images and their classes are both unseen during training.

General Classification Zero-Shot Learning

Grounding Referring Expressions in Images by Variational Context

1 code implementation CVPR 2018 Hanwang Zhang, Yulei Niu, Shih-Fu Chang

This is a general yet challenging vision-language task since it does not only require the localization of objects, but also the multimodal comprehension of context --- visual attributes (e. g., "largest", "baby") and relationships (e. g., "behind") that help to distinguish the referent from other objects, especially those of the same category.

Multiple Instance Learning

Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation

no code implementations5 Sep 2017 Yulei Niu, Zhiwu Lu, Ji-Rong Wen, Tao Xiang, Shih-Fu Chang

In this paper, we address two main issues in large-scale image annotation: 1) how to learn a rich feature representation suitable for predicting a diverse set of visual concepts ranging from object, scene to abstract concept; 2) how to annotate an image with the optimal number of class labels.

Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks

3 code implementations ICLR 2018 Victor Campos, Brendan Jou, Xavier Giro-i-Nieto, Jordi Torres, Shih-Fu Chang

We introduce the Skip RNN model which extends existing RNN models by learning to skip state updates and shortens the effective size of the computational graph.

More cat than cute? Interpretable Prediction of Adjective-Noun Pairs

1 code implementation21 Aug 2017 Delia Fernandez, Alejandro Woodward, Victor Campos, Xavier Giro-i-Nieto, Brendan Jou, Shih-Fu Chang

This work aims at disentangling the contributions of the `adjectives' and `nouns' in the visual prediction of ANPs.

Learning Spread-out Local Feature Descriptors

2 code implementations ICCV 2017 Xu Zhang, Felix X. Yu, Sanjiv Kumar, Shih-Fu Chang

We propose a simple, yet powerful regularization technique that can be used to significantly improve both the pairwise and triplet losses in learning local feature descriptors.

ConvNet Architecture Search for Spatiotemporal Feature Learning

1 code implementation16 Aug 2017 Du Tran, Jamie Ray, Zheng Shou, Shih-Fu Chang, Manohar Paluri

Learning image representations with ConvNets by pre-training on ImageNet has proven useful across many visual understanding tasks including object detection, semantic segmentation, and image captioning.

Action Classification Action Recognition +4

PPR-FCN: Weakly Supervised Visual Relation Detection via Parallel Pairwise R-FCN

no code implementations ICCV 2017 Hanwang Zhang, Zawlin Kyaw, Jinyang Yu, Shih-Fu Chang

We aim to tackle a novel vision task called Weakly Supervised Visual Relation Detection (WSVRD) to detect "subject-predicate-object" relations in an image with object relation groundtruths available only at the image level.

Weakly Supervised Object Detection

Localizing Actions from Video Labels and Pseudo-Annotations

no code implementations28 Jul 2017 Pascal Mettes, Cees G. M. Snoek, Shih-Fu Chang

The goal of this paper is to determine the spatio-temporal location of actions in video.

Action Localization

Learning Discriminative and Transformation Covariant Local Feature Detectors

1 code implementation CVPR 2017 Xu Zhang, Felix X. Yu, Svebor Karaman, Shih-Fu Chang

Specifically, we extend the covariant constraint proposed by Lenc and Vedaldi by defining the concepts of "standard patch" and "canonical feature" and leverage these to train a novel robust covariant detector.

Image Retrieval

Learning discriminative and transformation covariant local feature detectors.

1 code implementation Computer Vision and Pattern Recognition 2017 Xu Zhang, Felix X. Yu, Svebor Karaman, Shih-Fu Chang

Specifically, we extend the covariant constraint proposed by Lenc and Vedaldi [8] by defining the concepts of “standard patch” and “canonical feature” and leverage these to train a novel robust covariant detector.

Image Retrieval

Modeling Multimodal Clues in a Hybrid Deep Learning Framework for Video Classification

no code implementations14 Jun 2017 Yu-Gang Jiang, Zuxuan Wu, Jinhui Tang, Zechao Li, xiangyang xue, Shih-Fu Chang

More specifically, we utilize three Convolutional Neural Networks (CNNs) operating on appearance, motion and audio signals to extract their corresponding features.

General Classification Video Classification

PatternNet: Visual Pattern Mining with Deep Neural Network

no code implementations18 Mar 2017 Hongzhi Li, Joseph G. Ellis, Lei Zhang, Shih-Fu Chang

In this paper, we study the problem of visual pattern mining and propose a novel deep neural network architecture called PatternNet for discovering these patterns that are both discriminative and representative.

Image Classification

Model-Driven Feed-Forward Prediction for Manipulation of Deformable Objects

no code implementations15 Jul 2016 Yinxiao Li, Yan Wang, Yonghao Yue, Danfei Xu, Michael Case, Shih-Fu Chang, Eitan Grinspun, Peter Allen

A fully featured 3D model of the garment is constructed in real-time and volumetric features are then used to obtain the most similar model in the database to predict the object category and pose.

Pose Estimation

Deep Image Set Hashing

no code implementations16 Jun 2016 Jie Feng, Svebor Karaman, I-Hong Jhuo, Shih-Fu Chang

Learning-based hashing is often used in large scale image retrieval as they provide a compact representation of each sample and the Hamming distance can be used to efficiently compare two samples.

Image Retrieval

Multilingual Visual Sentiment Concept Matching

no code implementations7 Jun 2016 Nikolaos Pappas, Miriam Redi, Mercan Topkara, Brendan Jou, Hongyi Liu, Tao Chen, Shih-Fu Chang

The impact of culture in visual emotion perception has recently captured the attention of multimedia research.

Word Embeddings

Interactive Segmentation on RGBD Images via Cue Selection

no code implementations CVPR 2016 Jie Feng, Brian Price, Scott Cohen, Shih-Fu Chang

While these methods achieve better results than color-based methods, they are still limited in either using depth as an additional color channel or simply combining depth with color in a linear way.

Image Retrieval Interactive Segmentation +2

Going Deeper for Multilingual Visual Sentiment Detection

no code implementations30 May 2016 Brendan Jou, Shih-Fu Chang

In the original MVSO release, adjective-noun pair (ANP) detectors were trained for the six languages using an AlexNet-styled architecture by fine-tuning from DeepSentiBank.

Fine-tuning

EventNet Version 1.1 Technical Report

no code implementations24 May 2016 Dongang Wang, Zheng Shou, Hongyi Liu, Shih-Fu Chang

Finally, EventNet version 1. 1 contains 67, 641 videos, 500 events, and 5, 028 event-specific concepts.

Fine-tuning

Generic Instance Search and Re-identification from One Example via Attributes and Categories

no code implementations23 May 2016 Ran Tao, Arnold W. M. Smeulders, Shih-Fu Chang

Searching among instances from the same category as the query, the category-specific attributes outperform existing approaches by a large margin on shoes and cars and perform on par with the state-of-the-art on buildings.

Instance Search Person Re-Identification

Deep Cross Residual Learning for Multitask Visual Recognition

1 code implementation5 Apr 2016 Brendan Jou, Shih-Fu Chang

We propose a novel extension of residual learning for deep networks that enables intuitive learning across multiple related tasks using cross-connections called cross-residuals.

Object Recognition

Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs

1 code implementation CVPR 2016 Zheng Shou, Dongang Wang, Shih-Fu Chang

To address this challenging issue, we exploit the effectiveness of deep networks in temporal action localization via three segment-based 3D ConvNets: (1) a proposal network identifies candidate segments in a long video that may contain actions; (2) a classification network learns one-vs-all action classification model to serve as initialization for the localization network; and (3) a localization network fine-tunes on the learned classification network to localize each action instance.

Action Classification Classification +3

Event Specific Multimodal Pattern Mining with Image-Caption Pairs

no code implementations31 Dec 2015 Hongzhi Li, Joseph G. Ellis, Shih-Fu Chang

In this paper we describe a novel framework and algorithms for discovering image patch patterns from a large corpus of weakly supervised image-caption pairs generated from news events.

Image Captioning

On Binary Embedding using Circulant Matrices

no code implementations20 Nov 2015 Felix X. Yu, Aditya Bhaskara, Sanjiv Kumar, Yunchao Gong, Shih-Fu Chang

To address this problem, we propose Circulant Binary Embedding (CBE) which generates binary codes by projecting the data with a circulant matrix.

Learning to Hash for Indexing Big Data - A Survey

no code implementations17 Sep 2015 Jun Wang, Wei Liu, Sanjiv Kumar, Shih-Fu Chang

Such learning to hash methods exploit information such as data distributions or class labels when optimizing the hash codes or functions.

Visual Affect Around the World: A Large-scale Multilingual Visual Sentiment Ontology

no code implementations16 Aug 2015 Brendan Jou, Tao Chen, Nikolaos Pappas, Miriam Redi, Mercan Topkara, Shih-Fu Chang

Our work expressly focuses on the uniqueness of culture and language in relation to human affect, specifically sentiment and emotion semantics, and how they manifest in social multimedia.

EventNet: A Large Scale Structured Concept Library for Complex Event Detection in Video

no code implementations8 Jun 2015 Guangnan Ye, Yitong Li, Hongliang Xu, Dong Liu, Shih-Fu Chang

Extensive experiments over the zero-shot event retrieval task when no training samples are available show that the EventNet concept library consistently and significantly outperforms the state-of-the-art (such as the 20K ImageNet concepts trained with CNN) by a large margin up to 207%.

Event Detection Hierarchical structure

Attributes and Categories for Generic Instance Search From One Example

no code implementations CVPR 2015 Ran Tao, Arnold W. M. Smeulders, Shih-Fu Chang

This paper aims for generic instance search from one example where the instance can be an arbitrary 3D object like shoes, not just near-planar and one-sided instances like buildings and logos.

Instance Search

New Insights Into Laplacian Similarity Search

no code implementations CVPR 2015 Xiao-Ming Wu, Zhenguo Li, Shih-Fu Chang

Graph-based computer vision applications rely critically on similarity metrics which compute the pairwise similarity between any pair of vertices on graphs.

Image Retrieval

Compact Nonlinear Maps and Circulant Extensions

no code implementations12 Mar 2015 Felix X. Yu, Sanjiv Kumar, Henry Rowley, Shih-Fu Chang

This leads to much more compact maps without hurting the performance.

Deep Transfer Network: Unsupervised Domain Adaptation

no code implementations2 Mar 2015 Xu Zhang, Felix Xinnan Yu, Shih-Fu Chang, Shengjin Wang

In this paper, we propose a new domain adaptation framework named Deep Transfer Network (DTN), where the highly flexible deep neural networks are used to implement such a distribution matching process.

Unsupervised Domain Adaptation

Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks

no code implementations25 Feb 2015 Yu-Gang Jiang, Zuxuan Wu, Jun Wang, xiangyang xue, Shih-Fu Chang

In this paper, we study the challenging problem of categorizing videos according to high-level semantics such as the existence of a particular human action or a complex event.

An exploration of parameter redundancy in deep networks with circulant projections

no code implementations ICCV 2015 Yu Cheng, Felix X. Yu, Rogerio S. Feris, Sanjiv Kumar, Alok Choudhary, Shih-Fu Chang

We explore the redundancy of parameters in deep neural networks by replacing the conventional linear projection in fully-connected layers with the circulant projection.

Discrete Graph Hashing

no code implementations NeurIPS 2014 Wei Liu, Cun Mu, Sanjiv Kumar, Shih-Fu Chang

Hashing has emerged as a popular technique for fast nearest neighbor search in gigantic databases.

Hash-SVM: Scalable Kernel Machines for Large-Scale Visual Classification

no code implementations CVPR 2014 Yadong Mu, Gang Hua, Wei Fan, Shih-Fu Chang

This paper presents a novel algorithm which uses compact hash bits to greatly improve the efficiency of non-linear kernel SVM in very large scale visual classification problems.

Classification General Classification

Video Event Detection by Inferring Temporal Instance Labels

no code implementations CVPR 2014 Kuan-Ting Lai, Felix X. Yu, Ming-Syan Chen, Shih-Fu Chang

To solve this problem, we propose a large-margin formulation which treats the instance labels as hidden latent variables, and simultaneously infers the instance labels as well as the instance-level classification model.

Event Detection

Locally Linear Hashing for Extracting Non-Linear Manifolds

no code implementations CVPR 2014 Go Irie, Zhenguo Li, Xiao-Ming Wu, Shih-Fu Chang

Previous efforts in hashing intend to preserve data variance or pairwise affinity, but neither is adequate in capturing the manifold structures hidden in most visual data.

Quantization

Circulant Binary Embedding

no code implementations13 May 2014 Felix X. Yu, Sanjiv Kumar, Yunchao Gong, Shih-Fu Chang

To address this problem, we propose Circulant Binary Embedding (CBE) which generates binary codes by projecting the data with a circulant matrix.

Building A Large Concept Bank for Representing Events in Video

no code implementations29 Mar 2014 Yin Cui, Dong Liu, Jiawei Chen, Shih-Fu Chang

In this paper, we propose to build Concept Bank, the largest concept library consisting of 4, 876 concepts specifically designed to cover 631 real-world events.

Event Detection

On Learning from Label Proportions

1 code implementation24 Feb 2014 Felix X. Yu, Krzysztof Choromanski, Sanjiv Kumar, Tony Jebara, Shih-Fu Chang

Learning from Label Proportions (LLP) is a learning setting, where the training data is provided in groups, or "bags", and only the proportion of each class in each bag is known.

Analyzing the Harmonic Structure in Graph-Based Learning

no code implementations NeurIPS 2013 Xiao-Ming Wu, Zhenguo Li, Shih-Fu Chang

We show that either explicitly or implicitly, various well-known graph-based models exhibit a common significant \emph{harmonic} structure in its target function -- the value of a vertex is approximately the weighted average of the values of its adjacent neighbors.

$\propto$SVM for learning with label proportions

no code implementations4 Jun 2013 Felix X. Yu, Dong Liu, Sanjiv Kumar, Tony Jebara, Shih-Fu Chang

We study the problem of learning with label proportions in which the training data is provided in groups and only the proportion of each class in each group is known.

Designing Category-Level Attributes for Discriminative Visual Recognition

no code implementations CVPR 2013 Felix X. Yu, Liangliang Cao, Rogerio S. Feris, John R. Smith, Shih-Fu Chang

In this paper, we propose a novel formulation to automatically design discriminative "category-level attributes", which can be efficiently encoded by a compact category-attribute matrix.

Transfer Learning Zero-Shot Learning

Label Propagation from ImageNet to 3D Point Clouds

no code implementations CVPR 2013 Yan Wang, Rongrong Ji, Shih-Fu Chang

Our approach shows further major gains in accuracy when the training data from the target scenes is used, outperforming state-ofthe-art approaches with far better efficiency.

Sample-Specific Late Fusion for Visual Category Recognition

no code implementations CVPR 2013 Dong Liu, Kuan-Ting Lai, Guangnan Ye, Ming-Syan Chen, Shih-Fu Chang

However, the existing methods generally use a fixed fusion weight for all the scores of a classifier, and thus fail to optimally determine the fusion weight for the individual samples.

Robust Object Co-detection

no code implementations CVPR 2013 Xin Guo, Dong Liu, Brendan Jou, Mojun Zhu, Anni Cai, Shih-Fu Chang

Object co-detection aims at simultaneous detection of objects of the same category from a pool of related images by exploiting consistent visual patterns present in candidate objects in the images.

Object Detection

Hash Bit Selection: A Unified Solution for Selection Problems in Hashing

no code implementations CVPR 2013 Xianglong Liu, Junfeng He, Bo Lang, Shih-Fu Chang

We represent the bit pool as a vertx- and edge-weighted graph with the candidate bits as vertices.

Distributed Low-rank Subspace Segmentation

no code implementations20 Apr 2013 Ameet Talwalkar, Lester Mackey, Yadong Mu, Shih-Fu Chang, Michael. I. Jordan

Vision problems ranging from image clustering to motion segmentation to semi-supervised learning can naturally be framed as subspace segmentation problems, in which one aims to recover multiple low-dimensional subspaces from noisy and corrupted input data.

Event Detection Face Recognition +2

Learning with Partially Absorbing Random Walks

no code implementations NeurIPS 2012 Xiao-Ming Wu, Zhenguo Li, Anthony M. So, John Wright, Shih-Fu Chang

We prove that under proper absorption rates, a random walk starting from a set $\mathcal{S}$ of low conductance will be mostly absorbed in $\mathcal{S}$.

Cannot find the paper you are looking for? You can Submit a new open access paper.