Search Results for author: Alexander G. Hauptmann

Found 41 papers, 9 papers with code

MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis

no code implementations29 Apr 2024 Xiang Li, Zhi-Qi Cheng, Jun-Yan He, Xiaojiang Peng, Alexander G. Hauptmann

Emotional Text-to-Speech (E-TTS) synthesis has gained significant attention in recent years due to its potential to enhance human-computer interaction.

Contrastive Learning Speech Synthesis +1

SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs

no code implementations NeurIPS 2023 Lijun Yu, Yong Cheng, Zhiruo Wang, Vivek Kumar, Wolfgang Macherey, Yanping Huang, David A. Ross, Irfan Essa, Yonatan Bisk, Ming-Hsuan Yang, Kevin Murphy, Alexander G. Hauptmann, Lu Jiang

In this work, we introduce Semantic Pyramid AutoEncoder (SPAE) for enabling frozen LLMs to perform both understanding and generation tasks involving non-linguistic modalities such as images or videos.

In-Context Learning multimodal generation

DocumentNet: Bridging the Data Gap in Document Pre-Training

no code implementations15 Jun 2023 Lijun Yu, Jin Miao, Xiaoyu Sun, Jiayi Chen, Alexander G. Hauptmann, Hanjun Dai, Wei Wei

Document understanding tasks, in particular, Visually-rich Document Entity Retrieval (VDER), have gained significant attention in recent years thanks to their broad applications in enterprise AI.

document understanding Entity Retrieval +3

Argus++: Robust Real-time Activity Detection for Unconstrained Video Streams with Overlapping Cube Proposals

no code implementations14 Jan 2022 Lijun Yu, Yijun Qian, Wenhe Liu, Alexander G. Hauptmann

Activity detection is one of the attractive computer vision tasks to exploit the video streams captured by widely installed cameras.

Action Detection Activity Detection

Speech Driven Tongue Animation

no code implementations CVPR 2022 Salvador Medina, Denis Tome, Carsten Stoll, Mark Tiede, Kevin Munhall, Alexander G. Hauptmann, Iain Matthews

In this work, we introduce a large-scale speech and mocap dataset that focuses on capturing tongue, jaw, and lip motion.


Subspace Representation Learning for Few-shot Image Classification

no code implementations2 May 2021 Ting-yao Hu, Zhi-Qi Cheng, Alexander G. Hauptmann

In this paper, we propose a subspace representation learning (SRL) framework to tackle few-shot image classification tasks.

Classification Few-Shot Image Classification +3

Pose Guided Person Image Generation with Hidden p-Norm Regression

no code implementations19 Feb 2021 Ting-yao Hu, Alexander G. Hauptmann

In this paper, we propose a novel approach to solve the pose guided person image generation task.

Image Generation regression

Training-free Monocular 3D Event Detection System for Traffic Surveillance

no code implementations1 Feb 2020 Lijun Yu, Peng Chen, Wenhe Liu, Guoliang Kang, Alexander G. Hauptmann

To deal with the aforementioned problems, in this paper, we propose a training-free monocular 3D event detection system for traffic surveillance.

Event Detection

Contrastive Adaptation Network for Unsupervised Domain Adaptation

2 code implementations CVPR 2019 Guoliang Kang, Lu Jiang, Yi Yang, Alexander G. Hauptmann

Unsupervised Domain Adaptation (UDA) makes predictions for the target domain data while manual annotations are only available in the source domain.

Unsupervised Domain Adaptation

RCAA: Relational Context-Aware Agents for Person Search

no code implementations ECCV 2018 Xiaojun Chang, Po-Yao Huang, Yi-Dong Shen, Xiaodan Liang, Yi Yang, Alexander G. Hauptmann

In this paper, we address this problem by training relational context-aware agents which learn the actions to localize the target person from the gallery of whole scene images.

Person Search

Multi-shot Person Re-identification through Set Distance with Visual Distributional Representation

no code implementations3 Aug 2018 Ting-yao Hu, Xiaojun Chang, Alexander G. Hauptmann

In this work, we propose the idea of visual distributional representation, which interprets an image set as samples drawn from an unknown distribution in appearance feature space.

Person Re-Identification

A Closer Look at Weak Label Learning for Audio Events

1 code implementation24 Apr 2018 Ankit Shah, Anurag Kumar, Alexander G. Hauptmann, Bhiksha Raj

In this work, we first describe a CNN based approach for weakly supervised training of audio events.

Audio Classification Event Detection +2

Complex Event Detection by Identifying Reliable Shots From Untrimmed Videos

no code implementations ICCV 2017 Hehe Fan, Xiaojun Chang, De Cheng, Yi Yang, Dong Xu, Alexander G. Hauptmann

relevant) to the given event class, we formulate this task as a multi-instance learning (MIL) problem by taking each video as a bag and the video shots in each video as instances.

Event Detection

Deep Feature Learning via Structured Graph Laplacian Embedding for Person Re-Identification

no code implementations25 Jul 2017 De Cheng, Yihong Gong, Zhihui Li, Weiwei Shi, Alexander G. Hauptmann, Nanning Zheng

The proposed method can take full advantages of the structured distance relationships among these training samples, with the constructed complete graph.

Person Re-Identification

Hidden Two-Stream Convolutional Networks for Action Recognition

3 code implementations2 Apr 2017 Yi Zhu, Zhenzhong Lan, Shawn Newsam, Alexander G. Hauptmann

State-of-the-art action recognition approaches rely on traditional optical flow estimation methods to pre-compute motion information for CNNs.

Action Recognition Optical Flow Estimation +2

Guided Optical Flow Learning

no code implementations8 Feb 2017 Yi Zhu, Zhenzhong Lan, Shawn Newsam, Alexander G. Hauptmann

We study the unsupervised learning of CNNs for optical flow estimation using proxy ground truth data.

Image Reconstruction Optical Flow Estimation

Simple to Complex Cross-modal Learning to Rank

no code implementations4 Feb 2017 Minnan Luo, Xiaojun Chang, Zhihui Li, Liqiang Nie, Alexander G. Hauptmann, Qinghua Zheng

The heterogeneity-gap between different modalities brings a significant challenge to multimedia information retrieval.

Cross-Modal Retrieval Information Retrieval +3

Deep Local Video Feature for Action Recognition

no code implementations25 Jan 2017 Zhenzhong Lan, Yi Zhu, Alexander G. Hauptmann

We investigate the problem of representing an entire video using CNN features for human action recognition.

Action Recognition Temporal Action Localization

Person Re-identification: Past, Present and Future

no code implementations10 Oct 2016 Liang Zheng, Yi Yang, Alexander G. Hauptmann

Person re-identification (re-ID) has become increasingly popular in the community due to its application and research significance.

Image Classification Person Re-Identification +1

Strategies for Searching Video Content with Text Queries or Video Examples

no code implementations17 Jun 2016 Shoou-I Yu, Yi Yang, Zhongwen Xu, Shicheng Xu, Deyu Meng, Zexi Mao, Zhigang Ma, Ming Lin, Xuanchong Li, Huan Li, Zhenzhong Lan, Lu Jiang, Alexander G. Hauptmann, Chuang Gan, Xingzhong Du, Xiaojun Chang

The large number of user-generated videos uploaded on to the Internet everyday has led to many commercial video search engines, which mainly rely on text metadata for search.

Event Detection Retrieval +1

Long-Term Identity-Aware Multi-Person Tracking for Surveillance Video Summarization

no code implementations25 Apr 2016 Shoou-I Yu, Yi Yang, Xuanchong Li, Alexander G. Hauptmann

Therefore, our tracker propagates identity information to frames without recognized faces by uncovering the appearance and spatial manifold formed by person detections.

Face Recognition Video Summarization

Dynamic Concept Composition for Zero-Example Event Detection

no code implementations14 Jan 2016 Xiaojun Chang, Yi Yang, Guodong Long, Chengqi Zhang, Alexander G. Hauptmann

In this paper, we focus on automatically detecting events in unconstrained videos without the use of any visual training exemplars.

Event Detection Zero-Shot Learning

Improving Human Activity Recognition Through Ranking and Re-ranking

no code implementations11 Dec 2015 Zhenzhong Lan, Shoou-I Yu, Alexander G. Hauptmann

We propose two well-motivated ranking-based methods to enhance the performance of current state-of-the-art human activity recognition systems.

Human Activity Recognition Re-Ranking

Handcrafted Local Features are Convolutional Neural Networks

no code implementations16 Nov 2015 Zhenzhong Lan, Shoou-I Yu, Ming Lin, Bhiksha Raj, Alexander G. Hauptmann

We approach this problem by first showing that local handcrafted features and Convolutional Neural Networks (CNNs) share the same convolution-pooling network structure.

Action Recognition Optical Flow Estimation +2

Uncovering Temporal Context for Video Question and Answering

no code implementations15 Nov 2015 Linchao Zhu, Zhongwen Xu, Yi Yang, Alexander G. Hauptmann

In this work, we introduce Video Question Answering in temporal domain to infer the past, describe the present and predict the future.

Decoder Multiple-choice +2

Long-short Term Motion Feature for Action Classification and Retrieval

no code implementations13 Feb 2015 Zhenzhong Lan, Xuanchong Li, Ming Lin, Alexander G. Hauptmann

Therefore, they need to occur frequently enough in the videos and to be be able to tell the difference among different types of motions.

Action Classification Classification +3

A Discriminative CNN Video Representation for Event Detection

no code implementations CVPR 2015 Zhongwen Xu, Yi Yang, Alexander G. Hauptmann

In this paper, we propose a discriminative video representation for event detection over a large scale video dataset when only limited hardware resources are available.

Event Detection

Event Detection using Multi-Level Relevance Labels and Multiple Features

no code implementations CVPR 2014 Zhongwen Xu, Ivor W. Tsang, Yi Yang, Zhigang Ma, Alexander G. Hauptmann

We address the challenging problem of utilizing related exemplars for complex event detection while multiple features are available.

Event Detection

Complex Event Detection via Multi-source Video Attributes

no code implementations CVPR 2013 Zhigang Ma, Yi Yang, Zhongwen Xu, Shuicheng Yan, Nicu Sebe, Alexander G. Hauptmann

Compared to complex event videos, these external videos contain simple contents such as objects, scenes and actions which are the basic elements of complex events.

Event Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.