Search Results for author: Alexander G. Hauptmann

Found 41 papers, 9 papers with code

MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis

no code implementations • 29 Apr 2024 • Xiang Li, Zhi-Qi Cheng, Jun-Yan He, Xiaojiang Peng, Alexander G. Hauptmann

Emotional Text-to-Speech (E-TTS) synthesis has gained significant attention in recent years due to its potential to enhance human-computer interaction.

Contrastive Learning Speech Synthesis +1

Paper
Add Code

Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

no code implementations • 9 Oct 2023 • Lijun Yu, José Lezama, Nitesh B. Gundavarapu, Luca Versari, Kihyuk Sohn, David Minnen, Yong Cheng, Vighnesh Birodkar, Agrim Gupta, Xiuye Gu, Alexander G. Hauptmann, Boqing Gong, Ming-Hsuan Yang, Irfan Essa, David A. Ross, Lu Jiang

While Large Language Models (LLMs) are the dominant models for generative tasks in language, they do not perform as well as diffusion models on image and video generation.

Ranked #2 on Video Prediction on Kinetics-600 12 frames, 64x64

Action Recognition Image Generation +4

Paper
Add Code

SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs

no code implementations • NeurIPS 2023 • Lijun Yu, Yong Cheng, Zhiruo Wang, Vivek Kumar, Wolfgang Macherey, Yanping Huang, David A. Ross, Irfan Essa, Yonatan Bisk, Ming-Hsuan Yang, Kevin Murphy, Alexander G. Hauptmann, Lu Jiang

In this work, we introduce Semantic Pyramid AutoEncoder (SPAE) for enabling frozen LLMs to perform both understanding and generation tasks involving non-linguistic modalities such as images or videos.

In-Context Learning multimodal generation

Paper
Add Code

DocumentNet: Bridging the Data Gap in Document Pre-Training

no code implementations • 15 Jun 2023 • Lijun Yu, Jin Miao, Xiaoyu Sun, Jiayi Chen, Alexander G. Hauptmann, Hanjun Dai, Wei Wei

Document understanding tasks, in particular, Visually-rich Document Entity Retrieval (VDER), have gained significant attention in recent years thanks to their broad applications in enterprise AI.

document understanding Entity Retrieval +3

Paper
Add Code

ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules

1 code implementation • ICCV 2023 • Zhi-Qi Cheng, Qi Dai, SiYao Li, Jingdong Sun, Teruko Mitamura, Alexander G. Hauptmann

We evaluate ChartReader on Chart-to-Table, ChartQA, and Chart-to-Text tasks, demonstrating its superiority over existing methods.

Derendering Language Modelling +1

Paper
Code

Breaking The Limits of Text-conditioned 3D Motion Synthesis with Elaborative Descriptions

no code implementations • ICCV 2023 • Yijun Qian, Jack Urbanek, Alexander G. Hauptmann, Jungdam Won

Given its wide applications, there is increasing focus on generating 3D human motions from textual descriptions.

Attribute Motion Synthesis

Paper
Add Code

MAGVIT: Masked Generative Video Transformer

1 code implementation • CVPR 2023 • Lijun Yu, Yong Cheng, Kihyuk Sohn, José Lezama, Han Zhang, Huiwen Chang, Alexander G. Hauptmann, Ming-Hsuan Yang, Yuan Hao, Irfan Essa, Lu Jiang

We introduce the MAsked Generative VIdeo Transformer, MAGVIT, to tackle various video synthesis tasks with a single model.

Ranked #1 on Video Prediction on Something-Something V2

Multi-Task Learning Text-to-Video Generation +2

862

Paper
Code

GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement

1 code implementation • 18 Aug 2022 • Zhi-Qi Cheng, Qi Dai, SiYao Li, Teruko Mitamura, Alexander G. Hauptmann

In the second stage, we exploit transformer layers to unearth the potential semantic relations within both verbs and semantic roles.

Grounded Situation Recognition Image Captioning +3

Paper
Code

Rethinking Spatial Invariance of Convolutional Networks for Object Counting

1 code implementation • CVPR 2022 • Zhi-Qi Cheng, Qi Dai, Hong Li, Jingkuan Song, Xiao Wu, Alexander G. Hauptmann

We evaluate our methods on 4 mainstream object counting networks (i. e., MCNN, CSRNet, SANet, and ResNet-50).

Ranked #1 on Object Counting on TRANCOS

Crowd Counting Object +2

Paper
Code

Argus++: Robust Real-time Activity Detection for Unconstrained Video Streams with Overlapping Cube Proposals

no code implementations • 14 Jan 2022 • Lijun Yu, Yijun Qian, Wenhe Liu, Alexander G. Hauptmann

Activity detection is one of the attractive computer vision tasks to exploit the video streams captured by widely installed cameras.

Action Detection Activity Detection

Paper
Add Code

Speech Driven Tongue Animation

no code implementations • CVPR 2022 • Salvador Medina, Denis Tome, Carsten Stoll, Mark Tiede, Kevin Munhall, Alexander G. Hauptmann, Iain Matthews

In this work, we introduce a large-scale speech and mocap dataset that focuses on capturing tongue, jaw, and lip motion.

Decoder

Paper
Add Code

Subspace Representation Learning for Few-shot Image Classification

no code implementations • 2 May 2021 • Ting-yao Hu, Zhi-Qi Cheng, Alexander G. Hauptmann

In this paper, we propose a subspace representation learning (SRL) framework to tackle few-shot image classification tasks.

Classification Few-Shot Image Classification +3

Paper
Add Code

Pose Guided Person Image Generation with Hidden p-Norm Regression

no code implementations • 19 Feb 2021 • Ting-yao Hu, Alexander G. Hauptmann

In this paper, we propose a novel approach to solve the pose guided person image generation task.

Image Generation regression

Paper
Add Code

Pixel-Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation

1 code implementation • NeurIPS 2020 • Guoliang Kang, Yunchao Wei, Yi Yang, Yueting Zhuang, Alexander G. Hauptmann

The conventional solution to this task is to minimize the discrepancy between source and target to enable effective knowledge transfer.

Ranked #25 on Synthetic-to-Real Translation on SYNTHIA-to-Cityscapes

Domain Adaptation Semantic Segmentation +2

Paper
Code

Training-free Monocular 3D Event Detection System for Traffic Surveillance

no code implementations • 1 Feb 2020 • Lijun Yu, Peng Chen, Wenhe Liu, Guoliang Kang, Alexander G. Hauptmann

To deal with the aforementioned problems, in this paper, we propose a training-free monocular 3D event detection system for traffic surveillance.

Event Detection

Paper
Add Code

Contrastive Adaptation Network for Unsupervised Domain Adaptation

2 code implementations • CVPR 2019 • Guoliang Kang, Lu Jiang, Yi Yang, Alexander G. Hauptmann

Unsupervised Domain Adaptation (UDA) makes predictions for the target domain data while manual annotations are only available in the source domain.

Ranked #7 on Domain Adaptation on Office-31

Unsupervised Domain Adaptation

318

Paper
Code

RCAA: Relational Context-Aware Agents for Person Search

no code implementations • ECCV 2018 • Xiaojun Chang, Po-Yao Huang, Yi-Dong Shen, Xiaodan Liang, Yi Yang, Alexander G. Hauptmann

In this paper, we address this problem by training relational context-aware agents which learn the actions to localize the target person from the gallery of whole scene images.

Person Search

Paper
Add Code

Multi-shot Person Re-identification through Set Distance with Visual Distributional Representation

no code implementations • 3 Aug 2018 • Ting-yao Hu, Xiaojun Chang, Alexander G. Hauptmann

In this work, we propose the idea of visual distributional representation, which interprets an image set as samples drawn from an unknown distribution in appearance feature space.

Person Re-Identification

Paper
Add Code

A Closer Look at Weak Label Learning for Audio Events

1 code implementation • 24 Apr 2018 • Ankit Shah, Anurag Kumar, Alexander G. Hauptmann, Bhiksha Raj

In this work, we first describe a CNN based approach for weakly supervised training of audio events.

Audio Classification Event Detection +2

Paper
Code

DecideNet: Counting Varying Density Crowds Through Attention Guided Detection and Density Estimation

1 code implementation • CVPR 2018 • Jiang Liu, Chenqiang Gao, Deyu Meng, Alexander G. Hauptmann

DecideNet starts with estimating the crowd density by generating detection and regression based density maps separately.

Ranked #10 on Crowd Counting on WorldExpo’10

Crowd Counting Density Estimation +1

Paper
Code

Complex Event Detection by Identifying Reliable Shots From Untrimmed Videos

no code implementations • ICCV 2017 • Hehe Fan, Xiaojun Chang, De Cheng, Yi Yang, Dong Xu, Alexander G. Hauptmann

relevant) to the given event class, we formulate this task as a multi-instance learning (MIL) problem by taking each video as a bag and the video shots in each video as instances.

Event Detection

Paper
Add Code

Deep Feature Learning via Structured Graph Laplacian Embedding for Person Re-Identification

no code implementations • 25 Jul 2017 • De Cheng, Yihong Gong, Zhihui Li, Weiwei Shi, Alexander G. Hauptmann, Nanning Zheng

The proposed method can take full advantages of the structured distance relationships among these training samples, with the constructed complete graph.

Person Re-Identification

Paper
Add Code

Video Representation Learning and Latent Concept Mining for Large-scale Multi-label Video Classification

no code implementations • 5 Jul 2017 • Po-Yao Huang, Ye Yuan, Zhenzhong Lan, Lu Jiang, Alexander G. Hauptmann

We report on CMU Informedia Lab's system used in Google's YouTube 8 Million Video Understanding Challenge.

Attribute General Classification +3

Paper
Add Code

Hidden Two-Stream Convolutional Networks for Action Recognition

3 code implementations • 2 Apr 2017 • Yi Zhu, Zhenzhong Lan, Shawn Newsam, Alexander G. Hauptmann

State-of-the-art action recognition approaches rely on traditional optical flow estimation methods to pre-compute motion information for CNNs.

Ranked #20 on Action Recognition on UCF101

Action Recognition Optical Flow Estimation +2

559

Paper
Code

Guided Optical Flow Learning

no code implementations • 8 Feb 2017 • Yi Zhu, Zhenzhong Lan, Shawn Newsam, Alexander G. Hauptmann

We study the unsupervised learning of CNNs for optical flow estimation using proxy ground truth data.

Image Reconstruction Optical Flow Estimation

Paper
Add Code

Simple to Complex Cross-modal Learning to Rank

no code implementations • 4 Feb 2017 • Minnan Luo, Xiaojun Chang, Zhihui Li, Liqiang Nie, Alexander G. Hauptmann, Qinghua Zheng

The heterogeneity-gap between different modalities brings a significant challenge to multimedia information retrieval.

Cross-Modal Retrieval Information Retrieval +3

Paper
Add Code

Deep Local Video Feature for Action Recognition

no code implementations • 25 Jan 2017 • Zhenzhong Lan, Yi Zhu, Alexander G. Hauptmann

We investigate the problem of representing an entire video using CNN features for human action recognition.

Action Recognition Temporal Action Localization

Paper
Add Code

Person Re-identification: Past, Present and Future

no code implementations • 10 Oct 2016 • Liang Zheng, Yi Yang, Alexander G. Hauptmann

Person re-identification (re-ID) has become increasingly popular in the community due to its application and research significance.

Ranked #83 on Person Re-Identification on DukeMTMC-reID

Image Classification Person Re-Identification +1

Paper
Add Code

Self-paced Learning for Weakly Supervised Evidence Discovery in Multimedia Event Search

no code implementations • 12 Aug 2016 • Mengyi Liu, Lu Jiang, Shiguang Shan, Alexander G. Hauptmann

Multimedia event detection has been receiving increasing attention in recent years.

Event Detection

Paper
Add Code

Strategies for Searching Video Content with Text Queries or Video Examples

no code implementations • 17 Jun 2016 • Shoou-I Yu, Yi Yang, Zhongwen Xu, Shicheng Xu, Deyu Meng, Zexi Mao, Zhigang Ma, Ming Lin, Xuanchong Li, Huan Li, Zhenzhong Lan, Lu Jiang, Alexander G. Hauptmann, Chuang Gan, Xingzhong Du, Xiaojun Chang

The large number of user-generated videos uploaded on to the Internet everyday has led to many commercial video search engines, which mainly rely on text metadata for search.

Event Detection Retrieval +1

Paper
Add Code

Long-Term Identity-Aware Multi-Person Tracking for Surveillance Video Summarization

no code implementations • 25 Apr 2016 • Shoou-I Yu, Yi Yang, Xuanchong Li, Alexander G. Hauptmann

Therefore, our tracker propagates identity information to frames without recognized faces by uncovering the appearance and spatial manifold formed by person detections.

Face Recognition Video Summarization

Paper
Add Code

Dynamic Concept Composition for Zero-Example Event Detection

no code implementations • 14 Jan 2016 • Xiaojun Chang, Yi Yang, Guodong Long, Chengqi Zhang, Alexander G. Hauptmann

In this paper, we focus on automatically detecting events in unconstrained videos without the use of any visual training exemplars.

Event Detection Zero-Shot Learning

Paper
Add Code

Improving Human Activity Recognition Through Ranking and Re-ranking

no code implementations • 11 Dec 2015 • Zhenzhong Lan, Shoou-I Yu, Alexander G. Hauptmann

We propose two well-motivated ranking-based methods to enhance the performance of current state-of-the-art human activity recognition systems.

Human Activity Recognition Re-Ranking

Paper
Add Code

Handcrafted Local Features are Convolutional Neural Networks

no code implementations • 16 Nov 2015 • Zhenzhong Lan, Shoou-I Yu, Ming Lin, Bhiksha Raj, Alexander G. Hauptmann

We approach this problem by first showing that local handcrafted features and Convolutional Neural Networks (CNNs) share the same convolution-pooling network structure.

Action Recognition Optical Flow Estimation +2

Paper
Add Code

Uncovering Temporal Context for Video Question and Answering

no code implementations • 15 Nov 2015 • Linchao Zhu, Zhongwen Xu, Yi Yang, Alexander G. Hauptmann

In this work, we introduce Video Question Answering in temporal domain to infer the past, describe the present and predict the future.

Decoder Multiple-choice +2

Paper
Add Code

Beyond Spatial Pyramid Matching: Space-time Extended Descriptor for Action Recognition

no code implementations • 15 Oct 2015 • Zhenzhong Lan, Alexander G. Hauptmann

We address the problem of generating video features for action recognition.

Action Recognition Temporal Action Localization

Paper
Add Code

Long-short Term Motion Feature for Action Classification and Retrieval

no code implementations • 13 Feb 2015 • Zhenzhong Lan, Xuanchong Li, Ming Lin, Alexander G. Hauptmann

Therefore, they need to occur frequently enough in the videos and to be be able to tell the difference among different types of motions.

Action Classification Classification +3

Paper
Add Code

Beyond Gaussian Pyramid: Multi-skip Feature Stacking for Action Recognition

no code implementations • CVPR 2015 • Zhenzhong Lan, Ming Lin, Xuanchong Li, Alexander G. Hauptmann, Bhiksha Raj

MIFS compensates for information lost from using differential operators by recapturing information at coarse scales.

Action Recognition Event Detection +1

Paper
Add Code

A Discriminative CNN Video Representation for Event Detection

no code implementations • CVPR 2015 • Zhongwen Xu, Yi Yang, Alexander G. Hauptmann

In this paper, we propose a discriminative video representation for event detection over a large scale video dataset when only limited hardware resources are available.

Event Detection

Paper
Add Code

Event Detection using Multi-Level Relevance Labels and Multiple Features

no code implementations • CVPR 2014 • Zhongwen Xu, Ivor W. Tsang, Yi Yang, Zhigang Ma, Alexander G. Hauptmann

We address the challenging problem of utilizing related exemplars for complex event detection while multiple features are available.

Event Detection

Paper
Add Code

Complex Event Detection via Multi-source Video Attributes

no code implementations • CVPR 2013 • Zhigang Ma, Yi Yang, Zhongwen Xu, Shuicheng Yan, Nicu Sebe, Alexander G. Hauptmann

Compared to complex event videos, these external videos contain simple contents such as objects, scenes and actions which are the basic elements of complex events.

Event Detection

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.