Search Results for author: Alexander Hauptmann

Found 47 papers, 19 papers with code

SimAug: Learning Robust Representations from Simulation for Trajectory Prediction

1 code implementation ECCV 2020 Junwei Liang, Lu Jiang, Alexander Hauptmann

We approach this problem through the real-data-free setting in which the model is trained only on 3D simulation data and applied out-of-the-box to a wide variety of real cameras.

Adversarial Attack Adversarial Defense +2

Combo: Co-speech holistic 3D human motion generation and efficient customizable adaptation in harmony

no code implementations18 Aug 2024 Chao Xu, Mingze Sun, Zhi-Qi Cheng, Fei Wang, Yang Liu, Baigui Sun, Ruqi Huang, Alexander Hauptmann

For the former, we propose to pre-train on data regarding a fixed identity with neutral emotion, and defer the incorporation of customizable conditions (identity and emotion) to fine-tuning stage, which is boosted by our novel X-Adapter for parameter-efficient fine-tuning.

Motion Generation parameter-efficient fine-tuning

Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models

no code implementations18 Jul 2024 Xiaoyu Zhu, Hao Zhou, Pengfei Xing, Long Zhao, Hao Xu, Junwei Liang, Alexander Hauptmann, Ting Liu, Andrew Gallagher

In this paper, we investigate the use of diffusion models which are pre-trained on large-scale image-caption pairs for open-vocabulary 3D semantic understanding.

3D Semantic Segmentation Visual Grounding

Multimodal Reranking for Knowledge-Intensive Visual Question Answering

no code implementations17 Jul 2024 Haoyang Wen, Honglei Zhuang, Hamed Zamani, Alexander Hauptmann, Michael Bendersky

Besides, the two-tower architecture also limits the relevance score modeling of a retriever to select top candidates for answer generator reasoning.

Answer Generation Question Answering +1

Learning Visual-Semantic Subspace Representations for Propositional Reasoning

no code implementations25 May 2024 Gabriel Moreira, Alexander Hauptmann, Manuel Marques, João Paulo Costeira

Learning representations that capture rich semantic relationships and accommodate propositional calculus poses a significant challenge.

Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward

1 code implementation1 Apr 2024 Ruohong Zhang, Liangke Gui, Zhiqing Sun, Yihao Feng, Keyang Xu, Yuanhan Zhang, Di Fu, Chunyuan Li, Alexander Hauptmann, Yonatan Bisk, Yiming Yang

Preference modeling techniques, such as direct preference optimization (DPO), has shown effective in enhancing the generalization abilities of large language model (LLM).

Instruction Following Language Modelling +3

Towards Calibrated Robust Fine-Tuning of Vision-Language Models

1 code implementation3 Nov 2023 Changdae Oh, Hyesu Lim, Mijoo Kim, Dongyoon Han, Sangdoo Yun, Jaegul Choo, Alexander Hauptmann, Zhi-Qi Cheng, Kyungwoo Song

Improving out-of-distribution (OOD) generalization during in-distribution (ID) adaptation is a primary goal of robust fine-tuning of zero-shot models beyond naive fine-tuning.

Autonomous Driving Medical Diagnosis

Hyperbolic vs Euclidean Embeddings in Few-Shot Learning: Two Sides of the Same Coin

no code implementations18 Sep 2023 Gabriel Moreira, Manuel Marques, João Paulo Costeira, Alexander Hauptmann

Recent research in representation learning has shown that hierarchical data lends itself to low-dimensional and highly informative representations in hyperbolic space.

Few-Shot Learning Representation Learning

Spatial-Temporal Alignment Network for Action Recognition and Detection

no code implementations4 Dec 2020 Junwei Liang, Liangliang Cao, Xuehan Xiong, Ting Yu, Alexander Hauptmann

The experimental results show that the STAN model can consistently improve the state of the arts in both action detection and action recognition tasks.

Action Detection Action Recognition

Event-Related Bias Removal for Real-time Disaster Events

no code implementations Findings of the Association for Computational Linguistics 2020 Evangelia Spiliopoulou, Salvador Medina Maza, Eduard Hovy, Alexander Hauptmann

Furthermore, the classification of information in real-time systems requires training on out-of-domain data, as we do not have any data from a new emerging crisis.

General Classification

Support-set bottlenecks for video-text representation learning

no code implementations ICLR 2021 Mandela Patrick, Po-Yao Huang, Yuki Asano, Florian Metze, Alexander Hauptmann, João Henriques, Andrea Vedaldi

The dominant paradigm for learning video-text representations -- noise contrastive learning -- increases the similarity of the representations of pairs of samples that are known to be related, such as text and video from the same sample, and pushes away the representations of all other pairs.

Contrastive Learning Representation Learning +3

Robust Long-Term Object Tracking via Improved Discriminative Model Prediction

1 code implementation11 Aug 2020 Seokeon Choi, Junhyun Lee, Yunsung Lee, Alexander Hauptmann

We propose an improved discriminative model prediction method for robust long-term tracking based on a pre-trained short-term tracker.

Object Tracking

From A Glance to "Gotcha": Interactive Facial Image Retrieval with Progressive Relevance Feedback

no code implementations30 Jul 2020 Xinru Yang, Haozhi Qi, Mingyang Li, Alexander Hauptmann

Facial image retrieval plays a significant role in forensic investigations where an untrained witness tries to identify a suspect from a massive pool of images.

Face Image Retrieval Retrieval

ZSTAD: Zero-Shot Temporal Activity Detection

no code implementations CVPR 2020 Lingling Zhang, Xiaojun Chang, Jun Liu, Minnan Luo, Sen Wang, ZongYuan Ge, Alexander Hauptmann

An integral part of video analysis and surveillance is temporal activity detection, which means to simultaneously recognize and localize activities in long untrimmed videos.

Action Detection Activity Detection

The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction

1 code implementation CVPR 2020 Junwei Liang, Lu Jiang, Kevin Murphy, Ting Yu, Alexander Hauptmann

The first contribution is a new dataset, created in a realistic 3D simulator, which is based on real world trajectory data, and then extrapolated by human annotators to achieve different latent goals.

Autonomous Driving Human motion prediction +5

Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations

no code implementations IJCNLP 2019 Po-Yao Huang, Xiaojun Chang, Alexander Hauptmann

With the aim of promoting and understanding the multilingual version of image search, we leverage visual object detection and propose a model with diverse multi-head attention to learn grounded multilingual multimodal representations.

Diversity Image Retrieval +3

Improving the Learning of Multi-column Convolutional Neural Network for Crowd Counting

no code implementations17 Sep 2019 Zhi-Qi Cheng, Jun-Xiu Li, Qi Dai, Xiao Wu, Jun-Yan He, Alexander Hauptmann

By minimizing the mutual information, each column is guided to learn features with different image scales.

Crowd Counting

Learning Spatial Awareness to Improve Crowd Counting

no code implementations ICCV 2019 Zhi-Qi Cheng, Jun-Xiu Li, Qi Dai, Xiao Wu, Alexander Hauptmann

Although the Maximum Excess over SubArrays (MESA) loss has been previously proposed to address the above issues by finding the rectangular subregion whose predicted density map has the maximum difference from the ground truth, it cannot be solved by gradient descent, thus can hardly be integrated into the deep learning framework.

Crowd Counting Weakly-supervised Learning

Activitynet 2019 Task 3: Exploring Contexts for Dense Captioning Events in Videos

no code implementations11 Jul 2019 Shizhe Chen, Yuqing Song, Yida Zhao, Qin Jin, Zhaoyang Zeng, Bei Liu, Jianlong Fu, Alexander Hauptmann

The overall system achieves the state-of-the-art performance on the dense-captioning events in video task with 9. 91 METEOR score on the challenge testing set.

Dense Captioning Dense Video Captioning +1

Unsupervised Bilingual Lexicon Induction from Mono-lingual Multimodal Data

no code implementations2 Jun 2019 Shizhe Chen, Qin Jin, Alexander Hauptmann

The linguistic feature is learned from the sentence contexts with visual semantic constraints, which is beneficial to learn translation for words that are less visual-relevant.

Bilingual Lexicon Induction Sentence +2

ExCL: Extractive Clip Localization Using Natural Language Descriptions

1 code implementation NAACL 2019 Soham Ghosh, Anuva Agarwal, Zarana Parekh, Alexander Hauptmann

The task of retrieving clips within videos based on a given natural language query requires cross-modal reasoning over multiple frames.

Perceiving Physical Equation by Observing Visual Scenarios

no code implementations29 Nov 2018 Siyu Huang, Zhi-Qi Cheng, Xi Li, Xiao Wu, Zhongfei Zhang, Alexander Hauptmann

To tackle this challenge, we present a novel pipeline comprised of an Observer Engine and a Physicist Engine by respectively imitating the actions of an observer and a physicist in the real world.

Traffic Danger Recognition With Surveillance Cameras Without Training Data

no code implementations29 Nov 2018 Lijun Yu, Dawei Zhang, Xiangqun Chen, Alexander Hauptmann

Therefore, we developed a model to predict and identify car crashes from surveillance cameras based on a 3D reconstruction of the road plane and prediction of trajectories.

3D Reconstruction Position

CADP: A Novel Dataset for CCTV Traffic Camera based Accident Analysis

1 code implementation16 Sep 2018 Ankit Shah, Jean Baptiste Lamare, Tuan Nguyen Anh, Alexander Hauptmann

Our experiments indicate a considerable improvement in object detection accuracy: +8. 51% for CM and +6. 20% for ACM.

Object object-detection +2

Stacked Pooling: Improving Crowd Counting by Boosting Scale Invariance

1 code implementation22 Aug 2018 Siyu Huang, Xi Li, Zhi-Qi Cheng, Zhongfei Zhang, Alexander Hauptmann

In this work, we explore the cross-scale similarity in crowd counting scenario, in which the regions of different scales often exhibit high visual similarity.

Crowd Counting Density Estimation

RUC+CMU: System Report for Dense Captioning Events in Videos

no code implementations22 Jun 2018 Shizhe Chen, Yuqing Song, Yida Zhao, Jiarong Qiu, Qin Jin, Alexander Hauptmann

This notebook paper presents our system in the ActivityNet Dense Captioning in Video task (task 3).

Caption Generation Dense Captioning +1

Focal Visual-Text Attention for Visual Question Answering

2 code implementations CVPR 2018 Junwei Liang, Lu Jiang, Liangliang Cao, Li-Jia Li, Alexander Hauptmann

Recent insights on language and vision with neural networks have been successfully applied to simple single-image visual question answering.

Memex Question Answering Question Answering +1

GNAS: A Greedy Neural Architecture Search Method for Multi-Attribute Learning

no code implementations19 Apr 2018 Siyu Huang, Xi Li, Zhi-Qi Cheng, Zhongfei Zhang, Alexander Hauptmann

A key problem in deep multi-attribute learning is to effectively discover the inter-attribute correlation structures.

Attribute Neural Architecture Search

Video Captioning with Guidance of Multimodal Latent Topics

no code implementations31 Aug 2017 Shizhe Chen, Jia Chen, Qin Jin, Alexander Hauptmann

For the topic prediction task, we use the mined topics as the teacher to train a student topic prediction model, which learns to predict the latent topics from multimodal contents of videos.

Caption Generation Decoder +2

MemexQA: Visual Memex Question Answering

1 code implementation4 Aug 2017 Lu Jiang, Junwei Liang, Liangliang Cao, Yannis Kalantidis, Sachin Farfade, Alexander Hauptmann

This paper proposes a new task, MemexQA: given a collection of photos or videos from a user, the goal is to automatically answer questions that help users recover their memory about events captured in the collection.

Memex Question Answering Question Answering +1

Exploiting Multi-modal Curriculum in Noisy Web Data for Large-scale Concept Learning

1 code implementation16 Jul 2016 Junwei Liang, Lu Jiang, Deyu Meng, Alexander Hauptmann

Learning video concept detectors automatically from the big but noisy web data with no additional manual annotations is a novel but challenging area in the multimedia and the machine learning community.

BIG-bench Machine Learning

The Solution Path Algorithm for Identity-Aware Multi-Object Tracking

no code implementations CVPR 2016 Shoou-I Yu, Deyu Meng, WangMeng Zuo, Alexander Hauptmann

The tracker is formulated as a quadratic optimization problem with L0 norm constraints, which we propose to solve with the solution path algorithm.

Active Learning Decision Making +2

The Best of Both Worlds: Combining Data-independent and Data-driven Approaches for Action Recognition

no code implementations17 May 2015 Zhenzhong Lan, Dezhong Yao, Ming Lin, Shoou-I Yu, Alexander Hauptmann

First, we propose a two-stream Stacked Convolutional Independent Subspace Analysis (ConvISA) architecture to show that unsupervised learning methods can significantly boost the performance of traditional local features extracted from data-independent models.

Action Recognition Multi-class Classification +3

Self-Paced Learning with Diversity

no code implementations NeurIPS 2014 Lu Jiang, Deyu Meng, Shoou-I Yu, Zhenzhong Lan, Shiguang Shan, Alexander Hauptmann

Self-paced learning (SPL) is a recently proposed learning regime inspired by the learning process of humans and animals that gradually incorporates easy to more complex samples into training.

Diversity

Harry Potter's Marauder's Map: Localizing and Tracking Multiple Persons-of-Interest by Nonnegative Discretization

no code implementations CVPR 2013 Shoou-I Yu, Yi Yang, Alexander Hauptmann

A device just like Harry Potter's Marauder's Map, which pinpoints the location of each person-of-interest at all times, provides invaluable information for analysis of surveillance videos.

Face Recognition Human Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.