1 code implementation • ECCV 2020 • Junwei Liang, Lu Jiang, Alexander Hauptmann
We approach this problem through the real-data-free setting in which the model is trained only on 3D simulation data and applied out-of-the-box to a wide variety of real cameras.
Ranked #1 on Trajectory Forecasting on ActEV
1 code implementation • 22 Oct 2024 • Kai Wang, Zekai Li, Zhi-Qi Cheng, Samir Khaki, Ahmad Sajedi, Ramakrishna Vedantam, Konstantinos N Plataniotis, Alexander Hauptmann, Yang You
Hopefully, more researchers will be inspired and encouraged to improve the practicality and efficacy of DD.
no code implementations • 18 Aug 2024 • Chao Xu, Mingze Sun, Zhi-Qi Cheng, Fei Wang, Yang Liu, Baigui Sun, Ruqi Huang, Alexander Hauptmann
For the former, we propose to pre-train on data regarding a fixed identity with neutral emotion, and defer the incorporation of customizable conditions (identity and emotion) to fine-tuning stage, which is boosted by our novel X-Adapter for parameter-efficient fine-tuning.
no code implementations • 18 Jul 2024 • Xiaoyu Zhu, Hao Zhou, Pengfei Xing, Long Zhao, Hao Xu, Junwei Liang, Alexander Hauptmann, Ting Liu, Andrew Gallagher
In this paper, we investigate the use of diffusion models which are pre-trained on large-scale image-caption pairs for open-vocabulary 3D semantic understanding.
no code implementations • 17 Jul 2024 • Haoyang Wen, Honglei Zhuang, Hamed Zamani, Alexander Hauptmann, Michael Bendersky
Besides, the two-tower architecture also limits the relevance score modeling of a retriever to select top candidates for answer generator reasoning.
1 code implementation • 17 Jun 2024 • Zebang Cheng, Zhi-Qi Cheng, Jun-Yan He, Jingdong Sun, Kai Wang, Yuxiang Lin, Zheng Lian, Xiaojiang Peng, Alexander Hauptmann
Accurate emotion perception is crucial for various applications, including human-computer interaction, education, and counseling.
no code implementations • 25 May 2024 • Gabriel Moreira, Alexander Hauptmann, Manuel Marques, João Paulo Costeira
Learning representations that capture rich semantic relationships and accommodate propositional calculus poses a significant challenge.
1 code implementation • 1 Apr 2024 • Ruohong Zhang, Liangke Gui, Zhiqing Sun, Yihao Feng, Keyang Xu, Yuanhan Zhang, Di Fu, Chunyuan Li, Alexander Hauptmann, Yonatan Bisk, Yiming Yang
Preference modeling techniques, such as direct preference optimization (DPO), has shown effective in enhancing the generalization abilities of large language model (LLM).
1 code implementation • 3 Nov 2023 • Changdae Oh, Hyesu Lim, Mijoo Kim, Dongyoon Han, Sangdoo Yun, Jaegul Choo, Alexander Hauptmann, Zhi-Qi Cheng, Kyungwoo Song
Improving out-of-distribution (OOD) generalization during in-distribution (ID) adaptation is a primary goal of robust fine-tuning of zero-shot models beyond naive fine-tuning.
no code implementations • 18 Sep 2023 • Gabriel Moreira, Manuel Marques, João Paulo Costeira, Alexander Hauptmann
Recent research in representation learning has shown that hierarchical data lends itself to low-dimensional and highly informative representations in hyperbolic space.
1 code implementation • CVPR 2023 • Xiaoyu Zhu, Po-Yao Huang, Junwei Liang, Celso M. de Melo, Alexander Hauptmann
The model uses a hierarchical transformer with intra-frame off-set attention and inter-frame self-attention.
1 code implementation • NAACL 2021 • Po-Yao Huang, Mandela Patrick, Junjie Hu, Graham Neubig, Florian Metze, Alexander Hauptmann
Specifically, we focus on multilingual text-to-video search and propose a Transformer-based model that learns contextualized multilingual multimodal embeddings.
no code implementations • ICCV 2021 • Liangke Gui, Adrien Bardes, Ruslan Salakhutdinov, Alexander Hauptmann, Martial Hebert, Yu-Xiong Wang
Learning to hallucinate additional examples has recently been shown as a promising direction to address few-shot learning tasks.
no code implementations • 4 Dec 2020 • Junwei Liang, Liangliang Cao, Xuehan Xiong, Ting Yu, Alexander Hauptmann
The experimental results show that the STAN model can consistently improve the state of the arts in both action detection and action recognition tasks.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Evangelia Spiliopoulou, Salvador Medina Maza, Eduard Hovy, Alexander Hauptmann
Furthermore, the classification of information in real-time systems requires training on out-of-domain data, as we do not have any data from a new emerging crisis.
no code implementations • ICLR 2021 • Mandela Patrick, Po-Yao Huang, Yuki Asano, Florian Metze, Alexander Hauptmann, João Henriques, Andrea Vedaldi
The dominant paradigm for learning video-text representations -- noise contrastive learning -- increases the similarity of the representations of pairs of samples that are known to be related, such as text and video from the same sample, and pushes away the representations of all other pairs.
1 code implementation • 11 Aug 2020 • Seokeon Choi, Junhyun Lee, Yunsung Lee, Alexander Hauptmann
We propose an improved discriminative model prediction method for robust long-term tracking based on a pre-trained short-term tracker.
no code implementations • 30 Jul 2020 • Xinru Yang, Haozhi Qi, Mingyang Li, Alexander Hauptmann
Facial image retrieval plays a significant role in forensic investigations where an untrained witness tries to identify a suspect from a massive pool of images.
1 code implementation • 30 Jun 2020 • Xiaoyu Zhu, Junwei Liang, Alexander Hauptmann
This provides the first benchmark for quantitative evaluation of models to assess building damage using aerial videos.
no code implementations • ACL 2020 • Po-Yao Huang, Junjie Hu, Xiaojun Chang, Alexander Hauptmann
In this paper, we investigate how to utilize visual content for disambiguation and promoting latent space alignment in unsupervised MMT.
1 code implementation • 4 Apr 2020 • Junwei Liang, Lu Jiang, Alexander Hauptmann
We refer to our method as SimAug.
Ranked #2 on Trajectory Prediction on ActEV
no code implementations • CVPR 2020 • Lingling Zhang, Xiaojun Chang, Jun Liu, Minnan Luo, Sen Wang, ZongYuan Ge, Alexander Hauptmann
An integral part of video analysis and surveillance is temporal activity detection, which means to simultaneously recognize and localize activities in long untrimmed videos.
no code implementations • 29 Jan 2020 • Zhong Zhou, Isak Czeresnia Etinger, Florian Metze, Alexander Hauptmann, Alexander Waibel
We have interesting results both in bounding the shooter as well as detecting the gun smoke.
1 code implementation • CVPR 2020 • Junwei Liang, Lu Jiang, Kevin Murphy, Ting Yu, Alexander Hauptmann
The first contribution is a new dataset, created in a realistic 3D simulator, which is based on real world trajectory data, and then extrapolated by human annotators to achieve different latent goals.
Ranked #1 on Multi-future Trajectory Prediction on ForkingPaths
no code implementations • IJCNLP 2019 • Po-Yao Huang, Xiaojun Chang, Alexander Hauptmann
With the aim of promoting and understanding the multilingual version of image search, we leverage visual object detection and propose a model with diverse multi-head attention to learn grounded multilingual multimodal representations.
no code implementations • 17 Sep 2019 • Zhi-Qi Cheng, Jun-Xiu Li, Qi Dai, Xiao Wu, Jun-Yan He, Alexander Hauptmann
By minimizing the mutual information, each column is guided to learn features with different image scales.
no code implementations • ICCV 2019 • Zhi-Qi Cheng, Jun-Xiu Li, Qi Dai, Xiao Wu, Alexander Hauptmann
Although the Maximum Excess over SubArrays (MESA) loss has been previously proposed to address the above issues by finding the rectangular subregion whose predicted density map has the maximum difference from the ground truth, it cannot be solved by gradient descent, thus can hardly be integrated into the deep learning framework.
Ranked #5 on Crowd Counting on WorldExpo’10
no code implementations • 11 Jul 2019 • Shizhe Chen, Yuqing Song, Yida Zhao, Qin Jin, Zhaoyang Zeng, Bei Liu, Jianlong Fu, Alexander Hauptmann
The overall system achieves the state-of-the-art performance on the dense-captioning events in video task with 9. 91 METEOR score on the challenge testing set.
no code implementations • 2 Jun 2019 • Shizhe Chen, Qin Jin, Alexander Hauptmann
The linguistic feature is learned from the sentence contexts with visual semantic constraints, which is beneficial to learn translation for words that are less visual-relevant.
2 code implementations • 26 May 2019 • Junwei Liang, Jay D. Aronson, Alexander Hauptmann
Among other uses, VERA enables the localization of a shooter from just a few videos that include the sound of gunshots.
1 code implementation • NAACL 2019 • Soham Ghosh, Anuva Agarwal, Zarana Parekh, Alexander Hauptmann
The task of retrieving clips within videos based on a given natural language query requires cross-modal reasoning over multiple frames.
2 code implementations • CVPR 2019 • Junwei Liang, Lu Jiang, Juan Carlos Niebles, Alexander Hauptmann, Li Fei-Fei
To facilitate the training, the network is learned with an auxiliary task of predicting future location in which the activity will happen.
Ranked #1 on Activity Prediction on ActEV
no code implementations • 29 Nov 2018 • Siyu Huang, Zhi-Qi Cheng, Xi Li, Xiao Wu, Zhongfei Zhang, Alexander Hauptmann
To tackle this challenge, we present a novel pipeline comprised of an Observer Engine and a Physicist Engine by respectively imitating the actions of an observer and a physicist in the real world.
no code implementations • 29 Nov 2018 • Lijun Yu, Dawei Zhang, Xiangqun Chen, Alexander Hauptmann
Therefore, we developed a model to predict and identify car crashes from surveillance cameras based on a 3D reconstruction of the road plane and prediction of trajectories.
1 code implementation • 16 Sep 2018 • Ankit Shah, Jean Baptiste Lamare, Tuan Nguyen Anh, Alexander Hauptmann
Our experiments indicate a considerable improvement in object detection accuracy: +8. 51% for CM and +6. 20% for ACM.
no code implementations • 1 Sep 2018 • Ankit Shah, Harini Kesavamoorthy, Poorva Rane, Pramati Kalwad, Alexander Hauptmann, Florian Metze
Moments capture a huge part of our lives.
1 code implementation • 22 Aug 2018 • Siyu Huang, Xi Li, Zhi-Qi Cheng, Zhongfei Zhang, Alexander Hauptmann
In this work, we explore the cross-scale similarity in crowd counting scenario, in which the regions of different scales often exhibit high visual similarity.
no code implementations • 22 Jun 2018 • Shizhe Chen, Yuqing Song, Yida Zhao, Jiarong Qiu, Qin Jin, Alexander Hauptmann
This notebook paper presents our system in the ActivityNet Dense Captioning in Video task (task 3).
2 code implementations • CVPR 2018 • Junwei Liang, Lu Jiang, Liangliang Cao, Li-Jia Li, Alexander Hauptmann
Recent insights on language and vision with neural networks have been successfully applied to simple single-image visual question answering.
Ranked #1 on Memex Question Answering on MemexQA
no code implementations • 19 Apr 2018 • Siyu Huang, Xi Li, Zhi-Qi Cheng, Zhongfei Zhang, Alexander Hauptmann
A key problem in deep multi-attribute learning is to effectively discover the inter-attribute correlation structures.
no code implementations • 31 Aug 2017 • Shizhe Chen, Jia Chen, Qin Jin, Alexander Hauptmann
For the topic prediction task, we use the mined topics as the teacher to train a student topic prediction model, which learns to predict the latent topics from multimodal contents of videos.
1 code implementation • 4 Aug 2017 • Lu Jiang, Junwei Liang, Liangliang Cao, Yannis Kalantidis, Sachin Farfade, Alexander Hauptmann
This paper proposes a new task, MemexQA: given a collection of photos or videos from a user, the goal is to automatically answer questions that help users recover their memory about events captured in the collection.
1 code implementation • 16 Jul 2016 • Junwei Liang, Lu Jiang, Deyu Meng, Alexander Hauptmann
Learning video concept detectors automatically from the big but noisy web data with no additional manual annotations is a novel but challenging area in the multimedia and the machine learning community.
no code implementations • CVPR 2016 • Shoou-I Yu, Deyu Meng, WangMeng Zuo, Alexander Hauptmann
The tracker is formulated as a quadratic optimization problem with L0 norm constraints, which we propose to solve with the solution path algorithm.
no code implementations • 17 May 2015 • Zhenzhong Lan, Dezhong Yao, Ming Lin, Shoou-I Yu, Alexander Hauptmann
First, we propose a two-stream Stacked Convolutional Independent Subspace Analysis (ConvISA) architecture to show that unsupervised learning methods can significantly boost the performance of traditional local features extracted from data-independent models.
no code implementations • NeurIPS 2014 • Lu Jiang, Deyu Meng, Shoou-I Yu, Zhenzhong Lan, Shiguang Shan, Alexander Hauptmann
Self-paced learning (SPL) is a recently proposed learning regime inspired by the learning process of humans and animals that gradually incorporates easy to more complex samples into training.
no code implementations • CVPR 2013 • Shoou-I Yu, Yi Yang, Alexander Hauptmann
A device just like Harry Potter's Marauder's Map, which pinpoints the location of each person-of-interest at all times, provides invaluable information for analysis of surveillance videos.