1 code implementation • 12 Dec 2022 • Khoa Vo, Kashu Yamazaki, Phong X. Nguyen, Phat Nguyen, Khoa Luu, Ngan Le
We choose video paragraph captioning and temporal action detection to illustrate the effectiveness of human perception based-contextual representation in video understanding.
no code implementations • 9 Dec 2022 • Hyekang Kevin Joo, Khoa Vo, Kashu Yamazaki, Ngan Le
Video anomaly detection (VAD) -- commonly formulated as a multiple-instance learning problem in a weakly-supervised manner due to its labor-intensive nature -- is a challenging problem in video surveillance where the frames of anomaly need to be localized in an untrimmed video.
1 code implementation • 28 Nov 2022 • Kashu Yamazaki, Khoa Vo, Sang Truong, Bhiksha Raj, Ngan Le
Video paragraph captioning aims to generate a multi-sentence description of an untrimmed video with several temporal event locations in coherent storytelling.
Ranked #2 on
Video Captioning
on ActivityNet Captions
1 code implementation • 12 Oct 2022 • Minh Tran, Khoa Vo, Kashu Yamazaki, Arthur Fernandes, Michael Kidd, Ngan Le
AISFormer explicitly models the complex coherence between occluder, visible, amodal, and invisible masks within an object's regions of interest by treating them as learnable queries.
1 code implementation • 5 Oct 2022 • Khoa Vo, Sang Truong, Kashu Yamazaki, Bhiksha Raj, Minh-Triet Tran, Ngan Le
PMR module represents each video snippet by a visual-linguistic feature, in which main actors and surrounding environment are represented by visual information, whereas relevant objects are depicted by linguistic features through an image-text model.
1 code implementation • 26 Jun 2022 • Kashu Yamazaki, Sang Truong, Khoa Vo, Michael Kidd, Chase Rainwater, Khoa Luu, Ngan Le
In this paper, we leverage the human perceiving process, that involves vision and language interaction, to generate a coherent paragraph description of untrimmed videos.
Ranked #3 on
Video Captioning
on ActivityNet Captions
1 code implementation • 16 Mar 2022 • Khoa Vo, Kashu Yamazaki, Sang Truong, Minh-Triet Tran, Akihiro Sugimoto, Ngan Le
Temporal action proposal generation (TAPG) aims to estimate temporal intervals of actions in untrimmed videos, which is a challenging yet plays an important role in many tasks of video analysis and understanding.
1 code implementation • 21 Oct 2021 • Khoa Vo, Hyekang Joo, Kashu Yamazaki, Sang Truong, Kris Kitani, Minh-Triet Tran, Ngan Le
In this paper, we make an attempt to simulate that ability of a human by proposing Actor Environment Interaction (AEI) network to improve the video representation for temporal action proposals generation.
no code implementations • CVPR 2021 • Charles R. Qi, Yin Zhou, Mahyar Najibi, Pei Sun, Khoa Vo, Boyang Deng, Dragomir Anguelov
While current 3D object recognition research mostly focuses on the real-time, onboard scenario, there are many offboard use cases of perception that are largely under-explored, such as using machines to automatically generate high-quality 3D labels.