1 code implementation • 12 Dec 2022 • Khoa Vo, Kashu Yamazaki, Phong X. Nguyen, Phat Nguyen, Khoa Luu, Ngan Le
We choose video paragraph captioning and temporal action detection to illustrate the effectiveness of human perception based-contextual representation in video understanding.
no code implementations • 9 Dec 2022 • Hyekang Kevin Joo, Khoa Vo, Kashu Yamazaki, Ngan Le
Video anomaly detection (VAD) -- commonly formulated as a multiple-instance learning problem in a weakly-supervised manner due to its labor-intensive nature -- is a challenging problem in video surveillance where the frames of anomaly need to be localized in an untrimmed video.
1 code implementation • 28 Nov 2022 • Kashu Yamazaki, Khoa Vo, Sang Truong, Bhiksha Raj, Ngan Le
Video paragraph captioning aims to generate a multi-sentence description of an untrimmed video with several temporal event locations in coherent storytelling.
Ranked #2 on
Video Captioning
on ActivityNet Captions
1 code implementation • 12 Oct 2022 • Minh Tran, Khoa Vo, Kashu Yamazaki, Arthur Fernandes, Michael Kidd, Ngan Le
AISFormer explicitly models the complex coherence between occluder, visible, amodal, and invisible masks within an object's regions of interest by treating them as learnable queries.
1 code implementation • 5 Oct 2022 • Khoa Vo, Sang Truong, Kashu Yamazaki, Bhiksha Raj, Minh-Triet Tran, Ngan Le
PMR module represents each video snippet by a visual-linguistic feature, in which main actors and surrounding environment are represented by visual information, whereas relevant objects are depicted by linguistic features through an image-text model.
1 code implementation • 26 Jun 2022 • Kashu Yamazaki, Sang Truong, Khoa Vo, Michael Kidd, Chase Rainwater, Khoa Luu, Ngan Le
In this paper, we leverage the human perceiving process, that involves vision and language interaction, to generate a coherent paragraph description of untrimmed videos.
Ranked #3 on
Video Captioning
on ActivityNet Captions
1 code implementation • 16 Mar 2022 • Khoa Vo, Kashu Yamazaki, Sang Truong, Minh-Triet Tran, Akihiro Sugimoto, Ngan Le
Temporal action proposal generation (TAPG) aims to estimate temporal intervals of actions in untrimmed videos, which is a challenging yet plays an important role in many tasks of video analysis and understanding.
no code implementations • 16 Mar 2022 • Viet-Khoa Vo-Ho, Kashu Yamazaki, Hieu Hoang, Minh-Triet Tran, Ngan Le
To address such limitations, meta-learning has been adopted in the scenarios of few-shot learning and multiple tasks.
1 code implementation • 21 Oct 2021 • Khoa Vo, Hyekang Joo, Kashu Yamazaki, Sang Truong, Kris Kitani, Minh-Triet Tran, Ngan Le
In this paper, we make an attempt to simulate that ability of a human by proposing Actor Environment Interaction (AEI) network to improve the video representation for temporal action proposals generation.
no code implementations • 25 Aug 2021 • Ngan Le, Vidhiwar Singh Rathour, Kashu Yamazaki, Khoa Luu, Marios Savvides
In this work, we provide a detailed review of recent and state-of-the-art research advances of deep reinforcement learning in computer vision.
no code implementations • 17 Jul 2021 • Viet-Khoa Vo-Ho, Ngan Le, Kashu Yamazaki, Akihiro Sugimoto, Minh-Triet Tran
Temporal action proposal generation is an essential and challenging task that aims at localizing temporal intervals containing human actions in untrimmed videos.
no code implementations • 16 Mar 2021 • Kashu Yamazaki, Vidhiwar Singh Rathour, T. Hoang Ngan Le
Among many successful network architectures, 3D Unet has been established as a standard architecture for volumetric medical segmentation.
no code implementations • 4 Dec 2020 • Ngan Le, Trung Le, Kashu Yamazaki, Toan Duc Bui, Khoa Luu, Marios Savides
Our proposed Offset Curves (OsC) loss consists of three main fitting terms.
no code implementations • 3 Dec 2020 • Ngan Le, Kashu Yamazaki, Dat Truong, Kha Gia Quach, Marios Savvides
The first objective is performed by our proposed contextual brain tumor detection network, which plays a role of an attention gate and focuses on the region around brain tumor only while ignoring the far neighbor background which is less correlated to the tumor.