Search Results for author: Kashu Yamazaki

Found 16 papers, 8 papers with code

Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation

no code implementations5 Oct 2023 Kashu Yamazaki, Taisei Hanyu, Khoa Vo, Thang Pham, Minh Tran, Gianfranco Doretto, Anh Nguyen, Ngan Le

Open-Fusion harnesses the power of a pre-trained vision-language foundation model (VLFM) for open-set semantic comprehension and employs the Truncated Signed Distance Function (TSDF) for swift 3D scene reconstruction.

3D Scene Reconstruction

AerialFormer: Multi-resolution Transformer for Aerial Image Segmentation

no code implementations12 Jun 2023 Kashu Yamazaki, Taisei Hanyu, Minh Tran, Adrian de Luis, Roy McCann, Haitao Liao, Chase Rainwater, Meredith Adkins, Jackson Cothren, Ngan Le

Aerial Image Segmentation is a top-down perspective semantic segmentation and has several challenging characteristics such as strong imbalance in the foreground-background distribution, complex background, intra-class heterogeneity, inter-class homogeneity, and tiny objects.

Image Segmentation Segmentation +1

Contextual Explainable Video Representation: Human Perception-based Understanding

1 code implementation12 Dec 2022 Khoa Vo, Kashu Yamazaki, Phong X. Nguyen, Phat Nguyen, Khoa Luu, Ngan Le

We choose video paragraph captioning and temporal action detection to illustrate the effectiveness of human perception based-contextual representation in video understanding.

Action Detection Action Recognition +4

CLIP-TSA: CLIP-Assisted Temporal Self-Attention for Weakly-Supervised Video Anomaly Detection

1 code implementation9 Dec 2022 Hyekang Kevin Joo, Khoa Vo, Kashu Yamazaki, Ngan Le

Video anomaly detection (VAD) -- commonly formulated as a multiple-instance learning problem in a weakly-supervised manner due to its labor-intensive nature -- is a challenging problem in video surveillance where the frames of anomaly need to be localized in an untrimmed video.

Anomaly Detection Multiple Instance Learning +1

VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning

1 code implementation28 Nov 2022 Kashu Yamazaki, Khoa Vo, Sang Truong, Bhiksha Raj, Ngan Le

Video paragraph captioning aims to generate a multi-sentence description of an untrimmed video with several temporal event locations in coherent storytelling.

Sentence Video Captioning

AISFormer: Amodal Instance Segmentation with Transformer

1 code implementation12 Oct 2022 Minh Tran, Khoa Vo, Kashu Yamazaki, Arthur Fernandes, Michael Kidd, Ngan Le

AISFormer explicitly models the complex coherence between occluder, visible, amodal, and invisible masks within an object's regions of interest by treating them as learnable queries.

Amodal Instance Segmentation Segmentation +1

AOE-Net: Entities Interactions Modeling with Adaptive Attention Mechanism for Temporal Action Proposals Generation

1 code implementation5 Oct 2022 Khoa Vo, Sang Truong, Kashu Yamazaki, Bhiksha Raj, Minh-Triet Tran, Ngan Le

PMR module represents each video snippet by a visual-linguistic feature, in which main actors and surrounding environment are represented by visual information, whereas relevant objects are depicted by linguistic features through an image-text model.

Action Detection Temporal Action Proposal Generation

VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning

1 code implementation26 Jun 2022 Kashu Yamazaki, Sang Truong, Khoa Vo, Michael Kidd, Chase Rainwater, Khoa Luu, Ngan Le

In this paper, we leverage the human perceiving process, that involves vision and language interaction, to generate a coherent paragraph description of untrimmed videos.

Contrastive Learning Video Captioning

Meta-Learning of NAS for Few-shot Learning in Medical Image Applications

no code implementations16 Mar 2022 Viet-Khoa Vo-Ho, Kashu Yamazaki, Hieu Hoang, Minh-Triet Tran, Ngan Le

To address such limitations, meta-learning has been adopted in the scenarios of few-shot learning and multiple tasks.

Few-Shot Learning Image Classification +1

ABN: Agent-Aware Boundary Networks for Temporal Action Proposal Generation

1 code implementation16 Mar 2022 Khoa Vo, Kashu Yamazaki, Sang Truong, Minh-Triet Tran, Akihiro Sugimoto, Ngan Le

Temporal action proposal generation (TAPG) aims to estimate temporal intervals of actions in untrimmed videos, which is a challenging yet plays an important role in many tasks of video analysis and understanding.

Action Detection Temporal Action Proposal Generation

AEI: Actors-Environment Interaction with Adaptive Attention for Temporal Action Proposals Generation

1 code implementation21 Oct 2021 Khoa Vo, Hyekang Joo, Kashu Yamazaki, Sang Truong, Kris Kitani, Minh-Triet Tran, Ngan Le

In this paper, we make an attempt to simulate that ability of a human by proposing Actor Environment Interaction (AEI) network to improve the video representation for temporal action proposals generation.

Action Detection Temporal Action Proposal Generation

Deep Reinforcement Learning in Computer Vision: A Comprehensive Survey

no code implementations25 Aug 2021 Ngan Le, Vidhiwar Singh Rathour, Kashu Yamazaki, Khoa Luu, Marios Savvides

In this work, we provide a detailed review of recent and state-of-the-art research advances of deep reinforcement learning in computer vision.

Image Segmentation object-detection +5

Agent-Environment Network for Temporal Action Proposal Generation

no code implementations17 Jul 2021 Viet-Khoa Vo-Ho, Ngan Le, Kashu Yamazaki, Akihiro Sugimoto, Minh-Triet Tran

Temporal action proposal generation is an essential and challenging task that aims at localizing temporal intervals containing human actions in untrimmed videos.

Temporal Action Proposal Generation

Invertible Residual Network with Regularization for Effective Medical Image Segmentation

no code implementations16 Mar 2021 Kashu Yamazaki, Vidhiwar Singh Rathour, T. Hoang Ngan Le

Among many successful network architectures, 3D Unet has been established as a standard architecture for volumetric medical segmentation.

Image Segmentation Medical Image Segmentation +2

A Multi-task Contextual Atrous Residual Network for Brain Tumor Detection & Segmentation

no code implementations3 Dec 2020 Ngan Le, Kashu Yamazaki, Dat Truong, Kha Gia Quach, Marios Savvides

The first objective is performed by our proposed contextual brain tumor detection network, which plays a role of an attention gate and focuses on the region around brain tumor only while ignoring the far neighbor background which is less correlated to the tumor.

Brain Tumor Segmentation Tumor Segmentation

Cannot find the paper you are looking for? You can Submit a new open access paper.