Search Results for author: Minh Tran

Found 33 papers, 14 papers with code

Negative to Positive Co-learning with Aggressive Modality Dropout

1 code implementation1 Jan 2025 Nicholas Magal, Minh Tran, Riku Arakawa, Suzanne Nie

This paper aims to document an effective way to improve multimodal co-learning by using aggressive modality dropout.

A2VIS: Amodal-Aware Approach to Video Instance Segmentation

no code implementations2 Dec 2024 Minh Tran, Thang Pham, Winston Bounsavy, Tri Nguyen, Ngan Le

Through extensive experiments and ablation studies, we show that A2VIS excels in both MOT and VIS tasks in identifying and tracking object instances with a keen understanding of their full shape.

Instance Segmentation Multiple Object Tracking +4

Amodal Instance Segmentation with Diffusion Shape Prior Estimation

no code implementations26 Sep 2024 Minh Tran, Khoa Vo, Tri Nguyen, Ngan Le

Drawing inspiration from this, we propose AISDiff with a Diffusion Shape Prior Estimation (DiffSP) module.

Amodal Instance Segmentation Object +2

HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model

no code implementations1 Jun 2024 Khoa Vo, Thinh Phan, Kashu Yamazaki, Minh Tran, Ngan Le

Current video-language models (VLMs) rely extensively on instance-level alignment between video and language modalities, which presents two major limitations: (1) visual reasoning disobeys the natural perception that humans do in first-person perspective, leading to a lack of reasoning interpretation; and (2) learning is limited in capturing inherent fine-grained relationships between two modalities.

Action Recognition Decoder +7

S3Former: Self-supervised High-resolution Transformer for Solar PV Profiling

no code implementations7 May 2024 Minh Tran, Adrian de Luis, Haitao Liao, Ying Huang, Roy McCann, Alan Mantooth, Jack Cothren, Ngan Le

To meet this need, we introduce S3Former, designed to segment solar panels from aerial imagery and provide size and location information critical for analyzing the impact of such installations on the grid.

Self-Supervised Learning

CarcassFormer: An End-to-end Transformer-based Framework for Simultaneous Localization, Segmentation and Classification of Poultry Carcass Defect

no code implementations17 Apr 2024 Minh Tran, Sang Truong, Arthur F. A. Fernandes, Michael T. Kidd, Ngan Le

This study proposes an effective approach for automating the assessment of carcass quality without requiring skilled labor or inspector involvement.

Defect Detection

Dyadic Interaction Modeling for Social Behavior Generation

2 code implementations14 Mar 2024 Minh Tran, Di Chang, Maksim Siniukov, Mohammad Soleymani

Hence, an effective model for generating listener nonverbal behaviors requires understanding the dyadic context and interaction.

Contrastive Learning Diversity +1

3FM: Multi-modal Meta-learning for Federated Tasks

1 code implementation15 Dec 2023 Minh Tran, Roochi Shah, Zejun Gong

We present a novel approach in the domain of federated learning (FL), particularly focusing on addressing the challenges posed by modality heterogeneity, variability in modality availability across clients, and the prevalent issue of missing data.

Federated Learning Meta-Learning

SolarFormer: Multi-scale Transformer for Solar PV Profiling

no code implementations30 Oct 2023 Adrian de Luis, Minh Tran, Taisei Hanyu, Anh Tran, Liao Haitao, Roy McCann, Alan Mantooth, Ying Huang, Ngan Le

Accurate mapping of PV installations is crucial for understanding their adoption and informing energy policy.


Privacy-preserving Representation Learning for Speech Understanding

no code implementations26 Oct 2023 Minh Tran, Mohammad Soleymani

In this paper, we present a novel framework to anonymize utterance-level speech embeddings generated by pre-trained encoders and show its effectiveness for a range of speech classification tasks.

Classification Emotion Recognition +6

Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation

1 code implementation5 Oct 2023 Kashu Yamazaki, Taisei Hanyu, Khoa Vo, Thang Pham, Minh Tran, Gianfranco Doretto, Anh Nguyen, Ngan Le

Open-Fusion harnesses the power of a pre-trained vision-language foundation model (VLFM) for open-set semantic comprehension and employs the Truncated Signed Distance Function (TSDF) for swift 3D scene reconstruction.

3D Scene Reconstruction

Representation Learning for Audio Privacy Preservation using Source Separation and Robust Adversarial Learning

no code implementations9 Aug 2023 Diep Luong, Minh Tran, Shayan Gharib, Konstantinos Drossos, Tuomas Virtanen

Privacy preservation has long been a concern in smart acoustic monitoring systems, where speech can be passively recorded along with a target signal in the system's operating environment.

Privacy Preserving Representation Learning

AerialFormer: Multi-resolution Transformer for Aerial Image Segmentation

1 code implementation12 Jun 2023 Kashu Yamazaki, Taisei Hanyu, Minh Tran, Adrian de Luis, Roy McCann, Haitao Liao, Chase Rainwater, Meredith Adkins, Jackson Cothren, Ngan Le

Aerial Image Segmentation is a top-down perspective semantic segmentation and has several challenging characteristics such as strong imbalance in the foreground-background distribution, complex background, intra-class heterogeneity, inter-class homogeneity, and tiny objects.

Decoder Image Segmentation +2

Adversarial Representation Learning for Robust Privacy Preservation in Audio

1 code implementation29 Apr 2023 Shayan Gharib, Minh Tran, Diep Luong, Konstantinos Drossos, Tuomas Virtanen

In this study, we propose a novel adversarial training method for learning representations of audio recordings that effectively prevents the detection of speech activity from the latent features of the recordings.

Event Detection Representation Learning +1

Multi-modal Facial Action Unit Detection with Large Pre-trained Models for the 5th Competition on Affective Behavior Analysis in-the-wild

no code implementations19 Mar 2023 Yufeng Yin, Minh Tran, Di Chang, Xinrui Wang, Mohammad Soleymani

Facial action unit detection has emerged as an important task within facial expression analysis, aimed at detecting specific pre-defined, objective facial expressions, such as lip tightening and cheek raising.

Action Unit Detection Face Alignment +2

An Inception-Residual-Based Architecture with Multi-Objective Loss for Detecting Respiratory Anomalies

no code implementations7 Mar 2023 Dat Ngo, Lam Pham, Huy Phan, Minh Tran, Delaram Jarchi, Sefki Kolozali

Notably, we achieved the Top-1 performance in Task 2-1 and Task 2-2 with the highest Score of 74. 5% and 53. 9%, respectively.

Task 2

Meta Learning for Few-Shot Medical Text Classification

no code implementations3 Dec 2022 Pankaj Sharma, Imran Qureshi, Minh Tran

We investigate the use of meta-learning and robustness techniques on a broad corpus of benchmark text and medical data.

Meta-Learning text-classification +1

AISFormer: Amodal Instance Segmentation with Transformer

1 code implementation12 Oct 2022 Minh Tran, Khoa Vo, Kashu Yamazaki, Arthur Fernandes, Michael Kidd, Ngan Le

AISFormer explicitly models the complex coherence between occluder, visible, amodal, and invisible masks within an object's regions of interest by treating them as learnable queries.

Amodal Instance Segmentation Decoder +2

3DConvCaps: 3DUnet with Convolutional Capsule Encoder for Medical Image Segmentation

1 code implementation19 May 2022 Minh Tran, Viet-Khoa Vo-Ho, Ngan T. H. Le

Capsule network is a recent new architecture that has achieved better robustness in part-whole representation learning by replacing pooling layers with dynamic routing and convolutional strides, which has shown potential results on popular tasks such as digit classification and object segmentation.

Decoder Hippocampus +5

Scaling Cross-Domain Content-Based Image Retrieval for E-commerce Snap and Search Application

no code implementations13 Apr 2022 Isaac Kwan Yin Chung, Minh Tran, Eran Nussinovitch

In this industry talk at ECIR 2022, we illustrate how we approach the main challenges from large scale cross-domain content-based image retrieval using a cascade method and a combination of our visual search and classification capabilities.

Content-Based Image Retrieval Retrieval

A Speech Representation Anonymization Framework via Selective Noise Perturbation

1 code implementation26 Mar 2022 Minh Tran, Mohammad Soleymani

Privacy and security are major concerns when communicating speech signals to cloud services such as automatic speech recognition (ASR) and speech emotion recognition (SER).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +7

A Pre-trained Audio-Visual Transformer for Emotion Recognition

no code implementations23 Jan 2022 Minh Tran, Mohammad Soleymani

In this paper, we introduce a pretrained audio-visual Transformer trained on more than 500k utterances from nearly 4000 celebrities from the VoxCeleb2 dataset for human behavior understanding.

Emotion Classification Emotion Recognition

SS-3DCapsNet: Self-supervised 3D Capsule Networks for Medical Segmentation on Less Labeled Data

no code implementations15 Jan 2022 Minh Tran, Loi Ly, Binh-Son Hua, Ngan Le

Capsule network is a recent new deep network architecture that has been applied successfully for medical image segmentation tasks.

Decoder Hippocampus +5

Deep Federated Learning for Autonomous Driving

1 code implementation12 Oct 2021 Anh Nguyen, Tuong Do, Minh Tran, Binh X. Nguyen, Chien Duong, Tu Phan, Erman Tjiputra, Quang D. Tran

We design a new Federated Autonomous Driving network (FADNet) that can improve the model stability, ensure convergence, and handle imbalanced data distribution problems while is being trained with federated learning methods.

Autonomous Driving Federated Learning

Modeling Dynamics of Facial Behavior for Mental Health Assessment

1 code implementation23 Aug 2021 Minh Tran, Ellen Bradley, Michelle Matvey, Joshua Woolley, Mohammad Soleymani

Facial action unit (FAU) intensities are popular descriptors for the analysis of facial behavior.


Multiple Meta-model Quantifying for Medical Visual Question Answering

2 code implementations19 May 2021 Tuong Do, Binh X. Nguyen, Erman Tjiputra, Minh Tran, Quang D. Tran, Anh Nguyen

However, most of the existing medical VQA methods rely on external data for transfer learning, while the meta-data within the dataset is not fully utilized.

Medical Visual Question Answering Meta-Learning +3

Robust Deep Learning Framework For Predicting Respiratory Anomalies and Diseases

no code implementations21 Jan 2020 Lam Pham, Ian McLoughlin, Huy Phan, Minh Tran, Truc Nguyen, Ramaswamy Palaniappan

This paper presents a robust deep learning framework developed to detect respiratory diseases from recordings of respiratory sounds.

Deep Learning

Are you really looking at me? A Feature-Extraction Framework for Estimating Interpersonal Eye Gaze from Conventional Video

no code implementations21 Jun 2019 Minh Tran, Taylan Sen, Kurtis Haut, Mohammad Rafayet Ali, Mohammed Ehsan Hoque

Despite a revolution in the pervasiveness of video cameras in our daily lives, one of the most meaningful forms of nonverbal affective communication, interpersonal eye gaze, i. e. eye gaze relative to a conversation partner, is not available from common video.

Clustering Deception Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.