1 code implementation • 1 Jan 2025 • Nicholas Magal, Minh Tran, Riku Arakawa, Suzanne Nie
This paper aims to document an effective way to improve multimodal co-learning by using aggressive modality dropout.
no code implementations • 2 Dec 2024 • Minh Tran, Thang Pham, Winston Bounsavy, Tri Nguyen, Ngan Le
Through extensive experiments and ablation studies, we show that A2VIS excels in both MOT and VIS tasks in identifying and tracking object instances with a keen understanding of their full shape.
no code implementations • 26 Sep 2024 • Minh Tran, Khoa Vo, Tri Nguyen, Ngan Le
Drawing inspiration from this, we propose AISDiff with a Diffusion Shape Prior Estimation (DiffSP) module.
no code implementations • 1 Jun 2024 • Khoa Vo, Thinh Phan, Kashu Yamazaki, Minh Tran, Ngan Le
Current video-language models (VLMs) rely extensively on instance-level alignment between video and language modalities, which presents two major limitations: (1) visual reasoning disobeys the natural perception that humans do in first-person perspective, leading to a lack of reasoning interpretation; and (2) learning is limited in capturing inherent fine-grained relationships between two modalities.
no code implementations • 7 May 2024 • Minh Tran, Adrian de Luis, Haitao Liao, Ying Huang, Roy McCann, Alan Mantooth, Jack Cothren, Ngan Le
To meet this need, we introduce S3Former, designed to segment solar panels from aerial imagery and provide size and location information critical for analyzing the impact of such installations on the grid.
no code implementations • 17 Apr 2024 • Minh Tran, Sang Truong, Arthur F. A. Fernandes, Michael T. Kidd, Ngan Le
This study proposes an effective approach for automating the assessment of carcass quality without requiring skilled labor or inspector involvement.
1 code implementation • 18 Mar 2024 • Minh Tran, Winston Bounsavy, Khoa Vo, Anh Nguyen, Tri Nguyen, Ngan Le
Consequently, this compromised quality of visible features during the subsequent visible-to-amodal transition.
2 code implementations • 14 Mar 2024 • Minh Tran, Di Chang, Maksim Siniukov, Mohammad Soleymani
Hence, an effective model for generating listener nonverbal behaviors requires understanding the dyadic context and interaction.
1 code implementation • 15 Dec 2023 • Minh Tran, Roochi Shah, Zejun Gong
We present a novel approach in the domain of federated learning (FL), particularly focusing on addressing the challenges posed by modality heterogeneity, variability in modality availability across clients, and the prevalent issue of missing data.
no code implementations • 30 Oct 2023 • Adrian de Luis, Minh Tran, Taisei Hanyu, Anh Tran, Liao Haitao, Roy McCann, Alan Mantooth, Ying Huang, Ngan Le
Accurate mapping of PV installations is crucial for understanding their adoption and informing energy policy.
no code implementations • 26 Oct 2023 • Minh Tran, Mohammad Soleymani
In this paper, we present a novel framework to anonymize utterance-level speech embeddings generated by pre-trained encoders and show its effectiveness for a range of speech classification tasks.
1 code implementation • 5 Oct 2023 • Kashu Yamazaki, Taisei Hanyu, Khoa Vo, Thang Pham, Minh Tran, Gianfranco Doretto, Anh Nguyen, Ngan Le
Open-Fusion harnesses the power of a pre-trained vision-language foundation model (VLFM) for open-set semantic comprehension and employs the Truncated Signed Distance Function (TSDF) for swift 3D scene reconstruction.
no code implementations • 5 Sep 2023 • Minh Tran, Yufeng Yin, Mohammad Soleymani
There are individual differences in expressive behaviors driven by cultural norms and personality.
1 code implementation • 18 Aug 2023 • Di Chang, Yufeng Yin, Zongjian Li, Minh Tran, Mohammad Soleymani
Facial expression analysis is an important tool for human-computer interaction.
no code implementations • 9 Aug 2023 • Diep Luong, Minh Tran, Shayan Gharib, Konstantinos Drossos, Tuomas Virtanen
Privacy preservation has long been a concern in smart acoustic monitoring systems, where speech can be passively recorded along with a target signal in the system's operating environment.
1 code implementation • 12 Jun 2023 • Kashu Yamazaki, Taisei Hanyu, Minh Tran, Adrian de Luis, Roy McCann, Haitao Liao, Chase Rainwater, Meredith Adkins, Jackson Cothren, Ngan Le
Aerial Image Segmentation is a top-down perspective semantic segmentation and has several challenging characteristics such as strong imbalance in the foreground-background distribution, complex background, intra-class heterogeneity, inter-class homogeneity, and tiny objects.
Ranked #1 on Semantic Segmentation on ISPRS Potsdam
1 code implementation • 29 Apr 2023 • Shayan Gharib, Minh Tran, Diep Luong, Konstantinos Drossos, Tuomas Virtanen
In this study, we propose a novel adversarial training method for learning representations of audio recordings that effectively prevents the detection of speech activity from the latent features of the recordings.
no code implementations • 19 Mar 2023 • Yufeng Yin, Minh Tran, Di Chang, Xinrui Wang, Mohammad Soleymani
Facial action unit detection has emerged as an important task within facial expression analysis, aimed at detecting specific pre-defined, objective facial expressions, such as lip tightening and cheek raising.
no code implementations • 7 Mar 2023 • Dat Ngo, Lam Pham, Huy Phan, Minh Tran, Delaram Jarchi, Sefki Kolozali
Notably, we achieved the Top-1 performance in Task 2-1 and Task 2-2 with the highest Score of 74. 5% and 53. 9%, respectively.
no code implementations • 3 Dec 2022 • Pankaj Sharma, Imran Qureshi, Minh Tran
We investigate the use of meta-learning and robustness techniques on a broad corpus of benchmark text and medical data.
1 code implementation • 12 Oct 2022 • Minh Tran, Khoa Vo, Kashu Yamazaki, Arthur Fernandes, Michael Kidd, Ngan Le
AISFormer explicitly models the complex coherence between occluder, visible, amodal, and invisible masks within an object's regions of interest by treating them as learnable queries.
1 code implementation • 19 May 2022 • Minh Tran, Viet-Khoa Vo-Ho, Ngan T. H. Le
Capsule network is a recent new architecture that has achieved better robustness in part-whole representation learning by replacing pooling layers with dynamic routing and convolutional strides, which has shown potential results on popular tasks such as digit classification and object segmentation.
no code implementations • 13 Apr 2022 • Isaac Kwan Yin Chung, Minh Tran, Eran Nussinovitch
In this industry talk at ECIR 2022, we illustrate how we approach the main challenges from large scale cross-domain content-based image retrieval using a cascade method and a combination of our visual search and classification capabilities.
1 code implementation • 26 Mar 2022 • Minh Tran, Mohammad Soleymani
Privacy and security are major concerns when communicating speech signals to cloud services such as automatic speech recognition (ASR) and speech emotion recognition (SER).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +7
no code implementations • 16 Mar 2022 • Minh Tran, Viet-Khoa Vo-Ho, Kyle Quinn, Hien Nguyen, Khoa Luu, Ngan Le
We then provide recent developments of CapsNet for the task of medical image segmentation.
no code implementations • 23 Jan 2022 • Minh Tran, Mohammad Soleymani
In this paper, we introduce a pretrained audio-visual Transformer trained on more than 500k utterances from nearly 4000 celebrities from the VoxCeleb2 dataset for human behavior understanding.
no code implementations • 15 Jan 2022 • Minh Tran, Loi Ly, Binh-Son Hua, Ngan Le
Capsule network is a recent new deep network architecture that has been applied successfully for medical image segmentation tasks.
1 code implementation • 12 Oct 2021 • Anh Nguyen, Tuong Do, Minh Tran, Binh X. Nguyen, Chien Duong, Tu Phan, Erman Tjiputra, Quang D. Tran
We design a new Federated Autonomous Driving network (FADNet) that can improve the model stability, ensure convergence, and handle imbalanced data distribution problems while is being trained with federated learning methods.
1 code implementation • 23 Aug 2021 • Minh Tran, Ellen Bradley, Michelle Matvey, Joshua Woolley, Mohammad Soleymani
Facial action unit (FAU) intensities are popular descriptors for the analysis of facial behavior.
2 code implementations • 19 May 2021 • Tuong Do, Binh X. Nguyen, Erman Tjiputra, Minh Tran, Quang D. Tran, Anh Nguyen
However, most of the existing medical VQA methods rely on external data for transfer learning, while the meta-data within the dataset is not fully utilized.
Ranked #5 on Medical Visual Question Answering on PathVQA
no code implementations • COLING 2020 • Minh Tran, YiPeng Zhang, Mohammad Soleymani
Offensive and abusive language is a pressing problem on social media platforms.
no code implementations • 21 Jan 2020 • Lam Pham, Ian McLoughlin, Huy Phan, Minh Tran, Truc Nguyen, Ramaswamy Palaniappan
This paper presents a robust deep learning framework developed to detect respiratory diseases from recordings of respiratory sounds.
no code implementations • 21 Jun 2019 • Minh Tran, Taylan Sen, Kurtis Haut, Mohammad Rafayet Ali, Mohammed Ehsan Hoque
Despite a revolution in the pervasiveness of video cameras in our daily lives, one of the most meaningful forms of nonverbal affective communication, interpersonal eye gaze, i. e. eye gaze relative to a conversation partner, is not available from common video.