Search Results for author: Son Tran

Found 16 papers, 3 papers with code

M-LLM Based Video Frame Selection for Efficient Video Understanding

no code implementations27 Feb 2025 Kai Hu, Feng Gao, Xiaohan Nie, Peng Zhou, Son Tran, Tal Neiman, Lingyun Wang, Mubarak Shah, Raffay Hamid, Bing Yin, Trishul Chilimbi

Empirical results show that the proposed M-LLM video frame selector improves the performances various downstream video Large Language Model (video-LLM) across medium (ActivityNet, NExT-QA) and long (EgoSchema, LongVideoBench) context video question answering benchmarks.

EgoSchema Language Modeling +6

Bringing Multimodality to Amazon Visual Search System

no code implementations17 Dec 2024 Xinliang Zhu, Michael Huang, Han Ding, Jinyu Yang, Kelvin Chen, Tao Zhou, Tal Neiman, Ouye Xie, Son Tran, Benjamin Yao, Doug Gray, Anuj Bindal, Arnab Dhua

Through extensive experiments, we show that this change leads to a substantial improvement to the image to image matching problem.

Metric Learning

DreamBlend: Advancing Personalized Fine-tuning of Text-to-Image Diffusion Models

no code implementations28 Nov 2024 Shwetha Ram, Tal Neiman, Qianli Feng, Andrew Stuart, Son Tran, Trishul Chilimbi

As the pre-trained model is fine-tuned, earlier checkpoints synthesize images with low subject fidelity but high prompt fidelity and diversity.

Diversity Image Generation +1

X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs

no code implementations18 Jul 2024 Sirnam Swetha, Jinyu Yang, Tal Neiman, Mamshad Nayeem Rizve, Son Tran, Benjamin Yao, Trishul Chilimbi, Mubarak Shah

In this work, we focus on enhancing the visual representations for MLLMs by combining high-frequency and detailed visual representations, obtained through masked image modeling (MIM), with semantically-enriched low-frequency representations captured by CL.

Contrastive Learning Representation Learning +1

Open Vocabulary Multi-Label Video Classification

no code implementations12 Jul 2024 Rohit Gupta, Mamshad Nayeem Rizve, Jayakrishnan Unnikrishnan, Ashish Tawari, Son Tran, Mubarak Shah, Benjamin Yao, Trishul Chilimbi

Pre-trained vision-language models (VLMs) have enabled significant progress in open vocabulary computer vision tasks such as image classification, object detection and image segmentation.

Action Classification Classification +7

VidLA: Video-Language Alignment at Scale

no code implementations CVPR 2024 Mamshad Nayeem Rizve, Fan Fei, Jayakrishnan Unnikrishnan, Son Tran, Benjamin Z. Yao, Belinda Zeng, Mubarak Shah, Trishul Chilimbi

To effectively address this limitation, we instead keep the network architecture simple and use a set of data tokens that operate at different temporal resolutions in a hierarchical manner, accounting for the temporally hierarchical nature of videos.

Language Modelling Visual Grounding

UnsMOT: Unified Framework for Unsupervised Multi-Object Tracking with Geometric Topology Guidance

no code implementations3 Sep 2023 Son Tran, Cong Tran, Anh Tran, Cuong Pham

In this paper, we push forward the state-of-the-art performance of unsupervised MOT methods by proposing UnsMOT, a novel framework that explicitly combines the appearance and motion features of objects with geometric information to provide more accurate tracking.

Multi-Object Tracking object-detection +1

SurveyLM: A platform to explore emerging value perspectives in augmented language models' behaviors

no code implementations1 Aug 2023 Steve J. Bickley, Ho Fai Chan, Bang Dao, Benno Torgler, Son Tran

This white paper presents our work on SurveyLM, a platform for analyzing augmented language models' (ALMs) emergent alignment behaviors through their dynamically evolving attitude and value perspectives in complex social contexts.

Survey

Multi-modal Alignment using Representation Codebook

no code implementations CVPR 2022 Jiali Duan, Liqun Chen, Son Tran, Jinyu Yang, Yi Xu, Belinda Zeng, Trishul Chilimbi

Aligning signals from different modalities is an important step in vision-language representation learning as it affects the performance of later stages such as cross-modality fusion.

Representation Learning Retrieval

CogniFNN: A Fuzzy Neural Network Framework for Cognitive Word Embedding Evaluation

no code implementations24 Sep 2020 Xinping Liu, Zehong Cao, Son Tran

Word embeddings can reflect the semantic representations, and the embedding qualities can be comprehensively evaluated with human natural reading-related cognitive data sources.

EEG Embeddings Evaluation

Searching for Apparel Products from Images in the Wild

no code implementations4 Jul 2019 Son Tran, Ming Du, Sampath Chanda, R. Manmatha, Cj Taylor

In particular, Instagram and Twitter influencers often provide images of themselves wearing different outfits and their followers are often inspired to buy similar clothes. We propose a system to automatically find the closest visually similar clothes in the online Catalog (street-to-shop searching).

Descriptive

Cannot find the paper you are looking for? You can Submit a new open access paper.