Search Results for author: Trishul Chilimbi

Found 19 papers, 3 papers with code

Asynchronous Convergence in Multi-Task Learning via Knowledge Distillation from Converged Tasks

no code implementations NAACL (ACL) 2022 Weiyi Lu, Sunny Rajagopalan, Priyanka Nigam, Jaspreet Singh, Xiaodi Sun, Yi Xu, Belinda Zeng, Trishul Chilimbi

However, one issue that often arises in MTL is the convergence speed between tasks varies due to differences in task difficulty, so it can be a challenge to simultaneously achieve the best performance on all tasks with a single model checkpoint.

Knowledge Distillation Multi-Task Learning

M-LLM Based Video Frame Selection for Efficient Video Understanding

no code implementations27 Feb 2025 Kai Hu, Feng Gao, Xiaohan Nie, Peng Zhou, Son Tran, Tal Neiman, Lingyun Wang, Mubarak Shah, Raffay Hamid, Bing Yin, Trishul Chilimbi

Empirical results show that the proposed M-LLM video frame selector improves the performances various downstream video Large Language Model (video-LLM) across medium (ActivityNet, NExT-QA) and long (EgoSchema, LongVideoBench) context video question answering benchmarks.

EgoSchema Language Modeling +6

DreamBlend: Advancing Personalized Fine-tuning of Text-to-Image Diffusion Models

no code implementations28 Nov 2024 Shwetha Ram, Tal Neiman, Qianli Feng, Andrew Stuart, Son Tran, Trishul Chilimbi

As the pre-trained model is fine-tuned, earlier checkpoints synthesize images with low subject fidelity but high prompt fidelity and diversity.

Diversity Image Generation +1

X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs

no code implementations18 Jul 2024 Sirnam Swetha, Jinyu Yang, Tal Neiman, Mamshad Nayeem Rizve, Son Tran, Benjamin Yao, Trishul Chilimbi, Mubarak Shah

In this work, we focus on enhancing the visual representations for MLLMs by combining high-frequency and detailed visual representations, obtained through masked image modeling (MIM), with semantically-enriched low-frequency representations captured by CL.

Contrastive Learning Representation Learning +1

Open Vocabulary Multi-Label Video Classification

no code implementations12 Jul 2024 Rohit Gupta, Mamshad Nayeem Rizve, Jayakrishnan Unnikrishnan, Ashish Tawari, Son Tran, Mubarak Shah, Benjamin Yao, Trishul Chilimbi

Pre-trained vision-language models (VLMs) have enabled significant progress in open vocabulary computer vision tasks such as image classification, object detection and image segmentation.

Action Classification Classification +7

VidLA: Video-Language Alignment at Scale

no code implementations CVPR 2024 Mamshad Nayeem Rizve, Fan Fei, Jayakrishnan Unnikrishnan, Son Tran, Benjamin Z. Yao, Belinda Zeng, Mubarak Shah, Trishul Chilimbi

To effectively address this limitation, we instead keep the network architecture simple and use a set of data tokens that operate at different temporal resolutions in a hierarchical manner, accounting for the temporally hierarchical nature of videos.

Language Modelling Visual Grounding

Robust Multi-Task Learning with Excess Risks

1 code implementation3 Feb 2024 Yifei He, Shiji Zhou, Guojun Zhang, Hyokun Yun, Yi Xu, Belinda Zeng, Trishul Chilimbi, Han Zhao

To overcome this limitation, we propose Multi-Task Learning with Excess Risks (ExcessMTL), an excess risk-based task balancing method that updates the task weights by their distances to convergence instead.

Multi-Task Learning

SMILE: Scaling Mixture-of-Experts with Efficient Bi-level Routing

no code implementations10 Dec 2022 Chaoyang He, Shuai Zheng, Aston Zhang, George Karypis, Trishul Chilimbi, Mahdi Soltanolkotabi, Salman Avestimehr

The mixture of Expert (MoE) parallelism is a recent advancement that scales up the model size with constant computational cost.

MICO: Selective Search with Mutual Information Co-training

1 code implementation COLING 2022 Zhanyu Wang, Xiao Zhang, Hyokun Yun, Choon Hui Teo, Trishul Chilimbi

In contrast to traditional exhaustive search, selective search first clusters documents into several groups before all the documents are searched exhaustively by a query, to limit the search executed within one group or only a few groups.

Retrieval

Efficient and effective training of language and graph neural network models

no code implementations22 Jun 2022 Vassilis N. Ioannidis, Xiang Song, Da Zheng, Houyu Zhang, Jun Ma, Yi Xu, Belinda Zeng, Trishul Chilimbi, George Karypis

The effectiveness in our framework is achieved by applying stage-wise fine-tuning of the BERT model first with heterogenous graph information and then with a GNN model.

Edge Classification Graph Neural Network +3

Multi-modal Alignment using Representation Codebook

no code implementations CVPR 2022 Jiali Duan, Liqun Chen, Son Tran, Jinyu Yang, Yi Xu, Belinda Zeng, Trishul Chilimbi

Aligning signals from different modalities is an important step in vision-language representation learning as it affects the performance of later stages such as cross-modality fusion.

Representation Learning Retrieval

Magic Pyramid: Accelerating Inference with Early Exiting and Token Pruning

no code implementations30 Oct 2021 Xuanli He, Iman Keivanloo, Yi Xu, Xiang He, Belinda Zeng, Santosh Rajagopalan, Trishul Chilimbi

To achieve this, we propose a novel idea, Magic Pyramid (MP), to reduce both width-wise and depth-wise computation via token pruning and early exiting for Transformer-based models, particularly BERT.

text-classification Text Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.