Search Results for author: Belinda Zeng

Found 18 papers, 4 papers with code

Asynchronous Convergence in Multi-Task Learning via Knowledge Distillation from Converged Tasks

no code implementations NAACL (ACL) 2022 Weiyi Lu, Sunny Rajagopalan, Priyanka Nigam, Jaspreet Singh, Xiaodi Sun, Yi Xu, Belinda Zeng, Trishul Chilimbi

However, one issue that often arises in MTL is the convergence speed between tasks varies due to differences in task difficulty, so it can be a challenge to simultaneously achieve the best performance on all tasks with a single model checkpoint.

Knowledge Distillation Multi-Task Learning

Learning LLM Preference over Intra-Dialogue Pairs: A Framework for Utterance-level Understandings

no code implementations7 Mar 2025 Xuanqing Liu, Luyang Kong, Wei Niu, Afshin Khashei, Belinda Zeng, Steve Johnson, Jon Jay, Davor Golac, Matt Pope

Consequently, there is a growing need to combine the scalability of LLM-generated labels with the precision of human annotations, enabling fine-tuned smaller models to achieve both higher speed and accuracy comparable to larger models.

Dialogue Act Classification Dialogue State Tracking +2

Diffusion Models For Multi-Modal Generative Modeling

no code implementations24 Jul 2024 Changyou Chen, Han Ding, Bunyamin Sisman, Yi Xu, Ouye Xie, Benjamin Z. Yao, Son Dinh Tran, Belinda Zeng

In this paper, we propose a principled way to define a diffusion model by constructing a unified multi-modal diffusion model in a common diffusion space.

Decoder Denoising +1

GraphStorm: all-in-one graph machine learning framework for industry applications

1 code implementation10 Jun 2024 Da Zheng, Xiang Song, Qi Zhu, Jian Zhang, Theodore Vasiloudis, Runjie Ma, Houyu Zhang, Zichen Wang, Soji Adeshina, Israt Nisa, Alejandro Mottini, Qingjun Cui, Huzefa Rangwala, Belinda Zeng, Christos Faloutsos, George Karypis

GraphStorm has the following desirable properties: (a) Easy to use: it can perform graph construction and model training and inference with just a single command; (b) Expert-friendly: GraphStorm contains many advanced GML modeling techniques to handle complex graph data and improve model performance; (c) Scalable: every component in GraphStorm can operate on graphs with billions of nodes and can scale model training and inference to different hardware without changing any code.

All graph construction

VidLA: Video-Language Alignment at Scale

no code implementations CVPR 2024 Mamshad Nayeem Rizve, Fan Fei, Jayakrishnan Unnikrishnan, Son Tran, Benjamin Z. Yao, Belinda Zeng, Mubarak Shah, Trishul Chilimbi

To effectively address this limitation, we instead keep the network architecture simple and use a set of data tokens that operate at different temporal resolutions in a hierarchical manner, accounting for the temporally hierarchical nature of videos.

Language Modelling Visual Grounding

Robust Multi-Task Learning with Excess Risks

1 code implementation3 Feb 2024 Yifei He, Shiji Zhou, Guojun Zhang, Hyokun Yun, Yi Xu, Belinda Zeng, Trishul Chilimbi, Han Zhao

To overcome this limitation, we propose Multi-Task Learning with Excess Risks (ExcessMTL), an excess risk-based task balancing method that updates the task weights by their distances to convergence instead.

Multi-Task Learning

Better Representations via Adversarial Training in Pre-Training: A Theoretical Perspective

no code implementations26 Jan 2024 Yue Xing, Xiaofeng Lin, Qifan Song, Yi Xu, Belinda Zeng, Guang Cheng

Pre-training is known to generate universal representations for downstream tasks in large-scale deep learning such as large language models.

Adversarial Robustness Contrastive Learning +1

ForeSeer: Product Aspect Forecasting Using Temporal Graph Embedding

no code implementations7 Oct 2023 Zixuan Liu, Gaurush Hiranandani, Kun Qian, Eddie W. Huang, Yi Xu, Belinda Zeng, Karthik Subbian, Sheng Wang

ForeSeer transfers reviews from similar products on a large product graph and exploits these reviews to predict aspects that might emerge in future reviews.

Graph Embedding Link Prediction

Efficient and effective training of language and graph neural network models

no code implementations22 Jun 2022 Vassilis N. Ioannidis, Xiang Song, Da Zheng, Houyu Zhang, Jun Ma, Yi Xu, Belinda Zeng, Trishul Chilimbi, George Karypis

The effectiveness in our framework is achieved by applying stage-wise fine-tuning of the BERT model first with heterogenous graph information and then with a GNN model.

Edge Classification Graph Neural Network +3

Multi-modal Alignment using Representation Codebook

no code implementations CVPR 2022 Jiali Duan, Liqun Chen, Son Tran, Jinyu Yang, Yi Xu, Belinda Zeng, Trishul Chilimbi

Aligning signals from different modalities is an important step in vision-language representation learning as it affects the performance of later stages such as cross-modality fusion.

Representation Learning Retrieval

Magic Pyramid: Accelerating Inference with Early Exiting and Token Pruning

no code implementations30 Oct 2021 Xuanli He, Iman Keivanloo, Yi Xu, Xiang He, Belinda Zeng, Santosh Rajagopalan, Trishul Chilimbi

To achieve this, we propose a novel idea, Magic Pyramid (MP), to reduce both width-wise and depth-wise computation via token pruning and early exiting for Transformer-based models, particularly BERT.

text-classification Text Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.