Search Results for author: Lin Ju

Found 7 papers, 1 papers with code

AntBatchInfer: Elastic Batch Inference in the Kubernetes Cluster

no code implementations • 15 Apr 2024 • Siyuan Li, Youshao Xiao, Fanzhuang Meng, Lin Ju, Lei Liang, Lin Wang, Jun Zhou

Offline batch inference is a common task in the industry for deep learning applications, but it can be challenging to ensure stability and performance when dealing with large amounts of data and complicated inference pipelines.

Paper
Add Code

AntDT: A Self-Adaptive Distributed Training Framework for Leader and Straggler Nodes

no code implementations • 15 Apr 2024 • Youshao Xiao, Lin Ju, Zhenglei Zhou, Siyuan Li, ZhaoXin Huan, Dalong Zhang, Rujie Jiang, Lin Wang, Xiaolu Zhang, Lei Liang, Jun Zhou

Previous works only address part of the stragglers and could not adaptively solve various stragglers in practice.

Paper
Add Code

M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining

1 code implementation • 29 Jan 2024 • Qingpei Guo, Furong Xu, Hanxiao Zhang, Wang Ren, Ziping Ma, Lin Ju, Jian Wang, Jingdong Chen, Ming Yang

Vision-language foundation models like CLIP have revolutionized the field of artificial intelligence.

Ranked #1 on Zero-shot Image Retrieval on Flickr30k-CN (using extra training data)

Zero-Shot Cross-Modal Retrieval Zero-shot Image Retrieval +3

Paper
Code

G-Meta: Distributed Meta Learning in GPU Clusters for Large-Scale Recommender Systems

no code implementations • 9 Jan 2024 • Youshao Xiao, Shangchun Zhao, Zhenglei Zhou, ZhaoXin Huan, Lin Ju, Xiaolu Zhang, Lin Wang, Jun Zhou

However, the existing systems are not tailored for meta learning based DLRM models and have critical problems regarding efficiency in distributed training in the GPU cluster.

Meta-Learning Recommendation Systems

Paper
Add Code

An Adaptive Placement and Parallelism Framework for Accelerating RLHF Training

no code implementations • 19 Dec 2023 • Youshao Xiao, Weichang Wu, Zhenglei Zhou, Fagui Mao, Shangchun Zhao, Lin Ju, Lei Liang, Xiaolu Zhang, Jun Zhou

Furthermore, our framework provides a simple user interface and allows for the agile allocation of models across devices in a fine-grained manner for various training scenarios, involving models of varying sizes and devices of different scales.

Paper
Add Code

Rethinking Memory and Communication Cost for Efficient Large Language Model Training

no code implementations • 9 Oct 2023 • Chan Wu, Hanxiao Zhang, Lin Ju, Jinjing Huang, Youshao Xiao, ZhaoXin Huan, Siyuan Li, Fanzhuang Meng, Lei Liang, Xiaolu Zhang, Jun Zhou

In this paper, we rethink the impact of memory consumption and communication costs on the training speed of large language models, and propose a memory-communication balanced strategy set Partial Redundancy Optimizer (PaRO).

Language Modelling Large Language Model

Paper
Add Code

Trust in AutoML: Exploring Information Needs for Establishing Trust in Automated Machine Learning Systems

no code implementations • 17 Jan 2020 • Jaimie Drozdal, Justin Weisz, Dakuo Wang, Gaurav Dass, Bingsheng Yao, Changruo Zhao, Michael Muller, Lin Ju, Hui Su

We explore trust in a relatively new area of data science: Automated Machine Learning (AutoML).

AutoML BIG-bench Machine Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.