Search Results for author: Zangwei Zheng

Found 15 papers, 10 papers with code

How Does the Textual Information Affect the Retrieval of Multimodal In-Context Learning?

no code implementations • 19 Apr 2024 • Yang Luo, Zangwei Zheng, Zirui Zhu, Yang You

This effectiveness, however, hinges on the appropriate selection of in-context examples, a process that is currently biased towards visual data, overlooking textual information.

Paper
Add Code

DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers

1 code implementation • 15 Mar 2024 • Xuanlei Zhao, Shenggan Cheng, Zangwei Zheng, Zheming Yang, Ziming Liu, Yang You

Scaling large models with long sequences across applications like language generation, video generation and multimodal tasks requires efficient sequence parallelism.

Text Generation Video Generation

993

Paper
Code

Helen: Optimizing CTR Prediction Models with Frequency-wise Hessian Eigenvalue Regularization

1 code implementation • 23 Feb 2024 • Zirui Zhu, Yong liu, Zangwei Zheng, Huifeng Guo, Yang You

We explore the typical data characteristics and optimization statistics of CTR prediction, revealing a strong positive correlation between the top hessian eigenvalue and feature frequency.

Click-Through Rate Prediction

Paper
Code

OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

1 code implementation • 29 Jan 2024 • Fuzhao Xue, Zian Zheng, Yao Fu, Jinjie Ni, Zangwei Zheng, Wangchunshu Zhou, Yang You

To help the open-source community have a better understanding of Mixture-of-Experts (MoE) based large language models (LLMs), we train and release OpenMoE, a series of fully open-sourced and reproducible decoder-only MoE LLMs, ranging from 650M to 34B parameters and trained on up to over 1T tokens.

1,199

Paper
Code

CAME: Confidence-guided Adaptive Memory Efficient Optimization

2 code implementations • 5 Jul 2023 • Yang Luo, Xiaozhe Ren, Zangwei Zheng, Zhuo Jiang, Xin Jiang, Yang You

Adaptive gradient methods, such as Adam and LAMB, have demonstrated excellent performance in the training of large language models.

2,957

Paper
Code

Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline

1 code implementation • NeurIPS 2023 • Zangwei Zheng, Xiaozhe Ren, Fuzhao Xue, Yang Luo, Xin Jiang, Yang You

By leveraging this information, we introduce an efficient sequence scheduling technique that groups queries with similar response lengths into micro-batches.

Quantization Scheduling

Paper
Code

Preventing Zero-Shot Transfer Degradation in Continual Learning of Vision-Language Models

1 code implementation • ICCV 2023 • Zangwei Zheng, Mingyuan Ma, Kai Wang, Ziheng Qin, Xiangyu Yue, Yang You

To address this challenge, we propose a novel method ZSCL to prevent zero-shot transfer degradation in the continual learning of vision-language models in both feature and parameter space.

Class Incremental Learning Incremental Learning

Paper
Code

InfoBatch: Lossless Training Speed Up by Unbiased Dynamic Data Pruning

1 code implementation • 8 Mar 2023 • Ziheng Qin, Kai Wang, Zangwei Zheng, Jianyang Gu, Xiangyu Peng, Zhaopan Xu, Daquan Zhou, Lei Shang, Baigui Sun, Xuansong Xie, Yang You

To solve this problem, we propose \textbf{InfoBatch}, a novel framework aiming to achieve lossless training acceleration by unbiased dynamic data pruning.

Semantic Segmentation

267

Paper
Code

Prompt Vision Transformer for Domain Generalization

1 code implementation • 18 Aug 2022 • Zangwei Zheng, Xiangyu Yue, Kai Wang, Yang You

In this paper, we propose a novel approach DoPrompt based on prompt learning to embed the knowledge of source domains in domain prompts for target domain prediction.

Domain Generalization Representation Learning

Paper
Code

A Study on Transformer Configuration and Training Objective

no code implementations • 21 May 2022 • Fuzhao Xue, Jianghai Chen, Aixin Sun, Xiaozhe Ren, Zangwei Zheng, Xiaoxin He, Yongming Chen, Xin Jiang, Yang You

In this paper, we revisit these conventional configurations.

Ranked #103 on Image Classification on ImageNet

Image Classification

Paper
Add Code

CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10 minutes on 1 GPU

1 code implementation • 13 Apr 2022 • Zangwei Zheng, Pengtai Xu, Xuan Zou, Da Tang, Zhen Li, Chenguang Xi, Peng Wu, Leqi Zou, Yijie Zhu, Ming Chen, Xiangzhuo Ding, Fuzhao Xue, Ziheng Qin, Youlong Cheng, Yang You

Our experiments show that previous scaling rules fail in the training of CTR prediction neural networks.

Click-Through Rate Prediction

155

Paper
Code

Multi-source Few-shot Domain Adaptation

no code implementations • 25 Sep 2021 • Xiangyu Yue, Zangwei Zheng, Colorado Reed, Hari Prasanna Das, Kurt Keutzer, Alberto Sangiovanni Vincentelli

Multi-source Domain Adaptation (MDA) aims to transfer predictive models from multiple, fully-labeled source domains to an unlabeled target domain.

Domain Adaptation Self-Supervised Learning

Paper
Add Code

Cross-token Modeling with Conditional Computation

no code implementations • 5 Sep 2021 • Yuxuan Lou, Fuzhao Xue, Zangwei Zheng, Yang You

Mixture-of-Experts (MoE), a conditional computation architecture, achieved promising performance by scaling local module (i. e. feed-forward network) of transformer.

Computational Efficiency Image Classification

Paper
Add Code

Scene-aware Learning Network for Radar Object Detection

no code implementations • 3 Jul 2021 • Zangwei Zheng, Xiangyu Yue, Kurt Keutzer, Alberto Sangiovanni Vincentelli

In this paper, we propose a scene-aware radar learning framework for accurate and robust object detection.

Ensemble Learning Object +3

Paper
Add Code

Prototypical Cross-domain Self-supervised Learning for Few-shot Unsupervised Domain Adaptation

1 code implementation • CVPR 2021 • Xiangyu Yue, Zangwei Zheng, Shanghang Zhang, Yang Gao, Trevor Darrell, Kurt Keutzer, Alberto Sangiovanni Vincentelli

In this paper, we propose an end-to-end Prototypical Cross-domain Self-Supervised Learning (PCS) framework for Few-shot Unsupervised Domain Adaptation (FUDA).

Ranked #6 on Semantic Segmentation on DensePASS

Contrastive Learning Self-Supervised Learning +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.