Search Results for author: Zangwei Zheng

Found 15 papers, 10 papers with code

How Does the Textual Information Affect the Retrieval of Multimodal In-Context Learning?

no code implementations19 Apr 2024 Yang Luo, Zangwei Zheng, Zirui Zhu, Yang You

This effectiveness, however, hinges on the appropriate selection of in-context examples, a process that is currently biased towards visual data, overlooking textual information.

DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers

1 code implementation15 Mar 2024 Xuanlei Zhao, Shenggan Cheng, Zangwei Zheng, Zheming Yang, Ziming Liu, Yang You

Scaling large models with long sequences across applications like language generation, video generation and multimodal tasks requires efficient sequence parallelism.

Text Generation Video Generation

Helen: Optimizing CTR Prediction Models with Frequency-wise Hessian Eigenvalue Regularization

1 code implementation23 Feb 2024 Zirui Zhu, Yong liu, Zangwei Zheng, Huifeng Guo, Yang You

We explore the typical data characteristics and optimization statistics of CTR prediction, revealing a strong positive correlation between the top hessian eigenvalue and feature frequency.

Click-Through Rate Prediction

OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

1 code implementation29 Jan 2024 Fuzhao Xue, Zian Zheng, Yao Fu, Jinjie Ni, Zangwei Zheng, Wangchunshu Zhou, Yang You

To help the open-source community have a better understanding of Mixture-of-Experts (MoE) based large language models (LLMs), we train and release OpenMoE, a series of fully open-sourced and reproducible decoder-only MoE LLMs, ranging from 650M to 34B parameters and trained on up to over 1T tokens.

CAME: Confidence-guided Adaptive Memory Efficient Optimization

2 code implementations5 Jul 2023 Yang Luo, Xiaozhe Ren, Zangwei Zheng, Zhuo Jiang, Xin Jiang, Yang You

Adaptive gradient methods, such as Adam and LAMB, have demonstrated excellent performance in the training of large language models.

Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline

1 code implementation NeurIPS 2023 Zangwei Zheng, Xiaozhe Ren, Fuzhao Xue, Yang Luo, Xin Jiang, Yang You

By leveraging this information, we introduce an efficient sequence scheduling technique that groups queries with similar response lengths into micro-batches.

Quantization Scheduling

Preventing Zero-Shot Transfer Degradation in Continual Learning of Vision-Language Models

1 code implementation ICCV 2023 Zangwei Zheng, Mingyuan Ma, Kai Wang, Ziheng Qin, Xiangyu Yue, Yang You

To address this challenge, we propose a novel method ZSCL to prevent zero-shot transfer degradation in the continual learning of vision-language models in both feature and parameter space.

Class Incremental Learning Incremental Learning

InfoBatch: Lossless Training Speed Up by Unbiased Dynamic Data Pruning

1 code implementation8 Mar 2023 Ziheng Qin, Kai Wang, Zangwei Zheng, Jianyang Gu, Xiangyu Peng, Zhaopan Xu, Daquan Zhou, Lei Shang, Baigui Sun, Xuansong Xie, Yang You

To solve this problem, we propose \textbf{InfoBatch}, a novel framework aiming to achieve lossless training acceleration by unbiased dynamic data pruning.

Semantic Segmentation

Prompt Vision Transformer for Domain Generalization

1 code implementation18 Aug 2022 Zangwei Zheng, Xiangyu Yue, Kai Wang, Yang You

In this paper, we propose a novel approach DoPrompt based on prompt learning to embed the knowledge of source domains in domain prompts for target domain prediction.

Domain Generalization Representation Learning

Multi-source Few-shot Domain Adaptation

no code implementations25 Sep 2021 Xiangyu Yue, Zangwei Zheng, Colorado Reed, Hari Prasanna Das, Kurt Keutzer, Alberto Sangiovanni Vincentelli

Multi-source Domain Adaptation (MDA) aims to transfer predictive models from multiple, fully-labeled source domains to an unlabeled target domain.

Domain Adaptation Self-Supervised Learning

Cross-token Modeling with Conditional Computation

no code implementations5 Sep 2021 Yuxuan Lou, Fuzhao Xue, Zangwei Zheng, Yang You

Mixture-of-Experts (MoE), a conditional computation architecture, achieved promising performance by scaling local module (i. e. feed-forward network) of transformer.

Computational Efficiency Image Classification

Scene-aware Learning Network for Radar Object Detection

no code implementations3 Jul 2021 Zangwei Zheng, Xiangyu Yue, Kurt Keutzer, Alberto Sangiovanni Vincentelli

In this paper, we propose a scene-aware radar learning framework for accurate and robust object detection.

Ensemble Learning Object +3

Cannot find the paper you are looking for? You can Submit a new open access paper.