Search Results for author: Han Shi

Found 25 papers, 13 papers with code

Self-Adjust Softmax

no code implementations25 Feb 2025 Chuanyang Zheng, Yihang Gao, Guoxuan Chen, Han Shi, Jing Xiong, Xiaozhe Ren, Chao Huang, Xin Jiang, Zhenguo Li, Yu Li

We conducted experiments to evaluate the empirical performance of Transformer models using SA-Softmax compared to the vanilla softmax function.

SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator

1 code implementation16 Dec 2024 Guoxuan Chen, Han Shi, Jiawei Li, Yihang Gao, Xiaozhe Ren, Yimeng Chen, Xin Jiang, Zhenguo Li, Weiyang Liu, Chao Huang

This observation suggests that information of the segments between these separator tokens can be effectively condensed into the separator tokens themselves without significant information loss.

GSM8K Language Modeling +1

Efficient Multi-modal Large Language Models via Visual Token Grouping

no code implementations26 Nov 2024 Minbin Huang, Runhui Huang, Han Shi, Yimeng Chen, Chuanyang Zheng, Xiangguo Sun, Xin Jiang, Zhenguo Li, Hong Cheng

The development of Multi-modal Large Language Models (MLLMs) enhances Large Language Models (LLMs) with the ability to perceive data formats beyond text, significantly advancing a range of downstream applications, such as visual question answering and image captioning.

Image Captioning Question Answering +2

DAPE V2: Process Attention Score as Feature Map for Length Extrapolation

2 code implementations7 Oct 2024 Chuanyang Zheng, Yihang Gao, Han Shi, Jing Xiong, Jiankai Sun, Jingyao Li, Minbin Huang, Xiaozhe Ren, Michael Ng, Xin Jiang, Zhenguo Li, Yu Li

The attention mechanism is a fundamental component of the Transformer model, contributing to interactions among distinct tokens, in contrast to earlier feed-forward neural networks.

Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding

1 code implementation2 Oct 2024 Yao Teng, Han Shi, Xian Liu, Xuefei Ning, Guohao Dai, Yu Wang, Zhenguo Li, Xihui Liu

In this paper, we propose a training-free probabilistic parallel decoding algorithm, Speculative Jacobi Decoding (SJD), to accelerate auto-regressive text-to-image generation.

Text-to-Image Generation

QuickLLaMA: Query-aware Inference Acceleration for Large Language Models

1 code implementation11 Jun 2024 Jingyao Li, Han Shi, Xin Jiang, Zhenguo Li, Hong Xu, Jiaya Jia

On widely recognized benchmarks, Q-LLM improved by 7. 17% compared to the current state-of-the-art on LLaMA3, and by 3. 26% on Mistral on the $\infty$-bench.

DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis

1 code implementation23 May 2024 Yao Teng, Yue Wu, Han Shi, Xuefei Ning, Guohao Dai, Yu Wang, Zhenguo Li, Xihui Liu

In addition, to further improve training efficiency for high-resolution image generation with DiM, we investigate "weak-to-strong" training strategy that pretrains DiM on low-resolution images ($256\times 256$) and then finetune it on high-resolution images ($512 \times 512$).

Image Generation Mamba +1

DAPE: Data-Adaptive Positional Encoding for Length Extrapolation

2 code implementations23 May 2024 Chuanyang Zheng, Yihang Gao, Han Shi, Minbin Huang, Jingyao Li, Jing Xiong, Xiaozhe Ren, Michael Ng, Xin Jiang, Zhenguo Li, Yu Li

Positional encoding plays a crucial role in transformers, significantly impacting model performance and length generalization.

AlgoFormer: An Efficient Transformer Framework with Algorithmic Structures

no code implementations21 Feb 2024 Yihang Gao, Chuanyang Zheng, Enze Xie, Han Shi, Tianyang Hu, Yu Li, Michael K. Ng, Zhenguo Li, Zhaoqiang Liu

Furthermore, some theoretical and empirical results are presented to show that the designed transformer has the potential to perform algorithm representation and learning.

Machine Translation text-classification +1

Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models

1 code implementation12 Feb 2024 Jiacheng Ye, Shansan Gong, Liheng Chen, Lin Zheng, Jiahui Gao, Han Shi, Chuan Wu, Xin Jiang, Zhenguo Li, Wei Bi, Lingpeng Kong

Recently, diffusion models have garnered significant interest in the field of text processing due to their many potential advantages compared to conventional autoregressive models.

Language Modeling Language Modelling +1

BYOM: Building Your Own Multi-Task Model For Free

no code implementations3 Oct 2023 Weisen Jiang, Baijiong Lin, Han Shi, Yu Zhang, Zhenguo Li, James T. Kwok

Recently, various merging methods have been proposed to build a multi-task model from task-specific finetuned models without retraining.

LEGO-Prover: Neural Theorem Proving with Growing Libraries

1 code implementation1 Oct 2023 Haiming Wang, Huajian Xin, Chuanyang Zheng, Lin Li, Zhengying Liu, Qingxing Cao, Yinya Huang, Jing Xiong, Han Shi, Enze Xie, Jian Yin, Zhenguo Li, Heng Liao, Xiaodan Liang

Our ablation study indicates that these newly added skills are indeed helpful for proving theorems, resulting in an improvement from a success rate of 47. 1% to 50. 4%.

 Ranked #1 on Automated Theorem Proving on miniF2F-valid (Pass@100 metric)

Automated Theorem Proving

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

1 code implementation21 Sep 2023 Longhui Yu, Weisen Jiang, Han Shi, Jincheng Yu, Zhengying Liu, Yu Zhang, James T. Kwok, Zhenguo Li, Adrian Weller, Weiyang Liu

Our MetaMath-7B model achieves 66. 4% on GSM8K and 19. 4% on MATH, exceeding the state-of-the-art models of the same size by 11. 5% and 8. 7%.

Ranked #59 on Arithmetic Reasoning on GSM8K (using extra training data)

Arithmetic Reasoning GSM8K +5

Forward-Backward Reasoning in Large Language Models for Mathematical Verification

no code implementations15 Aug 2023 Weisen Jiang, Han Shi, Longhui Yu, Zhengying Liu, Yu Zhang, Zhenguo Li, James T. Kwok

Instead of using forward or backward reasoning alone, we propose FOBAR to combine FOrward and BAckward Reasoning for verification.

Mathematical Reasoning

DiffFlow: A Unified SDE Framework for Score-Based Diffusion Models and Generative Adversarial Networks

no code implementations5 Jul 2023 Jingwei Zhang, Han Shi, Jincheng Yu, Enze Xie, Zhenguo Li

Generative models can be categorized into two types: explicit generative models that define explicit density forms and allow exact likelihood inference, such as score-based diffusion models (SDMs) and normalizing flows; implicit generative models that directly learn a transformation from the prior to the data distribution, such as generative adversarial nets (GANs).

Denoising

Auto-Validate by-History: Auto-Program Data Quality Constraints to Validate Recurring Data Pipelines

no code implementations4 Jun 2023 Dezhan Tu, Yeye He, Weiwei Cui, Song Ge, Haidong Zhang, Han Shi, Dongmei Zhang, Surajit Chaudhuri

Data pipelines are widely employed in modern enterprises to power a variety of Machine-Learning (ML) and Business-Intelligence (BI) applications.

Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism

1 code implementation CVPR 2022 BinBin Yang, Xinchi Deng, Han Shi, Changlin Li, Gengwei Zhang, Hang Xu, Shen Zhao, Liang Lin, Xiaodan Liang

To make ROSETTA automatically determine which experience is available and useful, a prototypical task correlation guided Gating Diversity Controller(GDC) is introduced to adaptively adjust the diversity of gates for the new task based on class-specific prototypes.

Continual Learning Diversity +3

AutoBERT-Zero: Evolving BERT Backbone from Scratch

no code implementations15 Jul 2021 Jiahui Gao, Hang Xu, Han Shi, Xiaozhe Ren, Philip L. H. Yu, Xiaodan Liang, Xin Jiang, Zhenguo Li

Transformer-based pre-trained language models like BERT and its variants have recently achieved promising performance in various natural language processing (NLP) tasks.

Inductive Bias Language Modelling +3

SparseBERT: Rethinking the Importance Analysis in Self-attention

1 code implementation25 Feb 2021 Han Shi, Jiahui Gao, Xiaozhe Ren, Hang Xu, Xiaodan Liang, Zhenguo Li, James T. Kwok

A surprising result is that diagonal elements in the attention map are the least important compared with other attention positions.

Effective Decoding in Graph Auto-Encoder using Triadic Closure

no code implementations26 Nov 2019 Han Shi, Haozheng Fan, James T. Kwok

We propose the triad decoder, which considers and predicts the three edges involved in a local triad together.

Clustering Decoder +6

Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS

1 code implementation NeurIPS 2020 Han Shi, Renjie Pi, Hang Xu, Zhenguo Li, James T. Kwok, Tong Zhang

In this work, we propose BONAS (Bayesian Optimized Neural Architecture Search), a sample-based NAS framework which is accelerated using weight-sharing to evaluate multiple related architectures simultaneously.

Bayesian Optimization Neural Architecture Search

Multi-objective Neural Architecture Search via Predictive Network Performance Optimization

no code implementations25 Sep 2019 Han Shi, Renjie Pi, Hang Xu, Zhenguo Li, James T. Kwok, Tong Zhang

Inspired by the nature of the graph structure of a neural network, we propose BOGCN-NAS, a NAS algorithm using Bayesian Optimization with Graph Convolutional Network (GCN) predictor.

Bayesian Optimization Neural Architecture Search

Cannot find the paper you are looking for? You can Submit a new open access paper.