no code implementations • 25 Feb 2025 • Chuanyang Zheng, Yihang Gao, Guoxuan Chen, Han Shi, Jing Xiong, Xiaozhe Ren, Chao Huang, Xin Jiang, Zhenguo Li, Yu Li
We conducted experiments to evaluate the empirical performance of Transformer models using SA-Softmax compared to the vanilla softmax function.
1 code implementation • 16 Dec 2024 • Guoxuan Chen, Han Shi, Jiawei Li, Yihang Gao, Xiaozhe Ren, Yimeng Chen, Xin Jiang, Zhenguo Li, Weiyang Liu, Chao Huang
This observation suggests that information of the segments between these separator tokens can be effectively condensed into the separator tokens themselves without significant information loss.
no code implementations • 26 Nov 2024 • Minbin Huang, Runhui Huang, Han Shi, Yimeng Chen, Chuanyang Zheng, Xiangguo Sun, Xin Jiang, Zhenguo Li, Hong Cheng
The development of Multi-modal Large Language Models (MLLMs) enhances Large Language Models (LLMs) with the ability to perceive data formats beyond text, significantly advancing a range of downstream applications, such as visual question answering and image captioning.
2 code implementations • 7 Oct 2024 • Chuanyang Zheng, Yihang Gao, Han Shi, Jing Xiong, Jiankai Sun, Jingyao Li, Minbin Huang, Xiaozhe Ren, Michael Ng, Xin Jiang, Zhenguo Li, Yu Li
The attention mechanism is a fundamental component of the Transformer model, contributing to interactions among distinct tokens, in contrast to earlier feed-forward neural networks.
1 code implementation • 2 Oct 2024 • Yao Teng, Han Shi, Xian Liu, Xuefei Ning, Guohao Dai, Yu Wang, Zhenguo Li, Xihui Liu
In this paper, we propose a training-free probabilistic parallel decoding algorithm, Speculative Jacobi Decoding (SJD), to accelerate auto-regressive text-to-image generation.
1 code implementation • 11 Jun 2024 • Jingyao Li, Han Shi, Xin Jiang, Zhenguo Li, Hong Xu, Jiaya Jia
On widely recognized benchmarks, Q-LLM improved by 7. 17% compared to the current state-of-the-art on LLaMA3, and by 3. 26% on Mistral on the $\infty$-bench.
1 code implementation • 23 May 2024 • Yao Teng, Yue Wu, Han Shi, Xuefei Ning, Guohao Dai, Yu Wang, Zhenguo Li, Xihui Liu
In addition, to further improve training efficiency for high-resolution image generation with DiM, we investigate "weak-to-strong" training strategy that pretrains DiM on low-resolution images ($256\times 256$) and then finetune it on high-resolution images ($512 \times 512$).
2 code implementations • 23 May 2024 • Chuanyang Zheng, Yihang Gao, Han Shi, Minbin Huang, Jingyao Li, Jing Xiong, Xiaozhe Ren, Michael Ng, Xin Jiang, Zhenguo Li, Yu Li
Positional encoding plays a crucial role in transformers, significantly impacting model performance and length generalization.
no code implementations • 21 Feb 2024 • Yihang Gao, Chuanyang Zheng, Enze Xie, Han Shi, Tianyang Hu, Yu Li, Michael K. Ng, Zhenguo Li, Zhaoqiang Liu
Furthermore, some theoretical and empirical results are presented to show that the designed transformer has the potential to perform algorithm representation and learning.
1 code implementation • 12 Feb 2024 • Jiacheng Ye, Shansan Gong, Liheng Chen, Lin Zheng, Jiahui Gao, Han Shi, Chuan Wu, Xin Jiang, Zhenguo Li, Wei Bi, Lingpeng Kong
Recently, diffusion models have garnered significant interest in the field of text processing due to their many potential advantages compared to conventional autoregressive models.
no code implementations • 3 Oct 2023 • Weisen Jiang, Baijiong Lin, Han Shi, Yu Zhang, Zhenguo Li, James T. Kwok
Recently, various merging methods have been proposed to build a multi-task model from task-specific finetuned models without retraining.
1 code implementation • 1 Oct 2023 • Haiming Wang, Huajian Xin, Chuanyang Zheng, Lin Li, Zhengying Liu, Qingxing Cao, Yinya Huang, Jing Xiong, Han Shi, Enze Xie, Jian Yin, Zhenguo Li, Heng Liao, Xiaodan Liang
Our ablation study indicates that these newly added skills are indeed helpful for proving theorems, resulting in an improvement from a success rate of 47. 1% to 50. 4%.
Ranked #1 on
Automated Theorem Proving
on miniF2F-valid
(Pass@100 metric)
1 code implementation • 21 Sep 2023 • Longhui Yu, Weisen Jiang, Han Shi, Jincheng Yu, Zhengying Liu, Yu Zhang, James T. Kwok, Zhenguo Li, Adrian Weller, Weiyang Liu
Our MetaMath-7B model achieves 66. 4% on GSM8K and 19. 4% on MATH, exceeding the state-of-the-art models of the same size by 11. 5% and 8. 7%.
Ranked #59 on
Arithmetic Reasoning
on GSM8K
(using extra training data)
no code implementations • ICCV 2023 • Xinchi Deng, Han Shi, Runhui Huang, Changlin Li, Hang Xu, Jianhua Han, James Kwok, Shen Zhao, Wei zhang, Xiaodan Liang
Compared with the existing methods, GrowCLIP improves 2. 3% average top-1 accuracy on zero-shot image classification of 9 downstream tasks.
no code implementations • 15 Aug 2023 • Weisen Jiang, Han Shi, Longhui Yu, Zhengying Liu, Yu Zhang, Zhenguo Li, James T. Kwok
Instead of using forward or backward reasoning alone, we propose FOBAR to combine FOrward and BAckward Reasoning for verification.
no code implementations • 5 Jul 2023 • Jingwei Zhang, Han Shi, Jincheng Yu, Enze Xie, Zhenguo Li
Generative models can be categorized into two types: explicit generative models that define explicit density forms and allow exact likelihood inference, such as score-based diffusion models (SDMs) and normalizing flows; implicit generative models that directly learn a transformation from the prior to the data distribution, such as generative adversarial nets (GANs).
no code implementations • 4 Jun 2023 • Dezhan Tu, Yeye He, Weiwei Cui, Song Ge, Haidong Zhang, Han Shi, Dongmei Zhang, Surajit Chaudhuri
Data pipelines are widely employed in modern enterprises to power a variety of Machine-Learning (ML) and Business-Intelligence (BI) applications.
1 code implementation • ICCV 2023 • Enze Xie, Lewei Yao, Han Shi, Zhili Liu, Daquan Zhou, Zhaoqiang Liu, Jiawei Li, Zhenguo Li
This paper proposes DiffFit, a parameter-efficient strategy to fine-tune large pre-trained diffusion models that enable fast adaptation to new domains.
Efficient Diffusion Personalization
parameter-efficient fine-tuning
1 code implementation • CVPR 2022 • BinBin Yang, Xinchi Deng, Han Shi, Changlin Li, Gengwei Zhang, Hang Xu, Shen Zhao, Liang Lin, Xiaodan Liang
To make ROSETTA automatically determine which experience is available and useful, a prototypical task correlation guided Gating Diversity Controller(GDC) is introduced to adaptively adjust the diversity of gates for the new task based on class-specific prototypes.
no code implementations • ICLR 2022 • Han Shi, Jiahui Gao, Hang Xu, Xiaodan Liang, Zhenguo Li, Lingpeng Kong, Stephen M. S. Lee, James T. Kwok
Recently over-smoothing phenomenon of Transformer-based models is observed in both vision and language fields.
no code implementations • 15 Jul 2021 • Jiahui Gao, Hang Xu, Han Shi, Xiaozhe Ren, Philip L. H. Yu, Xiaodan Liang, Xin Jiang, Zhenguo Li
Transformer-based pre-trained language models like BERT and its variants have recently achieved promising performance in various natural language processing (NLP) tasks.
Ranked #10 on
Semantic Textual Similarity
on MRPC
1 code implementation • 25 Feb 2021 • Han Shi, Jiahui Gao, Xiaozhe Ren, Hang Xu, Xiaodan Liang, Zhenguo Li, James T. Kwok
A surprising result is that diagonal elements in the attention map are the least important compared with other attention positions.
no code implementations • 26 Nov 2019 • Han Shi, Haozheng Fan, James T. Kwok
We propose the triad decoder, which considers and predicts the three edges involved in a local triad together.
1 code implementation • NeurIPS 2020 • Han Shi, Renjie Pi, Hang Xu, Zhenguo Li, James T. Kwok, Tong Zhang
In this work, we propose BONAS (Bayesian Optimized Neural Architecture Search), a sample-based NAS framework which is accelerated using weight-sharing to evaluate multiple related architectures simultaneously.
no code implementations • 25 Sep 2019 • Han Shi, Renjie Pi, Hang Xu, Zhenguo Li, James T. Kwok, Tong Zhang
Inspired by the nature of the graph structure of a neural network, we propose BOGCN-NAS, a NAS algorithm using Bayesian Optimization with Graph Convolutional Network (GCN) predictor.