1 code implementation • 10 Mar 2025 • Zebin You, Jingyang Ou, Xiaolu Zhang, Jun Hu, Jun Zhou, Chongxuan Li
Additionally, on ImageNet 512x512, with only about 60% of the NFE, eMIGM outperforms the state-of-the-art continuous diffusion models.
no code implementations • 7 Mar 2025 • Ling Team, Binwei Zeng, Chao Huang, Chao Zhang, Changxin Tian, Cong Chen, dingnan jin, Feng Yu, Feng Zhu, Feng Yuan, Fakang Wang, Gangshan Wang, Guangyao Zhai, HaiTao Zhang, Huizhong Li, Jun Zhou, Jia Liu, Junpeng Fang, Junjie Ou, Jun Hu, Ji Luo, Ji Zhang, Jian Liu, Jian Sha, Jianxue Qian, Jiewei Wu, Junping Zhao, Jianguo Li, Jubao Feng, Jingchao Di, Junming Xu, Jinghua Yao, Kuan Xu, Kewei Du, Longfei Li, Lei Liang, Lu Yu, Li Tang, Lin Ju, Peng Xu, Qing Cui, Song Liu, Shicheng Li, Shun Song, Song Yan, Tengwei Cai, Tianyi Chen, Ting Guo, Ting Huang, Tao Feng, Tao Wu, Wei Wu, Xiaolu Zhang, Xueming Yang, Xin Zhao, Xiaobo Hu, Xin Lin, Yao Zhao, Yilong Wang, Yongzhen Guo, Yuanyuan Wang, Yue Yang, Yang Cao, Yuhao Fu, Yi Xiong, Yanzhe Li, Zhe Li, Zhiqiang Zhang, Ziqi Liu, ZhaoXin Huan, Zujie Wen, Zhenhang Sun, Zhuoxuan Du, Zhengyu He
Ultimately, our experimental findings demonstrate that a 300B MoE LLM can be effectively trained on lower-performance devices while achieving comparable performance to models of a similar scale, including dense and MoE models.
no code implementations • 2 Mar 2025 • Hongzhi Luan, Changxin Tian, ZhaoXin Huan, Xiaolu Zhang, Kunlong Chen, Zhiqiang Zhang, Jun Zhou
To address these issues, we propose Base model Oriented Systematic Evaluation (BOSE), a method specifically designed to optimize the evaluation of base models.
no code implementations • 14 Feb 2025 • Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, Jun Zhou, Yankai Lin, Ji-Rong Wen, Chongxuan Li
Autoregressive models (ARMs) are widely regarded as the cornerstone of large language models (LLMs).
no code implementations • 25 May 2024 • Kaituo Feng, Changsheng Li, Xiaolu Zhang, Jun Zhou, Ye Yuan, Guoren Wang
Chain-of-thought distillation is a powerful technique for transferring reasoning abilities from large language models (LLMs) to smaller student models.
1 code implementation • 24 Apr 2024 • Kaiwen Xue, Yuhao Zhou, Shen Nie, Xu Min, Xiaolu Zhang, Jun Zhou, Chongxuan Li
Bayesian flow networks (BFNs) iteratively refine the parameters, instead of the samples in diffusion models (DMs), of distributions at various noise levels through Bayesian inference.
no code implementations • 15 Apr 2024 • Youshao Xiao, Lin Ju, Zhenglei Zhou, Siyuan Li, ZhaoXin Huan, Dalong Zhang, Rujie Jiang, Lin Wang, Xiaolu Zhang, Lei Liang, Jun Zhou
Previous works only address part of the stragglers and could not adaptively solve various stragglers in practice.
no code implementations • 28 Mar 2024 • Binzong Geng, ZhaoXin Huan, Xiaolu Zhang, Yong He, Liang Zhang, Fajie Yuan, Jun Zhou, Linjian Mo
However, we argue that a critical obstacle remains in deploying LLMs for practical use: the efficiency of LLMs when processing long textual user behaviors.
no code implementations • 9 Jan 2024 • Youshao Xiao, Shangchun Zhao, Zhenglei Zhou, ZhaoXin Huan, Lin Ju, Xiaolu Zhang, Lin Wang, Jun Zhou
However, the existing systems are not tailored for meta learning based DLRM models and have critical problems regarding efficiency in distributed training in the GPU cluster.
no code implementations • CVPR 2024 • Dong-Dong Wu, Chilin Fu, Weichang Wu, Wenwen Xia, Xiaolu Zhang, Jun Zhou, Min-Ling Zhang
With the escalating complexity and investment cost of training deep neural networks safeguarding them from unauthorized usage and intellectual property theft has become imperative.
no code implementations • 19 Dec 2023 • Youshao Xiao, Zhenglei Zhou, Fagui Mao, Weichang Wu, Shangchun Zhao, Lin Ju, Lei Liang, Xiaolu Zhang, Jun Zhou
To address these issues, we propose a flexible model placement framework that offers two general and agile model placement strategies.
no code implementations • 22 Oct 2023 • Zuoli Tang, ZhaoXin Huan, Zihao Li, Xiaolu Zhang, Jun Hu, Chilin Fu, Jun Zhou, Chenliang Li
We expect that by mixing the user's behaviors across different domains, we can exploit the common knowledge encoded in the pre-trained language model to alleviate the problems of data sparsity and cold start problems.
no code implementations • 9 Oct 2023 • Chan Wu, Hanxiao Zhang, Lin Ju, Jinjing Huang, Youshao Xiao, ZhaoXin Huan, Siyuan Li, Fanzhuang Meng, Lei Liang, Xiaolu Zhang, Jun Zhou
In this paper, we rethink the impact of memory consumption and communication costs on the training speed of large language models, and propose a memory-communication balanced strategy set Partial Redundancy Optimizer (PaRO).
no code implementations • 31 Aug 2023 • ZhaoXin Huan, Ke Ding, Ang Li, Xiaolu Zhang, Xu Min, Yong He, Liang Zhang, Jun Zhou, Linjian Mo, Jinjie Gu, Zhongyi Liu, Wenliang Zhong, Guannan Zhang
3) AntM$^{2}$C provides 1 billion CTR data with 200 features, including 200 million users and 6 million items.
1 code implementation • 27 Mar 2023 • Kaituo Feng, Changsheng Li, Xiaolu Zhang, Jun Zhou
This will bring two big challenges to the existing dynamic GNN methods: (i) How to dynamically propagate appropriate information in an open temporal graph, where new class nodes are often linked to old class nodes.
1 code implementation • 28 May 2022 • Shih-Han Chan, Yinpeng Dong, Jun Zhu, Xiaolu Zhang, Jun Zhou
We propose four kinds of backdoor attacks for object detection task: 1) Object Generation Attack: a trigger can falsely generate an object of the target class; 2) Regional Misclassification Attack: a trigger can change the prediction of a surrounding object to the target class; 3) Global Misclassification Attack: a single trigger can change the predictions of all objects in an image to the target class; and 4) Object Disappearance Attack: a trigger can make the detector fail to detect the object of the target class.
no code implementations • 29 Sep 2021 • Yang Li, Yichuan Mo, Liangliang Shi, Junchi Yan, Xiaolu Zhang, Jun Zhou
Although many efforts have been made in terms of backbone architecture design, loss function, and training techniques, few results have been obtained on how the sampling in latent space can affect the final performance, and existing works on latent space mainly focus on controllability.
no code implementations • CVPR 2021 • Zihao Xiao, Xianfeng Gao, Chilin Fu, Yinpeng Dong, Wei Gao, Xiaolu Zhang, Jun Zhou, Jun Zhu
However, deep CNNs are vulnerable to adversarial patches, which are physically realizable and stealthy, raising new security concerns on the real-world applications of these models.
1 code implementation • 10 Jun 2021 • Jiawei Zhang, Linyi Li, Huichen Li, Xiaolu Zhang, Shuang Yang, Bo Li
In this paper, we show that such efficiency highly depends on the scale at which the attack is applied, and attacking at the optimal scale significantly improves the efficiency.
1 code implementation • 25 Feb 2021 • Huichen Li, Linyi Li, Xiaojun Xu, Xiaolu Zhang, Shuang Yang, Bo Li
We aim to bridge the gap between the two by investigating how to efficiently estimate gradient based on a projected low-dimensional space.
no code implementations • CVPR 2020 • Huichen Li, Xiaojun Xu, Xiaolu Zhang, Shuang Yang, Bo Li
Such adversarial attacks can be achieved by adding a small magnitude of perturbation to the input to mislead model prediction.
no code implementations • 3 Mar 2020 • ZhaoXin Huan, Yulong Wang, Xiaolu Zhang, Lin Shang, Chilin Fu, Jun Zhou
Adversarial examples often exhibit black-box attacking transferability, which allows that adversarial examples crafted for one model can fool another model.
no code implementations • 5 Oct 2019 • Bingzhe Wu, Chaochao Chen, Shiwan Zhao, Cen Chen, Yuan YAO, Guangyu Sun, Li Wang, Xiaolu Zhang, Jun Zhou
Based on this framework, we demonstrate that SGLD can prevent the information leakage of the training dataset to a certain extent.
1 code implementation • 27 Sep 2019 • Yulong Wang, Xiaolu Zhang, Lingxi Xie, Jun Zhou, Hang Su, Bo Zhang, Xiaolin Hu
Network pruning is an important research field aiming at reducing computational costs of neural networks.
no code implementations • NeurIPS 2019 • Bingzhe Wu, Shiwan Zhao, Chaochao Chen, Haoyang Xu, Li Wang, Xiaolu Zhang, Guangyu Sun, Jun Zhou
In this paper, we aim to understand the generalization properties of generative adversarial networks (GANs) from a new perspective of privacy protection.
no code implementations • CVPR 2019 • Bingzhe Wu, Shiwan Zhao, Guangyu Sun, Xiaolu Zhang, Zhong Su, Caihong Zeng, Zhihong Liu
(2) privacy leakage: the model trained using a conventional method may involuntarily reveal the private information of the patients in the training dataset.
no code implementations • 7 Sep 2018 • Xiaolu Zhang, Shiwan Zhao, Lingxi Xie
This paper considers WCE-based gastric ulcer detection, in which the major challenge is to detect the lesions in a local region.
no code implementations • 30 Jun 2018 • Bingzhe Wu, Xiaolu Zhang, Shiwan Zhao, Lingxi Xie, Caihong Zeng, Zhihong Liu, Guangyu Sun
Given an input image from a specified stain, several generators are first applied to estimate its appearances in other staining methods, and a classifier follows to combine visual cues from different stains for prediction (whether it is pathological, or which type of pathology it has).