no code implementations • 17 Apr 2024 • Zicheng Liu, Li Wang, Siyuan Li, Zedong Wang, Haitao Lin, Stan Z. Li
Transformer models have been successful in various sequence processing tasks, but the self-attention mechanism's computational cost limits its practicality for long sequences.
2 code implementations • 14 Feb 2024 • Siyuan Li, Zicheng Liu, Juanxi Tian, Ge Wang, Zedong Wang, Weiyang Jin, Di wu, Cheng Tan, Tao Lin, Yang Liu, Baigui Sun, Stan Z. Li
Exponential Moving Average (EMA) is a widely used weight averaging (WA) regularization to learn flat optima for better generalizations without extra cost in deep neural network (DNN) optimization.
1 code implementation • 31 Dec 2023 • Siyuan Li, Luyuan Zhang, Zedong Wang, Di wu, Lirong Wu, Zicheng Liu, Jun Xia, Cheng Tan, Yang Liu, Baigui Sun, Stan Z. Li
As the deep learning revolution marches on, self-supervised learning has garnered increasing attention in recent years thanks to its remarkable representation learning ability and the low dependence on labeled data.
1 code implementation • 4 Oct 2023 • Siyuan Li, Weiyang Jin, Zedong Wang, Fang Wu, Zicheng Liu, Cheng Tan, Stan Z. Li
The main challenge is how to distinguish high-quality pseudo labels against the confirmation bias.
2 code implementations • NeurIPS 2023 • Cheng Tan, Siyuan Li, Zhangyang Gao, Wenfei Guan, Zedong Wang, Zicheng Liu, Lirong Wu, Stan Z. Li
Spatio-temporal predictive learning is a learning paradigm that enables models to learn spatial and temporal patterns by predicting future frames from given past frames in an unsupervised manner.
6 code implementations • 7 Nov 2022 • Siyuan Li, Zedong Wang, Zicheng Liu, Cheng Tan, Haitao Lin, Di wu, ZhiYuan Chen, Jiangbin Zheng, Stan Z. Li
Notably, MogaNet hits 80. 0\% and 87. 8\% accuracy with 5. 2M and 181M parameters on ImageNet-1K, outperforming ParC-Net and ConvNeXt-L, while saving 59\% FLOPs and 17M parameters, respectively.
Ranked #1 on Pose Estimation on COCO val2017
1 code implementation • 11 Sep 2022 • Siyuan Li, Zedong Wang, Zicheng Liu, Di wu, Cheng Tan, Weiyang Jin, Stan Z. Li
Data mixing, or mixup, is a data-dependent augmentation technique that has greatly enhanced the generalizability of modern deep neural networks.
1 code implementation • 30 Nov 2021 • Siyuan Li, Zicheng Liu, Zedong Wang, Di wu, Zihan Liu, Stan Z. Li
Accordingly, we propose $\eta$-balanced mixup loss for complementary learning of the two sub-objectives.
Ranked #7 on Image Classification on Places205