Mutual-Learning Improves End-to-End Speech Translation

no code implementations EMNLP 2021 Jiawei Zhao, Wei Luo, Boxing Chen, Andrew Gilman

In this paper, we propose an alternative–a trainable mutual-learning scenario, where the MT and the ST models are collaboratively trained and are considered as peers, rather than teacher/student.

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

1 code implementation6 Mar 2024 Jiawei Zhao, Zhenyu Zhang, Beidi Chen, Zhangyang Wang, Anima Anandkumar, Yuandong Tian

Our approach reduces memory usage by up to 65. 5% in optimizer states while maintaining both efficiency and performance for pre-training on LLaMA 1B and 7B architectures with C4 dataset with up to 19. 7B tokens, and on fine-tuning RoBERTa on GLUE tasks.

1 code implementation20 Jun 2023 Jiawei Zhao, Yifei Zhang, Beidi Chen, Florian Schäfer, Anima Anandkumar

To remedy this, we design a new training algorithm Incremental Low-Rank Learning (InRank), which explicitly expresses cumulative weight updates as low-rank matrices while incrementally augmenting their ranks during training.

Incremental Spatial and Spectral Learning of Neural Operators for Solving Large-Scale PDEs

no code implementations28 Nov 2022 Robert Joseph George, Jiawei Zhao, Jean Kossaifi, Zongyi Li, Anima Anandkumar

Fourier Neural Operators (FNO) offer a principled approach to solving challenging partial differential equations (PDE) such as turbulent flows.

ZerO Initialization: Initializing Neural Networks with only Zeros and Ones

1 code implementation25 Oct 2021 Jiawei Zhao, Florian Schäfer, Anima Anandkumar

Deep neural networks are usually initialized with random weights, with adequately selected initial variance to ensure stable signal propagation during training.

Transformer-based Dual Relation Graph for Multi-label Image Recognition

1 code implementation ICCV 2021 Jiawei Zhao, Ke Yan, Yifan Zhao, Xiaowei Guo, Feiyue Huang, Jia Li

Different from these researches, in this paper, we propose a novel Transformer-based Dual Relation learning framework, constructing complementary relationships by exploring two aspects of correlation, i. e., structural relation graph and semantic relation graph.

RGB-D Salient Object Detection with Ubiquitous Target Awareness

no code implementations8 Sep 2021 Yifan Zhao, Jiawei Zhao, Jia Li, Xiaowu Chen

To construct our framework as well as achieving accurate salient detection results, we propose a Ubiquitous Target Awareness (UTA) network to solve three important challenges in RGB-D SOD task: 1) a depth awareness module to excavate depth information and to mine ambiguous regions via adaptive depth-error weights, 2) a spatial-aware cross-modal interaction and a channel-aware cross-level interaction, exploiting the low-level boundary cues and amplifying high-level salient channels, and 3) a gated multi-scale predictor module to perceive the object saliency in different contextual scales.

