Zero-shot cross-lingual generation assumes finetuning the multilingual pretrained language model (mPLM) on a generation task in one language and then using it to make predictions for this task in other languages.
Recent works have widely adopted large language model pretraining for source code, suggested source code-specific pretraining objectives and investigated the applicability of various Transformer-based language model architectures for source code.
Channel decoding, channel detection, channel assessment, and resource management for wireless multiple-input multiple-output (MIMO) systems are all examples of problems where machine learning (ML) can be successfully applied.
Memorization studies of deep neural networks (DNNs) help to understand what patterns and how do DNNs learn, and motivate improvements to DNN training approaches.
Training neural networks with batch normalization and weight decay has become a common practice in recent years.
In this work, we develop dynamic embeddings, a recurrent mechanism that adjusts the learned semantics of the variable when it obtains more information about the variable's role in the program.
There is an emerging interest in the application of natural language processing models to source code processing tasks.
In this work, we conduct a thorough empirical study of the capabilities of Transformers to utilize syntactic information in different tasks.
In this work, we consider a fixed memory budget setting, and investigate, what is more effective: to train a single wide network, or to perform a memory split -- to train an ensemble of several thinner networks, with the same total number of parameters?
Recently, a lot of techniques were developed to sparsify the weights of neural networks and to remove networks' structure units, e. g. neurons.
Bayesian methods have been successfully applied to sparsify weights of neural networks and to remove structure units from the networks, e. g. neurons.
In natural language processing, a lot of the tasks are successfully solved with recurrent neural networks, but such models have a huge number of parameters.
Recurrent neural networks show state-of-the-art results in many text analysis tasks but often require a lot of memory to store their weights.