Search Results for author: Zhenisbek Assylbekov

Found 18 papers, 10 papers with code

Gradient Descent Fails to Learn High-frequency Functions and Modular Arithmetic

no code implementations19 Oct 2023 Rustem Takhanov, Maxat Tezekbayev, Artur Pak, Arman Bolatov, Zhenisbek Assylbekov

In the novel framework, the hardness of a class is usually quantified by the variance of the gradient with respect to a random choice of a target function.

Intractability of Learning the Discrete Logarithm with Gradient-Based Methods

1 code implementation2 Oct 2023 Rustem Takhanov, Maxat Tezekbayev, Artur Pak, Arman Bolatov, Zhibek Kadyrsizova, Zhenisbek Assylbekov

The discrete logarithm problem is a fundamental challenge in number theory with significant implications for cryptographic protocols.

Long-Tail Theory under Gaussian Mixtures

1 code implementation20 Jul 2023 Arman Bolatov, Maxat Tezekbayev, Igor Melnykov, Artur Pak, Vassilina Nikoulina, Zhenisbek Assylbekov

We suggest a simple Gaussian mixture model for data generation that complies with Feldman's long tail theory (2020).

Memorization

Speeding Up Entmax

1 code implementation Findings (NAACL) 2022 Maxat Tezekbayev, Vassilina Nikoulina, Matthias Gallé, Zhenisbek Assylbekov

Softmax is the de facto standard in modern neural networks for language processing when it comes to normalizing logits.

Machine Translation Text Generation +1

The Rediscovery Hypothesis: Language Models Need to Meet Linguistics

no code implementations2 Mar 2021 Vassilina Nikoulina, Maxat Tezekbayev, Nuradil Kozhakhmet, Madina Babazhanova, Matthias Gallé, Zhenisbek Assylbekov

In this paper, we study whether linguistic knowledge is a necessary condition for the good performance of modern language models, which we call the \textit{rediscovery hypothesis}.

Language Modelling

Squashed Shifted PMI Matrix: Bridging Word Embeddings and Hyperbolic Spaces

2 code implementations27 Feb 2020 Zhenisbek Assylbekov, Alibi Jangeldin

We show that removing sigmoid transformation in the skip-gram with negative sampling (SGNS) objective does not harm the quality of word vectors significantly and at the same time is related to factorizing a squashed shifted PMI matrix which, in turn, can be treated as a connection probabilities matrix of a random graph.

Clustering Word Embeddings

Semantics- and Syntax-related Subvectors in the Skip-gram Embeddings

1 code implementation23 Dec 2019 Maxat Tezekbayev, Zhenisbek Assylbekov, Rustem Takhanov

We show that the skip-gram embedding of any word can be decomposed into two subvectors which roughly correspond to semantic and syntactic roles of the word.

A Critique of the Smooth Inverse Frequency Sentence Embeddings

no code implementations30 Sep 2019 Aidana Karipbayeva, Alena Sorokina, Zhenisbek Assylbekov

We critically review the smooth inverse frequency sentence embedding method of Arora, Liang, and Ma (2017), and show inconsistencies in its setup, derivation, and evaluation.

Sentence Sentence Embedding +1

Low-Rank Approximation of Matrices for PMI-based Word Embeddings

no code implementations21 Sep 2019 Alena Sorokina, Aidana Karipbayeva, Zhenisbek Assylbekov

We perform an empirical evaluation of several methods of low-rank approximation in the problem of obtaining PMI-based word embeddings.

Word Embeddings

Context Vectors are Reflections of Word Vectors in Half the Dimensions

no code implementations26 Feb 2019 Zhenisbek Assylbekov, Rustem Takhanov

This paper takes a step towards theoretical analysis of the relationship between word embeddings and context embeddings in models such as word2vec.

Text Generation Word Embeddings

Fourier Neural Networks: A Comparative Study

no code implementations8 Feb 2019 Abylay Zhumekenov, Malika Uteuliyeva, Olzhas Kabdolov, Rustem Takhanov, Zhenisbek Assylbekov, Alejandro J. Castro

We review neural network architectures which were motivated by Fourier series and integrals and which are referred to as Fourier neural networks.

Reproducing and Regularizing the SCRN Model

1 code implementation COLING 2018 Olzhas Kabdolov, Zhenisbek Assylbekov, Rustem Takhanov

We reproduce the Structurally Constrained Recurrent Network (SCRN) model, and then regularize it using the existing widespread techniques, such as naive dropout, variational dropout, and weight tying.

Language Modelling

Reusing Weights in Subword-aware Neural Language Models

1 code implementation NAACL 2018 Zhenisbek Assylbekov, Rustem Takhanov

We propose several ways of reusing subword embeddings and other weights in subword-aware neural language models.

Cannot find the paper you are looking for? You can Submit a new open access paper.