no code implementations • 19 Oct 2023 • Rustem Takhanov, Maxat Tezekbayev, Artur Pak, Arman Bolatov, Zhenisbek Assylbekov
In the novel framework, the hardness of a class is usually quantified by the variance of the gradient with respect to a random choice of a target function.
1 code implementation • 2 Oct 2023 • Rustem Takhanov, Maxat Tezekbayev, Artur Pak, Arman Bolatov, Zhibek Kadyrsizova, Zhenisbek Assylbekov
The discrete logarithm problem is a fundamental challenge in number theory with significant implications for cryptographic protocols.
1 code implementation • 20 Jul 2023 • Arman Bolatov, Maxat Tezekbayev, Igor Melnykov, Artur Pak, Vassilina Nikoulina, Zhenisbek Assylbekov
We suggest a simple Gaussian mixture model for data generation that complies with Feldman's long tail theory (2020).
1 code implementation • RepL4NLP (ACL) 2022 • Sultan Nurmukhamedov, Thomas Mach, Arsen Sheverdin, Zhenisbek Assylbekov
We choose random points in the hyperbolic disc and claim that these points are already word representations.
1 code implementation • Findings (NAACL) 2022 • Maxat Tezekbayev, Vassilina Nikoulina, Matthias Gallé, Zhenisbek Assylbekov
Softmax is the de facto standard in modern neural networks for language processing when it comes to normalizing logits.
no code implementations • 2 Mar 2021 • Vassilina Nikoulina, Maxat Tezekbayev, Nuradil Kozhakhmet, Madina Babazhanova, Matthias Gallé, Zhenisbek Assylbekov
In this paper, we study whether linguistic knowledge is a necessary condition for the good performance of modern language models, which we call the \textit{rediscovery hypothesis}.
2 code implementations • 27 Feb 2020 • Zhenisbek Assylbekov, Alibi Jangeldin
We show that removing sigmoid transformation in the skip-gram with negative sampling (SGNS) objective does not harm the quality of word vectors significantly and at the same time is related to factorizing a squashed shifted PMI matrix which, in turn, can be treated as a connection probabilities matrix of a random graph.
1 code implementation • 23 Dec 2019 • Maxat Tezekbayev, Zhenisbek Assylbekov, Rustem Takhanov
We show that the skip-gram embedding of any word can be decomposed into two subvectors which roughly correspond to semantic and syntactic roles of the word.
no code implementations • 30 Sep 2019 • Aidana Karipbayeva, Alena Sorokina, Zhenisbek Assylbekov
We critically review the smooth inverse frequency sentence embedding method of Arora, Liang, and Ma (2017), and show inconsistencies in its setup, derivation, and evaluation.
no code implementations • 21 Sep 2019 • Alena Sorokina, Aidana Karipbayeva, Zhenisbek Assylbekov
We perform an empirical evaluation of several methods of low-rank approximation in the problem of obtaining PMI-based word embeddings.
no code implementations • 26 Feb 2019 • Zhenisbek Assylbekov, Rustem Takhanov
This paper takes a step towards theoretical analysis of the relationship between word embeddings and context embeddings in models such as word2vec.
no code implementations • 8 Feb 2019 • Abylay Zhumekenov, Malika Uteuliyeva, Olzhas Kabdolov, Rustem Takhanov, Zhenisbek Assylbekov, Alejandro J. Castro
We review neural network architectures which were motivated by Fourier series and integrals and which are referred to as Fourier neural networks.
1 code implementation • COLING 2018 • Olzhas Kabdolov, Zhenisbek Assylbekov, Rustem Takhanov
We reproduce the Structurally Constrained Recurrent Network (SCRN) model, and then regularize it using the existing widespread techniques, such as naive dropout, variational dropout, and weight tying.
1 code implementation • NAACL 2018 • Zhenisbek Assylbekov, Rustem Takhanov
We propose several ways of reusing subword embeddings and other weights in subword-aware neural language models.
1 code implementation • 2 Sep 2017 • Rustem Takhanov, Zhenisbek Assylbekov
Further, for every word we construct a new sequence over an alphabet of patterns.
1 code implementation • EMNLP 2017 • Zhenisbek Assylbekov, Rustem Takhanov, Bagdat Myrzakhmetov, Jonathan N. Washington
Syllabification does not seem to improve word-level RNN language modeling quality when compared to character-based segmentation.