In this summary paper, we present the results of the 1st edition of the NLPS task, providing a description of the evaluation data, and the participating systems.
In this second iteration of the explanation regeneration shared task, participants are supplied with more than double the training and evaluation data of the first shared task, as well as a knowledge base nearly double in size, both of which expand into more challenging scientific topics that increase the difficulty of the task.
While previous editions of this shared task aimed to evaluate explanatory completeness – finding a set of facts that form a complete inference chain, without gaps, to arrive from question to correct answer, this 2021 instantiation concentrates on the subtask of determining relevance in large multi-hop explanations.
In this paper, we present Toloka Visual Question Answering, a new crowdsourced dataset allowing comparing performance of machine learning systems against human level of expertise in the grounding visual question answering task.
Recent progress in generative models, especially in text-guided diffusion models, has enabled the production of aesthetically-pleasing imagery resembling the works of professional human artists.
Crowdsourcing allows running simple human intelligence tasks on a large crowd of workers, enabling solving problems for which it is difficult to formulate an algorithm or train a machine learning model in reasonable time.
The main obstacle towards designing aggregation methods for more advanced applications is the absence of training data, and in this work, we focus on bridging this gap in speech recognition.
In this tutorial, we present a portion of unique industry experience in efficient natural language data annotation via crowdsourcing shared by both leading researchers and engineers from Yandex.
no code implementations • • Varvara Logacheva, Denis Teslenko, Artem Shelmanov, Steffen Remus, Dmitry Ustalov, Andrey Kutuzov, Ekaterina Artemova, Chris Biemann, Simone Paolo Ponzetto, Alexander Panchenko
We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings by Grave et al. (2018), enabling WSD in these languages.
While automated question answering systems are increasingly able to retrieve answers to natural language questions, their ability to generate detailed human-readable explanations for their answers is still quite limited.
We present our system for semantic frame induction that showed the best performance in Subtask B. 1 and finished as the runner-up in Subtask A of the SemEval 2019 Task 2 on unsupervised semantic frame induction (QasemiZadeh et al., 2019).
We present a detailed theoretical and computational analysis of the Watset meta-algorithm for fuzzy graph clustering, which has been found to be widely applicable in a variety of domains.
We use dependency triples automatically extracted from a Web-scale corpus to perform unsupervised semantic frame induction.
The sparse mode uses the traditional vector space model to estimate the most similar word sense corresponding to its context.
The paper describes the results of the first shared task on word sense induction (WSI) for the Russian language.
The paper gives an overview of the Russian Semantic Similarity Evaluation (RUSSE) shared task held in conjunction with the Dialogue 2015 conference.
In this paper, we show how distributionally-induced semantic classes can be helpful for extracting hypernyms.
On the one hand, humans easily make judgments about semantic relatedness.
Graph-based synset induction methods, such as MaxMax and Watset, induce synsets by performing a global clustering of a synonymy graph.
In word sense disambiguation (WSD), knowledge-based systems tend to be much more interpretable than knowledge-free counterparts as they rely on the wealth of manually-encoded elements representing word senses, such as hypernyms, usage examples, and images.
We present a new approach to extraction of hypernyms based on projection learning and word embeddings.
This paper presents a new graph-based approach that induces synsets using synonymy dictionaries and word embeddings.
Linguistic resources can be populated with data through the use of such approaches as crowdsourcing and gamification when motivated people are involved.