Quality of a product is the degree to which a product meets the customer’s expectation, which must also be valid for the case of lexical semantic resources.
In addition, powered by the knowledge of radical systems in ZiNet, this paper introduces glyph similarity measurement between ancient Chinese characters, which could capture similar glyph pairs that are potentially related in origins or semantics.
Text classification is a fundamental task with broad applications in natural language processing.
WordNet represents polysemous terms by capturing the different meanings of these terms at the lexical level, but without giving emphasis on the polysemy types such terms belong to.
1 code implementation • 15 Jun 2022 • Khuyagbaatar Batsuren, Gábor Bella, Aryaman Arora, Viktor Martinović, Kyle Gorman, Zdeněk Žabokrtský, Amarsanaa Ganbold, Šárka Dohnalová, Magda Ševčíková, Kateřina Pelegrinová, Fausto Giunchiglia, Ryan Cotterell, Ekaterina Vylomova
The SIGMORPHON 2022 shared task on morpheme segmentation challenged systems to decompose a word into a sequence of morphemes and covered most types of morphology: compounds, derivations, and inflections.
Part-prototype Networks (ProtoPNets) are concept-based classifiers designed to achieve the same performance as black-box models without compromising transparency.
We focus on the development of AIs which live in lifelong symbiosis with a human.
no code implementations • 7 May 2022 • Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Benoît Sagot, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay, Juan López Bautista, Gema Celeste Silva Villegas, Lucas Torroba Hennigen, Adam Ek, David Guriel, Peter Dirix, Jean-Philippe Bernardy, Andrey Scherbakov, Aziyana Bayyr-ool, Antonios Anastasopoulos, Roberto Zariquiey, Karina Sheifer, Sofya Ganieva, Hilaria Cruz, Ritván Karahóǧa, Stella Markantonatou, George Pavlidis, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Candy Angulo, Jatayu Baxi, Andrew Krizhanovsky, Natalia Krizhanovskaya, Elizabeth Salesky, Clara Vania, Sardana Ivanova, Jennifer White, Rowan Hall Maudslay, Josef Valvoda, Ran Zmigrod, Paula Czarnowska, Irene Nikkarinen, Aelita Salchak, Brijesh Bhatt, Christopher Straughn, Zoey Liu, Jonathan North Washington, Yuval Pinter, Duygu Ataman, Marcin Wolinski, Totok Suhardijanto, Anna Yablonskaya, Niklas Stoehr, Hossep Dolatian, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Aryaman Arora, Richard J. Hatcher, Ritesh Kumar, Jeremiah Young, Daria Rodionova, Anastasia Yemelina, Taras Andrushko, Igor Marchenko, Polina Mashkovtseva, Alexandra Serova, Emily Prud'hommeaux, Maria Nepomniashchaya, Fausto Giunchiglia, Eleanor Chodroff, Mans Hulden, Miikka Silfverberg, Arya D. McCarthy, David Yarowsky, Ryan Cotterell, Reut Tsarfaty, Ekaterina Vylomova
The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema.
1 code implementation • 11 Apr 2022 • Temuulen Khishigsuren, Gábor Bella, Khuyagbaatar Batsuren, Abed Alhakim Freihat, Nandu Chandran Nair, Amarsanaa Ganbold, Hadi Khalilia, Yamini Chandrashekar, Fausto Giunchiglia
We capture the phenomenon of diversity through the notions of lexical gap and language-specific word and use a systematic method to infer gaps semi-automatically on a large scale.
The Universal Knowledge Core (UKC) is a large multilingual lexical database with a focus on language diversity and covering over a thousand languages.
Recent work in Machine Learning and Computer Vision has provided evidence of systematic design flaws in the development of major object recognition benchmark datasets.
We base our work on the teleosemantic modelling of concepts as abilities implementing the distinct functions of recognition and classification.
The representation of the personal context is complex and essential to improve the help machines can give to humans for making sense of the world, and the help humans can give to machines to improve their efficiency.
We tackle sequential learning under label noise in applications where a human supervisor can be queried to relabel suspicious examples.
We assume that substances in the world are represented by two types of concepts, namely substance concepts and classification concepts, the former instrumental to (visual) perception, the latter to (language based) classification.
When building a new application we are more and more confronted with the need of reusing and integrating pre-existing knowledge, e. g., ontologies, schemas, data of any kind, from multiple sources.
We propose a novel approach to the problem of semantic heterogeneity where data are organized into a set of stratified and independent representation layers, namely: conceptual(where a set of unique alinguistic identifiers are connected inside a graph codifying their meaning), language(where sets of synonyms, possibly from multiple languages, annotate concepts), knowledge(in the form of a graph where nodes are entity types and links are properties), and data(in the form of a graph of entities populating the previous knowledge graph).
In this paper we provide a theory and an algorithm for how to build substance concepts which are in a one-to-one correspondence with classifications concepts, thus paving the way to the seamless integration between natural language descriptions and visual perception.
As the role of algorithmic systems and processes increases in society, so does the risk of bias, which can result in discrimination against individuals and social groups.
The complexity and non-Euclidean structure of graph data hinder the development of data augmentation methods similar to those in computer vision.
Motivated by this, we introduce TRCKD, a novel approach that combines automated drift detection and adaptation with an interactive stage in which the user is asked to disambiguate between different kinds of KD.
We set out to uncover the unique grammatical properties of an important yet so far under-researched type of natural language text: that of short labels typically found within structured datasets.
Second, existing models typically assume that context is objective, whereas in most applications context is best viewed from the user's perspective.
The ability to learn from human supervision is fundamental for personal assistants and other interactive applications of AI.
We present a new wordnet resource for Scottish Gaelic, a Celtic minority language spoken by about 60, 000 speakers, most of whom live in Northwestern Scotland.
We present a framework capable of tackilng the problem of continual object recognition in a setting which resembles that under whichhumans see and learn.
In this paper we present the Tren-toTeam system which participated to thetask 3 at SemEval-2017 (Nakov et al., 2017). We concentrated our work onapplying Grice Maxims(used in manystate-of-the-art Machine learning applica-tions(Vogel et al., 2013; Kheirabadiand Aghagolzadeh, 2012; Dale and Re-iter, 1995; Franke, 2011)) to ranking an-swers of a question by answers relevancy. Particularly, we created a ranker systembased on relevancy scores, assigned by 3main components: Named entity recogni-tion, similarity score, sentiment analysis. Our system obtained a comparable resultsto Machine learning systems.
In this paper, we study the problem of how to better embed entities and relations of knowledge bases into different low-dimensional spaces by taking full advantage of the additional semantics of relation paths, and we propose a compositional learning model of relation path embedding (RPE).