This work revisits the information given by the graph-of-words and its typical utilization through graph-based ranking approaches in the context of keyword extraction.
In order to overcome these issues, we reconsider the task of summarization from a human-centered perspective.
Automatically extracting keyphrases from scholarly documents leads to a valuable concise representation that humans can understand and machines can process for tasks, such as information retrieval, article clustering and article classification.
Bayesian Active Learning has had significant impact to various NLP problems, but nevertheless it's application to text summarization has been explored very little.
Energy production using renewable sources exhibits inherent uncertainties due to their intermittent nature.
In this document, we report an analysis of the Public MeSH Note field of the new descriptors introduced in the MeSH thesaurus between 2006 and 2020.
This method allows us to improve summarization performance by simply using the median of multiple stochastic summaries.
Area under the precision-recall curve (AUPR) that emphasizes the accuracy of top-ranked pairs and area under the receiver operating characteristic curve (AUC) that heavily punishes the existence of low ranked interacting pairs are two widely used evaluation metrics in the DTI prediction task.
Artificial Intelligence (AI) has a tremendous impact on the unexpected growth of technology in almost every aspect.
LionForests is a random forest-specific interpretation technique, which provides rules as explanations.
The use of machine learning rapidly increases in high-risk scenarios where decisions are required, for example in healthcare or industrial monitoring equipment.
We achieve a new state-of-the-art 84. 28% accuracy on top-50 candidates on the Zeshel dataset, compared to the previous 82. 06% on the top-64 of (Wu et al., 2020).
We propose REDSandT (Relation Extraction with Distant Supervision and Transformers), a novel distantly-supervised transformer-based RE method, that manages to capture a wider set of relations through highly informative instance and label embeddings for RE, by exploiting BERT's pre-trained model, and the relationship between labels and entities, respectively.
Ranked #2 on Relationship Extraction (Distant Supervised) on New York Times Corpus (AUC metric)
To this end, we propose a framework to categorize new descriptors based on their current relation to older descriptors.
In addition, WkNNIR exploits local imbalance to promote the influence of more reliable similarities on the interaction recovery and prediction processes.
We present a novel method for TAR that implements a full pipeline from the research protocol to the screening of the relevant papers.
Interpretable machine learning is an emerging field providing solutions on acquiring insights into machine learning models' rationale.
Keyword extraction is an important document process that aims at finding a small set of terms that concisely describe a document's topics.
Online hate speech is a recent problem in our society that is rising at a steady pace by leveraging the vulnerabilities of the corresponding regimes that characterise most social media platforms.
Ranked #1 on Hate Speech Detection on Ethos MultiLabel
To this end, we propose a new method that uses weak supervision to train a concept annotator on the literature available for a particular disease.
Experimental results on 13 multi-label datasets demonstrate the effectiveness of the proposed measure and sampling approaches for a variety of evaluation metrics, particularly in the case of an ensemble of classifiers trained on repeated samples of the original data.
With this approach we can decompose the problem of long document summarization into smaller and simpler problems, reducing computational complexity and creating more training examples, which at the same time contain less noise in the target summaries compared to the standard approach.
Ranked #3 on Text Summarization on Pubmed (using extra training data)
Towards a future where machine learning systems will integrate into every aspect of people's lives, researching methods to interpret such systems is necessary, instead of focusing exclusively on enhancing their performance.
Technological breakthroughs on smart homes, self-driving cars, health care and robotic assistants, in addition to reinforced law regulations, have critically influenced academic research on explainable machine learning.
We propose SUSIE, a novel summarization method that can work with state-of-the-art summarization models in order to produce structured scientific summaries for academic articles.
Keyphrase extraction is a textual information processing task concerned with the automatic extraction of representative and characteristic phrases from a document that express all the key aspects of its content.
Class-imbalance is an inherent characteristic of multi-label data which affects the prediction accuracy of most multi-label learning methods.
It then uses the minimum covariance determinant estimator to model the distribution of non-keyphrase word vectors, under the assumption that these vectors come from the same distribution, indicative of their irrelevance to the semantics expressed by the dimensions of the learned vector representation.
Automated keyphrase extraction is a fundamental textual information processing task concerned with the selection of representative phrases from a document that summarize its content.
We conduct extensive experiments on eight data sets, with label sets sizes ranging from hundreds to hundreds of thousands, comparing our proposed algorithm with the previously proposed LLDA algorithms (Prior--LDA, Dep--LDA), as well as the state of the art in extreme multi-label classification.
Background: In this paper we present the approaches and methods employed in order to deal with a large scale multi-label semantic indexing task of biomedical papers.
Hierarchy Of Multi-label classifiers (HOMER) is a multi-label learning algorithm that breaks the initial learning task to several, easier sub-tasks by first constructing a hierarchy of labels from a given label set and secondly employing a given base multi-label classifier (MLC) to the resulting sub-problems.
We introduce a novel approach for estimating Latent Dirichlet Allocation (LDA) parameters from collapsed Gibbs samples (CGS), by leveraging the full conditional distributions over the latent variable assignments to efficiently average over multiple samples, for little more computational cost than drawing a single additional collapsed Gibbs sample.
Multi-target regression is concerned with the simultaneous prediction of multiple continuous target variables based on the same set of input variables.
Marginal probabilities are entered as soft evidence in the network and adjusted through probabilistic inference.
When the prediction targets are binary the task is called multi-label classification, while when the targets are continuous the task is called multi-target regression.