Search Results for author: Grigorios Tsoumakas

Found 54 papers, 32 papers with code

ETHOS: an Online Hate Speech Detection Dataset

1 code implementation11 Jun 2020 Ioannis Mollas, Zoe Chrysopoulou, Stamatis Karlos, Grigorios Tsoumakas

Online hate speech is a recent problem in our society that is rising at a steady pace by leveraging the vulnerabilities of the corresponding regimes that characterise most social media platforms.

Hate Speech Detection

Synthetic Oversampling of Multi-Label Data based on Local Label Distribution

2 code implementations2 May 2019 Bin Liu, Grigorios Tsoumakas

Class-imbalance is an inherent characteristic of multi-label data which affects the prediction accuracy of most multi-label learning methods.

Multi-Label Learning

Multi-Label Sampling based on Local Label Imbalance

1 code implementation7 May 2020 Bin Liu, Konstantinos Blekas, Grigorios Tsoumakas

Experimental results on 13 multi-label datasets demonstrate the effectiveness of the proposed measure and sampling approaches for a variety of evaluation metrics, particularly in the case of an ensemble of classifiers trained on repeated samples of the original data.

Multi-Label Learning

A Review of Keyphrase Extraction

2 code implementations13 May 2019 Eirini Papagiannopoulou, Grigorios Tsoumakas

Keyphrase extraction is a textual information processing task concerned with the automatic extraction of representative and characteristic phrases from a document that express all the key aspects of its content.

Clustering Keyphrase Extraction +1

Improving Distantly-Supervised Relation Extraction through BERT-based Label & Instance Embeddings

1 code implementation1 Feb 2021 Despina Christou, Grigorios Tsoumakas

We propose REDSandT (Relation Extraction with Distant Supervision and Transformers), a novel distantly-supervised transformer-based RE method, that manages to capture a wider set of relations through highly informative instance and label embeddings for RE, by exploiting BERT's pre-trained model, and the relationship between labels and entities, respectively.

Relation Relationship Extraction (Distant Supervised)

A Divide-and-Conquer Approach to the Summarization of Long Documents

1 code implementation13 Apr 2020 Alexios Gidiotis, Grigorios Tsoumakas

With this approach we can decompose the problem of long document summarization into smaller and simpler problems, reducing computational complexity and creating more training examples, which at the same time contain less noise in the target summaries compared to the standard approach.

Ranked #13 on Text Summarization on Pubmed (using extra training data)

Document Summarization Sentence +1

Keyphrase Extraction from Scientific Articles via Extractive Summarization

1 code implementation NAACL (sdp) 2021 Chrysovalantis Giorgos Kontoulis, Eirini Papagiannopoulou, Grigorios Tsoumakas

Automatically extracting keyphrases from scholarly documents leads to a valuable concise representation that humans can understand and machines can process for tasks, such as information retrieval, article clustering and article classification.

Extractive Summarization Information Retrieval +2

LARD: Large-scale Artificial Disfluency Generation

1 code implementation LREC 2022 Tatiana Passali, Thanassis Mavropoulos, Grigorios Tsoumakas, Georgios Meditskos, Stefanos Vrochidis

In addition, we release a new large-scale dataset with disfluencies that can be used on four different tasks: disfluency detection, classification, extraction, and correction.

VisioRed: A Visualisation Tool for Interpretable Predictive Maintenance

1 code implementation31 Mar 2021 Spyridon Paraschos, Ioannis Mollas, Nick Bassiliades, Grigorios Tsoumakas

The use of machine learning rapidly increases in high-risk scenarios where decisions are required, for example in healthcare or industrial monitoring equipment.

BIG-bench Machine Learning Decision Making +2

Conclusive Local Interpretation Rules for Random Forests

1 code implementation13 Apr 2021 Ioannis Mollas, Nick Bassiliades, Grigorios Tsoumakas

LionForests is a random forest-specific interpretation technique, which provides rules as explanations.

Binary Classification Decision Making +1

Multiple Similarity Drug-Target Interaction Prediction with Random Walks and Matrix Factorization

1 code implementation24 Jan 2022 Bin Liu, Dimitrios Papadopoulos, Fragkiskos D. Malliaros, Grigorios Tsoumakas, Apostolos N. Papadopoulos

Moreover, the validation of highly ranked non-interacting pairs also demonstrates the potential of MDMF2A to discover novel DTIs.

Local Multi-Label Explanations for Random Forest

1 code implementation5 Jul 2022 Nikolaos Mylonas, Ioannis Mollas, Nick Bassiliades, Grigorios Tsoumakas

Random Forest falls short on this property, especially when a large number of tree predictors are used.

Classification Decision Making +2

Local Word Vectors Guiding Keyphrase Extraction

1 code implementation20 Oct 2017 Eirini Papagiannopoulou, Grigorios Tsoumakas

Automated keyphrase extraction is a fundamental textual information processing task concerned with the selection of representative phrases from a document that summarize its content.

Keyphrase Extraction Word Embeddings

Structured Summarization of Academic Publications

1 code implementation19 May 2019 Alexios Gidiotis, Grigorios Tsoumakas

We propose SUSIE, a novel summarization method that can work with state-of-the-art summarization models in order to produce structured scientific summaries for academic articles.

Dense Distributions from Sparse Samples: Improved Gibbs Sampling Parameter Estimators for LDA

1 code implementation8 May 2015 Yannis Papanikolaou, James R. Foulds, Timothy N. Rubin, Grigorios Tsoumakas

We introduce a novel approach for estimating Latent Dirichlet Allocation (LDA) parameters from collapsed Gibbs samples (CGS), by leveraging the full conditional distributions over the latent variable assignments to efficiently average over multiple samples, for little more computational cost than drawing a single additional collapsed Gibbs sample.

Clustering Multi-Label Classification

Beyond MeSH: Fine-Grained Semantic Indexing of Biomedical Literature based on Weak Supervision

1 code implementation15 May 2020 Anastasios Nentidis, Anastasia Krithara, Grigorios Tsoumakas, Georgios Paliouras

To this end, we propose a new method that uses weak supervision to train a concept annotator on the literature available for a particular disease.

Retrieval

Optimizing Area Under the Curve Measures via Matrix Factorization for Predicting Drug-Target Interaction with Multiple Similarities

1 code implementation1 May 2021 Bin Liu, Grigorios Tsoumakas

In drug discovery, identifying drug-target interactions (DTIs) via experimental approaches is a tedious and expensive procedure.

Drug Discovery

Unsupervised Keyphrase Extraction from Scientific Publications

1 code implementation10 Aug 2018 Eirini Papagiannopoulou, Grigorios Tsoumakas

It then uses the minimum covariance determinant estimator to model the distribution of non-keyphrase word vectors, under the assumption that these vectors come from the same distribution, indicative of their irrelevance to the semantics expressed by the dimensions of the learned vector representation.

Keyphrase Extraction Outlier Detection +1

Keywords lie far from the mean of all words in local vector space

1 code implementation21 Aug 2020 Eirini Papagiannopoulou, Grigorios Tsoumakas, Apostolos N. Papadopoulos

Keyword extraction is an important document process that aims at finding a small set of terms that concisely describe a document's topics.

Keyword Extraction Position

Drug-Target Interaction Prediction via an Ensemble of Weighted Nearest Neighbors with Interaction Recovery

1 code implementation22 Dec 2020 Bin Liu, Konstantinos Pliakos, Celine Vens, Grigorios Tsoumakas

In addition, WkNNIR exploits local imbalance to promote the influence of more reliable similarities on the interaction recovery and prediction processes.

Drug Discovery

Large-scale investigation of weakly-supervised deep learning for the fine-grained semantic indexing of biomedical literature

2 code implementations23 Jan 2023 Anastasios Nentidis, Thomas Chatzopoulos, Anastasia Krithara, Grigorios Tsoumakas, Georgios Paliouras

Conclusion: The results suggest that concept occurrence is a strong heuristic for refining the coarse-grained labels at the level of MeSH concepts and the proposed method improves it further.

LioNets: Local Interpretation of Neural Networks through Penultimate Layer Decoding

1 code implementation15 Jun 2019 Ioannis Mollas, Nikolaos Bassiliades, Grigorios Tsoumakas

Technological breakthroughs on smart homes, self-driving cars, health care and robotic assistants, in addition to reinforced law regulations, have critically influenced academic research on explainable machine learning.

General Classification Self-Driving Cars

What is all this new MeSH about? Exploring the semantic provenance of new descriptors in the MeSH thesaurus

1 code implementation20 Jan 2021 Anastasios Nentidis, Anastasia Krithara, Grigorios Tsoumakas, Georgios Paliouras

To this end, we propose a framework to categorize new descriptors based on their current relation to older descriptors.

Topic-Aware Evaluation and Transformer Methods for Topic-Controllable Summarization

1 code implementation9 Jun 2022 Tatiana Passali, Grigorios Tsoumakas

Furthermore, existing methods built upon recurrent architectures, which can significantly limit their performance compared to more recent Transformer-based architectures, while they also require modifications to the model's architecture for controlling the topic.

Fine-Grained Selective Similarity Integration for Drug-Target Interaction Prediction

1 code implementation1 Dec 2022 Bin Liu, Jin Wang, Kaiwei Sun, Grigorios Tsoumakas

Recently, with the availability of abundant heterogeneous biological information from diverse data sources, computational methods have been able to leverage multiple drug and target similarities to boost the performance of DTI prediction.

Truthful Meta-Explanations for Local Interpretability of Machine Learning Models

1 code implementation7 Dec 2022 Ioannis Mollas, Nick Bassiliades, Grigorios Tsoumakas

As a result, the demand for a selection tool, a meta-explanation technique based on a high-quality evaluation metric, is apparent.

Local Interpretability of Random Forests for Multi-Target Regression

1 code implementation29 Mar 2023 Avraam Bardos, Nikolaos Mylonas, Ioannis Mollas, Grigorios Tsoumakas

Although model-agnostic techniques exist for multi-target regression, specific techniques tailored to random forest models are not available.

Multi-target regression regression

Web Robot Detection in Academic Publishing

no code implementations14 Nov 2017 Athanasios Lagopoulos, Grigorios Tsoumakas, Georgios Papadopoulos

In this paper, we present our approach on detecting web robots in academic publishing websites.

Subset Labeled LDA for Large-Scale Multi-Label Classification

no code implementations16 Sep 2017 Yannis Papanikolaou, Grigorios Tsoumakas

We conduct extensive experiments on eight data sets, with label sets sizes ranging from hundreds to hundreds of thousands, comparing our proposed algorithm with the previously proposed LLDA algorithms (Prior--LDA, Dep--LDA), as well as the state of the art in extreme multi-label classification.

Classification Extreme Multi-Label Classification +2

Hierarchical Partitioning of the Output Space in Multi-label Data

no code implementations19 Dec 2016 Yannis Papanikolaou, Ioannis Katakis, Grigorios Tsoumakas

Hierarchy Of Multi-label classifiers (HOMER) is a multi-label learning algorithm that breaks the initial learning task to several, easier sub-tasks by first constructing a hierarchy of labels from a given label set and secondly employing a given base multi-label classifier (MLC) to the resulting sub-problems.

Clustering Multi-Label Classification +1

Multi-Target Regression via Input Space Expansion: Treating Targets as Inputs

no code implementations28 Nov 2012 Eleftherios Spyromitros-Xioufis, Grigorios Tsoumakas, William Groves, Ioannis Vlahavas

When the prediction targets are binary the task is called multi-label classification, while when the targets are continuous the task is called multi-target regression.

General Classification Multi-Label Classification +3

Multi-Target Regression via Random Linear Target Combinations

no code implementations20 Apr 2014 Grigorios Tsoumakas, Eleftherios Spyromitros-Xioufis, Aikaterini Vrekou, Ioannis Vlahavas

Multi-target regression is concerned with the simultaneous prediction of multiple continuous target variables based on the same set of input variables.

General Classification Multi-Label Classification +2

Discovering and Exploiting Entailment Relationships in Multi-Label Learning

no code implementations15 Apr 2014 Christina Papagiannopoulou, Grigorios Tsoumakas, Ioannis Tsamardinos

Marginal probabilities are entered as soft evidence in the network and adjusted through probabilistic inference.

Multi-Label Learning

LionForests: Local Interpretation of Random Forests

no code implementations20 Nov 2019 Ioannis Mollas, Nick Bassiliades, Ioannis Vlahavas, Grigorios Tsoumakas

Towards a future where machine learning systems will integrate into every aspect of people's lives, researching methods to interpret such systems is necessary, instead of focusing exclusively on enhancing their performance.

BIG-bench Machine Learning

From Protocol to Screening: A Hybrid Learning Approach for Technology-Assisted Systematic Literature Reviews

no code implementations19 Nov 2020 Athanasios Lagopoulos, Grigorios Tsoumakas

We present a novel method for TAR that implements a full pipeline from the research protocol to the screening of the relevant papers.

Learning-To-Rank Retrieval +2

Improving Zero-Shot Entity Retrieval through Effective Dense Representations

no code implementations6 Mar 2021 Eleni Partalidou, Despina Christou, Grigorios Tsoumakas

We achieve a new state-of-the-art 84. 28% accuracy on top-50 candidates on the Zeshel dataset, compared to the previous 82. 06% on the top-64 of (Wu et al., 2020).

Entity Linking Entity Retrieval +1

Harvesting the Public MeSH Note field

no code implementations1 Jun 2021 Anastasios Nentidis, Anastasia Krithara, Grigorios Tsoumakas, Georgios Paliouras

In this document, we report an analysis of the Public MeSH Note field of the new descriptors introduced in the MeSH thesaurus between 2006 and 2020.

Bayesian Active Summarization

no code implementations9 Oct 2021 Alexios Gidiotis, Grigorios Tsoumakas

Bayesian Active Learning has had significant impact to various NLP problems, but nevertheless it's application to text summarization has been explored very little.

Active Learning Text Summarization

AUTH @ CLSciSumm 20, LaySumm 20, LongSumm 20

no code implementations EMNLP (sdp) 2020 Alexios Gidiotis, Stefanos Stefanidis, Grigorios Tsoumakas

We present the systems we submitted for the shared tasks of the Workshop on Scholarly Document Processing at EMNLP 2020.

Keyword Extraction Using Unsupervised Learning on the Document’s Adjacency Matrix

no code implementations NAACL (TextGraphs) 2021 Eirini Papagiannopoulou, Grigorios Tsoumakas, Apostolos Papadopoulos

This work revisits the information given by the graph-of-words and its typical utilization through graph-based ranking approaches in the context of keyword extraction.

Keyword Extraction

Does Noise Affect Housing Prices? A Case Study in the Urban Area of Thessaloniki

1 code implementation25 Feb 2023 Georgios Kamtziridis, Dimitris Vrakas, Grigorios Tsoumakas

Real estate markets depend on various methods to predict housing prices, including models that have been trained on datasets of residential or commercial properties.

From Lengthy to Lucid: A Systematic Literature Review on NLP Techniques for Taming Long Sentences

no code implementations8 Dec 2023 Tatiana Passali, Efstathios Chatzikyriakidis, Stelios Andreadis, Thanos G. Stavropoulos, Anastasia Matonaki, Anestis Fachantidis, Grigorios Tsoumakas

This survey, conducted using the PRISMA guidelines, systematically reviews two main strategies for addressing the issue of long sentences: a) sentence compression and b) sentence splitting.

Sentence Sentence Compression

Multi-Label Adaptive Batch Selection by Highlighting Hard and Imbalanced Samples

no code implementations27 Mar 2024 Ao Zhou, Bin Liu, Jin Wang, Grigorios Tsoumakas

However, the intrinsic class imbalance in multi-label data may bias the model towards majority labels, since samples relevant to minority labels may be underrepresented in each mini-batch.

Cannot find the paper you are looking for? You can Submit a new open access paper.