Search Results for author: Pavel Smrz

Found 19 papers, 10 papers with code

Making Software FAIR: A machine-assisted workflow for the research software lifecycle

no code implementations8 Jan 2025 Petr Knoth, Laurent Romary, Patrice Lopez, Roberto Di Cosmo, Pavel Smrz, Tomasz Umerle, Melissa Harrison, Alain Monteil, Matteo Cancellieri, David Pride

A key issue hindering discoverability, attribution and reusability of open research software is that its existence often remains hidden within the manuscript of research papers.

BenCzechMark : A Czech-centric Multitask and Multimetric Benchmark for Large Language Models with Duel Scoring Mechanism

no code implementations23 Dec 2024 Martin Fajcik, Martin Docekal, Jan Dolezal, Karel Ondrej, Karel Beneš, Jan Kapsa, Pavel Smrz, Alexander Polok, Michal Hradis, Zuzana Neverilova, Ales Horak, Radoslav Sabol, Michal Stefanik, Adam Jirkovsky, David Adamczyk, Petr Hyner, Jan Hula, Hynek Kydlicek

Furthermore, we collect and clean BUT-Large Czech Collection, the largest publicly available clean Czech language corpus, and use it for (i) contamination analysis, (ii) continuous pretraining of the first Czech-centric 7B language model, with Czech-specific tokenization.

Language Modeling Language Modelling

OARelatedWork: A Large-Scale Dataset of Related Work Sections with Full-texts from Open Access Sources

2 code implementations3 May 2024 Martin Docekal, Martin Fajcik, Pavel Smrz

We show that the estimated upper bound for extractive summarization increases by 217% in the ROUGE-2 score, when using full content instead of abstracts.

Document Summarization Extractive Summarization +1

Claim-Dissector: An Interpretable Fact-Checking System with Joint Re-ranking and Veracity Prediction

1 code implementation28 Jul 2022 Martin Fajcik, Petr Motlicek, Pavel Smrz

We propose to disentangle the per-evidence relevance probability and its contribution to the final veracity probability in an interpretable way -- the final veracity probability is proportional to a linear ensemble of per-evidence relevance probabilities.

Fact Checking Re-Ranking +1

Query-Based Keyphrase Extraction from Long Documents

1 code implementation11 May 2022 Martin Docekal, Pavel Smrz

Transformer-based architectures in natural language processing force input size limits that can be problematic when long documents need to be processed.

Chunking Keyphrase Extraction

Pruning the Index Contents for Memory Efficient Open-Domain QA

2 code implementations21 Feb 2021 Martin Fajcik, Martin Docekal, Karel Ondrej, Pavel Smrz

This work presents a novel pipeline that demonstrates what is achievable with a combined effort of state-of-the-art approaches.

Open-Domain Question Answering

JokeMeter at SemEval-2020 Task 7: Convolutional humor

no code implementations SEMEVAL 2020 Martin Docekal, Martin Fajcik, Josef Jon, Pavel Smrz

This paper describes our system that was designed for Humor evaluation within the SemEval-2020 Task 7.

BUT-FIT at SemEval-2019 Task 7: Determining the Rumour Stance with Pre-Trained Deep Bidirectional Transformers

1 code implementation SEMEVAL 2019 Martin Fajcik, Lukáš Burget, Pavel Smrz

This paper describes our system submitted to SemEval 2019 Task 7: RumourEval 2019: Determining Rumour Veracity and Support for Rumours, Subtask A (Gorrell et al., 2019).

General Classification Rumour Detection +1

WTF-LOD - A New Resource for Large-Scale NER Evaluation

no code implementations LREC 2016 Lubomir Otrusina, Pavel Smrz

This paper introduces the Web TextFull linkage to Linked Open Data (WTF-LOD) dataset intended for large-scale evaluation of named entity recognition (NER) systems.

Entity Linking named-entity-recognition +2

Cannot find the paper you are looking for? You can Submit a new open access paper.