no code implementations • 21 Jan 2023 • Nikolaj Tatti
We then show that for fixed bernoulli parameters we can find the optimal change point in logarithmic time.
no code implementations • 19 May 2022 • {Chamalee Wickrama Arachchi, Nikolaj Tatti
We extend this model to temporal networks by modelling the edges with a Poisson process.
1 code implementation • 8 Apr 2022 • Guangyi Zhang, Nikolaj Tatti, Aristides Gionis
In more detail, given a set of items and a set of non-decreasing submodular functions, where each function is associated with a budget, we aim to find a ranking of the set of items that maximizes the sum of values achieved by all functions under the budget constraints.
no code implementations • 12 Dec 2021 • Nikolaj Tatti
The first algorithm maintains area under the ROC curve (AUC) under addition and deletion of data points in $O(\log n)$ time.
no code implementations • 12 Dec 2021 • Nikolaj Tatti
If we have a constraint $k$ for which we cannot find appropriate $\alpha$, we demonstrate a simple algorithm that yields $O(\sqrt{n})$ approximation guarantee by connecting the problem to a minimum $k$-union problem.
no code implementations • 16 Jun 2020 • Nikolaj Tatti, Hannes Heikinheimo
Such itemset families define a probabilistic model for the data from which the original collection of itemsets has been derived from.
no code implementations • 16 Jun 2020 • Sami Hanhijärvi, Markus Ojala, Niko Vuokko, Kai Puolamäki, Nikolaj Tatti, Heikki Mannila
At each step in the data mining process, the randomization produces random samples from the set of data matrices satisfying the already discovered patterns or models.
no code implementations • 25 Apr 2019 • Michael Mampaey, Jilles Vreeken, Nikolaj Tatti
As we use the Maximum Entropy principle to obtain unbiased probabilistic models, and only include those itemsets that are most informative with regard to the current model, the summaries we construct are guaranteed to be both descriptive and non-redundant.
no code implementations • 24 Apr 2019 • Nikolaj Tatti
We say that the itemset is significant if we are surprised by its frequency when compared to the frequencies of its sub-itemsets.
no code implementations • 16 Apr 2019 • Nikolaj Tatti, Boris Cule
Episodes are sequential patterns describing events that often occur in the vicinity of each other.
no code implementations • 15 Apr 2019 • Nikolaj Tatti
Discovering the most interesting patterns is the key problem in the field of pattern mining.
no code implementations • 14 Apr 2019 • Nikolaj Tatti, Boris Cule
Adopting existing approaches for discovering traditional patterns, such as closed itemsets, to episodes is not straightforward.
no code implementations • 9 Apr 2019 • Nikolaj Tatti
To this end, we formulate an optimization problem: given a graph and an integer $K$, we want to order graph vertices and partition the ordered adjacency matrix into $K$ bands such that bands closer to the diagonal are more dense.
no code implementations • 18 Feb 2019 • Nikolaj Tatti, Fabian Moerchen, Toon Calders
We do this by measuring the robustness of a property of an itemset such as closedness or non-derivability.
no code implementations • 18 Feb 2019 • Nikolaj Tatti, Jilles Vreeken
Our approach provides a means to study and tell differences between results of different exploratory data mining methods.
no code implementations • 8 Feb 2019 • Nikolaj Tatti, Michael Mampaey
We demonstrate that these statistics describe forms of data that occur in practice and have been studied in data mining.
no code implementations • 7 Feb 2019 • Nikolaj Tatti, Jilles Vreeken
An ideal outcome of pattern mining is a small set of informative patterns, containing no redundancy or noise, that identifies the key structure of the data at hand.
no code implementations • 4 Feb 2019 • Nikolaj Tatti, Taneli Mielikainen, Aristides Gionis, Heikki Mannila
Defining the effective dimensionality of such a dataset is a nontrivial problem.
no code implementations • 2 Feb 2019 • Nikolaj Tatti
More specifically, we propose an algorithm that, given $\epsilon$, estimates AUC within $\epsilon / 2$, and can maintain this estimate in $O((\log k) / \epsilon)$ time, per update, as the window slides.
no code implementations • 17 Jan 2019 • Nikolaj Tatti, Pauli Miettinen
In this paper, we study a problem of Boolean matrix factorization where we additionally require that the factor matrices have consecutive ones property (OBMF).
no code implementations • 28 May 2018 • Nikolaj Tatti
In addition, we consider a cumulative version of Seg, where we are asked to discover the optimal segmentation for each prefix of the input sequence.
no code implementations • NeurIPS 2017 • Kiran Garimella, Aristides Gionis, Nikos Parotsidis, Nikolaj Tatti
Our goal is to find two sets of nodes to employ in the respective campaigns, so that the overall information exposure for the two campaigns is balanced.
no code implementations • 5 May 2016 • Indre Zliobaite, Nikolaj Tatti
We show how to adjust the coefficient of determination ($R^2$) when used for measuring predictive accuracy via leave-one-out cross-validation.
1 code implementation • 26 Jun 2015 • Francois Petitjean, Tao Li, Nikolaj Tatti, Geoffrey I. Webb
It combines (1) a novel definition of the expected support for a sequential pattern - a concept on which most interestingness measures directly rely - with (2) SkOPUS: a new branch-and-bound algorithm for the exact discovery of top-k sequential patterns under a given measure of interest.