TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Multi-Label Text Classification	BVICTOR	SVM	Weighted F1	0.8235	# 2
Multi-Label Text Classification	BVICTOR	SVM	Average F1	0.7761	# 2
Multi-Label Text Classification	BVICTOR	XGBoost	Weighted F1	0.8957	# 1
Multi-Label Text Classification	BVICTOR	XGBoost	Average F1	0.8843	# 1
Multi-Label Text Classification	BVICTOR	NB	Weighted F1	0.6955	# 3
Multi-Label Text Classification	BVICTOR	NB	Average F1	0.6335	# 3
Multi-Label Text Classification	MVICTOR (theme)	NB	Weighted F1	0.6062	# 3
Multi-Label Text Classification	MVICTOR (theme)	NB	Average F1	0.3797	# 3
Multi-Label Text Classification	MVICTOR (theme)	SVM	Weighted F1	0.8137	# 2
Multi-Label Text Classification	MVICTOR (theme)	SVM	Average F1	0.6642	# 2
Multi-Label Text Classification	MVICTOR (theme)	XGBoost	Weighted F1	0.9072	# 1
Multi-Label Text Classification	MVICTOR (theme)	XGBoost	Average F1	0.8882	# 1
Text Classification	MVICTOR (type)	SVM	Weighted F1	0.9288	# 4
Text Classification	MVICTOR (type)	SVM	Average F1	0.6792	# 4
Text Classification	MVICTOR (type)	NB	Weighted F1	0.8477	# 5
Text Classification	MVICTOR (type)	NB	Average F1	0.4772	# 5
Text Classification	MVICTOR (type)	CNN + CRF	Weighted F1	0.9537	# 1
Text Classification	MVICTOR (type)	CNN + CRF	Average F1	0.7505	# 1
Text Classification	MVICTOR (type)	CNN	Weighted F1	0.9464	# 2
Text Classification	MVICTOR (type)	CNN	Average F1	0.7061	# 3
Text Classification	MVICTOR (type)	BiLSTM	Weighted F1	0.9433	# 3
Text Classification	MVICTOR (type)	BiLSTM	Average F1	0.7092	# 2
Multi-Label Text Classification	SVICTOR (theme)	XGBoost	Weighted F1	0.8634	# 1
Multi-Label Text Classification	SVICTOR (theme)	XGBoost	Average F1	0.8887	# 1
Multi-Label Text Classification	SVICTOR (theme)	SVM	Weighted F1	0.8231	# 2
Multi-Label Text Classification	SVICTOR (theme)	SVM	Average F1	0.8246	# 2
Multi-Label Text Classification	SVICTOR (theme)	NB	Weighted F1	0.4875	# 3
Multi-Label Text Classification	SVICTOR (theme)	NB	Average F1	0.5121	# 3
Text Classification	SVICTOR (type)	CNN	Weighted F1	0.9472	# 2
Text Classification	SVICTOR (type)	CNN	Average F1	0.7584	# 3
Text Classification	SVICTOR (type)	CNN + CRF	Weighted F1	0.9533	# 1
Text Classification	SVICTOR (type)	CNN + CRF	Average F1	0.7740	# 1
Text Classification	SVICTOR (type)	BiLSTM	Weighted F1	0.9465	# 3
Text Classification	SVICTOR (type)	BiLSTM	Average F1	0.7281	# 4
Text Classification	SVICTOR (type)	SVM	Weighted F1	0.9425	# 4
Text Classification	SVICTOR (type)	SVM	Average F1	0.7632	# 2
Text Classification	SVICTOR (type)	NB	Weighted F1	0.8893	# 5
Text Classification	SVICTOR (type)	NB	Average F1	0.5979	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/victor-a-dataset-for-brazilian-legal/multi-label-text-classification-on-bvictor)](https://paperswithcode.com/sota/multi-label-text-classification-on-bvictor?p=victor-a-dataset-for-brazilian-legal)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/victor-a-dataset-for-brazilian-legal/multi-label-text-classification-on-mvictor)](https://paperswithcode.com/sota/multi-label-text-classification-on-mvictor?p=victor-a-dataset-for-brazilian-legal)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/victor-a-dataset-for-brazilian-legal/text-classification-on-mvictor-type)](https://paperswithcode.com/sota/text-classification-on-mvictor-type?p=victor-a-dataset-for-brazilian-legal)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/victor-a-dataset-for-brazilian-legal/multi-label-text-classification-on-svictor)](https://paperswithcode.com/sota/multi-label-text-classification-on-svictor?p=victor-a-dataset-for-brazilian-legal)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/victor-a-dataset-for-brazilian-legal/text-classification-on-svictor-type)](https://paperswithcode.com/sota/text-classification-on-svictor-type?p=victor-a-dataset-for-brazilian-legal)`

VICTOR: a Dataset for Brazilian Legal Documents Classification

LREC 2020 · Pedro Henrique Luz de Araujo, Te{\'o}filo Em{\'\i}dio de Campos, Fabricio Ataides Braz, Nilton Correia da Silva ·

This paper describes VICTOR, a novel dataset built from Brazil{'}s Supreme Court digitalized legal documents, composed of more than 45 thousand appeals, which includes roughly 692 thousand documents{---}about 4.6 million pages. The dataset contains labeled text data and supports two types of tasks: document type classification; and theme assignment, a multilabel problem. We present baseline results using bag-of-words models, convolutional neural networks, recurrent neural networks and boosting algorithms. We also experiment using linear-chain Conditional Random Fields to leverage the sequential nature of the lawsuits, which we find to lead to improvements on document type classification. Finally we compare a theme classification approach where we use domain knowledge to filter out the less informative document pages to the default one where we use all pages. Contrary to the Court experts{'} expectations, we find that using all available data is the better method. We make the dataset available in three versions of different sizes and contents to encourage explorations of better models and techniques.

PDF Abstract LREC 2020 PDF LREC 2020 Abstract

Code

Add Remove Mark official

peluz/VICTOR-dataset

Tasks

Add Remove

Classification

General Classification

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Add Remove

Ranked #1 on Multi-Label Text Classification on MVICTOR (theme)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Multi-Label Text Classification	BVICTOR	SVM	Weighted F1	0.8235	# 2	Compare
Multi-Label Text Classification	BVICTOR	SVM	Average F1	0.7761	# 2	Compare
Multi-Label Text Classification	BVICTOR	XGBoost	Weighted F1	0.8957	# 1	Compare
Multi-Label Text Classification	BVICTOR	XGBoost	Average F1	0.8843	# 1	Compare
Multi-Label Text Classification	BVICTOR	NB	Weighted F1	0.6955	# 3	Compare
Multi-Label Text Classification	BVICTOR	NB	Average F1	0.6335	# 3	Compare
Multi-Label Text Classification	MVICTOR (theme)	NB	Weighted F1	0.6062	# 3	Compare
Multi-Label Text Classification	MVICTOR (theme)	NB	Average F1	0.3797	# 3	Compare
Multi-Label Text Classification	MVICTOR (theme)	SVM	Weighted F1	0.8137	# 2	Compare
Multi-Label Text Classification	MVICTOR (theme)	SVM	Average F1	0.6642	# 2	Compare
Multi-Label Text Classification	MVICTOR (theme)	XGBoost	Weighted F1	0.9072	# 1	Compare
Multi-Label Text Classification	MVICTOR (theme)	XGBoost	Average F1	0.8882	# 1	Compare
Text Classification	MVICTOR (type)	SVM	Weighted F1	0.9288	# 4	Compare
Text Classification	MVICTOR (type)	SVM	Average F1	0.6792	# 4	Compare
Text Classification	MVICTOR (type)	NB	Weighted F1	0.8477	# 5	Compare
Text Classification	MVICTOR (type)	NB	Average F1	0.4772	# 5	Compare
Text Classification	MVICTOR (type)	CNN + CRF	Weighted F1	0.9537	# 1	Compare
Text Classification	MVICTOR (type)	CNN + CRF	Average F1	0.7505	# 1	Compare
Text Classification	MVICTOR (type)	CNN	Weighted F1	0.9464	# 2	Compare
Text Classification	MVICTOR (type)	CNN	Average F1	0.7061	# 3	Compare
Text Classification	MVICTOR (type)	BiLSTM	Weighted F1	0.9433	# 3	Compare
Text Classification	MVICTOR (type)	BiLSTM	Average F1	0.7092	# 2	Compare
Multi-Label Text Classification	SVICTOR (theme)	XGBoost	Weighted F1	0.8634	# 1	Compare
Multi-Label Text Classification	SVICTOR (theme)	XGBoost	Average F1	0.8887	# 1	Compare
Multi-Label Text Classification	SVICTOR (theme)	SVM	Weighted F1	0.8231	# 2	Compare
Multi-Label Text Classification	SVICTOR (theme)	SVM	Average F1	0.8246	# 2	Compare
Multi-Label Text Classification	SVICTOR (theme)	NB	Weighted F1	0.4875	# 3	Compare
Multi-Label Text Classification	SVICTOR (theme)	NB	Average F1	0.5121	# 3	Compare
Text Classification	SVICTOR (type)	CNN	Weighted F1	0.9472	# 2	Compare
Text Classification	SVICTOR (type)	CNN	Average F1	0.7584	# 3	Compare
Text Classification	SVICTOR (type)	CNN + CRF	Weighted F1	0.9533	# 1	Compare
Text Classification	SVICTOR (type)	CNN + CRF	Average F1	0.7740	# 1	Compare
Text Classification	SVICTOR (type)	BiLSTM	Weighted F1	0.9465	# 3	Compare
Text Classification	SVICTOR (type)	BiLSTM	Average F1	0.7281	# 4	Compare
Text Classification	SVICTOR (type)	SVM	Weighted F1	0.9425	# 4	Compare
Text Classification	SVICTOR (type)	SVM	Average F1	0.7632	# 2	Compare
Text Classification	SVICTOR (type)	NB	Weighted F1	0.8893	# 5	Compare
Text Classification	SVICTOR (type)	NB	Average F1	0.5979	# 5	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

VICTOR: a Dataset for Brazilian Legal Documents Classification

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove