In this study, we capitalized on a collective dataset repository of 57k accidents from 9 companies belonging to 3 domains and tested whether models trained on multiple datasets (generic models) predicted safety outcomes better than the company-specific models.
While traditional natural language generation metrics are fast, they are not very reliable.
We show BARThez to be very competitive with state-of-the-art BERT-based French language models such as CamemBERT and FlauBERT.
Ranked #1 on Text Summarization on OrangeSum (using extra training data)
A valuable by-product of our method is the ability to sample, at no extra cost, sentences containing different senses of a given word.
In this paper, we represent documents as word co-occurrence networks and propose an application of the message passing framework to NLP, the Message Passing Attention network for Document understanding (MPAD).
Ranked #1 on Multi-Modal Document Classification on Reuters-21578
This paper significantly improves on, and finishes to validate, an approach proposed in previous research in which safety outcomes were predicted from attributes with machine learning.
In light of the increasing availability of digitally recorded safety reports in the construction industry, it is important to develop methods to exploit these data to improve our understanding of safety incidents and ability to learn from them.
Some of the most effective influential spreader detection algorithms are unstable to small perturbations of the network structure.
Our vectors were obtained by running word2vec on an 11M-word corpus that we created from scratch by leveraging freely-accessible online sources of construction-related text.
By applying our methodology on an attribute and outcome dataset directly obtained from 814 injury reports, we show that the frequency-magnitude distribution of construction safety risk is very similar to that of natural phenomena such as precipitation or earthquakes.