Data verbalisation is a task of great importance in the current field of natural language processing, as there is great benefit in the transformation of our abundant structured and semi-structured data into human-readable formats.
Wikidata is one of the most important sources of structured data on the web, built by a worldwide community of volunteers.
The system uses a hybrid of content-based and collaborative filtering techniques to rank items for editors relying on both item features and item-editor previous interaction.
While Wikipedia exists in 287 languages, its content is unevenly distributed among them.
We explore the problem of generating natural language summaries for Semantic Web data.
For these methods to work, they require a critical resource: a lexicon that is appropriate for the task at hand, in terms of the range of emotions it captures diversity.
Our model is based on a Recurrent Neural Network (RNN) that is trained over concatenated sequences of comments, a Convolution Neural Network that is trained over Wikipedia sentences and a formulation that couples the two trained embeddings in a multimodal space.