In this paper, we present a fast, scalable, and accurate approach towards automated Short Answer Scoring (SAS).
We also release several datasets to test computer vision video generation models of their speech understanding.
This number is increasing further due to COVID-19 and the associated automation of education and testing.
To close the gap between speech understanding and multimedia video applications, in this paper, we show the initial experiments by modelling the perception on visual speech and showing its use case on video compression.
The tool allows audio data and their corresponding annotations to be uploaded and assigned to a user through a key-based API.
In this study, we propose a novel multi-modal end-to-end neural approach for automated assessment of non-native English speakers' spontaneous speech using attention fusion.
In this paper, we present a new corpus consisting of sentences from Hindi short stories annotated for five different discourse modes argumentative, narrative, descriptive, dialogic and informative.
The modified GRU-based model outperforms the standard CNN-RNN and Conv3D models for three of the four scenarios.
Human-Computer Interaction I.2.7
Predicting the runtime complexity of a programming code is an arduous task.
In this paper, we formulate keyphrase extraction from scholarly articles as a sequence labeling task solved using a BiLSTM-CRF, where the words in the input text are represented using deep contextualized embeddings.
In this paper, we introduce the first and largest Hindi text corpus, named BHAAV, which means emotions in Hindi, for analyzing emotions that a writer expresses through his characters in a story, as perceived by a narrator/reader.
Visual speech recognition (VSR) is the task of recognizing spoken language from video input only, without any audio.
In this paper we present our approach and the system description for Sub Task A of SemEval 2019 Task 9: Suggestion Mining from Online Reviews and Forums.
To solve this problem, we present a novel approach to zero-shot learning by generating new classes using Generative Adversarial Networks (GANs), and show how the addition of unseen class samples increases the accuracy of a VSR system by a significant margin of 27% and allows it to handle speaker-independent out-of-vocabulary phrases.
There has been upsurge in the number of people participating in challenges made popular through social media channels.
In multilingual societies like the Indian subcontinent, use of code-switched languages is much popular and convenient for the users.
Recently, research has started venturing into generating (audio) speech from silent video sequences but there have been no developments thus far in dealing with divergent views and poses of a speaker.
Sound Audio and Speech Processing