OpenSpeaks Voice: Odia Dataset

Name:*

Full name (optional):

Description (Markdown and $\LaTeX$ enabled):*

OpenSpeaks Voice: Odia is a large speech dataset in the Odia language of India that is stewarded by Subhashish Panigrahi and is hosted at the O Foundation. It currently hosts over 70,000 audio files under a Universal Public Domain (CC0 1.0) Release. Of these, 66,000, hosted on Wikimedia Commons, include pronunciation of words and phrases, and the remaining 4,400 include pronunciation of sentences and are hosted on Mozilla Common Voice. The files on Wikimedia Commons were also released n 2023 as four physical media in the form of DVD-ROMs titled OpenSpeaks Voice: Odia Volume I, OpenSpeaks Voice: Odia Volume II, OpenSpeaks Voice: Balesoria-Odia Volume I, and OpenSpeaks Voice: Balesoria-Odia Volume II. The dataset uses Free/Libre and Open Source Software, primarily using web-based platforms such as Lingua Libre and Common Voice. Other tools used for this project include Kathabhidhana, developed by Panigrahi by forking the Voice Recorder for Tamil Wiktionary by Shrinivasan T, and Spell4wiki, Audacity among others. Over 64,000 files in this dataset are in the standard spoken variant of Odia (Central Odia), and the remaining 6,300 files are in Balesoria (Baleswari), the northern dialect of Odia. OpenSpeaks Voice: Balesoria-Odia Volume II was created by extracting words and phrases from the Nani Ma, a Balesoria-Odia documentary short directed by Panigrahi. The files within this dataset include transcription in Odia, making them accessible for automatic speech recognition (ASR). All the files are publicly available for ASR research and application building.

Source: [OpenSpeaks before](https://theofdn.org/activities/before/)
Image Source: [OpenSpeaks before](https://theofdn.org/activities/before/)

Homepage URL (optional):

Paper where the dataset was introduced:

Introduction date:

Dataset license:

URL to full license terms:

Image

Currently

datasets/fbf2c956-e002-4ca5-9dff-5d538ee8181f.jpg Clear

Change

---

OpenSpeaks Voice: Odia

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Usage

License

Modalities

Languages

OpenSpeaks Voice: Odia

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit