Hong Kong Cantonese corpus

The Hong Kong Cantonese Corpus was collected from transcribed conversations that were recorded between March 1997 and August 1998. About 230,000 Chinese words were collected in the annotated corpus. It contains recordings of spontaneous speech (51 texts) and radio programmes (42 texts), which involve 2 to 4 speakers, with 1 text of monologue. The text were word-segmented, annotated with part-of-speech tagging and Cantonese pronunciation using the romanisation scheme of Linguistic Society of Hong Kong (LSHK).

Source: Hong Kong Cantonese corpus

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

huggingface/datasets

18,406

Hong Kong Cantonese corpus

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

10,000 People - Human Pose Recognition Data

License

Modalities

Languages

Hong Kong Cantonese corpus

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

10,000 People - Human Pose Recognition Data

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages