Suar: Towards building a corpus for the Saudi dialect

Saturday, January 22, 2022

This paper presents the preliminary results of the construction of a morphologically annotated corpus for the Saudi dialect. We call the corpus SUAR (SaUdi corpus for NLP Applications and Resources). The corpus consists of around 104,079 words collected from different online sources. The linguistic features of the Saudi dialect are elaborated and compared with Modern Standard Arabic and other Arabic dialects. This paper conducts a pilot study to explore possible directions to facilitate the morphological annotation of the Saudi corpus. The corpus was automatically annotated using the MADAMIRA tool, after which it was manually inspected to validate the resulting analysis.

Rating

Changed

1443/06/19 03:15 AM

Departments

Deanships

Vice Rectorates

Centers and Institutes

Shaqra

Afif

Dhurma

Dawadmi

Al Quwaiiyah

Sajir

Thadq

Al Muzahimiyah

Suar: Towards building a corpus for the Saudi dialect