Interdisciplinary Linguistics Program presents Challenges And Solutions In The Development Of Language Corpora In Quechua by Dr. Chad Howe at the University of Georgia on Thursday, April 18, at 3:30 p.m. in 313 Greene Hall.

Abstract: Creating corpora, even for well-resourced languages, presents numerous challenges, ranging from actual data collection to determining which tools can be used for text management. In this talk, I will discuss two corpora of Quechua, the most widely spoken of the Indigenous languages in South America with upwards of 10 million speakers. The firs collection consists of oral interviews conducted with bilingual speakers in Cusco, Peru (Bateman 2022 & Hubbel 2024). The other is comprised of a colonial text, Ollantay, which is among the first literary works written in Quechua. Each source offers opportunities to explore linguistic and, perhaps more broadly, humanistic phenomena that, in the absence of robust data sources, are difficult to address. Specifically, I will present work on (i) the role of placeholders or hesitation markers in the speech of bilingual Spanish/Quechua speakers and (ii) authorship attribution in historical texts. The findings in these studies highlight the need to develop high-quality corpora for under-resourced languages.

Archives