This article discusses the challenges of using deep learning transformer models for languages other than English, specifically focusing on Afaan Oromo. The lack of comprehensive evaluation on openly accessible datasets for the language is identified as a major drawback for health-related text classification techniques. To address this, the authors collected disease-related documents and prepared a corpus of Afaan Oromo patient symptoms, which was then annotated by native speakers and domain experts.
