This article proposes a framework based on transformer models and a generative adversarial network for multi-class plasmodium classification and malaria diagnosis. The Generative Adversarial Network is used to generate extended training samples from multiclass cell images, with the aim of enhancing the robustness of the resulting model. The transformer models are compared to the state-of-the-art methods and prove to be efficient in the classification of malaria parasite through thin blood smear microscopic images. The Swin Transformer model and MobileVit outperform the baseline architectures in terms of precision, recall, F1-score, specificity, and FPR on test set. The Swin Transformer achieves superior detection performance (up to 99.8% accuracy), while MobileViT demonstrates lower memory usage and shorter inference times.
