This article discusses the performance of seven conventional learning models and four deep learning model variants on a clinical decision support task. The models were tested on a holdout dataset and compared to the original dataset. The results showed that the conventional models XGBoost, Random Forest, AdaBoost, and SVC all obtained AUROC scores over 87% and weighted averaged F1 scores over 90%. The deep learning model variants performed similarly to the conventional models, with the exception of the Gumbel Softmax variant. A Kruskal-Wallis H-test was used to determine whether each feature subset performed significantly differently from the original processed dataset.
