This article discusses a proposed methodology for obtaining a classifier for druggable proteins. The methodology involves using three families of protein composition descriptors and testing 13 types of machine learning classifiers with different feature selection methods and parameters. The employed classifiers include Gaussian Naive Bayes, k-nearest neighbors algorithm, linear discriminant analysis, support vector machine, logistics regression, multilayer perceptron, decision tree, random forest, XGBoost, Gradient Boosting, AdaBoost classifier, and Bagging classifier. The feature selection methods utilized were principal component analysis, feature selection based on a percentile of the highest scores, feature selector removing features with low variance, linear support vector classification, and extra-trees classifier.
