Compound potency predictions play a major role in computational drug discovery and predictive methods are typically evaluated and compared in benchmark calculations. This study carried out an in-depth analysis of potential reasons leading to artificial outcomes of potency predictions using different methods. It was found that potency predictions on activity classes typically used in benchmark settings were determined by compounds with intermediate potency close to median values of the compound data sets. The findings provide a clear rationale for general limitations of compound potency benchmark predictions and a basis for the design of alternative test systems for methodological comparisons.
