This article discusses the challenges of dealing with virtual libraries of gigascale and terrascale, which contain millions of potential hits and thousands of potential lead series for any target. It explains that new computational approaches are needed to handle such libraries, as they must be fast enough to process them and accurate enough to avoid false positives. The article also outlines some practical and cost-effective remedies for dealing with artefacts, such as selection based on the consensus of two different scoring functions and selection of highly diverse hits.
