Nosey Parker is Praetorian’s secret detection tool that combines regular expression-based detection with machine learning (ML) to find misplaced secrets in source code and web data. Since the original blog post, the regex-based scanner has been reimplemented and released as an open-source project, and the proprietary ML-powered version has been reimplemented using the open-source project as a base. ML is used for two primary tasks within Nosey Parker: scoring regex-based findings and classifying content as “secret” or “not secret”. The fine-tuned models work well, eliminating 10-20% of total reported regex findings and detecting hundreds of real secrets that the regex-based detection engine and other rule-based tools missed. The proprietary ML-powered version has been completely reimplemented in Python and is now orchestrated to run work on powerful cloud-based VMs with GPUs in the cloud.
