This talk will discuss how causal inference can be used to more reliably evaluate the performance and equity implications of machine learning algorithms used for decision making in high-stakes settings. It will demonstrate how standard evaluation procedures fail to address missing data and as a result, often produce invalid assessments of algorithmic performance. A new evaluation framework is proposed that addresses missing data by using counterfactual techniques to estimate unknown outcomes.