Ulrike Luxburg gave a talk at NeurIPS last week which articulated the fundamental limitations of attempts to make deep machine learning models interpretable or explainable. The talk distinguished two different scenarios for explanation: cooperative and adversarial. In the cooperative scenario, both the principal and the user want the most accurate explanations, while in the adversarial scenario, the principal’s best interests are not aligned with the goal of accurate explanation.
