Foundations of Symbolic Languages for Model Interpretability
Revista : Proceedings of the 35th Annual Conference on Neural Information Processing Systems (NeurIPS 2021Tipo de publicación : ISI
Abstract
Several queries and scores have recently been proposed to explain individual predictions over ML models. Examples include queries based on anchors, which are parts of an instance that are sufficient to justify its classification, and feature-perturbation scores such as SHAP. Given the need for flexible, reliable, and easy-to-apply interpretability methods for ML models, we foresee the need for developing declarative languages to naturally specify different explainability queries. We do this in a principled way by rooting such a language in a logic called FOIL, which allows for expressing many simple but important explainability queries, and might serve as a core for more expressive interpretability languages. We study the computational complexity of FOIL queries over two classes of ML models often deemed to be easily interpretable: decision trees and more general decision diagrams. Since the number of possible inputs for an ML model is exponential in its dimension, tractability of the FOIL evaluation problem is delicate but can be achieved by either restricting the structure of the models, or the fragment of FOIL being evaluated. We also present a prototype implementation of FOIL wrapped in a high-level declarative language and perform experiments showing that such a language can be used in practice.