Methods for interpreting and explaining AI models

Published: March 29, 2023

AI models are inherently difficult to interpret so methods have been developed to support interpretability and explainability. Interpretability can be defined as being able to understand what caused a decision or being able to predict/explain the result.

These methods can be broadly divided into:

inherent explainability: using ‘interpretable’ models where we can directly look at understand the parameters
post-hoc explainability, where we develop explanations after model trained.

Inherent explainability

Models which are inherently interpretable include:

linear regression - we can look at coefficients
decision trees (we can look at branch points
generalised linear models - again, we can look at coefficients
naive bayes - we can look at the conditional probabilities
nearest neighbours - we can look at each of the nearest neighbours

Of note, these do not include deep learning models, which are coming to dominant the field of AI.

Post-hoc explainability

This includes:

surrogate methods: building a simpler, interpretable model which models the model. Can be done on a globally (across whole spectrum of inputs/outputs) or locally (such as ‘LIME’ - local interpretable model-agnostic explanations)
investigate the impact of features: look at how predictions change when a specific parameter is changes (partial dependence plots), look at the importance of different features (permutation feature importance)
visualise the features themselves (‘feature visualisation’), such as heatmaps/saliency maps for CNNs or language models

Limitations of current methods

Saliency maps are good for where but they don’t tell us what. Therefore, they require a step of human interpretation, which has a risk of bias.

In healthcare, medication is often effectively a black box. E.g. mechanism of action of paracetamol is not well-understood. RCTs have been used for providing that it works. The same could be applied to healthcare.

The false hope of current approaches to explainable artificial intelligence in health care - critique of interpretability methods, with a focus on healthcare.
Producing simple text descriptions for AI interpretability - Luke Oakden-rayner
Demystifying black-box models with symbolic metamodels - van der Schaar
Semi-interpretable Probabilistic Models by Brooks Paige
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead - Rudin