Neural networks are becoming increasingly popular for applications in various domains, but in practice, further methods are necessary to make sure the models are learning patterns that agree with prior knowledge about the domain. A new approach introduces an explanation method, called ‘expected gradients’, that enables training with theoretically motivated feature attribution priors, to improve model performance on real-world tasks.
- Gabriel Erion
- Joseph D. Janizek
- Su-In Lee