Uncertainty estimates are crucial in many deep learning problems, e.g. for active learning or safety-critical applications. While Bayesian deep learning provides a framework to generate uncertainty estimates for deep learning models, it requires a well-specified prior which is in general unknown. This work aims to use large-scale datasets to learn an informative prior over the parameters of a neural network which can then be used in subsequent tasks to create better uncertainty estimations and tighter generalization bounds. The model uses scalable Laplace approximations to enable working with large-scale networks and datasets with little computational overhead compared to standard deep learning. Altogether, this transforms the problem of defining high-dimensional prior distributions with complex interactions between different weights to finding related datasets. To improve the generalization bounds for Laplace approximation, a novel method to scale the curvature using PAC-Bayesian bounds is proposed. For this, an approximate upper bound of the training error is derived for Laplace approximation that is optimized with respect to the curvature scales. Empirically, the learned prior needs less temperature scaling than isotropic Gaussian priors and produces similarly accurate predictions and uncertainty estimations. Moreover, non-vacuous generalization bounds are obtained for a LeNet-5 architecture on the NotMNIST dataset. In particular, the curvature scaling improves the bounds by up to 23 percent points while the empirically learned prior tightens the bound compared to isotropic Gaussian priors by an average of nine percent points, resulting in an upper bound of the generalization error of 65% on the NotMNIST dataset. Additionally, we introduce Progressive Bayesian Neural Networks (PBNN) that combine the learned prior with progressive neural networks to learn sequentially incoming tasks without catastrophic forgetting. Using an empirically learned prior on the ImageNet dataset, PBNN improve the accuracy and uncertainty on a large-scale robotics dataset compared to progressive neural networks and their variation with MC dropout. Moreover, we present a more accurate Kronecker-factorization of the Fisher Information Matrix (FIM) as an alternative to the widely adopted Kronecker-Factored Approximate Curvature (K-FAC). For this, we transform the optimal Kronecker-factored approximation of the FIM into a best rank-one approximation problem and solve this problem with a novel scalable version of the well-known power (iteration) method. In a proof-of-concept experiment, we show that the proposed algorithm can achieve more accurate estimates of the true FIM when compared to the K-FAC method.
Progressive Bayesian Neural Networks
2021-11-23
Sonstige
Elektronische Ressource
Englisch
Bridging the Reality Gap via Progressive Bayesian Optimisation
Springer Verlag | 2022
|Bayesian and Neural Networks for Preliminary Ship Design
British Library Conference Proceedings | 2001
|Bayesian and Neural Networks For Preliminary Ship Design
Online Contents | 2001
|