Uncertainty estimates are crucial in many deep learning problems, e.g. for active learning or safety-critical applications. While Bayesian deep learning provides a framework to generate uncertainty estimates for deep learning models, it requires a well-specified prior which is in general unknown. This work aims to use large-scale datasets to learn an informative prior over the parameters of a neural network which can then be used in subsequent tasks to create better uncertainty estimations and tighter generalization bounds. The model uses scalable Laplace approximations to enable working with large-scale networks and datasets with little computational overhead compared to standard deep learning. Altogether, this transforms the problem of defining high-dimensional prior distributions with complex interactions between different weights to finding related datasets. To improve the generalization bounds for Laplace approximation, a novel method to scale the curvature using PAC-Bayesian bounds is proposed. For this, an approximate upper bound of the training error is derived for Laplace approximation that is optimized with respect to the curvature scales. Empirically, the learned prior needs less temperature scaling than isotropic Gaussian priors and produces similarly accurate predictions and uncertainty estimations. Moreover, non-vacuous generalization bounds are obtained for a LeNet-5 architecture on the NotMNIST dataset. In particular, the curvature scaling improves the bounds by up to 23 percent points while the empirically learned prior tightens the bound compared to isotropic Gaussian priors by an average of nine percent points, resulting in an upper bound of the generalization error of 65% on the NotMNIST dataset. Additionally, we introduce Progressive Bayesian Neural Networks (PBNN) that combine the learned prior with progressive neural networks to learn sequentially incoming tasks without catastrophic forgetting. Using an empirically learned prior on the ImageNet dataset, PBNN improve the accuracy and uncertainty on a large-scale robotics dataset compared to progressive neural networks and their variation with MC dropout. Moreover, we present a more accurate Kronecker-factorization of the Fisher Information Matrix (FIM) as an alternative to the widely adopted Kronecker-Factored Approximate Curvature (K-FAC). For this, we transform the optimal Kronecker-factored approximation of the FIM into a best rank-one approximation problem and solve this problem with a novel scalable version of the well-known power (iteration) method. In a proof-of-concept experiment, we show that the proposed algorithm can achieve more accurate estimates of the true FIM when compared to the K-FAC method.


    Zugriff

    Download


    Exportieren, teilen und zitieren



    Titel :

    Progressive Bayesian Neural Networks


    Beteiligte:

    Erscheinungsdatum :

    2021-11-23


    Medientyp :

    Sonstige


    Format :

    Elektronische Ressource


    Sprache :

    Englisch




    Bridging the Reality Gap via Progressive Bayesian Optimisation

    Yu, Chen / Rosendo, Andre | Springer Verlag | 2022


    Bayesian and Neural Networks for Preliminary Ship Design

    Clausen, H. B. / Lutzen, M. / Friis-Hansen, A. et al. | British Library Conference Proceedings | 2001


    Bayesian and Neural Networks For Preliminary Ship Design

    Clausen, H.B. | Online Contents | 2001


    Empirical Modeling Based on Neural Networks and Bayesian Learning

    Makeev, Andrew / Nikishkov, Yuri / Armanios, Erian | AIAA | 2004