The training of large neural networks working with image data, as is the case for pixel-wise semantic segmentation of images, usually requires large amounts of precisely annotated training data. In addition, the training of models intended to work with examples from the real domain typically also requires training data from the same environment. The field of domain adaptation tries to tackle this approach by trying to use data from a different domain, as for example extracted from a simulation environment, to solve or at least increase the performance of certain models in the target domain. However, most of the time, no paired data from two different domains is available. This is especially true for the case of synthetic and real driving scenes. We propose a two-step training approach, extending the unsupervised MUNIT framework by a U-Net-style network to perform a semantic segmentation task trained purely using synthetic data. The domain-invariant output of the content encoder of a MUNIT image-to-image translation model created in a first training step acts, in a second step, as input encoding for the segmentation model. This second training step is a supervised one, using only synthetic labeled data which at inference is applied to real examples. Our approach is evaluated using synthetic data from GTA 5 adapted to real images from the Cityscapes dataset and is able to achieve results besting the current state of the art.


    Access

    Check access

    Check availability in my library

    Order at Subito €


    Export, share and cite



    Title :

    Unsupervised Domain Adaptation via Shared Content Representation for Semantic Segmentation




    Publication date :

    2021-09-19


    Size :

    1248795 byte




    Type of media :

    Conference paper


    Type of material :

    Electronic Resource


    Language :

    English