Reliable machine learning

Figure 1. When your deployed model meets unexpected working conditions...

When discussing machine learning and AI solutions for various applications, we should not only care about performance metrics, but also about the reliability of these systems. Indeed, we want to have solutions that do not stop working when they are deployed and experience a distribution of data that does not correspond to the one experienced during training. Moreover, we want to have ways and metrics to reliably determine if a model is certain in its prediction or not, instead of considering only performance metrics.

Learning across domains

The Vandal laboratory has a long experience working on topics related to domain adaptation and domain generalization. We have been devising solutions to make algorithmos more robust to domain shifts, e.g., for a self-driving car navigating in different environments/weather conditions (Figure 2). You can check the related publications down below to see a few works where we make use of these techniques.

Figure 2. Left: IDDA multi-domain dataset for autonomous driving. Right: samples from the SVOX dataset.

Uncertainty quantification and anomaly detection

Accurately predicting the outcome of real-world events using machine learning models requires a clear understanding of the model’s confidence in its predictions. This is particularly critical in high-stakes domains like healthcare, where decisions are made based on these predictions. While machine learning models often excel at making accurate predictions, quantifying the uncertainty surrounding these predictions remains a significant challenge. To be useful, uncertainty estimates must reliably indicate the range of possible outcomes and differentiate between predictions made with high and low confidence.

Related to this concept is the task of anomaly segmentation, which aims to segmenting the anomaly patterns which deviate from the normal patterns. A possible application of such a technology is to identify defects on production lines. One of our solutions, tailored for driving scenes, is Mask2Anomaly a universal anomaly segmentation architecture that reasons about anomalies not in terms of individual pixels, but rather per masks.

Figure 4. Mask2Anomaly algorithm for anomaly segmentation.

Related Publications

2024

Journal
Mask2Anomaly: Mask Transformer for Universal Open-set Segmentation

Shyam Nandan Rai, Fabio Cermelli, Barbara Caputo, and Carlo Masone

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

Abs Bib HTML Code

Segmenting unknown or anomalous object instances is a critical task in autonomous driving applications, and it is approached traditionally as a per-pixel classification problem. However, reasoning individually about each pixel without considering their contextual semantics results in high uncertainty around the objects’ boundaries and numerous false positives. We propose a paradigm change by shifting from a per-pixel classification to a mask classification. Our mask-based method, Mask2Anomaly, demonstrates the feasibility of integrating a mask-classification architecture to jointly address anomaly segmentation, open-set semantic segmentation, and open-set panoptic segmentation. Mask2Anomaly includes several technical novelties that are designed to improve the detection of anomalies/unknown objects: i) a global masked attention module to focus individually on the foreground and background regions; ii) a mask contrastive learning that maximizes the margin between an anomaly and known classes; iii) a mask refinement solution to reduce false positives; and iv) a novel approach to mine unknown instances based on the mask- architecture properties. By comprehensive qualitative and qualitative evaluation, we show Mask2Anomaly achieves new state-of-the-art results across the benchmarks of anomaly segmentation, open-set semantic segmentation, and open-set panoptic segmentation.
@article{Rai-2024-mask2anomaly, author = {Rai, Shyam Nandan and Cermelli, Fabio and Caputo, Barbara and Masone, Carlo}, journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence}, title = {{Mask2Anomaly}: Mask Transformer for Universal Open-set Segmentation}, year = {2024}, volume = {}, number = {}, pages = {1-17}, doi = {10.1109/TPAMI.2024.3419055}, keywords = {fine grained understanding, driving, uncertainty quantification, spatial intelligence}, }

2023

Journal
Hierarchical Instance Mixing Across Domains in Aerial Segmentation

Edoardo Arnaudo, Antonio Tavera, Carlo Masone, Fabrizio Dominici, and Barbara Caputo

IEEE Access, 2023

Abs arXiv Bib HTML

We investigate the task of unsupervised domain adaptation in aerial semantic segmentation observing that there are some shortcomings in the class mixing strategies used by the recent state-of-the-art methods that tackle this task: 1) they do not account for the large disparity in the extension of the semantic categories that is common in the aerial setting, which causes a domain imbalance in the mixed image; 2) they do not consider that aerial scenes have a weaker structural consistency in comparison to the driving scenes for which the mixing technique was originally proposed, which causes the mixed images to have elements placed out of their natural context; 3) source model used to generate the pseudo-labels may be susceptible to perturbations across domains, which causes inconsistent predictions on the target images and can jeopardize the mixing strategy. We address these shortcomings with a novel aerial semantic segmentation framework for UDA, named HIUDA, which is composed of two main technical novelties: firstly, a new mixing strategy for aerial segmentation across domains called Hierarchical Instance Mixing (HIMix), which extracts a set of connected components from each semantic mask and mixes them according to a semantic hierarchy and secondly, a twin-head architecture in which two separate segmentation heads are fed with variations of the same images in a contrastive fashion to produce finer segmentation maps. We conduct extensive experiments on the LoveDA benchmark, where our solution outperforms the current state-of-the-art.
@article{Arnaudo-2023-hierarchical, author = {Arnaudo, Edoardo and Tavera, Antonio and Masone, Carlo and Dominici, Fabrizio and Caputo, Barbara}, journal = {IEEE Access}, title = {Hierarchical Instance Mixing Across Domains in Aerial Segmentation}, year = {2023}, volume = {11}, number = {}, pages = {13324-13333}, doi = {10.1109/ACCESS.2023.3243475}, keywords = {fine grained understanding, aerial, robust learning, spatial intelligence}, }
Conference
Unmasking Anomalies in Road-Scene Segmentation

Shyam Nandan Rai, Fabio Cermelli, Dario Fontanel, Carlo Masone, and Barbara Caputo

In IEEE/CVF International Conference on Computer Vision (ICCV), 2023

Oral Abs arXiv Bib HTML Video Code

Within the top 2.3% of submitted papers, and the top 9% of accepted papers.

Anomaly segmentation is a critical task for driving applications, and it is approached traditionally as a per-pixel classification problem. However, reasoning individually about each pixel without considering their contextual semantics results in high uncertainty around the objects’ boundaries and numerous false positives. We propose a paradigm change by shifting from a per-pixel classification to a mask classification. Our mask-based method, Mask2Anomaly, demonstrates the feasibility of integrating an anomaly detection method in a mask-classification architecture. Mask2Anomaly includes several technical novelties that are designed to improve the detection of anomalies in masks: i) a global masked attention module to focus individually on the foreground and background regions; ii) a mask contrastive learning that maximizes the margin between an anomaly and known classes; and iii) a mask refinement solution to reduce false positives. Mask2Anomaly achieves new state-of-the-art results across a range of benchmarks, both in the per-pixel and component-level evaluations. In particular, Mask2Anomaly reduces the average false positives rate by 60% w.r.t. the previous state-of-the-art.
@inproceedings{Rai-2023-unmasking, author = {Rai, Shyam Nandan and Cermelli, Fabio and Fontanel, Dario and Masone, Carlo and Caputo, Barbara}, booktitle = {IEEE/CVF International Conference on Computer Vision (ICCV)}, title = {Unmasking Anomalies in Road-Scene Segmentation}, year = {2023}, volume = {}, number = {}, pages = {4014-4023}, doi = {10.1109/ICCV51070.2023.00373}, keywords = {fine grained understanding, driving, uncertainty quantification, spatial intelligence}, }

2022

Journal
Adaptive-Attentive Geolocalization From Few Queries: A Hybrid Approach

Valerio Paolicelli, Gabriele Berton, Francesco Montagna, Carlo Masone, and Barbara Caputo

Frontiers in Computer Science, 2022

Abs Bib HTML PDF

We tackle the task of cross-domain visual geo-localization, where the goal is to geo-localize a given query image against a database of geo-tagged images, in the case where the query and the database belong to different visual domains. In particular, at training time, we consider having access to only few unlabeled queries from the target domain. To adapt our deep neural network to the database distribution, we rely on a 2-fold domain adaptation technique, based on a hybrid generative-discriminative approach. To further enhance the architecture, and to ensure robustness across domains, we employ a novel attention layer that can easily be plugged into existing architectures. Through a large number of experiments, we show that this adaptive-attentive approach makes the model robust to large domain shifts, such as unseen cities or weather conditions. Finally, we propose a new large-scale dataset for cross-domain visual geo-localization, called SVOX.
@article{Paolicelli-2022-adageov2, author = {Paolicelli, Valerio and Berton, Gabriele and Montagna, Francesco and Masone, Carlo and Caputo, Barbara}, title = {Adaptive-Attentive Geolocalization From Few Queries: A Hybrid Approach}, journal = {Frontiers in Computer Science}, volume = {4}, year = {2022}, doi = {10.3389/fcomp.2022.841817}, issn = {2624-9898}, keywords = {localization, robust learning, spatial intelligence}, }
Workshop
Augmentation Invariance and Adaptive Sampling in Semantic Segmentation of Agricultural Aerial Images

A. Tavera, E. Arnaudo, C. Masone, and B. Caputo

In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2022

Abs arXiv Bib HTML Code

In this paper, we investigate the problem of Semantic Segmentation for agricultural aerial imagery. We observe that the existing methods used for this task are designed without considering two characteristics of the aerial data: (i) the top-down perspective implies that the model cannot rely on a fixed semantic structure of the scene, because the same scene may be experienced with different rotations of the sensor; (ii) there can be a strong imbalance in the distribution of semantic classes because the relevant objects of the scene may appear at extremely different scales (e.g., a field of crops and a small vehicle). We propose a solution to these problems based on two ideas: (i) we use together a set of suitable augmentation and a consistency loss to guide the model to learn semantic representations that are invariant to the photometric and geometric shifts typical of the top-down perspective (Augmentation Invariance); (ii) we use a sampling method (Adaptive Sampling) that selects the training images based on a measure of pixel-wise distribution of classes and actual network confidence. With an extensive set of experiments conducted on the Agriculture-Vision dataset, we demonstrate that our proposed strategies improve the performance of the current state-of-the-art method.
@inproceedings{Tavera-2022-augmentation, author = {Tavera, A. and Arnaudo, E. and Masone, C. and Caputo, B.}, title = {Augmentation Invariance and Adaptive Sampling in Semantic Segmentation of Agricultural Aerial Images}, booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)}, year = {2022}, pages = {1656-1665}, doi = {10.1109/CVPRW56347.2022.00172}, keywords = {fine grained understanding, aerial images, robust learning, spatial intelligence}, }
Conference
Pixel-by-Pixel Cross-Domain Alignment for Few-Shot Semantic Segmentation

A. Tavera, F. Cermelli, C. Masone, and B. Caputo

In IEEE Winter Conference on Applications of Computer Vision (WACV), 2022

Abs arXiv Bib HTML Code

In this paper we consider the task of semantic segmentation in autonomous driving applications. Specifically, we consider the cross-domain few-shot setting where training can use only few real-world annotated images and many annotated synthetic images. In this context, aligning the domains is made more challenging by the pixel-wise class imbalance that is intrinsic in the segmentation and that leads to ignoring the underrepresented classes and overfitting the well represented ones. We address this problem with a novel framework called Pixel-By-Pixel Cross-Domain Alignment (PixDA). We propose a novel pixel-by-pixel domain adversarial loss following three criteria: (i) align the source and the target domain for each pixel, (ii) avoid negative transfer on the correctly represented pixels, and (iii) regularize the training of infrequent classes to avoid overfitting. The pixel-wise adversarial training is assisted by a novel sample selection procedure, that handles the imbalance between source and target data, and a knowledge distillation strategy, that avoids overfitting towards the few target images. We demonstrate on standard synthetic-to-real benchmarks that PixDA outperforms previous state-of-the-art methods in (1-5)-shot settings.
@inproceedings{Tavera-2022-pixda, author = {Tavera, A. and Cermelli, F. and Masone, C. and Caputo, B.}, title = {Pixel-by-Pixel Cross-Domain Alignment for Few-Shot Semantic Segmentation}, booktitle = {IEEE Winter Conference on Applications of Computer Vision (WACV)}, year = {2022}, pages = {1626-1635}, doi = {10.1109/WACV51458.2022.00202}, keywords = {fine grained understanding, driving, robust learning, spatial intelligence}, }

2021

Conference
Reimagine BiSeNet for Real-Time Domain Adaptation in Semantic Segmentation

A. Tavera, C. Masone, and B. Caputo

In Proceedings of the I-RIM 2021 Conference, 2021

Abs arXiv Bib Code

Semantic segmentation models have reached remarkable performance across various tasks. However, this performance is achieved with extremely large models, using powerful computational resources and without considering training and inference time. Real-world applications, on the other hand, necessitate models with minimal memory demands, efficient inference speed, and executable with low-resources embedded devices, such as self-driving vehicles. In this paper, we look at the challenge of real-time semantic segmentation across domains, and we train a model to act appropriately on real-world data even though it was trained on a synthetic realm. We employ a new lightweight and shallow discriminator that was specifically created for this purpose. To the best of our knowledge, we are the first to present a real-time adversarial approach for assessing the domain adaption problem in semantic segmentation. We tested our framework in the two standard protocol: GTA5 to Cityscapes and SYNTHIA to Cityscapes.
@inproceedings{Tavera-2021-reimagine, author = {Tavera, A. and Masone, C. and Caputo, B.}, title = {Reimagine {BiSeNet} for Real-Time Domain Adaptation in Semantic Segmentation}, booktitle = {Proceedings of the I-RIM 2021 Conference}, year = {2021}, pages = {33-37}, doi = {10.5281/zenodo.5900517}, keywords = {fine grained understanding, spatial intelligence, robust learning}, }
Conference
Viewpoint Invariant Dense Matching for Visual Geolocalization

G. Berton, C. Masone, V. Paolicelli, and B. Caputo

In IEEE/CVF International Conference on Computer Vision (ICCV), 2021

Abs arXiv Bib HTML Code

In this paper we propose a novel method for image matching based on dense local features and tailored for visual geolocalization. Dense local features matching is robust against changes in illumination and occlusions, but not against viewpoint shifts which are a fundamental aspect of geolocalization. Our method, called GeoWarp, directly embeds invariance to viewpoint shifts in the process of extracting dense features. This is achieved via a trainable module which learns from the data an invariance that is meaningful for the task of recognizing places. We also devise a new self-supervised loss and two new weakly supervised losses to train this module using only unlabeled data and weak labels. GeoWarp is implemented efficiently as a re-ranking method that can be easily embedded into pre-existing visual geolocalization pipelines. Experimental validation on standard geolocalization benchmarks demonstrates that GeoWarp boosts the accuracy of state-of-the-art retrieval architectures.
@inproceedings{Berton-2021-viewpoint, author = {Berton, G. and Masone, C. and Paolicelli, V. and Caputo, B.}, booktitle = {IEEE/CVF International Conference on Computer Vision (ICCV)}, title = {Viewpoint Invariant Dense Matching for Visual Geolocalization}, year = {2021}, volume = {}, number = {}, pages = {12149-12158}, doi = {10.1109/ICCV48922.2021.01195}, keywords = {localization, robust learning, spatial intelligence}, }

2020

Journal
IDDA: A Large-Scale Multi-Domain Dataset for Autonomous Driving

E. Alberti, A. Tavera, C. Masone, and B. Caputo

IEEE Robotics and Automation Letters, 2020

Abs arXiv Bib HTML Website

Semantic segmentation is key in autonomous driving. Using deep visual learning architectures is not trivial in this context, because of the challenges in creating suitable large scale annotated datasets. This issue has been traditionally circumvented through the use of synthetic datasets, that have become a popular resource in this field. They have been released with the need to develop semantic segmentation algorithms able to close the visual domain shift between the training and test data. Although exacerbated by the use of artificial data, the problem is extremely relevant in this field even when training on real data. Indeed, weather conditions, viewpoint changes and variations in the city appearances can vary considerably from car to car, and even at test time for a single, specific vehicle. How to deal with domain adaptation in semantic segmentation, and how to leverage effectively several different data distributions (source domains) are important research questions in this field. To support work in this direction, this letter contributes a new large scale, synthetic dataset for semantic segmentation with more than 100 different source visual domains. The dataset has been created to explicitly address the challenges of domain shift between training and test data in various weather and view point conditions, in seven different city types. Extensive benchmark experiments assess the dataset, showcasing open challenges for the current state of the art.
@article{Alberti-2020-idda, author = {Alberti, E. and Tavera, A. and Masone, C. and Caputo, B.}, journal = {IEEE Robotics and Automation Letters}, title = {IDDA: A Large-Scale Multi-Domain Dataset for Autonomous Driving}, year = {2020}, volume = {5}, number = {4}, pages = {5526-5533}, doi = {10.1109/LRA.2020.3009075}, keywords = {fine grained understanding, driving, robust learning, spatial intelligence}, }