(2022, Error Severity) Handling Error Severity in Neural Networks with Semantic Knowledge

Addressing Mistake Severity in Neural Networks with Semantic Knowledge

Official account: EDPJ

Table of contents

0. Summary

1 Introduction

2. Related work

3. Method

4. Experiment

5. Results

5.1 Against disturbances

5.2 Natural Corruptions

6. Discussion and conclusion 

7. Future of work

8. Wider impact

reference

S. Summary

S.1 Main idea

S.2 Explanation of terms

S.3 method

S.4 Analysis


0. Summary

In general, the robustness of deep neural networks and machine learning algorithms is an open research challenge. In particular, it is difficult to ensure that algorithm performance is maintained on out-of-distribution inputs or anomalous instances that cannot be expected at training time. Embodied agents, e.g., robots, will be used under these conditions and will likely make incorrect predictions. A proxy will be considered untrustworthy unless it can maintain its performance in a dynamic environment. Most robust training techniques aim to increase the accuracy of a model on perturbing inputs; as another form of robustness, we aim to reduce the severity of errors made by neural networks under challenging conditions. We leverage current adversarial training methods to generate targeted adversarial attacks during training to increase the semantic similarity between model predictions and the true labels of misclassified instances. The results show that our method performs better in terms of error severity compared to standard models and adversarially trained models. We also find an interesting role played by non-robust features in terms of semantic similarity.

1 Introduction

Traditionally, the success of machine learning systems has been measured using simple metrics that treat all errors at the category level equally. However, this limited definition contradicts the natural intuition that some errors are much more serious than others. As machine learning proliferates across physical agents, distributional shifts, anomalies, and unique situations will make it extremely difficult to guarantee that all agents will operate without making mistakes. Likewise, current definitions of robustness measure the ability of machine learning systems to maintain classification accuracy in changing or degrading environments. We consider robustness from a different angle: models are evaluated not only by their accuracy, but also by the severity of the errors they make. Using these metrics as additional objectives can facilitate the development of training techniques that prioritize lower margins of error.

Consider an autonomous home care robot based on a computer vision system for object detection. With the large-scale deployment of robotic cells and the increase in deployment time, object detection systems will be challenged by changes in the environment and temporal distribution, resulting in a decrease in detection accuracy. Additionally, a robot in these environments could end up making costly mistakes; for example, it could mistake a phone for a plate, cause it to break in the dishwasher, or serve houseplants instead of salad at dinner. It can be assumed that a robot product that errs so badly will not earn the trust of users, and probably won't be kept. However, if the robot makes minor mistakes, they may be ignored; users are more likely to forgive the robot for putting a placemat in the dishwasher instead of a phone.

When measuring bug severity, some bugs can be manually assigned higher penalties. However, this approach requires direct human input and becomes unwieldy as the room for possible error increases. As an alternative to human-assigned penalties, we measure error severity in terms of semantic similarity between ground truth and predicted classes. The motivation for this measure is intuitive; consider, for example, an autonomous car misidentifying a pedestrian as a fallen tree branch vs misidentifying a person as a cyclist. Bicyclists are semantically closer to pedestrians than pedestrians, so cars are correspondingly more likely to take appropriate actions. This metric can also be viewed as a "semantic robustness" metric; a model that makes semantic alignment errors (pedestrian → cyclist) under perturbations is more semantically robust than a model that makes random errors (pedestrian → tree branch).

We propose a method for incorporating semantic knowledge into the training process using adversarial training, with the goal of improving the semantic alignment between errors and their respective ground truth labels. While other methods exist for embedding category-hierarchical information in neural networks, our method has the added benefit of providing a perspective on the relationship between robust and non-robust features and semantic alignment.

We also evaluate the error severity of our method under adversarial and naturally corrupted conditions—for example, blurring or changes in brightness and saturation. These types of degradations reduce the model's discriminative power and cause it to make more mistakes. In our study, we use these conditions as a proxy for distribution shifts and other sources of error, and aim to reduce error severity under these conditions.

The contributions of this work are as follows:

  • We propose a method based on targeted adversarial training to increase alignment of semantically similar categories.
  • We show that our method produces models that perform better in terms of error severity than models trained using standard models and common adversarials under multiple degradation conditions.
  • We discuss the surprising role of non-robust features in supporting semantic alignment.

2. Related work

Since the concept of adversarial examples was introduced, there have been many works trying to understand adversarial attacks. Ilyas et al. (2019) attribute the presence of adversarial examples to "non-robust" features in the dataset, which provide useful classification signals to the model but are meaningless (and often imperceptible) to humans. Adversarially strong networks are limited to using "strong" features—features that are still useful for classification even when adversarial perturbations are applied. In other words, Ilyas et al. hypothesize that there may be signals in the dataset that are useful for standard classification tasks but can be easily exploited in an adversarial setting.

Current adversarial training techniques focus heavily on pixel-level perturbations, similar to the "gold standard" robust optimization techniques introduced by Madry et al. (2018). There have been some works extending this to changes in noise, lighting, or other biases. That said, there are also examples in the literature of generating semantically adversarial examples that, unlike classical imperceptible pixel perturbations, focus on creating inputs that modify semantically meaningful properties, resulting in semantically meaningful Images that remain visually faithful to the ground truth label (ship vs. ship with wheels).

While there is also previous work addressing the problem of error severity in neural networks, Bertinetto et al. (2020) show that error severity has stagnated even though the highest accuracy of state-of-the-art classifiers has shown steady performance improvements over the past five years not before. They also provided a survey of jobs on "making better mistakes," identifying three main approaches.

  • The first approach is to embed semantic knowledge in labels, which attempts to modify category representations into more semantically aligned embeddings, e.g. by extracting from textual sources such as Wikipedia.
  • The second approach is to use a hierarchical loss, i.e. changing the loss function to penalize predictions that are far from the true label on the classification tree.
  • The last approach discussed is to use hierarchies, incorporating semantic category hierarchies into classifiers without modifying the loss function. They include two variants of the standard cross-entropy loss to incorporate prior knowledge into the model.

In the context of adversarial training, Ma et al. (2021) introduce the concept of hierarchical adversarial robustness to address error severity. Hierarchical adversarial robustness relies on the notion of hierarchical adversarial examples — adversarial examples that lead to misclassifications at a “coarse” level (i.e., misclassifications outside the true label superclass). They created a layered network consisting of a network to recognize the coarse class of images, and then used a network specific to the coarse class to recognize the "fine" class of images. However, they work on guarding against adversarial examples, whereas our goal is to increase the semantic alignment of model predictions to reduce error severity in both adversarial and natural conditions.

3. Method

We first define the standard classification task as follows: Let D be the distribution of the data, from which we have input pairs (x,y), where x is the sample point whose true label is y. Given a machine learning model f_θ parameterized by θ, the loss function can be written as L(f_θ (x), y). The standard training procedure then aims to:

Furthermore, we refer to an adversarial robust model as a model trained using untargeted adversarial training (i.e., finding a perturbation that causes any misclassification, regardless of what the mislabel is), with the adversarial perturbation designated as δ and governed by ε constraint. The adversarial training objective is defined as: 

Semantic knowledge is embedded into the training process by using semantically targeted adversarial attacks. Unlike non-targeted approaches, this approach generates perturbations that trick the model into predicting the specified (target) class. We finally use a staged training (ST) approach, applying semantically targeted adversarial training in the first stage and standard training in the second stage. Our semantically targeted training method is adapted from a targeted version of the adversarial training target (thus switching the formulation of the problem from a min-max approach to a two-layer optimization method), the target t is obtained from a set of classes C that are semantically similar to the original label y. selected in (y). Specifically, our objective function is: 

where δ* is the value required for a perturbation that causes misclassification to the target label t: t ∈ C(y) 

4. Experiment

We evaluated our method using two concepts of semantic similarity: path similarity according to WordNet (i.e. the inverse of the shortest path length between two words in the WordNet structure), and coarse (super) class grouping of labels based on CIFAR100 . CIFAR100 provides 100 fine classes, which are divided into 20 coarse classes. 

We let C(y) be the set of five labels with the highest path similarity to y. The target labels for each adversarial attack are uniformly sampled randomly from C(y). All models use the ResNet50 architecture and are trained with CIFAR100. We use a learning rate of 0.1 and a batch size of 100, and standard values ​​for the remaining training parameters. Furthermore, data augmentation is applied in the form of random cropping and random horizontal flipping.

To generate target perturbations, we use the rAI-toolbox developed by Soklaski et al. (2022) against a 10-step projected gradient descent (PGD) adversarial confined within a sphere of size ε in L2 space. The learning rate of the PGD solver is 2.5*ε/10. We experimented with the value of epsilon in different models. In addition, we also experimented with label modification by splitting the labels of perturbed images into original and target categories to address larger perturbations.

We measure error severity as the average path similarity between the model's incorrect predictions and the corresponding ground truth labels. In addition, we also measure the coarse classification accuracy of model errors (i.e., the proportion of model misclassifications among correct coarse classifications). We found that these two indicators are highly consistent, and the data trends of these two indicators are almost the same.

The models we compare are as follows:

  • Standard model : A model trained for 200 epoch standards.
  • Adversarial Robust Model : The model is trained adversarially for 200 epochs without targeting, using a perturbation of ε = 1.
  • Low Epsilon Semantic Targeting Model (LE-SmT) : A model for semantic target adversarial training using ε = 1 as the L2 perturbation constraint; trained for 200 epochs. The use of small epsilons in this initial model is based on a standard adversarial framework in which perturbations are imperceptible to the human eye.
  • High Epsilon Semantic Targeting Model (HE-SmT) : A model for semantic target adversarial training using an L2 perturbation constraint of ε = 2.5; trained for 200 epochs. We experimented with higher epsilon in this model to account for the fact that the target class may not be close to the original class in the embedding space, so target adversarial attacks against low epsilon values ​​may fail.
  • HE-SmT with Modified Labels (HE-SmT-LM) : A model for semantically targeted adversarial training with an L2 perturbation constraint of ε = 2.5; trained for 300 epochs. To better account for large perturbations, we set the labels of perturbed instances as a mixture of target and original images. We modify the one-hot encoded labels so that both the index of the ground truth class and the index of the target class are set to 0.5.
  • Stage-Trained Models (ST) : Models trained using the stage-training method, combining our semantic targeting approach with standard training. The motivation behind this approach is to apply our semantic targeting approach to increase the alignment between similar classes, while also exploiting non-robust signals that seem to contribute to the performance of baseline standard models. The model is trained semantically for 200 epochs (training with the prescribed settings for HE-SmT-LM), followed by standard training for 100 epochs.

We performed all experiments on a cluster using NVIDIA Tesla V100 GPUs. A round of standard training takes about 1 minute, and a round of semantic-targeted training and non-targeted adversarial training takes about 10 minutes.

5. Results

In this section, we show error severity results for adversarial perturbations and natural corruptions. We provide evidence that ST models outperform other models on this metric.

5.1 Against disturbances

We measure error severity on increasingly perturbed data using untargeted perturbations, and specifically consider the range ε ​​= 1.5 to ε = 2.0 as a proxy for imperfectly conditioned data, as we originally aimed at Robustness under challenging conditions. Figure 1 shows examples of perturbations in this range. Our baseline models are standard and adversarial models. The standard model achieves the best semantic alignment (least error) on clean data (0 perturbation) errors among the compared models. Adversarial robust models have much worse error severity than standard models, even on data with a high degree of perturbation (perturbations do not affect them much because they are robust).

Both LE-SmT and HE-SmT models failed to improve error severity. We found that low epsilon semantically targeted attacks had lower success rates early in training compared to non-targeted attacks; therefore, our other models (HE-SmT-LM and ST) focused on higher epsilon values ​​to ensure Classes that are initially misaligned carry a higher attack success rate. HE-SmT-LM yields slightly improved results, but still cannot compete with the standard model. The ST model recovers some semantic alignment on clean data and also outperforms all other models on highly perturbed data. The results of all models are shown in Fig. 2.

5.2 Natural Corruptions

Furthermore, we compare the error severity of the standard model, the adversarial robust model and the ST model on the CIFAR-100-C dataset. The CIFAR-100-C dataset applies common corruptions, such as contrast changes or blurring, to images from CIFAR-100. This dataset allows us to test model performance under natural damage conditions as it provides natural damages of varying severity. Damage is measured on a scale of 1-5, with 1 being the least severe and 5 being the most severe; we refer to severity 1 as low severity and severity 5 as high severity. We evaluate model performance on test data from 19 corruption sources.

The standard model outperforms both the adversarial robust model and the ST model on low-severity damages, achieving the highest semantic alignment of errors on 11/19 damage types. For low-severity damage, ST performed best with 8 types. However, for high-severity corruptions, ST had the highest false semantic alignment on 9/19 corruption types. The standard model only performs best on 4/19, while the adversarial robust model performs best on 6/19. Under high-severity damage, ST outperforms the standard model in terms of error severity for the 9/19 damage type. An example of specific damage is shown in Figure 3. Results for all severity levels are shown in Figure 4 . 

These results provide further evidence that ST preserves erroneous semantic alignments under highly degenerate conditions relative to standard and adversarially trained models.

6. Discussion and conclusion 

First, we note that many definitions of robustness use accuracy against a particular type of corruption as a success metric; these definitions fail to account for the severity of model errors. Our experiments reveal a gap between currently popular robustness techniques and their effectiveness at measuring error severity. We demonstrate a method for reducing error severity under both adversarial and natural degradation conditions, showing that the semantic consistency of errors can be improved by incorporating this objective into the model training process. We hope these results will drive further improvements to the training process to reduce the severity of errors.

Second, we observe the role of non-robust signals in category semantic alignment. Since non-robust signals are visually imperceptible and do not affect human perception of image semantics, it can be intuitively assumed that non-robust signals are less semantically aligned than robust signals, and thus robust models should be more semantically aligned . However, we found the opposite in our experiments. Models trained to exploit robust signals perform worse on wrong semantic alignments than models not trained adversarially. This is true even for semantically targeted adversarial training. Our approach to semantic training is only effective in our experiments when the model is allowed to exploit a certain amount of non-robust signals through staged training. We hypothesize that our two-step training method creates semantic alignments between non-robust features; the ability to use non-robust features when robust features degenerate will help increase false semantic alignments. Alternatively, non-robust features may be more closely related to semantic alignment than initially expected.

Finally, we emphasize that semantic alignment does not always coincide with visual alignment. Since neural networks rely on visual features of the data, it is interesting to consider the extent to which external sources of semantic knowledge can compensate for visual differences in categories. Bertinetto et al. (2020) discuss a similar question of the extent to which semantic knowledge can be arbitrary (thus breaking any correlation between semantic and visual similarity). Bertinetto (2020) et al. found that using arbitrary semantic hierarchies in their method resulted in a large performance drop in terms of error severity. It will be interesting to explore whether these results apply to our method, and whether the type of perturbation used has an effect on the results.

7. Future of work

Results from the staged training approach show that our semantic localization approach facilitates some alignment of classes, but the limited use of non-robust signals hinders the success of our approach. This opens an interesting avenue for exploring the contribution of non-robust features to similar class alignments, especially given that non-robust signals often refer to data that are meaningless according to human perception.

Future work could compare our method with alternatives that embed semantic knowledge, such as modifying the loss function to penalize more severe errors. Another direction could be to explore alternative perturbation methods that apply more semantically meaningful perturbations, such as attribute-guided perturbation or spatial transformation-based perturbation.

We also note that in this work we quantify error severity by measuring semantic alignment of classes rather than visual alignment; however, in some cases we may prefer to associate a class with a visually similar class rather than Not semantically similar class alignments. For example, an autonomous home help robot sent to pick up cough syrup might be less believable if it mistakenly retrieved a prescription drug instead of a bottle of apple juice, even though the prescription drug is more semantically similar to the real target. Therefore, future work can consider the issue of balancing the importance of semantic similarity and visual similarity. In particular, it is also useful to explore the potential use of non-robust features to add semantic information inconsistent with the visual similarity of classes.

8. Wider impact

We believe this work has the potential to further advance robust machine learning efforts and build trust with any concrete system using these algorithms. Furthermore, it encourages a different perspective on the usual definition of robustness and explores metrics that have stagnated in the past few years of progress. More importantly, we anticipate that finding new ways to leverage human knowledge and provide models with more semantic coherence will be important as one seeks to move forward from narrow AI. Not only can semantics provide models with the ability to better understand their environment, but they can also be a way of mitigating known biases in data—for example, can we encode more "fair" associations that are more in line with today's societal expectations? However, as with almost all research in this area, practitioners need to exercise caution, especially to avoid embedding biased semantic relations in their models. Fortunately, machine learning scientists and engineers are increasingly aware of these issues, and we believe that contributing work that helps build more human-like, trustworthy, robust models will lead to safer operations in general.

reference

Abreu N, Vaska N, Helus V. Addressing Mistake Severity in Neural Networks with Semantic Knowledge[J]. arXiv preprint arXiv:2211.11880, 2022.

S. Summary

S.1 Main idea

The severity of the different errors of the (classification) model is different. The authors exploit the semantic difference between model predictions and ground truth labels to quantify error severity for generating targeted adversarial attacks to improve model robustness.

S.2 Explanation of terms

Error Severity : For autonomous driving systems, misidentifying pedestrians as tree branches vs misidentifying pedestrians as cyclists, the former has significantly lower semantic similarity and also indicates higher error severity.

Wrong semantic alignment : As mentioned above, even if the model prediction is wrong, it should try to make the predicted wrong label and the real label have closer semantic similarity (the practice is semantic alignment), so as to reduce the severity of the error.

Model robustness : In the presence of disturbances, the prediction accuracy of the model remains unchanged or only slightly decreases. Also, even if something goes wrong, it should have a low error severity.

Robust features and non-robust features : Image features can be divided into robust features and non-robust features. As shown in the figure below, the image is from the paper "Adversarial Examples Are Not Bugs, They Are Features" by Ilyas (2019) et al.

  • Non-robust features can inform classification model predictions but are imperceptible to humans. For example, in adversarial defense, adding a small perturbation to the image, in the eyes of humans, the image has not changed, but the model may misjudge the image as another category.
  • Robust features are not affected by perturbations.

S.3 method

Use staged training :

The first stage uses semantic target adversarial training to embed semantic knowledge into the training process. Unlike non-targeted methods (finding perturbations that would cause any misclassification, regardless of what the mislabels were), this approach generates perturbations that trick the model into predicting the specified (target) class.

The target t is selected from a set of classes C(y) that are semantically similar to the original label y of the image x. C(y) is the set of five labels with the highest semantic similarity to y. This formula prompts to find the perturbation δ* within the range ε ​​that makes the model misjudge t.

The second stage is standard training. As shown in the following formula. Even if there is a disturbance that may cause the model to misjudge, the model can still make a correct judgment. That is, the robustness of the model is improved through training.

S.4 Analysis

After staged training , the robustness of the model is improved, and it is not easy to misjudgment due to disturbance. And even if it is misjudged, it will predict a label that is semantically close to the real label, reducing the severity of the error. 

Guess you like

Origin blog.csdn.net/qq_44681809/article/details/131111759