Corner Cases for Visual Perception in Automated Driving: Some Guidance on Detection ... (arVix 2021)

Corner Cases for Visual Perception in Automated Driving: Some Guidance on Detection Approaches - Corner Cases for Visual Perception in Automated Driving: Some Guidance on Detection Approaches (arVix 2021)

Disclaimer: This translation is only a personal study record

Article information

  • 标题:Corner Cases for Visual Perception in Automated Driving: Some Guidance on Detection Approaches (arVix 2021)
  • 作者:Jasmin Breitenstein, Jan-Aike Termöhlen, Daniel Lipinski and Tim Fingscheidt
  • Article link: https://arxiv.53yu.com/pdf/2102.05897.pdf

Summary

  Autonomous driving has become a major topic, not only in the active research community, but also in mainstream media coverage. Thanks to advances in deep learning techniques, the visual perception of such intelligent vehicles has made great progress over the past decade, but several challenges remain. One of the challenges is finding edge cases. They are the unexpected and unknown situations that occur while driving. Traditional visual perception methods often fail to detect them because no extreme cases are observed during training. Therefore, their detection is highly safety-critical, and detection methods can be applied to large amounts of collected data to select suitable training data. Reliable detection of extreme cases will not only further automate the data selection process and improve the safety of autonomous driving, but will also influence public acceptance of new technologies in a positive way. In this work, we continue to systematize corner cases at different levels with an expanded set of examples at each level. Furthermore, we group detection methods into different categories and relate them to extreme case levels. Therefore, we give guidance to demonstrate specific edge cases, as well as basic guidance on how to technically detect these cases.

1 Introduction

  Autonomous driving and its technology have made significant progress over the past few years. Although this progress has been made, and the advancement of autonomous driving has received a lot of attention, there are still some challenges for safe and reliable application in daily life. Visual perception methods are an important part of smart cars. They are asked to monitor and understand their environment. Consequently, a large number of algorithms already exist for visual perception tasks associated with the vehicular environment, including object detection (e.g. [1]), semantic segmentation (e.g. [2], [3]), instance segmentation (e.g. [4]) etc. A key factor, however, is the behavior of visual perception methods in unexpected situations that differ from normal traffic situations. These situations, the so-called edge cases, exist in countless examples. Their main and associated feature is the departure from what is generally considered normal traffic behaviour. For example, possible corner cases are the typical situations that everyone is afraid of when driving, e.g. a person running from behind a cover onto the street, a ghost driver or just lost cargo on the street.

insert image description here

Figure 1: Systematization of corner cases at different levels given in [5]. The theoretical complexity of detection generally increases from bottom to top.

  Reliable detection of such corner cases is critical to the safety of autonomous driving, as it could reduce the number of accidents involving self-driving cars, thereby contributing to the widespread acceptance and application of the technology. During development, online, in-vehicle and offline applications are all necessary. A robust and reliable corner case detection method identifies critical situations. In online applications, it can be used as a safety monitoring and warning system, identifying situations when they occur. In offline applications, edge case detectors are applied to large amounts of collected data to select suitable training and relevant test data when developing new visual perception algorithms in the laboratory. While both online and in-lab assays are safety-related, offline applications also save money and time by automating the training data selection process. Therefore, various works already exist to deal with the detection of corner cases in the automotive environment, such as the detection of obstacles [6], [7] or emerging objects [8].

  Although reliable and efficient edge case detection will have a huge impact on autonomous driving, there is still a lack of consistent accepted definitions and classifications to describe them. We follow the definition of Bolte et al. [9] that corner cases arise when "there are unpredictable related objects/classes at related locations". To facilitate the system development of detectors, classification is introduced in [5]. A condensed version of the systematization of the extreme cases can be found in Figure 1. It describes a hierarchy ordered according to the theoretical complexity of detection. We consider corner cases at pixel, domain, object, scene and scene level, which are described in detail in Section II.

  While this systematization has paved the way for more systematic development of detection methods, it also raises the question of how to actually detect specific edge cases at each level. In the context of smart manufacturing, Lopez et al. established a systematization of possible system failures [10]. Next, the authors categorize detection methods into feature extraction, regression, knowledge-based, signal modeling, state estimation, clustering, and classification methods, and relate each method category to a specific anomaly category in smart manufacturing. We follow their example to extend the previous systematization of corner cases of visual perception in autonomous driving by classifying detection methods and associating them with previously defined classes of corner cases. In addition, we provide concrete examples of edge cases for each level and provide a first guide for basic detection methods.

  Due to the ubiquity and success of deep learning methods in visual perception algorithms, we restrict this classification to deep learning methods. They are powerful methods with promising results in many visual perception applications and have also been successful in detecting anomalous events. Furthermore, we restrict this work to a purely vision approach, excluding other sensor data such as radar and lidar, but consider edge cases that can be detected from a single image frame or entire image sequences.

  The structure of this paper is as follows. We briefly review the systemization of the extreme cases in Figure 1. We then provide more comprehensive examples for each corner case level, with the goal of enabling a more complete picture of what a corner case level can contain, and document corner cases by almost simulating state directions. Furthermore, we extend previous systematizations by category of detection methods with respective related work. Finally, we map detection methods to the corner case level by providing hints and intuitions for the development of new methods.

2. Systematization of extreme cases

  Previously, the extreme case systematization of visual perception in autonomous driving has been introduced [5], which we briefly summarize below. This systematization can be found in the simplified version of Figure 1. Sort corner case levels according to detection complexity. Detection complexity ranges from low to high, and we have extreme cases at the pixel level, which can be divided into global outliers and local outliers. Examples of these are overexposure and dead pixels, respectively. Then there are domain-level corner cases caused by domain changes (e.g., changes in location, weather, or time of day). At the object level, corner cases are single-point anomalies or novelties. For example, this could be wildlife on the street, such as lions, or mobility aids, such as roller coasters or walking sticks. For scene-level extreme cases, we again distinguish two types: collective anomalies and contextual anomalies. Contextual anomalies represent known objects in unusual locations, such as a tree in the middle of a street. Collective anomalies are objects with known quantities of anomalies, such as demonstrations.

  The highest complexity detections have scene-level corner cases that are observed over the course of image sequences. Hazardous scenarios were observed in a similar manner, but collisions, such as overtaking, were still possible. No new conditions have been observed, but do not increase the likelihood of a collision, such as entering a freeway. Anomalies were also not observed but would create a very high probability of a collision, such as a person suddenly stepping onto the street in front of their vehicle.

  While in the introduction to Systematization [5] the different corner case levels are discussed in detail and suitable datasets and metrics are indicated, in Section IV we follow the approach of Lopez et al. Systematization extends another dimension. In this dimension, different detection methods are grouped into broad categories and associated with individual corner case levels. Furthermore, we extend the columns in Figure 1 with a comprehensive list of examples that essentially provide playbooks for corner cases.

3. Show edge cases

  In Table I, we provide examples of extreme case levels from Figure 1. This is to clarify which corner cases can be found at various levels of systematization, and almost as a state direction for possible corner case records. Furthermore, it indicates the dataset content required to develop and test reliable corner case detectors. Likewise, Table I ranks the example cases according to the respective extreme case levels. These situations are described in such detail that they translate directly into directions for data acquisition. In addition, the following sections present categories of detection methods to later be associated with individual corner case levels, giving some guidelines for detecting the example corner cases shown in Table I.

insert image description here

Table I: Example situations for each level of systematization of corner cases as shown in Figure 1.

4. Concept of detection method

  We distinguish broad concepts across five detection methods: reconstruction, prediction, generation, confidence score, and feature extraction. We subdivide the confidence score category into learned scores, Bayesian methods, and scores obtained through postprocessing.

  Reconstruction methods are often based on autoencoder-type networks. Most of these methods follow the paradigm that normality can be more faithfully reconstructed than abnormality. This leads to reconstruction-based approaches at every level of the corner case hierarchy. In particular, they can be applied to corner cases involving a single image and corner cases involving entire image sequences. Hasan et al. [11] train autoencoders end-to-end and on handcrafted features, where the reconstruction error is used as an anomaly score. Similar approaches exist that take a single image as input to the network when they consider video sequences [12]. Some reconstruction methods rely on prototype learning. During training, prototypes of normal samples are learned in the latent space, which leads to more reliable reconstruction of normal samples during inference compared to abnormal samples [13]. Oza et al. [14] also exploit reconstructions in class-conditional autoencoders for open-set recognition, and after unsupervised open-set training, they perform supervised closed-set training.

  Prediction-based methods can mainly be found at the scene level. Typically, they predict future frames and then compare them to real frames to detect any anomalies. Therefore, they can be trained in a supervised manner, where we assume that all training samples are normal. Bolte et al. [9] specifically applied this method to edge case detection in autonomous driving. Another approach uses a generative adversarial network structure to predict future frames in a video while ensuring appearance and motion constraints [15].

  Generative methods and reconstruction-based methods are closely related because this type of method can also make decisions based on reconstruction errors. However, generative methods also take into account the distance between the discriminator's decision or generation and the training distribution. Additionally, some have simply borrowed related techniques such as adversarial training or jamming. Lee et al. [16] introduce a confidence loss to enforce low confidence on out-of-distribution samples, while also generating bounded out-of-distribution training samples for this task, and jointly train the generative and classification targets. Adversarial-based training is also used to perform unified confidence predictions on noisy images, while leading to lower confidence for outlier samples [17]. Based on generating images from masks, Lis et al. [18] identify unknown objects in data by considering the error between the generated image and the original image. Generative methods, namely variational and adversarial autoencoders, have been applied to collective anomaly detection, taking the error between original and generated images as an anomaly score [19]. Furthermore, domain shift can be measured by generative methods. For example, representation learning guided by Wasserstein distance [20] uses a general adversarial network-inspired architecture, where a network of domain critics estimates the Wasserstein distance between source and target domain features. Löhdefink et al. [21] applied generative adversarial network based autoencoders to detect domain movement based on the distance of earth movers [22].

  Confidence score based methods are divided into three categories to make the method clearer. Those who obtain scores through post-processing, those who learn scores, and those who rely on Bayesian methods.

  Confidence scores can be based on applying post-processing techniques to the neural network without interfering with the training process. Baseline methods exist to obtain confidence scores by comparing soft maxima with fixed thresholds [23]. Furthermore, scores are obtained, for example, by applying Kullback-Leibler divergence matching to the softmax output during inference for comparison with class-specific templates obtained from a validation set in a multi-class prediction setting [24]. The method is trained in a supervised manner, does not require outlier examples, and gives a segmentation map as output. Another post-processing approach employs temperature scaling [25] on the input. It is based on the paradigm that for this modified normal input, the network is still able to infer the correct class, but not unknown classes. It is also only supervised by normal training samples.

  Unlike confidence scores obtained through post-processing, they can also be learned during training. In this type of learned confidence score, we also include any method that generally relies on the training set, e.g. providing a threshold. As an example of thresholding based on the training set, Shu et al. [26] computed three thresholds: one for accepting samples, one for rejecting, and a distance-based threshold for samples falling between two other thresholds. Since the threshold-based values ​​are based on the training set, we treat the resulting confidence scores as learned. Although the method is trained in a supervised manner, no extreme case examples are required during training. The method then outputs the label as one of the known classes or an unknown class. Another approach to learn confidence scores is to borrow techniques from multi-task learning and incorporate a second branch into the network to learn confidence scores [27]. Also in this case, training is done in a supervised manner, but only normal samples are required. Although originally intended for classification, the method is also applicable to segmentation [24]. The learned confidence score can be obtained by learning from prototypes, where the score is based on the distance to normal training samples [28]. The open-maximum activation also provides a learned confidence score after supervised training on normal samples with the aim of detecting unknowns [29]. Learning to detect geometric transformations applied to the data from normal training data, by assuming that these transformations can be detected more accurately on normal samples, also yields a learned confidence score [30].

  Bayesian confidence scores are usually obtained by estimating model uncertainty (epistemic uncertainty) [31]. Train the network to output the posterior distribution over its weights. Typical examples of such methods include Monte Carlo dropout techniques [31], [32] or deep integration [33]. Supervised training relies on normal training data. The current method for obtaining model uncertainty scores is deterministic uncertainty quantification, which is based on the idea of ​​radial basis networks [34]. In the semantic segmentation setting, Bayesian neural networks provide for each pixel an estimate of the class label and model uncertainty. Common measures of this uncertainty include entropy and variance [32]. Bayesian SegNet introduces an example of applying Monte Carlo dropout for uncertainty estimation in semantic segmentation, which incorporates dropout units into the network architecture to obtain a confidence map of the model [35]. Pham et al. [36] used a Bayesian framework for instance segmentation in the open set recognition setting. An extension to include the entire time span is achieved by considering a moving average over multiple frames [37].

  Feature extraction methods employ deep neural networks to extract features from input data. These features are then further processed using another technique, or used directly to provide classification labels. In contrast to confidence scores, feature extraction methods either directly classify samples as extreme cases, or use the extracted features in another way to obtain a decision. Confidence scores typically provide scores next to their decision labels. One such method extracts features which are then fitted to a hypersphere during training [38]. Therefore, although the method is unsupervised, it requires the training data to be considered normal, because in inference, if the distance on the hypersphere exceeds a threshold, the data is judged as anomalous. When it comes to video sequences, it is also possible to extract features from a single frame and then consider the features over a specific time interval to compare the probability distribution inside the interval with the probability distribution outside [39]. Classification Reconstruction Learning for Open Set Recognition (CROSR) also learns feature representations for unknown class detectors [40]. The feature representation consists of latent representations learned from each intermediate layer of the reconstruction network. Class membership is modeled by the distribution of distances between extracted features of normal training data and the corresponding class mean based on extreme value theory [40]. Standard classification methods train the network in a supervised manner, using a softmax function as the activation of the last layer to obtain the corresponding class of the input sample. Jatzkowski et al. [41] utilized this approach for overexposure detection. Feature extraction methods can also be found in domain adaptation methods. For example, cross-entropy based metrics [42], [43] are minimized during adaptation, showing that they are effective measures of domain mismatch. Bolte et al. [44] used the mean squared error of extracted features as a measure of domain shift.

5. Association detection methods and corner case levels

  In this section, we correlate the detection method in Section IV with extreme case levels in Fig. 1, which is similar to smart manufacturing [10]. We discuss some examples of each detection concept in Section IV, which already hint at which types of edge cases they can be applied to. Furthermore, we hope to develop an idea on how to detect some corner cases, such as those listed in Table I. A summary of this section can be found in Table II, which type of method has been applied to detect which corner case level. Additionally, we indicate which methods we believe could lead to efficient and reliable detection methods in the future.

  Overall, it can be said that due to the lack of large-scale datasets containing all types of extreme cases, and the related open-world problem of extreme case detection, unsupervised methods or methods trained only on normal samples currently seem to be the best way to obtain extreme case detection. The most efficient way for the device. Methods that rely on unusual training data require more complex and specialized training sets, and are likely to focus on specific corner cases relevant to their samples, thereby being blind to the possibility of unknown corner cases in inference.

  At the pixel level, as far as we know, only a few deep learning methods exist. However, to detect such corner cases, feature extraction methods provide promising results [41] for global outliers, since our goal is to detect corner cases that affect most or even the entire image. In this case, detection can be treated as a binary classification problem, and the network is able to extract sufficient features for the task. Supervised training is possible because there is no unexpected diversity of extreme cases of this type. However, due to the lack of self-driving datasets with labeled global outliers such as overexposure, it may be helpful to investigate methods using few-shot learning or similar techniques. Furthermore, we are interested in the detection of multiple global outliers, e.g. jointly detecting overexposure and underexposure in images. They can even appear in the same image in the case of an exit tunnel. This can be investigated in future work by considering joint or multi-task learning.

  Local outliers only affect a small part of the image, as in the case of dead pixels. Detection of these extreme cases can be learned under supervision since it can be simulated in the training data. Due to the possibility of simulation, the detection can be handled in the semantic segmentation method by including another class. This will also result in a pixel label, which will inform the location of the dead pixel. We believe that employing predictive methods, and thus including time spans, will help detect local outliers. For example, the predicted location of a dead pixel can be compared to the actual location. Ideally, the actual position is contrasted with the predicted position based on the learned optical flow.

insert image description here

Table II: Detection methods attributed to extreme case levels. A* indicates the proposed method in Section V to detect extreme cases of this level.

  To detect domain-level corner cases, we do not need to use domain adaptation methods, but find suitable domain mismatch metrics. However, these metrics often originate from domain adaptation methods used as loss functions. Often, such metrics are considered feature extraction methods. While training may require normal samples from the source domain for supervision, data from another domain for training should be explicitly excluded. Methods that use specific examples from the second domain in training may not achieve the same performance in the third domain. Bolte et al. [44] used the mean squared error distance to measure the difference between features in the source and target domains in an unsupervised domain adaptation setting. It may also prove advantageous to consider out-of-distribution detection methods, which are often evaluated by considering one dataset as in-distribution and another as out-of-distribution. Out-of-distribution detectors are usually evaluated by distinguishing the dataset they were trained on from another dataset [23], [16]. These methods can be extended from the classification setting to the automotive vision perception setting, since they only need to be trained with normal sample supervision. To reliably detect domain-level corner cases, we need reliable domain mismatch metrics. For this, we rely on evaluations using more than one target domain. One such measure applying a generative adversarial network-based autoencoder has been introduced previously, providing a new domain mismatch metric based on the distance of earth movers [21].

  At the object level, the main goal is to detect unknown unknowns [45]. These are examples belonging to new categories that have not been seen before in training. Providing examples of such corner cases during training will cause the network inference to detect only corner cases similar to these examples, which is self-defeating for our goal. The detection of object-level extreme cases belongs to the broad field of open-set recognition, and related methods usually provide some type of confidence score. Ideally, for detection and localization, we require pixel-level scores. Reconstruction and generative methods that fit this philosophy also exist. However, reconstruction-based methods often provide less meaningful results [18]. We wish to obtain a semantic segmentation mask of the input image, where pixels belonging to unknown objects are associated with unknown class labels or with a large amount of prediction uncertainty. With this goal in mind, the pursuit of confidence scores and generative detection methods seems to be the most fruitful, and many recent methods fit this directive [24], [18], [28]. Using Bayesian confidence scores, we require models with high uncertainty associated with these unknown objects. Here, scalable methods of Bayesian deep learning applying Monte Carlo dropout [32] or deep ensembles [33] provide the first step for detection. By defining these single-point anomalies as instances not visible during training, we conjecture that effective and reliable detection methods cannot rely on training samples including extreme cases. Here, one has to resort to unsupervised methods that can only be trained using normal samples.

  At the scene level, our goal is to detect known classes of unknown quantity or location. Challant et al. [19] used generative methods to detect collective anomalies, which achieved promising results. Furthermore, we think that future work should utilize instance segmentation to obtain group size by counting the number of instances of each class. In this case, a threshold is needed to define the set as abnormal. Contextual anomalies can be detected by using feature extraction methods [38]. However, in the case of automotive visual perception, feature extraction may fail to capture the complexity of the entire scene. Therefore, many existing methods give confidence scores [35], [33] or reconstruction errors [13] and distinguish between normal and abnormal samples. We propose to investigate how the incorporation of class prioritization affects the process, as these prioritizations may be useful in detecting misplaced class representation. Likewise, the confidence scores produced by Bayesian deep learning indicate the uncertainty of the model, so they can be used to locate objects that appear in unusual environments. Both extreme case types at the scene level can be trained with normal data for supervised training, since both types detect instances of known classes, just in unusual locations or numbers. However, as opposed to the extreme case at the object level, in this case, while we may need pixel-level semantic segmentation labels for visual perception applications, we also need pixel-level labels that tell us that objects occur in unusual locations, or if For invisible quantities, image-level labels are required.

  Scene-level corner cases consist of patterns that occur within a specific time span, and may not be anomalous in a single frame. Here, prediction-based methods, whose decisions depend on the comparison between predicted and actual frames, provide beneficial results [9]. Pure reconstruction methods again achieve less reliable corner case detection scores. Predictive methods can be trained supervised because they only require normal training samples in order to detect extreme cases during inference. This is especially important for novel and unusual scenarios where we cannot capture every possibility due to the infinite number and considerable danger of corresponding situations. Furthermore, including samples may actually compromise the network to only detect such scenarios. For future work, we need to define sufficient metrics to detect corner cases of this type. While we may still want to know where the extreme cases are in the image, we also need the point in time when such extreme cases occurred. To achieve this, we can consider using image labels within a certain time span. After investigating the metrics, we propose to use a cost function to more preferentially detect vulnerable road users appearing at the edge of the field of view. For example, this can improve the detection of a person running from behind an occluder onto the street, since the person is already detectable when only a few human pixels are present in the frame. This approach also requires frame-by-frame masks to identify pixels that were not included in the previous frame because they were occluded or out of view.

  While detection methods for all corner case classes are treated separately, we also need to discuss the notion of a general corner case metric. Considering that an ideal edge case detector is already available, we wish to apply it to select the training data for the visual perception module. It takes as input the entire video sequence containing all types of corner cases, and the problem is how to report the results. For example, while we recommend reporting pixel labels at the object level and image labels at the pixel level, it needs to be clarified that ultimately this needs to be combined into a general metric that indicates whether a video sequence contains corner cases and thus complies with the necessary training data. Here, an average metric similar to the ordinary velocity metric can be considered.

6 Conclusion

  After reviewing the systematization of extreme cases, we introduce a more detailed list of examples aimed at deepening the understanding of the previously proposed categories and enabling their direct application to the acquisition of extreme case data. Furthermore, we extend corner case systematization by covering detection methods and their respective classes. We then associate detection methods with corner case levels and additionally provide some basic guidance on how to detect certain types of corner cases. Thus, we are able to delineate specific corner case examples and follow rough guidelines for baseline detection methods.

ACKNOWLEDGMENT

The authors gratefully acknowledge support of this work by Volkswagen AG, Wolfsburg, Germany.

REFERENCES

[1] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection With Region Proposal Networks,” in Proc. Of NIPS, Montr´eal, QC, Canada, Dec. 2015, pp. 91–99.
[2] Eduardo Romera, Jos´e M. ´Alvarez, Luis M. Bergasa, and Roberto Arroyo, “ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation,” IEEE Transactions on Intelligent Transportation Systems (T-ITS), vol. 19, no. 1, pp. 263–272, Jan. 2018.
[3] Olaf Ronneberger, Philipp Fischer, and Thomas Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” in Proc. of MICCAI, Munich, Germany, Oct. 2015, pp. 234–241.
[4] Kaiming He, Georgia Gkioxari, Piotr Doll´ar, and Ross Girshick, “Mask R-CNN,” in Proc. of ICCV, Venice, Italy, Oct. 2017, pp. 2980– 2988.
[5] Jasmin Breitenstein, Jan-Aike Term¨ohlen, Daniel Lipinski, and Tim Fingscheidt, “Systematization of Corner Cases for Visual Perception in Automated Driving,” in Proc. of IV, Las Vegas, NV, USA, Oct. 2020, pp. 986–993.
[6] P. Pinggera, S. Ramos, S. Gehrig, U. Franke, C. Rother, and R. Mester, “Lost and Found: Detecting Small Road Hazards for Self-Driving Vehicles,” in Proc. of IROS, Daejeon, South Korea, Oct. 2016, pp. 1099–1106.
[7] S. Ramos, S. Gehring, P. Pinggera, U. Franke, and C. Rother, “Detecting Unexpected Obstacles for Self-Driving Cars: Fusing Deep Learning and Geometric Modeling,” in Proc. of IV, Redondo Beach, CA, USA, June 2017, pp. 1025–1032.
[8] H. Blum, P.-E. Sarlin, J. Nieto, R. Siegwart, and C. Cadena, “Fishyscapes: A Benchmark for Safe Semantic Segmentation in Autonomous Driving,” in Proc. of ICCV - Workshops, Seoul, South Korea, Oct. 2019, pp. 1–10.
[9] J.-A. Bolte, A. B¨ar, D. Lipinski, and T. Fingscheidt, “Towards Corner Case Detection for Autonomous Driving,” in Proc. of IV, Paris, France, June 2019, pp. 438–445.
[10] F. Lopez, M. Saez, Y. Shao, E. C. Balta, J. Moyne, Z. M. Mao, K. Barton, and D. Tilbury, “Categorization of Anomalies in Smart Manufacturing Systems to Support the Selection of Detection Mechanisms,” IEEE Robotics and Automation Letters, vol. 2, no. 4, pp. 1885–1892, Oct. 2017.
[11] M. Hasan, J. Choi, J. Neumann, A. K. Roy-Chowdhury, and L. S. Davis, “Learning Temporal Regularity in Video Sequences,” in Proc. of CVPR, Las Vegas, NV, USA, June 2016, pp. 733–742.
[12] Y. Xia, X. Cao, F. Wen, G. Hua, and J. Sun, “Learning Discriminative Reconstructions for Unsupervised Outlier Removal,” in Proc. of ICCV, Santiago, Chile, Dec. 2015, pp. 1511–1519.
[13] D. Gong, L. Liu, Vuong Le, B. Saha, M. R. Mansour, S. Venkatesh, and A. van den Hengel, “Memorizing Normality to Detect Anomaly: Memory-augmented Deep Autoencoder for Unsupervised Anomaly Detection,” in Proc. of ICCV, Seoul, Korea, Oct. 2019, pp. 1705– 1714.
[14] P. Oza and V. M. Patel, “C2AE: Class Conditioned Auto-Encoder for Open-set Recognition,” in Proc. of CVPR, Long Beach, CA, USA, June 2019, pp. 2307–2316.
[15] W. Liu, W. Luo, D. Lian, and S. Gao, “Future Frame Prediction for Anomaly Detection – A New Baseline,” in Proc. of CVPR, Salt Lake City, UT, USA, June 2018, pp. 6536–6545.
[16] Kimin Lee, Honglak Lee, Kibok Lee, and Jinwoo Shin, “Training Confidence-Calibrated Classifiers for Detecting Out-of-Distribution Samples,” in Proc. of ICLR, Vancouver, BC, Canada, Apr. 2018, pp. 1–16.
[17] M. Hein, M. Andriushchenko, and J. Bitterwolf, “Why ReLU Networks Yield High-Confidence Predictions Far Away From The Training Data And How To Mitigate The Problem,” in Proc. of CVPR, Long Beach, CA, USA, June 2019, pp. 41–50.
[18] K. Lis, K. Nakka, P. Fua, and M. Salzmann, “Detecting the Unexpected via Image Resynthesis,” in Proc. of ICCV, Seoul, Korea, Oct. 2019, pp. 2152–2161.
[19] R. Chalapathy, E. Toth, and S. Chawla, “Group Anomaly Detection Using Deep Generative Models,” in Proc. of ECML PKDD, Dublin, Ireland, Sept. 2019, pp. 173–189.
[20] J. Shen, Y. Qu, W. Zhang, and Y. Yu, “Wasserstein Distance Guided Representation Learning for Domain Adaptation,” in Proc. of AAAI, New Orleans, LO, USA, Feb. 2018, pp. 4058–4065.
[21] Jonas L¨ohdefink, Justin Fehrling, Marvin Klingner, Fabian H¨uger, Peter Schlicht, Nico M. Schmidt, and Tim Fingscheidt, “Self-Supervised Domain Mismatch Estimation for Autonomous Perception,” in Proc.
of CVPR - Workshops, Seattle, WA, USA, June 2020, pp. 1–10.
[22] Yossi Rubner, Carlo Tomasi, and Leonidas J. Guibas, “The Earth Mover’s Distance as a Metric for Image Retrieval,” International Journal of Computer Vision, vol. 40, no. 2, pp. 99–121, Nov. 2000.
[23] Dan Hendrycks and Kevin Gimpel, “A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks,” in Proc. of ICLR, Toulon, France, Apr. 2017, pp. 1–12.
[24] Dan Hendrycks and Thomas Dietterich, “Benchmarking Neural Network Robustness to Common Corruptions and Perturbations,” in Proc. of ICLR, New Orleans, LA, USA, May 2019, pp. 1–15.
[25] S. Liang, Y. Li, and R. Srikant, “Enhancing the Reliability of Out-Of-Distribution Image Detection in Neural Networks,” in Proc. of ICLR, Vancouver, Canada, Apr. 2018, pp. 1–27.
[26] Yu Shu, Yemin Shi, Yaowei Wang, Yixiong Zou, Qingsheng Yuan, and Yonghong Tian, “ODN: Open Deep Network for Open-Set Action Recognition,” in Proc. of ICME, San Diego, CA, USA, July 2018, pp. 1–6.
[27] Terrance DeVries and Graham W. Taylor, “Learning Confidence for Out-of-Distribution Detection in Neural Networks,” arXiv preprint arXiv:1802.04865, Feb. 2018.
[28] Chen Xing, Sercan Arik, Zizhao Zhang, and Tomas Pfister, “Distance-Based Learning from Errors for Confidence Calibration,” in Proc. Of ICLR, Addis Ababa, Ethopia, Apr. 2020, pp. 1–12.
[29] A. Bendale and T. Boult, “Towards Open Set Deep Networks,” in Proc. of CVPR, Las Vegas, NV, USA, June 2016, pp. 1563–1572.
[30] I. Golan and R. El-Yaniv, “Deep Anomaly Detection Using Geometric Transformations,” in Proc. of NIPS, Montr´eal, Canada, Dec. 2018, pp. 9781–9791.
[31] A. Kendall and Y. Gal, “What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?,” in Proc. of NIPS, Long Beach, CA, USA, Dec. 2017, pp. 5574–5584.
[32] Y. Gal and Z. Ghahramani, “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning,” in Proc. of ICML, New York, NY, USA, June 2016, pp. 1050–1059.
[33] B. Lakshminarayanan, A. Pritzel, and C. Blundell, “Simple and Scalable Predictive Uncertainty Estimation Using Deep Ensembles,” in Proc. of NIPS, Long Beach, CA, USA, Dec. 2017, pp. 6402–6413.
[34] Joost van Amersfoort, Lewis Smith, Yee Whye Teh, and Yarin Gal, “Simple and Scalable Epistemic Uncertainty Estimation Using a Single Deep Deterministic Neural Network,” arXiv preprint arXiv:2003.02037, Mar. 2020.
[35] A. Kendall, V. Badrinarayanan, and R. Cipolla, “Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding,” arXiv, , no. 1511.02680, Nov. 2015.
[36] T. Pham, V. B. G. Kumar, T.-T. Do, G. Carneiro, and I. Reid, “Bayesian Semantic Instance Segmentation in Open Set World,” in Proc. of ECCV, Munich, Germany, Sept. 2018, pp. 3–18.
[37] P.-Y. Huang, W.-T. Hsu, C.-Y. Chiu, T.-F. Wu, and M. Sun, “Efficient Uncertainty Estimation for Semantic Segmentation in Videos,” in Proc. of ECCV, M¨unchen, Germany, Sept. 2018, pp. 536–552.
[38] L. Ruff, R. Vandermeulen, N. Goernitz, L. Deecke, S. A. Siddiqui, A. Binder, E. M¨uller, and M. Kloft, “Deep One-Class Classification,” in Proc. of ICML, Stockholm Sweden, July 2018, pp. 4393–4402.
[39] B. Barz, E. Rodner, Y. G. Garcia, and J. Denzler, “Detecting Regions of Maximal Divergence for Spatio-Temporal Anomaly Detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 5, pp. 1088–1101, May 2019.
[40] R. Yoshihashi, W. Shao, R. Kawakami, S. You, M. Iida, and T. Naemura, “Classification-Reconstruction Learning for Open-Set Recognition,” in Proc. of CVPR, Long Beach, CA, USA, June 2019, pp. 4016–4025.
[41] I. Jatzkowski, D. Wilke, and M. Maurer, “A Deep-Learning Approach for the Detection of Overexposure in Automotive Camera Images,” in Proc. of ITSC, Maui, HI, USA, Nov. 2018, pp. 2030–2035.
[42] Yuliang Zou, Zelun Luo, and Jia-Bin Huang, “DF-Net: Unsupervised Joint Learning of Depth and Flow Using Cross-Task Consistency,” in Proc. of ECCV, Munich, Germany, Sept. 2018, pp. 36–53.
[43] Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam, “Encoder-Decoder With Atrous Separable Convolution for Semantic Image Segmentation,” in Proc. of ECCV, Munich, Germany, Sept. 2018, pp. 801–818.
[44] Jan-Aike Bolte, Markus Kamp, Antonia Breuer, Silviu Homoceanu, Peter Schlicht, Fabian H¨uger, Daniel Lipinski, and Tim Fingscheidt, “Unsupervised Domain Adaptation to Improve Image Segmentation Quality Both in the Source and Target Domain,” in Proc. of CVPR - Workshops, Long Beach, CA, USA, June 2019, pp. 1404–1413.
[45] W. J. Scheirer, L. P. Jain, and T. E. Boult, “Probability Models for Open Set Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 11, pp. 2317–2324, Nov. 2014.

Guess you like

Origin blog.csdn.net/i6101206007/article/details/129409592