Digital Twins on Amazon: Using L3 Predictive Digital Twins to Predict "Behavior"

In a previous blog post , we discussed the definition and framework of a digital twin, which aligns with the way our customers use digital twins in their applications. We define a digital twin as “a dynamic digital representation of a single physical system that is dynamically updated with data to mimic the real structure, state, and behavior of the physical system, thereby accelerating business outcomes.” Furthermore, we describe a four-level digital twin A balanced index (illustrated below) to help customers understand their use cases and the technologies needed to achieve the business value they are seeking.

The Amazon cloud technology developer community provides developers with global development technology resources. There are technical documents, development cases, technical columns, training videos, activities and competitions, etc. Help Chinese developers connect with the world's most cutting-edge technologies, ideas, and projects, and recommend outstanding Chinese developers or technologies to the global cloud community. If you haven't paid attention/favorite yet, please don't rush over when you see this, click here to make it your technical treasure house!

 

image.png

In this blog post, we use Electric Vehicle (EV) as an example to illustrate how the L3 predictability level can predict the behavior of physical systems. Example use cases will give you an understanding of the data, models, technologies, Amazon services, and business processes required to create and support an L3 predictive digital twin solution. In previous blog posts we described  L1 descriptive and  L2 informative levels , and in future blog posts we will continue to use the same EV example to demonstrate L4 real-time digital twins.

L3 Predictive Digital Twin

L3 digital twins focus on modeling the behavior of physical systems, making predictions about unmeasured quantities or future states under ongoing operations, under the assumption that future behavior will be the same as in the past. This assumption is reasonably valid for the short-term outlook. Predictive models can be based on machine learning and/or fundamentals (eg, physics simulations). To illustrate the L3 predictive digital twin, we will continue to use  the example of electric vehicles (EV, Electric Vehicle) in the L1 descriptive and  L2 informative digital twin blogs, focusing on three use cases: 1/ virtual sensors; 2/ anomalies detection; and 3/ Predict impending failures in a very short time. To illustrate how to implement on Amazon, we extended  the Amazon IoT TwinMaker example from the  L2 informative blog  to include components related to these three functions. In the following sections, we will discuss each of these three functions individually.

1. Virtual sensors

In our EV example, a common challenge is to estimate the remaining range of the vehicle based on the current state of charge (SoC) of the battery. This is a very important piece of information for the driver, because if the EV gets stuck, it will often need to be towed to the nearest charging station. However, predicting the remaining range is non-trivial as it requires implementing a model that takes into account the state of charge of the battery, the characteristics of the battery discharge, the ambient temperature that affects the performance of the battery, and some influence on the expected upcoming driving situations. Assumptions (e.g. flat or mountainous terrain, passive or active acceleration). In our  L2 informative blog post, we calculated the remaining range very roughly so that it could be easily hardcoded into the embedded controller. In the L3 predictive example below, we replace the simple calculation with  an extension of the EV simulation provided by Amazon partner Maplesoft in the L1 descriptive blog post  . This time, the model incorporates a virtual sensor that calculates an estimated range based on the key input factors mentioned above.  The virtual sensor-based vehicle range is shown in the Grafana dashboard below  .

image.png

2. Anomaly detection

For industrial equipment, a common use case is to detect when the equipment is operating at non-nominal performance. Typically, such anomaly detection is integrated directly into the control system through simple rules (e.g., threshold violations, such as a temperature exceeding 100°C) or more complex statistical process control methods . These types of rule-based approaches will be incorporated into L2 informative use cases. In fact, detecting off-nominal performance in a complex system like an EV is a challenging task because the expected performance of individual components depends on the operation of the entire system. For example, in an EV, the battery discharges much more during hard acceleration than during constant speed driving. Using a threshold of battery discharge rate based on a simple rule won't work because the system will treat every hard acceleration as a battery anomaly event. Over the past 15 years, we have seen machine learning methods increasingly used for anomaly detection, first to describe normal behavior based on historical data streams, and then to continuously monitor real-time data streams for deviations from normal behavior. Amazon Lookout for Equipment  is a managed service that deploys supervised and unsupervised machine learning methods to perform this type of anomaly detection. The image below shows a screenshot from  the Grafana  Dashboard showing  the Check Battery light on as anomalous behavior was detected.

image.png

To learn more about the exception, we checked the output of  Amazon Lookout for Equipment in the Amazon Management Console  . The dashboard shows any anomalies we detected during the check period, including anomalies that caused the  Check Battery light to glow red. Selecting  the anomalies displayed in the Grafana  dashboard, we can see that the four sensors on which the model was trained are all showing anomalous behavior. The Amazon Lookout for Equipment dashboard displays the relative contribution of each sensor in detecting this anomaly as a percentage. Abnormal behavior of battery voltage and battery SoC is the main manifestation of this abnormality.

image.png

This is consistent with the way we introduce anomalies in synthetic datasets and train models. First, we used uptime to train an unsupervised  Amazon Lookout for Equipment  model on the four sensors shown. We then evaluated the model on the new dataset shown in the Amazon Lookout for Equipment dashboard above, on which we manually triggered the failure. Specifically, we introduced an energy loss term into the data, causing the SoC to degrade slightly faster, which also affects other sensors. Designing a rules-based system to detect such anomalies early enough to avoid further damage to the car would be a challenging task, especially if such behavior has not been observed before. However, Amazon Lookout for Equipment does initially detect periods of anomaly, and from a certain point onwards flags anomalies throughout the remainder of the time. Of course, the contribution of each sensor to anomaly detection is also displayed in  the Grafana  dashboard.

3. Failure prediction

Another common use case for industrial equipment is to predict the end-of-life of components in order to pre-plan and schedule maintenance. Developing a failure prediction model is a very challenging task that often requires custom analysis of the failure modes of a specific device under a variety of different operating conditions. In this use case, Amazon offers  Amazon SageMaker , a fully managed service that helps train, build, and deploy machine learning models. In the next section, we explain how to integrate Amazon SageMaker with Amazon IoT TwinMaker when we discuss the solution architecture   .

In our example, we create a synthetic battery sensor dataset that has been manually labeled with Remaining Useful Life (RUL). More specifically, we compute an energy loss term in a synthetic battery model to create a dataset of batteries with different RULs and manually associate larger energy losses with shorter RULs. In real life, engineers could create such labeled datasets by analyzing data from batteries that have reached the end of their useful life. We used the XGBoost algorithm to predict RUL with 2-minute batches of sensor data as input. The model takes as input features derived from these batches of data. For example, we smoothed sensor data using a rolling average and compared sensor data between the start and end of a 2-minute batch. Note that by forecasting using a rolling window, we are able to forecast at a granularity of less than 2 minutes. In our example, the remaining battery life is displayed in the control panel  under the Check Battery symbol. The car is in terrible condition and the battery is expected to fail soon!

image.png

4. Architecture

The solution architecture for the L3 predictive DT use case builds on the solution developed for the L2 informative DT as shown below. The core of the architecture focuses on using  Amazon Lambda functions to extract synthetic data that represents the actual electric vehicle data stream. Use  Amazon IoT SiteWise  to collect and store vehicle data, including vehicle speed, fluid level, battery temperature, tire pressure, seat belt and transmission status, battery charge, and other parameters. Historical maintenance data and upcoming planned maintenance activities are generated in  Amazon IoT Core  and stored in  Amazon Timestream  . Amazon IoT TwinMaker  is used to access data from multiple data sources. Time-series data stored in Amazon IoT SiteWise is accessible through the built-in Amazon IoT SiteWise connector, and maintenance data is accessible through Timestream's custom data connector.

For the L3 virtual sensor application, we extended the core architecture to use  Amazon Glue  to integrate  Maplesoft EV  models by using  the Amazon IoT TwinMaker Flink library as   a custom connector in Amazon Kinesis Data Analytics . For anomaly detection, we first export the sensor data to S3 for offline training (not shown in the figure). Trained models are made available through  Amazon Lookout for Equipment  to make predictions on batches of sensor data through a scheduler. Lambda functions prepare the data for the model and process its predictions. We then fed these predictions back to Amazon IoT SiteWise, which forwarded them to Amazon IoT TwinMaker and displayed them in the Grafana dashboard. For fault prediction, we first export sensor data to S3 for training, and then use  Amazon SageMaker Ground Truth  for labeling. Next, we trained the model using an Amazon SageMaker training job and deployed an inference endpoint for the resulting model. We then put the endpoint in a Lambda function, which is triggered by the scheduler for batch inference. We feed the generated predictions back to Amazon IoT SiteWise, which forwards them to Amazon IoT TwinMaker and displays them in the Grafana dashboard.

image.png

5. Implementing L3 digital twins: data, models and key challenges

Over the past 20 years, predictive modeling approaches using machine learning, physics-based models, and hybrid models have improved, increasing the reliability of predictions and making them more useful. However, our experience is that most forecasting efforts still fail due to imperfect operational practices around deploying models to business use.

For example, for virtual sensors, the critical task is to develop and deploy validated models in an integrated data pipeline and modeling workflow. From a cloud architecture perspective, these workflows are easy to implement, as shown in the EV example above. The bigger challenge is operational. First, building and validating virtual sensor models for complex devices can take years. Virtual sensors are often used for proficiencies that the sensor cannot measure, so by definition there is no actual validation data. Validation is therefore usually carried out in research labs, with experiments on prototype hardware using some very expensive sensors, or visual inspection of limited validation data to anchor the model. Second, once deployed, virtual sensors only work if the data pipeline is robust and provides the model with the data it needs. This sounds obvious, but operationally it can be a challenge. Poor actual sensor readings, missing data, incorrect data tagging, site-to-site variation in data tagging, and changes to control system tagging during an overhaul are often causes of virtual sensor errors. Ensuring high-quality and consistent data is fundamentally a business operations challenge. Businesses must develop standards, quality-check procedures, and training programs for technicians using the equipment. Technology cannot overcome poor operational practices in collecting data.

With anomaly detection and failure prediction, the data challenge is even greater. Engineering leaders develop a mindset convinced that their company is sitting on a gold mine of data and wonder why their data science teams are failing to deliver. Actually, these data pipelines are really robust, but they were created for completely different applications. For example, data pipelines used for supervision or performance monitoring are not necessarily suitable for anomaly detection and failure prediction. As anomaly detection algorithms look for patterns in data, issues like sensor misreads, missing data, and mislabeled data can render predictive models useless, but the same data is acceptable for other use cases. Another common challenge is that data pipelines that are perceived as fully automated are actually not. Often, undocumented manual data corrections that require human judgment are only discovered when workflows scale automatically and fail to enforce properly. Finally, for industrial assets, failure prediction models rely on manually collected inspection data because it provides the most direct observation of the actual condition of the equipment. In our experience, operational processes around collecting, interpreting, storing, and integrating instrumentation data are not robust enough to support failure models. For example, we found that detection data did not appear in the system until several months after it was collected, long before the device failed. Alternatively, inspection data consists of handwritten notes attached to incorrectly filled inspection data records or associated with incorrect equipment. Even the best predictive model will fail if it is fed with inaccurate data.

For L3 predictive digital twins, we encourage customers to develop and validate business operations to support the data requirements of the digital twin, just as engineering teams build digital twins themselves. Having an end-to-end workflow mindset from data collection to forecasting and acting on the forecasts is key to success.

Summarize

In this blog post, we describe the L3 level of predictability by walking through use cases for virtual sensors, anomaly detection, and failure prediction. We also discuss some of the operational challenges in implementing the necessary business processes to support the data requirements of L3 digital twins. In the previous blog post, we described  the L1 descriptive and  L2 informative levels. In a follow-up blog post, we will extend the EV use case to demonstrate L4 real-time digital twins. At Amazon, we're excited to start the digital twin journey with customers, learning about all four levels of digital twins , and encourage you to learn more about the new Amazon IoT TwinMaker service on our website .

about the author

image.png

Dr. Adam Rasheed is Director of Amazon's Autonomous Computing Division, where he is developing new markets for HPC-ML workflows for autonomous systems. He has more than 25 years of experience in mid-stage technology development in the industrial and digital space, including more than 10 years of experience in digital twin development for the aerospace, energy, oil and gas, and renewable energy industries. Dr. Rasheed received his Ph.D. at Caltech, researching experimental hypervelocity air thermodynamics (orbital reentry heating). He was named one of the "World's 35 Most Innovative Talents" by MIT Technology Review magazine and received the AIAA Lawrence Sperry Award, an industry award for early contributions to aviation. He has been granted more than 32 patents and authored more than 125 technical publications in industrial analysis, operations optimization, artificial lift, pulse blasts, hypersonics, shock wave-induced mixing, space medicine, and innovation.

image.png

Seibou Gounteni is an IoT Professional Solutions Architect at Amazon Web Services (Amazon). He leverages the depth and breadth of Amazon's platform capabilities to help customers build, develop, and operate scalable and highly innovative solutions that deliver measurable business results. Seibou is an instrumentation engineer with more than 10 years of experience in digital platforms, smart manufacturing, energy management, industrial automation and IT/OT systems in different industries.

image.png

Dr. David Sauerwein is a Data Scientist in Amazon Professional Services, where he helps customers on their AI/ML journey on the Amazon Cloud. David focuses on forecasting, digital twins and quantum computing. He has a PhD in Quantum Informatics.

Article source:  https://dev.amazoncloud.cn/column/article/630a132c142f067bebc5f66a?sc_medium=regulartraffic&sc_campaign=crossplatform&sc_channel=CSDN

Guess you like

Origin blog.csdn.net/u012365585/article/details/131731616