re:Invent 2023 | Accelerate optimization with Amazon Trusted Advisor and Amazon Well-Architected Framework

关键字: [Amazon Web Services re:Invent 2023, Cloud Optimization, Trusted Advisor, Well Architected Framework, Resilience, Governance]

Number of words in this article: 3000, reading time: 15 minutes

video

If the video cannot be played normally, please go to bilibili to watch this video. >> https://www.bilibili.com/video/BV1fa4y1o79Z

Introduction

Do you know how to identify optimization areas in your cloud environment to improve operational efficiency? Join this forum to learn how to accelerate optimization using insights from Amazon Trusted Advisor and Amazon Well-Architected Framework. Learn how to use Amazon Cloud Technology best practices to prioritize improvements based on business impact. Hear how Georgia Pacific effectively solved workload resiliency and cost optimization challenges by implementing Amazon Cloud Technology best practices and leveraging Trusted Advisor.

Highlights of speech

The following is the essence of this speech compiled by the editor for you. It has a total of 2,700 words and takes about 14 minutes to read. If you want to know more about the content of the speech or watch the full text of the speech, please watch the full video of the speech or the original text of the speech below.

Amazon Cloud Technology's Trusted Advisor and Well Architected framework's presentation on optimizing workloads began with Steven Salem, a senior solutions architect on Amazon Cloud Technology's Well Architected team. Then, he introduced Arun Rajan, known as the main product manager of Amazon Cloud Technology Trusted Advisor. Steven then introduced the third speaker, Carlos Wiley, an enterprise architect at Georgia Pacific. Steven said that while they may not be familiar with the company name, they likely interact with Georgia Pacific's products on a daily basis. Steven cited well-known consumer brands such as Angel Soft paper towels and Dixie disposable plates and cups.

After the speaker's introduction, Steven expected this to be a 300th-level speech focused on technical implementation details. This session will explore how to use Trusted Advisor and Well Architected frameworks to optimize cloud workloads. Steven outlined the high-level agenda of the presentation. First, he will outline the definition of cloud computing optimization. Next, he will introduce some methods and tools for discovering and optimizing opportunities. Finally, Carlos will share an example of how Georgia Pacific leveraged the Trusted Advisor and Well Architected frameworks to optimize its critical workloads.

Transitioning into Part One, Steven acknowledged that there are many different interpretations of what cloud optimization means. Some see it as simply cost reduction, while others see it as synonymous with performance improvements. To ensure their conversation has a clear goal, Steven provides a clear definition. Specifically, he defines cloud optimization as any effort to build and run a workload to maximize its potential to deliver business value and achieve goals. Steven emphasized that optimization can involve multiple areas, including but not limited to cost, performance, security and operations. To illustrate the scope of cloud computing optimization, he provides a detailed example of optimization focused on high availability.

In his vision, it is assumed that there is a very popular SaaS e-commerce website that requires extremely high availability due to its customer impact and revenue issues. To meet this need, three independent replicas are deployed in each availability zone in the region. Before these re-copies, a network load balancer was set up to distribute traffic within the zone. On the DNS side, Route 53 is used to handle routing and domain name resolution.

Steven explains how this AZ-independent architecture can provide maximum uptime. If a replica crashes for any reason, the load balancer can divert traffic to healthy replicas in other regions to maintain availability. While this meets the functional requirements, Steven highlights the drawbacks it brings with it, which provide opportunities for optimization.

First, managing and coordinating three separate AZ deployments introduces a significant amount of additional complexity and overhead into the architecture and operations. Every time you add more moving parts that need to work together, you increase the effort of building, running, managing, and troubleshooting the system.

More critically, there is a major flaw in the way this architecture handles AZ failure recovery. When a problem occurs in a zone, the standard recovery method is simply to delete the DNS records pointing to the affected zone. This removes it from the circulation, allowing all traffic to go to the remaining healthy areas. However, by calling the Route 53 API to delete a DNS record, you are now relying on the control plane to perform the recovery process. If the control plane is compromised due to a power outage or outage, you lose the ability to divert traffic, potentially shutting down the entire application.

Steven emphasizes that although this architecture meets the functional requirements, it brings drawbacks that clearly provide opportunities for optimization. Instead, he proposes a better approach that reduces complexity and risk by leveraging new capabilities in Amazon's cloud technology.

In particular, he recommends using Amazon Route 53's App Recovery Controller, which has a feature called zone switching. This feature allows calls to be migrated from a specific area directly within Route 53's data plane, eliminating reliance on the control plane during recovery. This not only reduces operational complexity, but more importantly eliminates the impact of control plane outages on availability.

Overall, Steven explains how to leverage new capabilities on Amazon cloud technology to optimize workloads. The optimization state of a workload changes over time, rather than being fixed.

Delving deeper into what's driving this ongoing change, Steven outlines three core factors: first, new paradigms and best practices brought about by industry trends; second, business needs that continue to evolve to adapt to market dynamics; and finally, Amazon Innovative services and features launched by Cloud Technology.

Steven emphasizes that because of all these variables, the concept of an ideal optimal state is short-lived. So how do you optimize workloads in a changing context? This brings us to the second core benefit of a well-architected framework.

Although the specific details may change frequently, a good architectural framework establishes a consistent foundation that supports an optimized cloud architecture. It provides a stable foundation from which to adjust optimization efforts as conditions change.

Steven outlined the six pillars of the framework: operational excellence, safety, reliability, cost optimization, performance efficiency and sustainability. By following the more than 300 best practices under these pillars, you can ensure that your architecture is following Amazon Cloud Technology's optimization guidelines.

Using the example of a highly available e-commerce website, Steven shows how a well-architected framework can guide optimization. For this workload, the key is to minimize the downtime caused by any failure. After looking at the reliability pillars, you'll find a section dedicated to failure management, which includes best practices for handling failures.

Special attention is paid to the use of data plane control whenever possible in Well Architected principles, rather than relying solely on control plane methods. This directly aligns with the previously discussed opportunities for optimization with Amazon Route 53. As time goes by, new features will emerge, but the cornerstone and best practices provide a fixed direction for optimization efforts.

Switching topics, Steven entered the next portion of the presentation, focusing on discovery. He reiterated that optimization is essentially a continuous improvement iterative process. This involves identifying potential improvement opportunities first, then evaluating the gap between work load and these opportunities, and finally implementing improvements gradually.

Steven emphasized the importance of maintaining this continuous iteration cycle. It allows for gradual enhancement of work loads while ensuring that changes bring about actual business value.

He outlined two core aspects of exploration - technical configuration and organizational processes/personnel. In the technical realm, Amazon Web Services provides automation tools to assess work loads for compliance with best practices. In the organizational discovery sphere, Amazon Web Services offers a structured conversation framework.

In terms of automated assessment, Steven introduced Trusted Advisor. Trusted Advisor is a comprehensive management service provided by Amazon Web Services, which conducts continuous scans of account configurations based on over 400 best practices covering 47 different services. It not only identifies deviations but also provides direct optimization suggestions and repair steps.

Steven noted that all Amazon Web Services customers with a commercial or enterprise support plan can utilize Trusted Advisor. He explained that the service integrates with EventBridge to trigger automatic operations in response to inspection responses. This goes beyond revealing insights; it can also programmatically improve the environment.

In the organizational context, Steven spoke of the Well Architected Tool. This is an interactive tool within the Amazon Web Services control panel that provides a structured framework for discussions with stakeholders about best practices. It includes custom question sets tailored to different roles, such as developers, operations engineers, and security personnel.

Steven introduced some recently added features of "well-architected tools" to attendees. First, a new profile feature allows the definition of customized sets of questions based on business context, enabling prioritization of the most relevant best practices for specific workloads. Second, the tool now offers review templates that are pre-populated with questions with common answers to simplify and scale the review process across your organization. After a brief introduction to exploratory tools, Steven turned over the floor to Arun Rajana to gain a deeper understanding of Trusted Advisor.

Aron began his presentation by emphasizing how trusted advisors align with the three core phases of the optimization cycle: learn, measure, and improve. Best practices checks provide service background education, while scans measure workload compliance with best practices. Finally, recommendations facilitate actions to improve the environment. While the report and recommendations are primarily about learning and measurement, Aron focuses on how to use trusted advisors to drive the improvement phase through automation. For example, he demonstrated a case using integration with EventBridge where actions can be triggered when a check detects an exposed IAM access key. Once notified of this discovery via EventBridge, a Step Functions workflow can be invoked to immediately disable the key, evaluate activity during the exposure period, and notify the security team of the incident. Move from manual processes to automated optimization and incident response with remediation workflows triggered by vulnerabilities discovered by trusted advisors. In addition to this automation potential, Aron also outlined how the Trusted Advisor inspection categories align closely with the defined pillars of a well-architected framework. To enhance this mapping, Amazon Cloud Technologies recently introduced a new inspection category focused specifically on operational excellence, covering operational readiness best practices. In addition, Allen explained that Trusted Advisor has recently acquired a new data source - Amazon Cloud Technology Config. By integrating with Config, Trusted Advisors can use Amazon Cloud Config rules as an additional check. This allows resourcing insights to be surfaced as actionable trusted advisor findings.

To show how this works, Arun ran a demonstration through a new API Gateway inspection, driven by Config data. This check alerts users when API Gateway has not been configured to log execution to CloudWatch. Config rules verify API Gateway settings, and Trusted Advisor presents them to users as part of operational excellence checks. This way, users immediately get recommendations for enabling this critical troubleshooting data.

Moving on to his next topic, Arun talked about how to handle Amazon Trusted Advisor findings. Although there are more than 400 inspections covering dozens of services, there are often more findings than an organization can address in one go.

To effectively prioritize, Arun proposes combining two key data dimensions – urgency and business impact. In terms of urgency, Trusted Advisor will provide three severity levels of high, medium, and low based on the inspection status. To determine the business impact, you can leverage the risk level for each best practice documented in the Well Architected Framework.

For example, Arun introduced a four-quadrant priority matrix—high urgency/high impact, high urgency/low impact, low urgency/high impact, and low urgency/low impact. Companies can map findings to these categories based on their business circumstances and decide which areas to focus on first.

For larger organizations, Arun noted that Amazon Cloud Enterprise Support includes a Trusted Advisor priority service. The service provides customized recommendations based on your business priorities and which findings will have the greatest impact. By relying on technical account managers who understand your workloads, you can accelerate optimization rather than taking a purely manual approach.

In conclusion, Arun mentioned that open source optimization starter solutions can help you start your own custom prioritized data collection. The solution includes scripts for pulling Trusted Advisor findings, mapping them to Well Architected practices, and generating reports that integrate the two data sources.

After completing the in-depth discussion on Amazon Trusted Advisor, Arun handed over the right to speak to Carlos Wiley of Georgia Pacific and asked him to share their experience on how to use Amazon Cloud Technology's optimization tools in practical applications.

Carlos highlighted the importance of the SAP HANA ERP system operated by Georgia Pacific in his client case study. According to him, the system handles key functions such as manufacturing, inventory management, order receiving and logistics. It provides the central nervous system for factories, warehouses, sales and distribution.

Because this system is critical to business operations, Carlos stressed that any downtime or disruption could result in the inability to produce products, take orders, and ship to customers. The entire supply chain could be severely affected.

To quantify the impact, Carlos described how downtime in the manufacturing process led to problems with product build-up. Due to the shutdown of the ERP system, new orders cannot be accepted and existing orders cannot be fulfilled. This ultimately prevents products from reaching store shelves, resulting in lost revenue and customer satisfaction.

Therefore, Carlos emphasized that resiliency and uptime were very important when Georgia Pacific initially deployed SAP HANA. He outlined their multi-layered approach to achieving extremely high availability and robust disaster recovery protection.

The initial deployment includes three Availability Zones. In each region, they prepare a complete copy of the SAP HANA database along with associated application servers. This can withstand zone-level failures and avoid single points of failure.

On their critical SAP HANA database, they implemented additional data recovery protection. They utilize Pacemaker replication to keep databases synchronized across regions. This active-active configuration ensures fast failover in the event of database outage.

For application servers, they leverage CloudEndure disaster recovery to replicate server images. CloudEndure allows for immediate and automatic startup of replacement servers in the event of a failure.

Through these measures, they provide complete resilience against isolated issues in a single area. Carlos noted, however, that regional risks remain a threat and require additional protection.

Here, they decided to use EC2 Capacity Reservations. By pre-reserving capacity in a region, they ensure that recovery systems can be brought up during surges in demand from widespread outages.

Reserved capacity does incur additional costs. To offset this impact, Georgia Pacific repurposed its excess capacity into quality assurance (QA) systems, thereby reducing expenses. During testing or actual disaster recovery, the QA system can be paused to free up reserved capacity. This approach optimizes the cost structure while providing disaster recovery guarantees. However, Carlos noted that the scope of optimization efforts extends far beyond infrastructure configuration. "

"A key challenge is understanding the resiliency risks for their many critical workloads. To solve this problem, Georgia Pacific adopted Amazon Cloud Technology's Resilience Hub. By simulating their core applications and their recovery time objective (RTO) and recovery point objective (RPO) requirements, Resilience Hub allows for centralized risk viewing and assessment across environments. This allows them to standardize resiliency patterns and configurations across their entire portfolio based on application criticality. Automated assessments also speed up compliance assessment of new workloads and modifications. "

Carlos explained that while Resilience Hub provided deep insights into simulated applications, they still needed to enable broader management across the entire Amazon cloud technology footprint. Here, Trusteed Advisor’s resiliency checks provide critical visibility. By centrally aggregating Trusteed Advisor results, they can easily track resiliency consistency within the application and outside the Resilience Hub. This end-to-end view ensures comprehensive risk coverage and oversight. "

"Overall, Carlos summarized three key lessons they learned while using these Amazon cloud technology tools for optimization. First, incorporating Well Architected review questions into internal assessments adds more precise and targeted findings. Second, relying on automated assessments greatly improves the speed, consistency, and accuracy of reviews. Finally, implement tools like Resilience Hub and Trusted Advisor to enable end-to-end visibility and governance, enabling optimization across the organization. "

"As Carlos finished summarizing the detailed customer cases, Steven returned to the podium to summarize. He reviews how to use Trusted Advisor and Well Architected to guide, implement and manage the cloud optimization process.

Steven repeatedly emphasized that the core point is to understand that optimization is a continuous and iterative process, and its focus is to drive the realization of business value. He particularly emphasized leveraging the continuous development and best practices provided by Amazon Cloud Technology to support business-centered optimization processes.

Then, Steven and Alan answered the last question together and completed a detailed discussion on using Trusteed Advisor and Well Architected Framework to optimize workloads.

Here are some highlights from the speech:

Focus on how being a trusted advisor and adopting good architecture can help optimize the cloud computing technology landscape.

Amazon Cloud Technology's Well-Architected Tool provides guidance for communicating with stakeholders by selecting targeted questions and best practices.

As a trusted advisor, its integration with other services is critical to optimizing the improvement aspects of the cloud computing cycle.

A dedicated team of technical account managers assist clients in prioritizing those trusted advisor recommendations that will have the greatest impact on their business.

The team continues to improve monitoring mechanisms to address issues such as EBS throttling, which can cause application server downtime, as well as gain greater insight into Amazon Cloud usage quota limits.

Automated assessments increase efficiency by ensuring all desktop exercises are completed before implementing a disaster recovery plan.

Leadership summarized how Amazon's cloud technology division is achieving cost optimization progress while meeting the highest requirements and implementing rigorous management.

Summarize

The speaker elaborated on how to use Amazon Cloud Technology's Well-Architected framework and Trusteed Advisor to optimize cloud computing workloads. In addition to cost and performance, cloud optimization also involves security, operations, and other aspects to realize its maximum potential. The Well-Architected framework provides guidance for building secure, high-performance, resilient and efficient infrastructure, covering best practices in various areas such as reliability, security and cost optimization. Trusteed Advisor scans configurations and identifies optimization opportunities based on these best practices. Speakers recommended an iterative approach to cloud optimization that is driven by business outcomes. This process involves learning best practices, measuring them to identify gaps, and then making incremental improvements. Tools like Well-Architected Review and Trusteed Advisor can help identify potential issues. Prioritize issues based on urgency and business impact to ensure high ROI. A practical case demonstrates how these concepts can be applied to improve the resiliency and governance processes of a critical ERP application. The main outcome is the utilization of an automated and Well-Architected consistent architectural review process. Speakers emphasized that cloud optimization will be an ongoing process as new features emerge and business needs change.

Original speech

https://blog.csdn.net/just2gooo/article/details/134814255

Want to know more exciting and complete content? Visit re:Invent official Chinese website now!

2023 Amazon Cloud Technology re:Invent Global Conference - Official Website

Click here to get the latest global product/service information from Amazon Cloud Technology with one click!

Click here to get the latest product/service information from Amazon Cloud Technology China with one click!

Register an Amazon Cloud Technology account now and start your cloud journey!

[Free] Amazon Cloud Technology "Free trial of more than 100 core cloud service products"

[Free] "Free trial of more than 40 core cloud service products" of Amazon Cloud Technology China

Who is Amazon Cloud Technology?

Amazon Cloud Technology (Amazon Web Services) is the pioneer and leader of global cloud computing. Since 2006, it has been characterized by continuous innovation, technology leadership, rich services, and wide application And well-known in the industry. Amazon Cloud Technology can support almost any workload on the cloud. Amazon Cloud Technology currently provides more than 200 full-featured services, covering computing, storage, network, database, data analysis, robotics, machine learning and artificial intelligence, Internet of Things, mobile, security, hybrid cloud, virtual reality and augmented reality, media , as well as application development, deployment and management; the infrastructure covers 99 availability zones in 31 geographical regions, and plans to build 4 new regions and 12 availability zones. Millions of customers around the world, from startups, small and medium-sized enterprises, to large enterprises and government agencies, trust Amazon Cloud Technology. They use Amazon Cloud Technology services to strengthen their infrastructure, improve agility, reduce costs, accelerate innovation, and enhance competitiveness. Achieve business growth and success.

Guess you like

Origin blog.csdn.net/weixin_40272094/article/details/134814278