"DevOps Practice Guide" - Reading Notes (8)

Part 6 Integrating technical practices for information security, change management and compliance

In previous chapters, we discussed how to build a fast workflow from code submission to release, and a fast feedback flow in reverse. We also explore cultural practices that enhance organizational learning and amplify weak signals of failure, helping to create safer work systems.

In Part 6, we will further expand these activities to not only achieve development and operation goals, but also to achieve information security goals at the same time, ensuring and improving the confidentiality, integrity and availability of services and data.

We do not just conduct product security checks at the end of the development process, but we need to integrate security controls into the daily work of the development and operation and maintenance teams, making security a part of everyone's daily work (maybe this is what is called DevSecOps bar ). Ideally, this work would be automated and integrated into the deployment pipeline. In addition, we will optimize manual operations, acceptance and approval processes through automated controls, gradually reducing reliance on controls such as segregation of duties and change approval processes.

Not only do we need to improve security, but we also need to create processes that are more auditable and demonstrate the effectiveness of controls to comply with regulatory and contractual obligations. Relevant measures are as follows

  • Make safety part of everyone’s job;
  • Integrate preventive control code into shared code bases;
  • Integrate security with deployment pipelines;
  • Integrate security with monitoring for better detection and recovery;
  • Protect deployment pipelines;
  • Integrate deployment activities with the change approval process;
  • Reduce reliance on separation of duties.

Organizations gain better security when we integrate security into everyone's daily work and make it everyone's responsibility. Better security means we can protect our data and treat it wisely. This in turn means reliability and business continuity because availability is better and recovery from failures is easier. We can address security issues before catastrophic consequences occur and increase system predictability. Perhaps most importantly, we can do a better job of protecting our systems and data than ever before.

22. Integrate information security into everyone’s daily work

The biggest obstacle to implementing DevOps principles and models has always been " information security and compliance are not allowed ." However, within the technology value stream, DevOps may be the best way to better integrate information security into everyone's daily work.

Throughout this book, we explore how to fully integrate QA and operations goals within the technology value stream. This chapter will introduce how to similarly integrate information security goals into daily work, thereby improving the efficiency of development and operation and maintenance personnel, increasing system security, and improving information security.

22.1 Integrate security into demonstrations of development iterations

One of our goals is to have feature teams collaborate with the information security team as early as possible, rather than waiting until the end of the project. One way to do this is to invite information security staff to a product demo at the end of each development interval so that they can better understand the development team's goals in the context of the organizational goals and observe the development team's implementation and build process. And provide guidance and feedback at the earliest stages of the project, allowing more time and freedom for problem fixes.

When information security personnel become part of the team, even if they participate simply by being notified and observing the process, they gain the information they need about the business context to make better security risk decisions. Additionally, information security staff can help feature teams understand what is needed to achieve security and compliance goals.

22.2 Integrate security into defect tracking and postmortem meetings

If possible, we like to use an issue tracking system for the development and operations teams to manage all known security issues to ensure that security work is visible and can be prioritized alongside other work. This is completely different from the traditional way of working in information security management. Previously, all security vulnerabilities were stored in GRC (governance, risk and compliance) tools that only information security personnel had access to; now, we will put all the work to be done in the tools used by development and operations. in the system.

“We hold post-mortem meetings every time a security issue arises because it better educates engineers on how to prevent the issue from recurring and is a great mechanism for transferring security knowledge to the engineering team.”

22.3 Integrate preventive security controls into shared source code libraries and shared services

In Chapter 20, we created a shared source code repository that makes it easy for anyone to find and reuse the organization's collective knowledge—not just code, but toolchains, deployment pipelines, standards, and more. This allows everyone to benefit from the collective experience accumulated by everyone in the organization. Now we add to the shared source code base any mechanisms and tools that help ensure the security of our applications and environments. We will add security-protected libraries that meet specific information security goals, such as authentication and encryption libraries and services.

If there is a centralized shared services organization, you can also collaborate with it to create and run security-related shared platform services, such as authentication, authorization, logging, and other security and auditing services required for development and operations. When engineers develop some application modules that use these predefined libraries or services, they no longer need to arrange a separate security design review, but will use the information we created about configuration hardening, database security settings, key length, etc. safety guidelines.

In order to make the provided services and libraries used as correctly as possible, we can provide security training to the development and operation and maintenance teams and help them review the project products to ensure the correct implementation of the security goals, especially when the team uses these for the first time. tool time.

Our ultimate goal is to provide required security libraries or services for any modern application or environment, such as enabling user authentication, authorization, password management, data encryption, etc. In addition, you can provide efficient configuration of security aspects for components used in the application stack for development and operations teams, such as for logging, authentication, and encryption. It may also include the following related content:

  • The code base and its recommended configuration [e.g., 2FA (two-factor authentication library), bcrypt password hashing, logging];
  • Use Vault, sneaker, Keywhiz, credstash, Trousseau, Red October and other tools for secret key and password management (such as connection settings, encryption keys);
  • Operating system packages and builds (e.g. NTP for time synchronization, properly configured secure versions of OpenSSL, OSSEC or Tripwire for file integrity monitoring) to ensure critical security logs are logged to a centralized ELK system syslog log configuration in ).

We should also work with the operations team to create basic configuration manuals or build images of operating systems, databases, and other infrastructure (e.g., NGINX, Apache, Tomcat) to ensure that they are all in a known, safe, and low-risk state . The shared source code repository is not only the place to get the latest version, but also the place to collaborate with other engineers and monitor and alert for changes to security-sensitive modules.

22.4 Integrating security into the deployment pipeline

Previously, in order to harden an application, we would start a security review after the development work was completed. Typically, development and operations may receive hundreds of pages of PDF documents describing various security vulnerabilities. These problems end up being difficult to fix because they are discovered too late in the software life cycle and the opportunity to fix them easily has been missed, or due to project deadline pressure.

Now, we will automate as much of the information security testing as possible in this step. This way, (ideally) security tests can be run along with all other tests in the deployment pipeline every time a developer or operations staff commits code, even in the earliest stages of a software project.

Our goal is to provide developers and operations staff with rapid feedback so they can be notified when they submit changes that pose security risks. In this way, security issues can be quickly detected and fixed, and this part of the work can be integrated into daily work to prevent recurrence of problems while learning.

Ideally, these automated security tests would be run in parallel with other static code analysis tools in the deployment pipeline.

Tools like Gauntlt these can be integrated into deployment pipelines to perform automated security testing of applications, application dependencies, environments, and more. It is worth noting that all Gauntlt security tests use test scripts in Gherkin syntax format, which is widely used by developers for unit testing and functional testing.

22.5 Ensure application security

Usually, testing during the development phase focuses on the correctness of the functionality, focusing on the correct logical flow. This type of testing, often called a happy path , verifies the user's normal flow of operations (sometimes there are several alternative paths) - everything works as expected, with no exceptions or error conditions. On the other hand, QA people, information security people, and fraudsters often focus on the sad path , which occurs when things go wrong, especially related to security-related error conditions. (This type of security-specific condition is often jokingly called a bad path .

For example, consider an e-commerce website where customers enter their credit card number into a form when placing an order. We want to define all unpleasant and bad paths to ensure that invalid credit cards are rejected, thereby preventing fraud and security breaches such as SQL injection, buffer overflows and other undesirable consequences.

Ideally, we generate this as part of automated unit or functional tests rather than manually executing them so that these tests can be run continuously in the deployment pipeline. We expect to include the following as part of testing.

  • Static Analysis : This is the testing we perform in a non-runtime environment, expected in the deployment pipeline. Typically, static analysis tools will examine program code for all possible runtime behaviors and look for coding flaws, backdoors, and potentially malicious code (sometimes called "inside-out testing"). Such tools include Brakeman, Code Climate, and search suppressed code functions (for example, exec()).
  • Dynamic Analysis : As opposed to static testing, dynamic analysis consists of a series of tests that are executed while the program is running. Dynamic testing monitors items such as system memory, functional behavior, response time, and overall system performance. This approach (sometimes called " outside-in testing ") acts as if a malicious third party is interacting with the application. Such tools include Arachni and OWASP ZAP (Zed Attack Proxy). Some penetration tests can also be automated and should be used as part of dynamic analysis tools such as Nmap and Metasploit. Ideally, automated dynamic testing should be performed during the automated functional testing phase of the deployment pipeline, or even against services in production. To ensure the effectiveness of security measures, tools such as OWASP ZAP can be configured as a browser proxy for the attack service and network traffic can be viewed in the test tool.
  • Dependency component scan : This is another type of static test that is usually executed in the deployment pipeline at build time. It takes inventory of all packages and libraries that binaries and executables depend on, and ensures that these dependent components (which we generally have no control over) are free of vulnerabilities or malicious binaries. Gemnasium and bundler auditing for Ruby, Maven for Java, and OWASP dependency checking are a few examples.
  • Source code integrity and code signing : All developers should have their own PGP keys that can be created and managed in a system such as keybase.io. Everything committed to a version control system should be signed - configured directly using the open source tools gpg and git. Additionally, all packages created by CI should be signed and their hashes recorded in a centralized logging service for audit purposes.

Additionally, we should define design patterns that help developers write code that prevents abuse, such as setting rate limits for services and turning a pressed submit button into an unclickable state. OWASP publishes a wealth of useful guidance, such as the Cheat Sheet series, including the following:

  • How to store passwords;
  • How to deal with forgotten passwords;
  • How to handle logging;
  • How to prevent cross-site scripting (XSS) vulnerabilities.

22.6 Securing the software supply chain

Josh Corman pointed out that as developers, "we no longer write custom software, but assemble the open source components we need, which has become a software supply chain that we are very dependent on." In other words, when we use various (commercial or open source) components or libraries in our software, we inherit not only their functionality, but also any associated security vulnerabilities .

When selecting software, we detect whether software projects rely on components or libraries with known vulnerabilities, and help developers carefully choose which components to use - choosing components that have been proven in the past and whose software vulnerabilities can be quickly fixed (such as open source projects ). We also need to identify multiple versions of libraries used in production, especially older versions with known vulnerabilities.

22.7 Ensure a safe environment

During this step, any work that can help harden the environment and reduce risks should be done. Although we may have established a known good configuration, we must ensure through monitoring that all production servers match these known good states. We use automated testing to ensure that all necessary settings have been correctly applied, including security hardening configuration, database security settings, key length, etc. Additionally, we will use testing to scan the environment for known vulnerabilities.

Another type of security verification method is to understand the actual environment (that is, the "actual state"). Such tools include Nmap to ensure that only expected ports are open, and tools to ensure that specific known vulnerabilities have been adequately hardened Metasploit, such as scanning for SQL injection attacks. The output of these tools should be put into an artifact repository and compared to previous versions as part of the functional testing process. This way, if any undesirable changes occur, they can be detected immediately.

22.8 Integrating information security into production telemetry

Internal security controls often fail to detect breaches successfully and in a timely manner. This is due to blind spots in monitoring or no one in the organization checking relevant telemetry on a daily basis.

We deploy the necessary monitoring, logging and alerting systems to fully achieve information security objectives across applications and environments, and ensure they are sufficiently centralized for simple, meaningful analysis and response.

By integrating security telemetry into the tools used by development, QA, and operations, everyone in the value stream can see how applications and environments behave in response to malicious threats, including attackers continually trying to exploit vulnerabilities and gain unforeseen consequences. Authorized access, implanting backdoors, performing fraud, denial of service and other sabotage activities.

Exposing the process of a service being attacked in production publicly for everyone forces everyone to consider security risks and design countermeasures in their daily work.

22.9 Building a secure telemetry system into your application

Questionable user behavior can reveal or trigger fraud and unauthorized access, and in order to detect it, relevant telemetry systems must be created within the application. Examples include:

  • Successful and unsuccessful user logins;
  • User password reset;
  • User email address reset;
  • User credit card changes

22.10 Establishing secure telemetry systems in your environment

In addition to refining the application, a comprehensive telemetry system needs to be created in the environment to enable early detection of signs of unauthorized access, especially for components running on uncontrolled infrastructure (e.g., hosted environments, cloud).

We need to monitor and alert for certain events, including:

  • changes to operating systems (e.g., in production environments, in build infrastructure);
  • Changes to security groups;
  • Configuration changes (e.g., OSSEC, Puppet, Chef, Tripwire);
  • Cloud infrastructure changes (e.g., VPCs, security groups, users, and permissions);
  • XSS attempts (i.e. "cross-site scripting attacks");
  • SQLi attempts (i.e. "SQL injection attacks");
  • Web server errors (for example, 4XX and 5XX errors).

We also want to confirm that logging is configured correctly so that all telemetry information is sent to the right place. When monitoring attacks, in addition to recording events, you can also choose to intercept access and save information about the source and target of the access to help us choose the best evasive measures.

22.11 Securing the deployment pipeline

The infrastructure that supports continuous integration and continuous deployment processes has also become a new area of ​​vulnerability. For example, if someone compromises the server running a deployment pipeline that stores login information for a version control system, they can steal the program's source code. Worse yet, if an account in the deployment pipeline has write access, an attacker could inject malicious changes into the version control system and thus into applications and services.

To fully protect the integrity of applications and environments, attack vectors against the deployment pipeline must also be reduced. Risks include developers introducing code that results in unauthorized access (mitigated by controls such as code testing, code reviews, and penetration testing), and unauthorized users gaining access to applications or environments (mitigated by controls such as ensuring configuration is
always In a known and good state, patched promptly and efficiently)

However, to protect your continuous build, integration, or deployment pipeline, risk mitigation measures may also include:

  • Harden your continuous build and integration servers and ensure they can be rebuilt in an automated manner, just like your customer-facing production service infrastructure, to prevent breaches of your continuous build and integration servers;
  • Review any changes committed to the version control system - either pair programming at commit time or setting up a code review process between commits and trunk merges - to prevent the continuous integration server from running uncontrolled code (e.g. unit tests might Contains malicious code that allows or triggers unauthorized access);
  • Detect when test code containing suspicious API calls (for example, unit tests that access the file system or network) is checked into the code base, and can then be immediately isolated and trigger a code review;
  • Ensure that each CI process runs in its own isolated container or virtual machine;
  • Ensure that the version control credentials used by the CI system are read-only.

22.12 Summary

This chapter describes ways to integrate information security objectives into all phases of daily work. That is to ensure that all on-demand environments are in a hardened, low-risk state by integrating security controls into the mechanisms that have been created, and to ensure that secure telemetry systems are created in pre-production and production environments by integrating security testing into the deployment pipeline. . In this way, we can improve overall security while improving development and operation and maintenance efficiency. Next, we will secure the deployment pipeline.

23. Protect the deployment pipeline

This chapter explores how to secure deployment pipelines and achieve security and compliance goals within a controlled environment, including change management and segregation of duties.

23.1 Integrate security and compliance into the change approval process

Almost all IT organizations of a certain size have their own change management process, which is the main control method to reduce operational and security risks. Compliance managers and security managers rely on change management processes to meet compliance needs and often require evidence that all changes have been appropriately authorized.

If the deployment pipeline is built correctly to reduce deployment risk, most changes will not need to go through a manual approval process because we will rely on controls such as automated testing and proactive production environment monitoring.

In this step, we will take steps to ensure that security and compliance are successfully integrated into any existing change management processes. An effective change management policy will recognize that different types of changes pose different risks, and that there are different ways to deal with different changes. ITIL defines these processes and divides changes into the following three types.

  • Standard Change : A low-risk change that follows an established approval process, but can also be pre-approved. Changes of this type include monthly updates to application tax tables or country codes, content and style changes to the website, and application or operating system patches with known impact. Change applicants do not need approval before deploying changes. Deployment of changes can be fully automated and logs should be left for traceability.
  • Routine changes : Changes that are higher risk and require review or approval by an authority. In many organizations, it is unreasonable to place approval responsibilities in the hands of a Change Advisory Board (CAB) or Emergency Change Advisory Board (ECAB), as they may lack the necessary expertise to understand the full impact of the change, often resulting in intolerable Long lead times. This problem is particularly acute for large-scale code deployments. Large-scale deployments may involve hundreds of thousands (or even millions) of lines of code committed by hundreds of developers over the course of several months of development. In order to complete the authorization of routine changes, the Change Advisory Board will almost certainly require a clearly defined change request (RFC) to provide the information needed to make decisions. A change request typically contains the desired business results, the effectiveness and assurance of the plan, a business case outlining the risks and mitigation options, and a recommended timeline.
  • Emergency changes : Changes that must be put into the production environment immediately in an emergency (for example, emergency security patches, restoring services) are potentially high-risk changes. These changes typically require senior management approval, but it is also possible to implement the changes first and add documentation later. A key goal of DevOps practices is to streamline routine change processes so that they work equally well for emergency changes.

23.2 Reclassify a large number of low-risk changes as standard changes

Ideally, by establishing a solid deployment process, we have developed a reputation for fast, reliable, and non-dramatic deployments. On this basis, the operation and maintenance department and relevant change organizations should be convinced that the changes we make have low risk and can be classified as standard changes pre-approved by the Change Advisory Board. This allows deployment directly to production without further approval, although of course the changes should still be documented correctly

To demonstrate that a change is low-risk, a good idea is to show a history of changes over a long period of time (e.g., months or quarters) and compile a complete list of issues in production over the same period. If we can demonstrate a high change success rate and a low mean time to recovery (MTTR), we can assert that we have a controlled environment that effectively prevents deployment errors and demonstrate the ability to quickly detect and correct any issues.

Even if these changes are classified as standard changes, they still need to be recorded in a change management system (for example, Remedy or ServiceNow) for the purpose of visual management. Ideally, deployments will be automated using configuration management tools and deployment pipeline tools (e.g., Puppet, Chef, Jenkins), and deployment results automatically logged. This way, everyone in the organization (DevOps or not) is aware of our changes, and also aware of all other changes happening in the organization.

Establishing this traceability and context should be easy and should not burden engineers with too heavy or time-consuming work. Linking to user stories, requirements, or defects is almost entirely sufficient; more details (e.g., creating a ticket for every code commit in a version control system) may not be meaningful or necessary as they would add significant Resistance to daily tasks.

23.3 How to handle general changes

Changes that cannot be classified as standard changes are routine changes and require approval by at least some members of the Change Advisory Board before deployment. In this case, even if the deployment is not fully automated, the goal is still to ensure rapid deployment.

In this case, it is important to ensure that the submitted change request is as complete and accurate as possible, providing the Change Advisory Board with all the information needed for a proper evaluation. If a change request is incorrectly formatted or has incomplete information, it will be bounced, which, in addition to increasing the time it takes to get to production, can also cast doubt on whether we truly understand the true goals of the change management process.

We can basically automate the creation of complete and accurate change requests, filling the tickets with the correct change details. For example, you can automate the creation of a ServiceNow change ticket and include links to the user stories in JIRA, the build manifest and test output from the deployment pipeline tool, and links to the Puppet/Chef scripts to be executed.

Once a change request is submitted, the changes will be reviewed, processed and approved by the relevant members of the Change Advisory Board. All change requests are handled the same way. If all goes well, the change organization appreciates the thoroughness and richness of detail of the change order because we allow them to quickly verify the correctness of this information we submit (e.g., view the artifacts that the deployment pipeline tool links to). However, the goal should be to consistently demonstrate a track record of successful changes so that eventually they agree that automated changes can be safely classified as standard changes.

23.4 Reduce reliance on separation of duties

For decades, we have used segregation of duties as one of the primary controls to reduce the risk of fraud or mistakes during software development. It is an accepted practice in most software development lifecycles to require developer changes to be submitted to a repository administrator for review and approval, and then IT Operations to deploy the changes to production.

When the production environment is deployed infrequently (such as once a year) and the work is not complex, work division and work handover are feasible business methods. However, as complexity and deployment frequency increase, successfully executing production deployments increasingly requires that everyone in the value stream can quickly see the results of the work being performed.

Separation of duties can hinder the above requirements by slowing down and reducing engineers' ability to receive feedback on their work. This prevents engineers from taking full responsibility for the quality of their work and reduces the company's ability to create organizational learning.

Therefore, where possible, the use of separation of duties as a means of control should be avoided. We should choose pair programming, continuous code check-in and code review, etc., which can provide the necessary guarantee for the quality of work. Additionally, after implementing these controls, if separation of duties is required, we can demonstrate that the controls we have created achieve the same results.

23.5 Ensure documentation and evidence are retained for auditors and compliance personnel

As technology organizations increasingly adopt DevOps models, the relationship between IT and audit has become more tense than ever. These new DevOps models challenge traditional thinking about auditing, controls, and risk avoidance.

23.6 Summary

This chapter discusses practices for holding everyone accountable for information security, integrating all information security goals into the daily work of everyone in the value stream. This significantly increases the effectiveness of controls, allowing for better prevention of security breaches and faster detection and recovery from security breaches. In addition, the time spent preparing for and passing compliance audits is significantly reduced.

23.7 Summary of Part 6

Part 6 explores how DevOps principles can be applied to information security to help us achieve our goals and ensure security is part of everyone's daily routine. Better security ensures we protect data, treat it wisely, and recover before security issues lead to disaster. Most importantly, we can make systems and data more secure than ever before.

24. Take Action - Book Summary

We have discussed the principles and technical practices of DevOps in detail. In this era of frequent security breaches, shortening delivery cycles, and large-scale technology transformation, DevOps emerged as technology leaders must simultaneously address the challenges of security, reliability, and flexibility. I hope this book can help readers deeply understand the problem and find solutions to the problem.

As this book has always emphasized, if not managed properly, conflicts between developers and operations personnel will worsen day by day, resulting in long time to launch new products and features, substandard quality, increased attrition, accumulation of technical debt, and loss of productivity. If the situation is low, employee dissatisfaction and burnout will become more and more serious.

DevOps principles and patterns can resolve this core conflict. After reading this book, I hope readers will understand how DevOps transformation can create a learning organization, how to speed up processes, create first-class reliability and security products, and how to improve corporate competitiveness and employee satisfaction.

The practice of DevOps requires new corporate culture and management norms, and will also change technical practices and architecture. Cross-department collaboration is crucial, including management, product management, development teams, quality assurance teams, IT operations, information security and even marketing personnel. Only by working together can all departments effectively build a safe working system. This helps small teams quickly and independently develop and verify, and safely deploy customer service-related code, and only then is technological innovation possible. This approach maximizes developer productivity, motivation, satisfaction, and the organization's ability to win in the market.

We know the dangers of resting on our laurels and the difficulty of changing daily working habits. We also understand the risks and costs of introducing new ways of working in organizations. We also know that DevOps is just a grain of salt in the long history and will soon be replaced by new popular methods.

But we firmly believe that DevOps will transform the technology industry just as Lean transformed manufacturing in the 1980s. Those organizations that embrace DevOps will win in the market, and those that reject DevOps will pay the price. Organizations that embrace DevOps create learning organizations that are passionate about continuous improvement and innovate to outperform their competitors.

Therefore, DevOps is not just a technical imperative but also an organizational imperative. The most critical point is that DevOps is universal and is especially suitable for organizations that must improve work processes through technical means while ensuring the high quality, reliability and security of products.

Don't be cynical about it. People involved in change understand that of course they may fail in what they do, but regardless of success or failure, they always try. The greatest significance of trying is to inspire others through practice. Innovation cannot succeed without taking risks. If you're not upsetting some management, you're probably not trying hard enough. Don’t let your organization’s immune system block or interfere with your vision. As Amazon’s former “disaster master” Jesse Robbins said: “Do the right thing and get fired.”

DevOps will greatly benefit all participants in the technology value stream, whether we are developers, operations engineers, quality assurance engineers, information security personnel, product managers or customers, it can give us the ability to develop great products And the pleasure produced. It provides humane working conditions and allows us to spend more time with our loved ones. It enables teams to work together, learn, grow, delight and benefit customers, while helping the organization succeed.

Guess you like

Origin blog.csdn.net/u010230019/article/details/132894117