Use Terraform and event-driven Amazon CodeBuild to improve the operation and maintenance efficiency of data applications on the cloud

Background Information

In the process of enterprise customers deploying a series of data applications on the cloud, the data development team is often responsible for the script content, and the management of a series of cloud resources behind it is usually implemented by a cloud operation and maintenance functional team through IaC (Infrastructre as Code) . However, when the data development team develops and deploys the corresponding script content, it will inevitably involve changes in resources on the cloud, such as the addition and modification of Glue and Lambda resources. This has resulted in a tight coupling between the two teams at the functional boundary: the iterative content of the data development team needs to submit requirements to the cloud operation and maintenance team for corresponding IaC operation and maintenance, which increases the workload of both parties.

The Amazon cloud technology developer community provides developers with global development technology resources. There are technical documents, development cases, technical columns, training videos, activities and competitions, etc. Help Chinese developers connect with the world's most cutting-edge technologies, ideas, and projects, and recommend outstanding Chinese developers or technologies to the global cloud community. If you haven't paid attention/favorite yet, please don't rush over when you see this, click here to make it your technical treasure house!

 

Overview of the optimization scheme

In order to alleviate the additional pressure brought by the addition and modification of data application codes to both parties, this article starts with a case to optimize the key processes in the process of data application addition and modification and deployment: the data development team calls the corresponding Terraform module in the form of an interface, and cooperates with Amazon CodePipeline Or EventBridge-driven event mode to implement CI/CD pipeline.

In this case, the cloud operation and maintenance team is responsible for the deployment and operation and maintenance of the IaC module, and uses Terraform Cloud Workspace for IaC code release and management. The data development team is responsible for developing Glue scripts for specific ETL task scenarios, using CodeCommit for code management, using CodeBuild to implement CI/CD content, and finally realizing the serial connection of CI/CD pipelines through CodePipeline or EventBridge. The two teams work together to realize the following scenarios:

"The hr department of an enterprise needs to ingest a data source into MySQL for downstream data applications. After the data engineer completes the Glue script development, use the Glue template developed by the cloud operation and maintenance team to create new Glue script (Python shell template) resources in batches. When subsequent data engineers create or modify Glue scripts, this set of pipelines can automatically capture the changed content in CodeCommit and synchronize the content to s3. The changes in s3 will directly reflect the function of triggering new/updated resources in Terraform, without IaC Dev/Cloud Ops team involved.”

The optimization plan below will clearly define the responsibility boundaries of the cloud operation and maintenance team and the data development team when developing and maintaining data applications on the cloud.

Implementation steps

(1) Unified process and norms

Confirm the key processes and steps between the data development team and the cloud operation and maintenance team, including how to implement the CI/CD pipeline, how to upload and store Glue scripts, configuration information required for resources (such as instance type, required IAM permissions, network )wait.

(2) Terraform script development

The cloud operation and maintenance team is responsible for the development of IaC scripts for Glue resources, including: configuration parameters, codes for adding/changing resources. The developed content will be uniformly placed in the glue-etl directory. An example of the contents of this directory is as follows:

|____glue-etl
| |____output.tf
| |____data.tf
| |____main.tf
| |____Readme.md
| |____policy.tf
| |____variables.tf

The cloud operation and maintenance team encapsulates the glue-etl module ( module ) and publishes it to the corresponding Workspace of Terraform Cloud.

The above glue-etl module contains the following:

  • tf contains a series of parameters output by this module.
  • tf contains references to some existing resources in the Amazon environment, such as: the current region, current user information, the Secret Manager key string of the database that the Glue script needs to access, and the subnet group required to deploy Glue resources Wait for the necessary information.
  • tf contains the relevant IAM Policy collection corresponding to the IAM role required for Glue execution.
  • tf contains a series of configuration parameters that need to be passed in by the user to call this module.

Due to space issues, the specific code content of the above .tf has been omitted.

(3) Monitor s3 content changes

After the cloud operation and maintenance team completes the development of the glue-etl module and uploads it to Terraform Workspace, the data development team needs to initialize a .tf file, and use the local keyword to upload the script to the path (such as the variables bucket_name, job_path_prefix and line_of_business) to the .tf file.

locals {
  bucket_name = "sample-bucket-glueetl"
  job_path_prefix = toset(["hr-mysql-source1-python-scripts"])
  line_of_business = "hr-department"
}

The second step is to obtain the storage path of the Glue script on s3 through data.aws_s3_bucket_objects provided by Terraform.

data "aws_s3_bucket_objects" "glue_job_objects_for_people_mdm_staging" {
  for_each = local.job_path_prefix
  bucket   = local.bucket_name
  prefix   = "${local.line_of_business}/${each.key}"
}

Next, configure the input parameters required by the Glue module. The following example shows how to map the Glue job name to the uploaded script name through string operations (the mapping rules can be customized, in this example, the prefix of the .py file is used as the Glue job name, see Figure 8), and put into the job In the local variable of -name-map. In actual application, you may need to configure more than one local variable as the input parameter of the module.

locals {
  job_name_map  = { 
for job_prefix in 
[for job_name in 
[for py_name in data.aws_s3_bucket_objects.glue_job_objects_for_people_mdm_staging["hr-mysql-source1-python-scripts"].keys : split("/", py_name)[2]
] : split(".", job_name)[0]
] : job_prefix => "${job_prefix}.py" if job_prefix != "" }
}

Finally, create Glue Python shell scripts of a certain specification in batches by calling the module (glue-etl in this example) in Terraform Cloud Workspace.

module "glue-etl-type1" {
  source                                  = "app.terraform.io/repo/glue-etl/aws"
  subnet_list                             = ["subnet-1","subnet-2","subnet-3"]
  bucket_name                             = local.bucket_name
  line_of_business                          =  local.line_of_business
  secret_manager_id                       = "some-secretmanager-id"
  if_connection                           = true
  conn_name                               = local.connection_name_staging
  glue_job_name_list_for_python  = local.job_name_map
  max_concurrent_runs_for_python = 4
  max_retries_for_python         = 0
}

(4) Implement CodeBuild-driven CI/CD pipeline

This article uses EventBridge to connect CodeCommit and CodeBuild in series. You can also choose Amazon CodePipeline to achieve the same function according to your usage habits. Before starting, please ensure that the corresponding Amazon CodeCommit and CodeBuild have been initialized.

Set the EventBridge rules triggered by the CodeCommit warehouse add and change events, as shown below.

{
  "source": [
    "aws.codecommit"
  ],
  "detail-type": [
    "CodeCommit Repository State Change"
  ],
  "detail": {
    "event": [
      "referenceCreated",
      "referenceUpdated"
    ]
  }
}

Configure the Input Transformer for this rule, define the input path and input template respectively, as follows:

{"referenceType":"$.detail.referenceType","region":"$.region","repositoryName":"$.detail.repositoryName","account":"$.account","referenceName":"$.detail.referenceName"}

{"environmentVariablesOverride": [
      {
          "name": "REFERENCE_NAME",
          "value": <referenceName>
       },
      {
          "name": "REFERENCE_TYPE",
          "value": <referenceType>
       },
      {
          "name": "REPOSITORY_NAME",
          "value": <repositoryName>
       },
      {
          "name": "REPO_REGION",
          "value": <region>
       },
       {
          "name": "ACCOUNT_ID",
          "value": <account>
       }
 ]}

Configure buildspec.yml to reflect the specific process of CI/CD pipeline. In this example, the pipeline content includes:

  • Install git-remote-codecommit and other Python dependencies required in the code (in this example, use Makefile to install dependencies) or commands (such as Terraform in this example)
  • Realize the CI process of ETL script or .tf file code, such as code quality inspection, syntax inspection, security vulnerability scanning, Unit Test, etc.
  • When the CI process is over, synchronize the updated code in CodeCommit to the s3 path where the Glue content is stored. After s3 receives the update code, perform the following operations:
  • Syntax checking for Terraform ( terraform fmt, validate & lint )
  • Resource change check ( terraform plan )
  • Final release ( terraform apply )
AWS CodeBuildversion: 0.2

env:
  variables:
    TF_VERSION: "1.0.6"
    
phases:
  install:
    runtime-versions:
      python: 3.8
    commands:
      - pip install git-remote-codecommit
      - make install
  pre_build:
    commands:
      - echo Hello pre build
      - cd /usr/bin
      - "curl -s -qL -o terraform.zip https://releases.hashicorp.com/terraform/${TF_VERSION}/terraform_${TF_VERSION}_linux_amd64.zip"
      - unzip -o terraform.zip
      - cd -
  build:
    commands:
      - echo build
      - make format
      - make lint
      - make test
      - env
      - git clone -b $REFERENCE_NAME codecommit::$REPO_REGION://$REPOSITORY_NAME
      - dt=$(date '+%d-%m-%Y-%H:%M:%S');
      - echo "$dt" 
      - aws s3 sync . s3://sample-bucket-glueetl/hr-mysql-source1-python-scripts/
      - terraform init
      - terraform fmt -recursive
      - terraform validate
      - terraform apply -auto-approve
  post_build:
    commands:
      - echo post build 
      - echo "terraform fmt & validate apply completed on `date`"
      - echo "Makefile completed on `date`"

Upload the buildspec.yml file to the corresponding CodeCommit warehouse, create a new CodeBuild project and point to the warehouse, use EventBridge as an event trigger to monitor CodeCommit content changes, and output events to CodeBuild to implement a complete set of CI/CD pipelines. The schema looks like this:

image.png

Precautions

  • In order to implement the above solution, you need to pay attention to the access permissions between various Amazon services and whether the required IAM role execution permissions are sufficient.
  • The method discussed in this article cannot fully automate the creation of resources for Glue scripts with different configurations. The data development team needs to call the corresponding Terraform module again and repeat the above process as needed.
  • The solution provided in this article is only for the scenario of using Amazon Code components to manage code versions and releases. For external code management components and CI/CD tools, this article does not discuss further.

Summarize

Through a specific case, this article shows that data developers call remote IaC modules (modules) through Terraform Cloud Workspace, combined with Amazon CodeCommit and Amazon CodeBuild driven by EventBridge to develop CI/CD pipelines, automatically capture changes in data application scripts and Create corresponding cloud resources in batches. By automating the resource management and code change release process related to data applications, the cloud operation and maintenance team reduces the management pressure brought about by adding/changing code assets – they no longer need to care about the additional code changes in data applications The workload, while the data development team can also focus on the code development and operation and maintenance of ETL scripts, without worrying about the subsequent impact of code changes on resources on the cloud.

reference documents

[1]  Use the Amazon Code component to automatically back up data to s3

[2]  Use Input Transformer to customize EventBridge event information

The author of this article

image.png

Mao Yuanqi

Data Scientist on the Amazon Professional Services team. Responsible for statistical learning, machine learning, data mining, and related consulting services in cloud data platform design. The service industry includes medical care, finance, unmanned driving, etc., and has accumulated rich experience in development, operation and maintenance

image.png

Liang Yu

DevOps consultant of Amazon professional services team, mainly responsible for the implementation of DevOps technology. Especially keen on cloud native services and related technologies. In his spare time, he enjoys sports and traveling with his family.

Article source: https://dev.amazoncloud.cn/column/article/6309c09ed4155422a4610a46?sc_medium=regulartraffic&sc_campaign=crossplatform&sc_channel=CSDN 

Guess you like

Origin blog.csdn.net/u012365585/article/details/132417869