[Continued delivery practices] last kilometer, you need to have a quality publishing platform thinking!

Foreword

Publishing is the last kilometer continuous delivery.
Traditionally, the final release of the software is a stressful process that requires a lot of manual configuration, operation and teamwork. For reliability release, developers need to prepare a detailed deployment documentation, and then synchronize information related to the implementation of the deployment operation and maintenance personnel, to perform a series of personalized script released by operation and maintenance personnel, deployed after the testers also need to do a detailed the manual verification.
Each step has a lot of needs in human judgment and information communication thing, the slightest mistake will produce very human error caused a system failure, release time, and the results are unpredictable, busy into the early hours after the release, brains thinking about how let the application just deployed to work properly, and finally had to roll back often, scenes like this are common.
To solve this pain point, in addition to ways of using small step in the iterative software development, reduce delivery complexity givers, a set of easy to use, fast, stable, fault-tolerant and strong, need to have the ability to quickly roll back the distribution system is essential when This paper will focus on building some of the best practices in the process of publishing platform micro medicine.

The core characteristics

  • Covering mainstream applications: Java, NodeJS, Python, PHP, Lua, Android, IOS and so on.
  • Support batch release and gray release
  • Release pause and resume support
  • Support the rapid roll back versions of history
  • Application support and stop and restart
  • It supports multi-instance application log aggregation view
  • Support distributed publishing node
  • Quality red card before the release point
  • Real-time monitoring and analysis released
  • After the release quality and efficiency metric

Overall structure

 

 

 

  • Typical application publishing process:
    • After the system project, the application created in WCP- application management applications, including the development of language input, jdk version, build tools, deployment, product path, code address, the application responsible for applications such as basic information (application information is the basis for all research and development collaboration ).
    • After the application is created, the application to apply resources in WCP- resource management, automatically establish the relationship between applications and IP / domain (including test environment, pre-development environment, production environment, etc.) in the CMDB.
    • After the application of established publishing platform automatically generates publish job, obtain basic information applications and resources, can trigger the execution of gray or batch release issued at the release.
    • Before the release operation is performed automatically check the quality of the red line (including the pipeline execution result confirmation and issuing checklist), did not meet quality red refuse to release.
    • Publishing operation, the monitor automatically suspended after the first release or publish gray, automatic trigger monitor. If the monitor fails to stop publishing; if the monitored through, can continue to publish.
    • After the release operation, storing and delivering data acquisition, quality and efficiency is output to the kanban issuance data do metric (published success rates, issuance frequency, and long release time).
    • Published problem, perform fast rollback and other functions, and provide logs and publishing application log aggregation queries to quickly locate the problem.
  • Publishing platform interface:
     

Quality card before release point

Quality is the built-in features continuous delivery. In publishing this last kilometer, how do quality red card point by publishing platform automation, is an important feature of the platform release.
In the Pipeline pipeline, automatically triggering unit testing ,, static code scanning, security testing, integration testing after the development of the code submitted, constitute the development and testing of a pipeline to participate.
Published card point is an important means to guarantee the quality of the interaction, in order to achieve the goal of sustained delivery, we have R & D pipeline as a result of the implementation of quality red line (you can also manually increase the release of examination results of Table) as a way to protect the entire duration of delivery smoothly.

 


Automating common quality card point strategy:

 

  • R & D pipeline state
    • Unit testing results
    • Unit test coverage
    • Code static checking
    • Integration test results
    • Security scan results
  • Plans to release state (release planning management system)
    • Published window
    • Publish assessment results, etc.

Published in quality control

To safeguard the stability of the system, we each application monitoring platform configured with corresponding dial test monitoring points.
How to reduce false alarms when released, or avoiding major publishing process occurs here needs publishing platform monitoring platform and do with a series of sophisticated control strategies.

  • Published Scenario 1:
    • Description: The new version is released, when restarting an instance of the service, there is a certain probability that the application will trigger an alarm that instance, application developers will result in unnecessary interference.
    • Strategy: publishing platform before compiling the application pack up, perform the official release, call monitoring platform API to stop monitoring tasks in the application, in the same call after the release monitoring platform API to enable monitoring tasks in the application.
  • Published Scenario 2:

    • Description: When publishing application examples for various reasons (such as the deployment of code errors, obviously there is a new version of BUG, ​​etc.), there has been a system failure.
    • 策略:采用分批发布策略,各个实例发布完后立即触发该实例的监控,如果监控发现异常,标识该批次发布操作失败,并强制中止后续批次的发布操作,以避免更多的实例出现问题。
  • 逻辑流程

     

    这里需要强调的是,拨测监控覆盖率在微医会作为团队的重要指标。因为应用只有在配置监控点后,发布平台才能在发布过程中进行有效的监测和干预。

     

发布后质效度量

质效度量是研发协作平台的一个重要组成部分,主要质效指标将按照研发质量、研发效率、研发成本三方面进行细分。

 


其中在发布过程中产生的数据,将会输送给质效度量系统进行质效分析,重点包括

 

  • 发布频率
  • 发布时长
  • 发布成功率等。

所有团队和研发成员可以结合这些发布数据指标,发现自己存在的问题和短板,并进行有效改进。

分批发布

分批发布是批次进行应用部署,每次仅对应用的一部分实例进行升级。分批发布过程中如果出现故障,则终止回退,待问题修复后重新发布。
这里我们采用了比较简洁的批次分配算法,因为公司目前使用的是双机房IDC,当应用进行分批发布时,首批发布会在两个机房中随机各选择一个实例执行,其他的实例则放到了第二批发布。

 


选择发布暂停,则可在首批发布完后暂停发布,等人工确认首批发布的实例没有问题后,再执行后续其他实例的发布,如此可有效保障发布的稳定性。
发布过程中,可在发布平台中实时查看运行日志,若发现问题,可随时执行暂停、取消或者回滚等操作。

 

 

  • 最佳实践

每个实例进行部署时,需要保证没有请求会派发到该实例,否则用户就会看到502的错误。所以需要有一个“下线”的操作,把当前机器从负载均衡中摘除,然后在部署完成之后,再把自己挂回到负载均衡中,这个过程称为“上线”。
为了实现该目的,可基于OpenResty自研Nginx网关对实例上下线进行实时调度,基于 OpenResty 的 Nginx 网关的实现过程比较复杂,这里不再详尽展开。

问题响应

在发布过程中,如果出现了一些意料之外的情况,发布平台也提供了一些常用的功能,满足开发人员定位和处理问题的需要,同时也尽量避免开发人员直接登录服务器操作。

  • 查看日志

主要对接了公司统一的日志平台系统,可实时查看应用日志,并且聚合了多实例的日志信息,减少几个实例不停切换寻找问题的痛苦。

 

 

  • 重启或停止实例

某个实例故障时,可快速重启或停用实例。

 

 

  • 快速回滚

每个发布的版本发布平台都会有备份,当发布新版本发现问题时,可快速回滚到历史版本

 

 

Jenkins Pipeline

在整套发布平台中,Jenkins Pipeline提供了核心的构建、打包、部署以及分布式调度的底层基础能力,只不过为了更灵活的调度发布操作、管理应用与发布任务之间关系等,我们摒弃了Jenkins自身的UI界面,而通过发布平台调用Jenkins API的方式将其定位为基础引擎。
其中Jenkins Pipeline的共享库特性,让我们通过groovy编程的方式,很好的实现了发布脚本的版本管理,再也不用发愁怎么管理那堆凌乱的shell脚本了。
这里只截取一部分结构代码,Jenkins共享库的具体使用可参见之前的系列文章。

import groovy.json.JsonSlurper
def call(Map map) {
pipeline {
agent any
parameters {
//java应用参数
string(name: 'BUILD_TOOL', defaultValue: 'maven', description: '构建工具')
string(name: 'MAVEN_VERSION', defaultValue: 'maven3', description: 'maven构建工具版本')
string(name: 'GRADLE_VERSION', defaultValue: 'Gradle3.3', description: 'gradle构建工具版本')
string(name: 'WAR_RELATIVE_PATH', defaultValue: '', description: 'war包地址')
string(name: 'WAR_STD_NAME', defaultValue: '', description: 'war包地址')
string(name: 'POM_RELATIVE_PATH', defaultValue: '/pom.xml', description: 'pom文件地址')
string(name: 'HAS_TEMPLATES', defaultValue: 'false', description: '是否有模板文件')
string(name: 'TEMPLATES_RELATIVE_PATH', defaultValue: '', description: '模板文件路径')
string(name: 'JETTY_VERSION', defaultValue: '', description: 'jetty版本')
string(name: 'GRADLE_TASK', defaultValue: 'war', description: 'gradleTask打包方式')
......
}
tools {
gradle "${params.GRADLE_VERSION}"
jdk "${params.LANGUAGE_VERSION}"
maven "${params.MAVEN_VERSION}"
}
stages {
stage('部署正式环境') {
steps {
script {
def pmap = [:]

try {
//应用参数传递
pmap.put('BUILD_TOOL', BUILD_TOOL.trim())
pmap.put('WAR_RELATIVE_PATH', WAR_RELATIVE_PATH.trim())
pmap.put('WAR_STD_NAME', WAR_STD_NAME.trim())
pmap.put('POM_RELATIVE_PATH', POM_RELATIVE_PATH.trim())
pmap.put('HAS_TEMPLATES', HAS_TEMPLATES.trim())
pmap.put("TEMPLATES_RELATIVE_PATH", TEMPLATES_RELATIVE_PATH.trim())
pmap.put('JETTY_VERSION', JETTY_VERSION.trim())
pmap.put('GRADLE_TASK', GRADLE_TASK.trim())
......

} catch (MissingPropertyException ex) {
println("Catching the MissingPropertyException " + ex.messageWithoutLocationText)
......
}
pmap = Utils_EnvConfig(pmap)
//发布前监控调度
Utils_Monitor(pmap.isRestartMonitor,monitorApiDomain,appIpName,monitorTimeOut,true)
switch (ACTION) {
case "package":
java_package(pmap)
break
case "copy":
java_copy(pmap)
break
case "start":
java_start(pmap)
break
case "rollback":
java_rollback(pmap)
break
case "restart":
java_start(pmap)
break
case "stop":
java_start(pmap)
break
case "kill":
java_start(pmap)
break
case "backup":
java_backup(pmap)
break
default:
echo "pipeline do nothing please choose one step (package,copy,start,rollback,restart,stop,kill)"
}
//发布后监控调度
Utils_Monitor(pmap.isRestartMonitor, monitorApiDomain, appIpName, monitorTimeOut, false)
}
}
}
}
}
}

结语

一套好的发布平台可以充当最后的守卫角色,对交付给线上用户的产品进行最后的检查,将未达到要求的软件版本挡于门外,一套完善的自动化发布平台也往往会比制订各类书面上的发布制度更为有效。
这套发布平台从19年年初开始重构,从原有一个单纯驱动shell脚本操作的发布工具,逐渐进化成内嵌大量质量和效率特性的发布平台,过程收获良多。这里一方面得益于持续交付的先进工程理念,另一方面也是站在了Jenkins Pipeline以及内部积累的大量成熟基础设施之上,让我们在开发时事半功倍。
当然这套平台也还有不少可以继续加强的功能,比如灰度发布的能力、基于更多质量指标对发布前后智能分析的能力等等,这些都在规划和进行之中。
中秋节将至,有闲余时间码篇文章分享给大家,祝中秋节快乐吧。

Guess you like

Origin www.cnblogs.com/cay83/p/11512792.html