Build a flexible and highly available continuous integration environment based on Jenkins, Apache Mesos and Marathon

Abstract: Continuous Integration (CI) is a software development practice. When used properly, it will greatly improve the efficiency of software development and ensure the quality of software development. This article discusses how to use Jenkins, Apache Mesos and Marathon to build a resilient, highly available continuous integration environment.

 

[Editor's note] The development practice of continuous integration is a hot topic at present. In this article, Zhou Weitao, head of the cloud platform of Shuren Technology, analyzes the use of open source Jenkins, Apache Mesos and Marathon to build a flexible and highly available continuous integration environment. In practice, it introduces the environment settings in detail, deploys the Jenkins master instance on Marathon, configures the Jenkins Master to achieve elastic scaling, creates a git repo on the internal code base or github, and uses marathon to deploy the persistent Jenkins Master and other steps.


Continuous Integration (CI) is a software development practice. When used properly, it will greatly improve the efficiency of software development and ensure the quality of software development; Jenkins is an open source project that provides an easy-to-use continuous integration system; Mesos is a An open source unified resource management and scheduling platform under Apache, it is called the kernel of distributed systems; Marathon is a framework for managing long-running applications registered on Apache Mesos. If you compare Mesos to For the data center Kernel, then Marathon is the daemon of init or upstart.

This article aims to explore how to use Jenkins, Apache Mesos and Marathon to build a flexible, highly available continuous integration environment.

Why run Jenkins on Apache Mesos

The main purpose of running Jenkins on Apache Mesos, or using Apache Mesos to provide slave resources to Jenkins, is to use Mesos' elastic resource allocation to improve resource utilization. By configuring the Jenkins-on-Mesos plug-in, Jenkins Master can dynamically apply for slave nodes to Mesos according to actual needs during job construction, and return the nodes to Mesos after the completion of the construction for a period of time.

At the same time, Marathon will perform a health check on the application published on it, thus automatically restarting the application if the application crashes unexpectedly for some reason. In this way, choosing to use Marathon to manage the Jenkins Master ensures the global high availability of the build system. Moreover, the Jenkins Master itself is deployed and run in the Mesos resource pool through Marathon, which further realizes resource sharing and improves resource utilization.

The following two figures illustrate the Marathon deployment of the Jenkins Master to the Mesos resource pool, and the entire process of the Jenkins Master using the Mesos resource pool to build jobs.

Environment settings

For ease of understanding, here I have simplified the architecture of the Mesos/Marathon cluster and no longer consider the high availability of the cluster itself. As for how to use zookeeper to configure a highly available mesos/marathon cluster, you can refer to the official documentation of Mesosphere, which will not be expanded here.

I built a Mesos cluster with 40 nodes 192.168.3.4-192.168.3.43, one of which is used to run Marthon and Mesos-master, and the other 39 nodes are used as slaves of mesos, as shown below.

After the configuration starts Marathon, Mesos-Master and Mesos-Slave, the entire operation below will be done on this cluster.

Deploy the master instance of Jenkins on Marathon

Marathon supports web pages or RESTapi to publish applications. Execute the following bash command on the 192.168.3.* intranet, and a Jenkins master instance will be started on the mesos slave through Marathon's RESTapi.

If the Jenkins master instance is successfully deployed, visit http://192.168.3.4:8080 through a browser (please make sure your browser can access the intranet, for example, you can do it by setting a browser proxy, etc.) can be found in the running tasks list Find jenkins in , click to enter the details page, we will see the following picture:

Visit http://192.168.3.4:5050/#/frameworks and find Marathon in Active Frameworks, click to enter the details page, you can find which Slave Jenkins Master runs on Mesos on this page, as shown in the following figure:

Click on sandbox

Configure Jenkins Master to achieve elastic scaling

The next step is to configure Jenkins to register as the Mesos Framework. You need to visit http://192.168.3.25:31052/ through a browser to come to the UI page of Jenkins Master. The screenshot below is the whole process of my step-by-step configuration.

1. Click "System Settings" in "System Management"

2. Set the Mesos Master to 192.168.3.4:5050; click the "Test Connection" test link, after the link is displayed successfully, click "Apply" to save the settings.


Jenkins is successfully registered on Mesos, visit http://192.168.3.4:5050/#/frameworks, we can find the jenkins Framework, as shown in the following figure:

Now we can start multiple build jobs at the same time to see the elastic scaling of Jenkins on Mesos, create a new project named test on http://192.168.3.25:31052/, and configure its build process to run a shell command top ,As shown below:

把该工程复制3份test2、test3和test4,并同时启动这4个工程的构建作业,Jenkins Master会向Mesos申请资源,如果资源分配成功,Jenkins Master就在获得的slave节点上进行作业构建,如下图所示:

因为在前面的系统配置里我们设置了执行者数量为2(即最多有两个作业同时进行构建),所以在上图中我们看到两个正在进行构建的作业,而另外两个作业在排队等待。

下图展示了当前的Jenkins作业构建共使用了0.6CPU和1.4G内存:

正在使用的slave节点的详细信息:

配置Jenkins Slave参数(可选)

在使用Jenkins进行项目构建时,我们经常会面临这样一种情形,不同的作业会有不同的资源需求,有些作业需要在配置很高的slave机器上运行,但是有些则不需要。为了提高资源利用率,显然,我们需要一种手段来向不同的作业分配不同的资源。通过设置Jenkins Mesos Cloud插件的slave info,我们可以很容易的满足上述要求。 具体的配置如下图所示:

至此我们利用mesos为jenkins弹性的提供资源,同时配置Jenkins Slave的参数来满足不同作业的资源需求,提高了集群的整体资源利用率。并通过Marathon 会自动检查运行在它之上的app的健康状态, 并重新发布崩溃掉的应用程序功能,实现了集群系统的部分高可用功能。接下来我们看看如何解决数据持久化的问题。

如何解决Jenkins Master的数据持久化问题

marathon会在Jenkins Master因意外崩溃后重新部署其到某个mesos slave节点上,但marathon无法维护应用程序的数据,即我们需要一个 Jenkins Master 的数据持久化方法,由于Jenkins Master是将数据存储在XML文件而不是数据库中,这里可以利用jenkins插件SCM Sync configuration plugin来将Jenkins Master的数据同步到相应的repo。

在内部的代码库或者 github 上创建一个 git repo

我们需要在内部的代码库或者公共代码库创建一个名为 jenkins-on-mesos 的 gitrepo , 譬如:[email protected]:wtzhou/jenkins-on-mesos.git 。 这个 repo 是 jenkins 插件 SCM Sync configuration plugin 用来同步jenkins数据的。

另外,对于SCM-Sync-Configuration来说,非常关键的一步是保证其有权限 pull/push 上面我们所创建的gitrepo。 以我们公司的内部环境为例, 在mesos集群搭建时,我们首先使用ansible为所有的mesos slave节点添加了用户core并生成了相同的ssh keypair,同时在内部的gitlab上注册了用户core并上传其在slave节点上的公钥,然后添加该用户core为repo [email protected]:wtzhou/jenkins-on-mesos.git的developer或者owner,这样每个mesos slave节点都可以以用户core来 pull/push 这个gitrepo了。

使用 marathon 部署可持久化的 Jenkins Master

我们首先需要wget两个文件:

其中start-jenkins.app.sh是需要配置的,

编辑如下3个变量:

1. SCM_SYNC_GIT: 上面所配置的 gitrepo 地址, 格式例子: [email protected]:wtzhou/jenkins-on-mesos.git

2. APP_USER:  marathon 会以用户 APP_USER 来部署 jenkins ,从而插件SCM-Sync-Configuration会以用户APP_USER来跟gitrepo进行同步。 所以在我们的这个例子里,我们让APP_USER=core。

3. MARATHON_PORTAL:  marathon 的 RESTapi 入口,例如:http://marathon.dataman.io:8080/v2/apps

接下来就可以执行命令:

来让 marathon 部署我们的 Jenkins Master 了。这样, 我们在 Jenkins Master 上所保存的任何配置,创建的任何job都会被SCM-Sync-Configuration同步到repo里,并在 Jenkins Master 被重新发布后 download 到本地。

关于SCM-Sync-Configuration的更多信息

SCM-Sync-Configuration初始化完成后(在我们环境里初始化过程会被自动触发),每次配置更新或者添加,编辑构建作业时,我们会得到一个提示页面来为新的 commit message 添加 comment,如下图所示:

当前,所支持的配置文件如下:

1. 构建作业的配置文件 (/jobs/*/config.xml)

2. 全局的 Jenkins/Hudson 系统配置文件 (/config.xml)

3. 基本的插件的配置文件 (/hudson*.xml, /scm-sync-configuration.xml)

4. 用户手动指定的配置文件

另外,我们可以在每一页的下面看到 scm sync config 的状态, 下图是同步出错时的截图,你可以去System Log查看具体的出错信息。

至此,我们又解决了Jenkins Master的数据持久化问题。到这里,我们就真正搭建完成了基于Jenkins, Apache Mesos和Marathon的弹性高可用的持续集成环境。

 

 

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326635128&siteId=291194637