Python continuous integration system A Continuous Integration System/continuous integration system

continuous integration system

Malini Das is a software engineer who is committed to improving coding speed (while ensuring code security, of course) and constantly looking for cross-programming solutions. She previously worked at Mozilla as a tools engineer, and now she works at Twitch. You can keep up with Malini 's latest news by following her on Twitter  or her blog .

What is a continuous integration system

In the process of software development, we need a way to ensure that every new function can be implemented stably and every bug can be fixed as expected. Generally speaking, this method is to test the code. In most cases, developers will test directly in the development environment to ensure that the functional implementation is complete and stable. Few people have time to test in every possible operating environment. Furthermore, as development continues, the number of tests required continues to increase, and the feasibility of fully testing the code in the development environment becomes less and less feasible. The emergence of continuous integration systems is precisely to solve this dilemma in development.

A continuous integration (CI) system is a system designed to test new code. When a new piece of code is submitted, the role of the continuous integration system is to ensure that the new code will not cause the failure of the previous test sample. To realize such a function, the continuous integration system is required to obtain the newly changed code, automatically complete the test, and generate a test report. At the same time, the continuous integration system also needs to ensure good stability. That is, when any part of the system fails or even crashes, the entire system should be able to resume operations from where it was last interrupted.

This system also needs the ability to balance load, so that when the time to submit a new version is shorter than the time to run the test, we can still ensure that the test results are obtained within a reasonable time. We can achieve this by distributing test cases to multiple threads and running them in parallel. In this project, a small and scalable minimalist distributed continuous integration system will be introduced.

Precautions and related instructions

In this project, Git is used as the code hosting system for testing. We will only call standard code management instructions, so if you are not familiar with Git operations, but are familiar with using other version control systems (VCS) like svn or Mercurial, then you can continue to follow the instructions below. Development tests.

Due to code length limitations and unit testing requirements, I simplified the test sample search mechanism. We will only run teststhe test examples in the folder named.

Generally speaking, the continuous integration system should monitor changes in the remote code hosting library. But for convenience, in our example, we choose to listen to local code library files instead of remote files.

Continuous integration systems do not have to be executed according to a fixed schedule. Of course, you can also set it to run automatically every time or several submissions. In our example, we set up CI to run periodically. In other words, if we set the CI system to run every 5 seconds, then every 5 seconds the system will test the latest submission within 5 seconds. No matter how many submissions occur within these 5 seconds, the system will only test the result of the last submission once.

CI systems are designed to listen for changes in the code base. CI systems used in practice can obtain commit information through notifications from the code base. For example, Github provides a special "commit hook". In this model, the CI system will be awakened by the server corresponding to the notification URL set in Github to respond accordingly. But this model was too complex in our local experimental environment, so we used the observer model. In this model, the system proactively detects code changes rather than waiting for notifications from the code management library.

The CI system also needs a report form (such as a web page), so that the person who triggered the test submits the test results to the CI result component, and participants in other projects can directly view the corresponding results.

Note that in our project, only one of many CI system frameworks is discussed. Within this framework, we simplify our project into three main components.

introduction

The most basic continuous integration system is divided into three parts: listener, test case scheduler, and test runner. First, the listener monitors the code base, and when a commit occurs, the listener notifies the scheduler. Afterwards, the sample scheduler will assign a test runner to complete the test corresponding to the submitted version number.

There are many ways to combine these three parts. We can run them all in the same thread on one computer. But in this way, our CI system will lack the ability to handle large loads. When many submissions bring a large amount of test content, this solution can easily cause a backlog of work. At the same time, the fault tolerance rate of this solution is very low. Once the computer running the system fails or loses power, there is no backup system to complete the interrupted work. We hope that our CI system should complete multiple testing tasks at the same time as much as possible according to needs, and have a good backup operation plan when the machine unexpectedly shuts down.

In order to build a CI system with strong load capacity and high fault tolerance, in this project, each of the above components is run as an independent process. Each process is completely independent of each other, and multiple instances of each thread can run simultaneously. This solution will bring great convenience when a lot of testing work needs to be carried out at the same time. We can run multiple instances of test runners on different threads at the same time, and each test runner works independently, which can effectively solve the problem of test queue backlog.

In this project, although these components run independently on separate threads, threads can communicate through sockets, so that we can run these processes separately on different hosts on the Internet. We will assign an address/port to each process so that each process can communicate with each other by sending messages to the assigned address.

Through a distributed architecture, we can handle hardware errors immediately when they occur. We can run the listener, test case scheduler, and test runner on different machines, and they can communicate with each other through the network. When a problem occurs with any of them, we can arrange for a new host to come online to run the problematic process. In this way, the system will have a very high fault tolerance rate.

In this project, there is no automatic recovery code. The functionality of automatic recovery depends on the architecture of the distributed system you are using. In actual use, CI systems usually run in distributed systems that support fault information transfer (for example, when a machine in the distributed system fails, the backup machine we set will automatically take over the interrupted work) middle.

In order to facilitate testing of our system, in this project we will manually trigger some processes locally to simulate a distributed environment.

Project file structure

The Python file structure of each component in the project is as follows: listener\newline( repo_observer.py), test sample scheduler( dispatcher.py), test runner\newline( test_runner.py). Each of the above threads communicates through sockets, and we put the code used to implement the communication function in helpers.py. This allows each component to directly import related functions from this file, without having to write this code repeatedly in each component.

In addition, we also used bash scripts. These scripts are used to perform some simple bash and git operations. It is more convenient to directly use bash scripts than to use system-level modules (such as os or subprocess) provided by Python.

Finally, we also created a testsdirectory to store the test samples we need the CI system to run. This directory contains two samples for testing, one of which simulates the situation when the sample passes, and the other simulates the situation when it fails.

default setting

Although our CI system is designed for distributed operation, in order to not be affected by network factors in the process of understanding the operating principles of the CI system, we will run all components on the same computer. Of course, if you want to try a distributed operating environment, you can also run each component on different hosts.

The continuous integration system triggers tests by monitoring code changes, so before starting we need to set up a code base for monitoring.

Let's call this project for testing  test_repo:

$ mkdir test_repo 
$ cd test_repo 
$ git init

The listener module monitors code updates by checking commits, so we need at least one commit to test the listener module.

Copy teststhe folder to test_repoand submit:

$ cp -r /this/directory/tests /path/to/test_repo/ 
$ cd /path/to/test\_repo 
$ git add tests/ 
$ git commit -m ”add tests”

We now have a commit ready for testing on the master branch of our testing repository.

The listener component requires a separate copy of the code to detect new commits. Let's make a copy of the code from the master branch and name it test_repo_clone_obs:

$ git clone /path/to/test_repo test_repo_clone_obs

The test runner also needs a copy of the code so that it can run the relevant tests when a commit occurs. We also make a copy of the code from the master branch and name it test_repo_clone_runner:

$ git clone /path/to/test_repo test_repo_clone_runner

components

Listener (repo_observer.py)

The listener's job is to listen for changes in the code base and notify the test case dispatcher when changes are found. In order to ensure that our CI system is compatible with various version control systems (not all VCSs have built-in notification systems), we set up the CI system to regularly check whether there are new submissions in the code base, instead of waiting for VCS to check the code base. Send notification when submitted.

The listener will periodically poll the code base, and when there is a new submission, the listener will push the version ID of the code that needs to be run to the distributor. The polling process of the listener is: first, get the current commit version in the listener's storage space; second, update the local library to this version; finally, compare this version with the latest commit ID of the remote library. In this way, when the current local version in the listener is inconsistent with the latest remote version, it is determined that a new submission has occurred. In our CI system, the listener will only push the most recent commit to the dispatcher. This means that if two commits occur within a single polling cycle, the listener will only run the test for the most recent one. Generally speaking, the CI system will run the corresponding test for each commit since the last update. But for the sake of simplicity, the CI system we built this time adopted a solution that only runs tests for the last submission.

The listener must know which code base it is listening to. We have already /path/to/test_repo_clone_obscreated a copy of the code for listening. Our listener will use this copy for detection. In order for the listener to use this copy, we repo_observer.pywill pass in the path to this copy of the code when calling. The listener will use this copy to pull the latest code from the main repository.

Similarly, we also need to provide the listener with the address of the test case dispatcher so that the messages pushed by the listener can be delivered to the dispatcher. When running a listener, you can --dispatcher-serverpass the allocator's address via a command line argument. If the address is not entered manually, the default address value of the allocator is: localhost:8888.

def poll():
    parser = argparse.ArgumentParser()
    parser.add_argument("--dispatcher-server",
                        help="dispatcher host:port, " \
                        "by default it uses localhost:8888",
                        default="localhost:8888",
                        action="store")
    parser.add_argument("repo", metavar="REPO", type=str,
                        help="path to the repository this will observe")
    args = parser.parse_args()
    dispatcher_host, dispatcher_port = args.dispatcher_server.split(":")

When running the listener script, it will be poll()run directly from start. This function will pass in the command line parameters and start an infinite while loop. This while loop will periodically check the code base for changes. The first thing done in this loop is to run Bash script update_repo.sh1 .

    while True:
        try:
            # call the bash script that will update the repo and check
            # for changes. If there's a change, it will drop a .commit_id file
            # with the latest commit in the current working directory
            subprocess.check_output(["./update_repo.sh", args.repo])
        except subprocess.CalledProcessError as e:
            raise Exception("Could not update and check repository. " +
                            "Reason: %s" % e.output)

update_repo.shUsed to identify new commits and notify listeners. It first records the current commit ID, then pulls the latest code, and then checks the latest commit ID. If the current version ID matches the latest one, it means that the code has not changed, so the listener will not respond. However, if the commit IDs are different, it means there is a new commit. At this time, update_repo.sha file called will be created .commit_idto record the latest price increase ID.

update_repo.shThe breakdown steps are as follows:

First, our script originates from a run_or_fail.shfile called. run_or_fail.shProvides some auxiliary functions for shell scripts. Through these functions we can run the specified script and output error messages when an error occurs.

#!/bin/bash

source run_or_fail.sh 

Next, our script will attempt to delete .commit_idthe file. Because repo_observer.pyit will be called continuously in a loop updaterepo.sh, if a file is generated in the last call .commit_idand the version ID stored in it has been tested in the last poll, it will cause confusion. So we will delete the last file first every time .commit_idto avoid confusion.

bash rm -f .commit_id 

After deleting the file (if the file already exists), the script will check whether the code base we are monitoring exists, and then .commit_idupdate it to the most recent submission to ensure .commit_idsynchronization between the file and the code base submission ID.

run_or_fail "Repository folder not found!" pushd $1 1> /dev/null
run_or_fail "Could not reset git" git reset --hard HEAD

After that, read the git log and parse out the last commit ID.

COMMIT=$(run_or_fail "Could not call 'git log' on repository" git log -n1)
if [ $? != 0 ]; then
  echo "Could not call 'git log' on repository"
  exit 1
fi
COMMIT_ID=`echo $COMMIT | awk '{ print $2 }'`

Next, pull the repository, get all recent changes, and get the latest commit ID.

run_or_fail "Could not pull from repository" git pull
COMMIT=$(run_or_fail "Could not call 'git log' on repository" git log -n1)
if [ $? != 0 ]; then
  echo "Could not call 'git log' on repository"
  exit 1
fi
NEW_COMMIT_ID=`echo $COMMIT | awk '{ print $2 }'`

Finally, if the newly obtained commit ID does not match the previous ID, we know that a new commit occurred between polls, so our script should store the new commit ID in the .commit_id file.

# if the id changed, then write it to a file
if [ $NEW_COMMIT_ID != $COMMIT_ID ]; then
  popd 1> /dev/null
  echo $NEW_COMMIT_ID > .commit_id
fi

repo_observer.pyAfter the script in it update_repo.shis run, the listener will check .commit_idwhether it exists. If the file exists, we know that a new commit has occurred since the last poll, and we need to notify the test case scheduler to start testing. The listener will check the running status of the scheduler service by connecting and sending a 'status' request to ensure that it is in a normal state and can accept instructions normally.

        if os.path.isfile(".commit_id"):
            try:
                response = helpers.communicate(dispatcher_host,
                                               int(dispatcher_port),
                                               "status")
            except socket.error as e:
                raise Exception("Could not communicate with dispatcher server: %s" % e)

If the scheduler returns an "OK", the listener reads .commit_idthe latest commit ID from the file and dispatch:<commit ID>sends the ID to the scheduler using the request. The listener will send commands every 5 seconds. If any errors occur, the listener will also retry every 5 seconds.

            if response == "OK":
                commit = ""
                with open(".commit_id", "r") as f:
                    commit = f.readline()
                response = helpers.communicate(dispatcher_host,
                                               int(dispatcher_port),
                                               "dispatch:%s" % commit)
                if response != "OK":
                    raise Exception("Could not dispatch the test: %s" %
                    response)
                print "dispatched!"
            else:
                raise Exception("Could not dispatch the test: %s" %
                response)
        time.sleep(5)

KeyboardInterrupt If you do not terminate the listener sending process using \newline  (Ctrl+c) or send a kill signal, the listener will repeat this operation forever.

Test sample dispatcher (dispatcher.py)

The test case allocator is a separate process used to allocate test tasks to the test runner. It will listen on a specified port for requests from the code base listener and test runner. Dispatchers allow test runners to actively register, and when the listener sends a commit ID, it assigns test work to an already registered test runner. At the same time, it can also smoothly handle various problems encountered by test runners. When a runner fails, it can immediately reassign the submission ID of the test run by the runner to a new test runner.

dispatch.pyThe script serveruns starting from the function. First, it will resolve the address and port of the distributor you set:

def serve():
    parser = argparse.ArgumentParser()
    parser.add_argument("--host",
                        help="dispatcher's host, by default it uses localhost",
                        default="localhost",
                        action="store")
    parser.add_argument("--port",
                        help="dispatcher's port, by default it uses 8888",
                        default=8888,
                        action="store")
    args = parser.parse_args()

Here we will start the allocator process and a runner_checkerfunction process, and a redistributefunction process.

    server = ThreadingTCPServer((args.host, int(args.port)), DispatcherHandler)
    print `serving on %s:%s` % (args.host, int(args.port))

    ...

    runner_heartbeat = threading.Thread(target=runner_checker, args=(server,))
    redistributor = threading.Thread(target=redistribute, args=(server,))
    try:
        runner_heartbeat.start()
        redistributor.start()
        # Activate the server; this will keep running until you
        # interrupt the program with Ctrl+C or Cmd+C
        server.serve_forever()
    except (KeyboardInterrupt, Exception):
        # if any exception occurs, kill the thread
        server.dead = True
        runner_heartbeat.join()
        redistributor.join()

runner_checkerThe function will periodically ping each registered runner to ensure that they are all working properly. If a runner becomes unresponsive, this function removes it from the registered runner pool and the commit ID previously assigned to it is reassigned to a newly available runner. The function pending_commitsrecords in a variable the ID of the commit that was affected by the runner becoming unresponsive.

    def runner_checker(server):
        def manage_commit_lists(runner):
            for commit, assigned_runner in server.dispatched_commits.iteritems():
                if assigned_runner == runner:
                    del server.dispatched_commits[commit]
                    server.pending_commits.append(commit)
                    break
            server.runners.remove(runner)
        while not server.dead:
            time.sleep(1)
            for runner in server.runners:
                s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
                try:
                    response = helpers.communicate(runner["host"],
                                                   int(runner["port"]),
                                                   "ping")
                    if response != "pong":
                        print "removing runner %s" % runner
                        manage_commit_lists(runner)
                except socket.error as e:
                    manage_commit_lists(runner)

redistributeUsed to pending_commitsreassign the commit ID recorded in the file. redistributeThe runtime will continuously check pending_commitsthe file, and once pending_commitsa commit ID is found, the function will call  dispatch_testsa method to assign the commit ID.

    def redistribute(server):
        while not server.dead:
            for commit in server.pending_commits:
                print "running redistribute"
                print server.pending_commits
                dispatch_tests(server, commit)
                time.sleep(5)

dispatch_testsFunction used to return an available runner from the registered runner pool. If a runner is available, the function sends a run test command with the commit ID. If there is currently no runner available, the function will repeat the above process after sleeping for 2 seconds. If the allocation is successful, the function dispatched_commitsrecords the commit ID and which runner the test for that commit ID is running in a variable. If the commit ID is in pending_commits, dispatch_teststhe function will remove the commit ID pending_commitsfrom it .

def dispatch_tests(server, commit_id):
    # NOTE: usually we don't run this forever
    while True:
        print "trying to dispatch to runners"
        for runner in server.runners:
            response = helpers.communicate(runner["host"],
                                           int(runner["port"]),
                                           "runtest:%s" % commit_id)
            if response == "OK":
                print "adding id %s" % commit_id
                server.dispatched_commits[commit_id] = runner
                if commit_id in server.pending_commits:
                    server.pending_commits.remove(commit_id)
                return
        time.sleep(2)

SocketServerThe dispatcher service uses a very simple web server module from the standard library . SocketServerThere are four basic server types in the module: TCPUDPUnixStreamServerand UnixDatagramServer. In order to ensure that our data transmission is continuous and stable, we use sockets based on the TCP protocol (UPD does not guarantee the stability and continuity of data).

SocketServerThe default provided in only TCPServersupports maintaining at most one session at the same time. So once the dispatcher establishes a session with a runner, it can no longer establish a connection with the listener. At this point the session from the listener can only wait for the first session to complete and disconnect before establishing a connection to the dispatcher. This is not very ideal for our project, where we envision the dispatcher communicating directly and quickly with all runners and listeners simultaneously.

In order for our dispatcher to maintain multiple connections at the same time, we use a custom class to add multi-threading capabilities ThreadingTCPServerto the default class. SocketServerThat is to say, whenever the dispatcher receives a connection request, it will create a new process to handle the session. This makes it possible for the distributor to maintain multiple connections simultaneously.

class ThreadingTCPServer(SocketServer.ThreadingMixIn, SocketServer.TCPServer):
    runners = [] # Keeps track of test runner pool
    dead = False # Indicate to other threads that we are no longer running
    dispatched_commits = {} # Keeps track of commits we dispatched
    pending_commits = [] # Keeps track of commits we have yet to dispatch

The dispatcher defines a handler function for each request. Processing methods are defined in classes that inherit from SocketServerand BaseRequestHandlertwo classes . DispatcherHandlerThe base class requires us to define a function that can handle link requests at any time. We write the custom content of this function in DispatcherHandlerand ensure that this function can be called every time a request occurs. This function will continuously monitor the incoming request ( self.requestit will carry the request information) and parse the instructions in the request.

class DispatcherHandler(SocketServer.BaseRequestHandler):
    """
    The RequestHandler class for our dispatcher.
    This will dispatch test runners against the incoming commit
    and handle their requests and test results
    """
    command_re = re.compile(r"(\w+)(:.+)*")
    BUF_SIZE = 1024
    def handle(self):
        self.data = self.request.recv(self.BUF_SIZE).strip()
        command_groups = self.command_re.match(self.data)
        if not command_groups:
            self.request.sendall("Invalid command")
            return
        command = command_groups.group(1)

This function can handle the following instructions: statusregisterdispatch, and  results. The statusfunction is used to detect whether the distributor service is running.

        if command == "status":
            print "in status"
            self.request.sendall("OK")

In order for the allocator functionality to take effect, we need to register at least one runner. When the registrar is called, in order to ensure that the corresponding runner can be accurately found when the submission ID needs to be sent to trigger the test, the "address:port" data of the runner will be saved in a list (the runner's data will be saved in an ThreadingTCPServerobject called).

        elif command == "register":
            # Add this test runner to our pool
            print "register"
            address = command_groups.group(2)
            host, port = re.findall(r":(\w*)", address)
            runner = {"host": host, "port":port}
            self.server.runners.append(runner)
            self.request.sendall("OK")

dispatch is used by the repository observer to dispatch a test runner against a commit. The format of this command is dispatch:<commit ID>. The dispatcher parses out the commit ID from this message and sends it to the test runner.

dispatch

        elif command == "dispatch":
            print "going to dispatch"
            commit_id = command_groups.group(2)[1:]
            if not self.server.runners:
                self.request.sendall("No runners are registered")
            else:
                # The coordinator can trust us to dispatch the test
                self.request.sendall("OK")
                dispatch_tests(self.server, commit_id)

resultsThe command will be called by the test runner when reporting test results. The usage of this command is results:<commit ID>:<length of results data in bytes>:<results> . <commit ID>Used to identify the submission ID corresponding to the test report. <length of results data in bytes>How large of a buffer is needed to calculate the resulting data usage. Finally, <results>in is the actual reported information.

        elif command == "results":
            print "got test results"
            results = command_groups.group(2)[1:]
            results = results.split(":")
            commit_id = results[0]
            length_msg = int(results[1])
            # 3 is the number of ":" in the sent command
            remaining_buffer = self.BUF_SIZE - \
                (len(command) + len(commit_id) + len(results[1]) + 3)
            if length_msg > remaining_buffer:
                self.data += self.request.recv(length_msg - remaining_buffer).strip()
            del self.server.dispatched_commits[commit_id]
            if not os.path.exists("test_results"):
                os.makedirs("test_results")
            with open("test_results/%s" % commit_id, "w") as f:
                data = self.data.split(":")[3:]
                data = "\n".join(data)
                f.write(data)
            self.request.sendall("OK")

Test runner ( test_runner.py )

The test runner is responsible for running tests for a given commit ID and reporting the test results. It will only communicate with the dispatcher, which is responsible for providing it with the commit ID that needs to run the test, and will receive the test result report.

test_runner.pyThe file will start with the function that starts the test runner service serveas the entry point, and start a thread to run dispatcher_checkerthe function. Since this startup process is very similar to that of repo_observer.pyand dispatcher.pywe won't go into details here.

dispatcher_checkerThe function pings the allocator every five seconds to make sure it's still running properly. This operation is mainly due to resource management considerations. If the corresponding allocator hangs, the test runner needs to be shut down as well. Otherwise, the test runner can only run in vain, and cannot receive new tasks or submit previous tasks to generate reports.

    def dispatcher_checker(server):
        while not server.dead:
            time.sleep(5)
            if (time.time() - server.last_communication) > 10:
                try:
                    response = helpers.communicate(
                                       server.dispatcher_server["host"],
                                       int(server.dispatcher_server["port"]),
                                       "status")
                    if response != "OK":
                        print "Dispatcher is no longer functional"
                        server.shutdown()
                        return
                except socket.error as e:
                    print "Can't communicate with dispatcher: %s" % e
                    server.shutdown()
                    return

The test runner serves the same purpose as the allocator ThreadingTCPServer . It needs to be run in multiple threads because the allocator will both send it commit IDs and possibly ping it during the test run.

class ThreadingTCPServer(SocketServer.ThreadingMixIn, SocketServer.TCPServer):
    dispatcher_server = None # Holds the dispatcher server host/port information
    last_communication = None # Keeps track of last communication from dispatcher
    busy = False # Status flag
    dead = False # Status flag

The entire communication flow starts with the dispatcher sending the commit ID of the test that needs to be run to the test runner. If the test runner is in a state where it can run the test, it sends a confirmation message back to the dispatcher and then closes the first connection. In order for the test runner to accept other requests from the dispatcher while running tests, it starts a separate process to run the tests.

This way, when the dispatcher sends a request (such as a ping request) while the test runner is running tests, the test runner's tests are running in another process, and the runner service itself can still respond. This way the test runner can support running multiple tasks at the same time. An alternative to multi-threaded execution is to establish a long-lived connection between the dispatcher and the test runner. But this will consume a lot of memory on the allocator side to maintain the connection. In addition, this method also has strong dependence on the network. If the network fluctuates (such as a sudden disconnection), it will cause damage to the system.

The test runner receives two messages from the dispatcher. The first is pinga message that the dispatcher uses to verify that the test runner is still active.

class TestHandler(SocketServer.BaseRequestHandler):
    ...

    def handle(self):
        ....
        if command == "ping":
            print "pinged"
            self.server.last_communication = time.time()
            self.request.sendall("pong")

Another thing is runtest, its format is runtest:<commit ID> . This command is used by the allocator to issue the commit ID that needs to be tested. When a runtest is received, the test runner will check if there are currently any tests running. If so, it will give BUSYthe response returned by the allocator. If not, it returns OK, sets its status to busy and runs its run_testsfunction.

        elif command == "runtest":
            print "got runtest command: am I busy? %s" % self.server.busy
            if self.server.busy:
                self.request.sendall("BUSY")
            else:
                self.request.sendall("OK")
                print "running"
                commit_id = command_groups.group(2)[1:]
                self.server.busy = True
                self.run_tests(commit_id,
                               self.server.repo_folder)
                self.server.busy = False

This function calls a test_runner_script.sh shell script called , which updates the code to the given commit ID. After the script returns, if the code base has been successfully updated, the runner will run the tests using unittest and collect the results into a file. After the test has finished running, the test runner reads in the results report file and sends the report to the scheduler.

    def run_tests(self, commit_id, repo_folder):
        # update repo
        output = subprocess.check_output(["./test_runner_script.sh",
                                        repo_folder, commit_id])
        print output
        # run the tests
        test_folder = os.path.join(repo_folder, "tests")
        suite = unittest.TestLoader().discover(test_folder)
        result_file = open("results", "w")
        unittest.TextTestRunner(result_file).run(suite)
        result_file.close()
        result_file = open("results", "r")
        # give the dispatcher the results
        output = result_file.read()
        helpers.communicate(self.server.dispatcher_server["host"],
                            int(self.server.dispatcher_server["port"]),
                            "results:%s:%s:%s" % (commit_id, len(output), output))

This is test_runner_script.shthe content:

#!/bin/bash
REPO=$1
COMMIT=$2
source run_or_fail.sh
run_or_fail "Repository folder not found" pushd "$REPO" 1> /dev/null
run_or_fail "Could not clean repository" git clean -d -f -x
run_or_fail "Could not call git pull" git pull
run_or_fail "Could not update to given commit hash" git reset --hard "$COMMIT"

To run test_runner.py , it must be pointed to a copy of the repository. You can use the copy we created earlier /path/to/test_repo test_repo_clone_runneras startup parameters. By default,  test_runner.pyit will start on ports 8900-9000 on localhost and try to connect to localhost:8888the scheduler server on localhost. You can change these values ​​through some optional parameters. --hostThe and --portparameters are used to specify the address and port of the server to run the test runner, and  --dispatcher-serverthe parameters specify the address of the scheduler.

control flow chart

The figure below is an overview of the system. The diagram assumes that all three files (  repo_observer.py ,  dispatcher.pyand test_runner.py ) are already running and describes the actions taken by each process when a new commit occurs.

run code

We can run this simple CI system locally, using a different terminal shell for each process. We first start the dispatcher, which runs on port 8888 by default:

$ python dispatcher.py

Opening a new shell, we start the test runner (so it can be registered with the dispatcher):

$ python test_runner.py <path/to/test_repo_clone_runner>

The test runner will automatically assign itself a port in the range 8900-9000. You can create as many test runners as needed.

Finally, in another new shell, let's start the repository listener:

$ python repo_observer.py --dispatcher-server=localhost:8888 <path/to/repo_clone_obs>

Now that everything is ready, let's trigger some tests and play around! By design we need to create a new commit to trigger the test. Switch to your main code repository and change whatever you want:

$ cd /path/to/test_repo
$ touch new_file
$ git add new_file
$ git commit -m"new file" new_file

It then repo_observer.pyrecognizes that a new commit has been made, and then notifies the allocator. You can view their running logs in their respective shell windows. When the dispatcher receives test results, it stores them in test_results/a folder within this repository, using the commit ID as the file name.

Error handling

Some simple error handling is included in this CI system.

If you test_runner.pykill the process,  dispatcher.pyyou will be sure that the runner will recognize that the node is no longer active and remove it from the runner pool.

You can also simulate a network or system failure and kill the test runner while it is executing tests. At this time, the allocator will recognize that the runner has hung up, it will remove the hung runner from the runner pool, and allocate the tasks that the runner was previously executing to other runners in the pool.

If you kill the allocator, the listener will report an error directly. The test runner will also notice that the allocator is no longer running and shut down automatically.

in conclusion

By analyzing the different functions in each process one by one, we have some basic understanding of building a distributed continuous integration system. Through socket requests to achieve inter-process communication, our CI system can be distributed and run on different machines, which enhances the reliability and scalability of our system.

The current function of this CI system is still very simple, and you can also use your own talents to extend it in various ways to achieve more functions. Here are some suggestions for improvements:

Automatically run tests on every commit

The current system will periodically check for new commits and run tests on the most recent commit. This design can be changed to trigger the test on every commit. You can do this by modifying the periodic checker to fetch all commits that occurred between polls.

Smarter runner

If the test runner detects that the allocator is unresponsive, it will stop running. The test runner also shuts down immediately while it is running the tests! It might be better if the test runner could have a waiting period or a long run (if you don't care about its resource usage) while waiting for the allocator to come back. This way when the allocator is restored, the runner can send reports back to the allocator for previously executed tests. This avoids duplication of tasks caused by allocator failures, which will significantly save runner resources when testing for each submission.

Report display

In a real CI system, test reports are typically sent to a separate reporting service. In the reporting system, people can view report details or set some notification rules to notify relevant personnel in the event of a failure or other special circumstances. You can create a separate reporting process for my CI system that replaces the report collection function of the dispatcher. This new process can be a Web service (or linked to a Web service), so that we can view the test report online directly on the web page, or even use an email server to provide reminders when the test fails.

test runner manager

In the current system, we have to manually run test_runner.pythe file to start the test runner. You can create a test runner manager process that manages the load on all runners and requests from the allocator, and adjusts the number of runners accordingly. This process will accept all test tasks, start the test runner according to the task, and reduce the number of runner instances when there are few tasks.

By following these suggestions, you can make this simple CI system more robust and fault-tolerant, with the ability to integrate with other systems (such as a web-based report viewer).

If you want to understand what kind of flexibility current continuous integration systems can achieve, I suggest you take a look at Jenkins  , which is a very powerful open source CI system written in Java. It provides a basic CI system while also allowing extensions using plugins. You can access its source code via GitHub . Another recommended project is Travis CI  , which is written in Ruby and its source code is also available through GitHub .

This is an attempt to understand how CI systems work and how to build a CI system yourself. You should now have a deeper understanding of what is required to make a reliable distributed system, and hopefully you can use this knowledge to develop more complex solutions.

Guess you like

Origin blog.csdn.net/xiaoshun007/article/details/133433507