How to improve reporting and monitoring of test automation results

Currently, with dozens of releases per day of our main product, it becomes even more important to find out why end-to-end tests are failing. We struggle with test defects (that is, failures that have nothing to do with actual defects in the application under test) every day, and the average success rate for Selenium-based tests is over 99%. This means that on a stable master branch running overnight, less than 1% of tests fail, indicating that occasional failures are still inevitable and can create uncertainty. This is a well-known UI automation issue and can happen frequently, especially in staging environments where we don't simulate any backend services.

We believe the best answer is to enable every QA engineer or developer to easily and quickly identify whether a specific test failure is only related to the branch/PR where the failure was detected. Therefore, good reporting tools and dashboards are essential that provide a comprehensive view of test execution across all repository workflows.

Current settings

Our updated end-to-end test execution, reporting and monitoring settings include:

  • GitHub Actions (GHA) workflow

  • Customize the runner for testing on Google Cloud (GCP)

  • Cluecumber Maven plugin for generating test reports

  • Google Cloud Storage (GCS) bucket for storing test reports

  • Test framework plugin to generate Kafka log entries from Cucumber JSON report files

  • Kafka message broker

  • Logstash for processing Kafka messages

  • Elasticsearch for data storage

  • Kibana for data visualization and exploration

  • Grafana and Slack for alerting on recurring test failures

Test report cloud storage

We were previously using an internal Jenkins instance for test execution and the reports generated by the Cluecumber plugin (html pages) could be attached as artifacts to each running job and Jenkins would also serve them as web pages when accessed in a browser.

Now, with test execution triggered via GitHub Actions and test jobs executed in a custom runner on GCP, we need to find a solution to store and access test reports. Attaching them as compressed artifacts to each GHA workflow run was the first solution, but it was inconvenient as it required downloading and decompressing each report and also made any direct link to the dashboard more complex.

We considered using what was already in the trivago technology stack and specifically chose to leverage Google Cloud Storage, uploading the test reports to a GCS bucket and serving them as a web page via the gcs-proxy application that supports file lists, inspired by https ://GitHub.com/springernature/gcs-proxy. This is how we created a "Test Report Server" to suit our needs.

We previously used an internal Jenkins instance to execute the tests, and the reports (html pages) generated by the Cluecumber plugin could be attached as artifacts to each job run, and Jenkins would also serve them as web pages when accessed in a browser.

Now that we trigger test execution via GitHub Actions and execute test jobs in a custom runner on GCP, we need to find a solution to store and access test reports. The first solution is to attach them as compressed artifacts to each GHA workflow run, but this is inconvenient because each report needs to be downloaded and unpacked, and it also complicates direct linking from the dashboard.

We considered using what was already in the trivago technology stack and specifically chose to leverage Google Cloud Storage by uploading the test report to a Google Cloud Storage bucket and making it available as a web page via the gcs-proxy application that supports file lists. The program is inspired by https://GitHub.com/springernature/gcs-proxy. This is how we created a "Test Report Server" to suit our needs.

After the test execution is completed, the current test process includes:

  • Logging test results (from Cucumber JSON file) to Kafka

  • Use the Cluecumber plug-in to generate html page test reports with attachments

  • Upload the generated test report folder to the "Test Report Server" GCS bucket

  • Share links to test reports as GitHub status badges, PR comments, Slack messages, etc., as your workflow requires.

The upload step is simple and requires gsutil to run the command in a GitHub Actions job:

name: Upload test report to the GCS Bucket
id: uploadTestReport
if: ${
   
   { always() }}
run: |
  gcloud config set auth/impersonate_service_account [email protected]
  gsutil -m rsync -r "${
   
   { env.REPORT_LOCAL_PATH }}" "${
   
   { env.DESTINATION_BUCKET }}/${
   
   { GitHub.repository }}/${
   
   { GitHub.workflow }}/${
   
   { GitHub.run_id }}"

We decided to organize the report by repository, workflow, and run ID. This has proven to be very effective and can be easily scaled as more teams within the company start using our test reporting server in their projects, storing end-to-end and API automation test reports as well as Webpack bundle analyzer reports.

Kibana Japanese Grafana

Since logging into Kafka includes a test report link, not only can we view the failure messages already in the Kibana dashboard, but we can easily get the full test report with just one click, with immediate access to step execution details, screenshots, and videos Record.

picture

picture

A different set of panels and filters allows each of our QA engineers and developers to quickly focus on a certain branch or test results. We currently have a master template of the dashboard that is reusable for different projects and testing workflows, but the availability of multiple fields with detailed data also allows each team to build their own dashboard with relevant information and filtering .

picture

A recent further addition to the existing setup is leveraging Grafana to trigger notifications to the Slack channel when a single test scenario fails across multiple branches/PRs. This way we avoid actively scanning for flaky tests during the day, and it supports our defect detection jobs that run overnight on the application's master branch.

picture

The Grafana query looks like this:

    doctype: "regression_tests_warp" AND (project: "warp_core_mobile" OR project: "warp_core_desktop") AND status: FAILED
We perform aggregation by scenario name and apply a given threshold to the unique count of the base URL to determine the number of different preview machines that failed. Since the base URL is associated with the PR/branch name, it provides an easy way to identify whether the fault is specific to a specific branch or widespread.

picture

If the query conditions are met, an alert notification is sent as a Slack message with an indication of the failed test and a quick link to a more detailed Kibana dashboard.

picture

in conclusion

The original approach to test results reporting and monitoring had always relied on Kafka and the ELK stack, and as we continued to iterate on this approach we managed to significantly increase trust in test automation by quickly providing extensive feedback while also adding Grafana alert function. At the same time, the Cloud Test Report Storage project has also proven to be a valuable resource for other teams adopting the project, because it is not tied to the specific test type, test framework or reporting tool used.

Finally: The complete software testing video tutorial below has been compiled and uploaded. Friends who need it can get it by themselves [guaranteed 100% free]

Software Testing Interview Document

We must study to find a high-paying job. The following interview questions are from the latest interview materials from first-tier Internet companies such as Alibaba, Tencent, Byte, etc., and some Byte bosses have given authoritative answers. After finishing this set I believe everyone can find a satisfactory job based on the interview information.

Guess you like

Origin blog.csdn.net/AI_Green/article/details/132887911