Author: Xiaoxia Ant Digital Financial Line Technical Team

Ant Group's 17-year senior tester's career stage summary, from the product form, R&D model, organizational perspective to look at testing work and testing technology, to the individual perspective to look at the structure of testing capabilities, and discuss personal growth. The 40,000-character long text is full of dry goods. The editor sincerely hopes that everyone can find a fixed time to read it carefully without being disturbed. I believe that these real technical growth experiences can give you some useful inspiration.

written in front

When I was still in school, I liked a cartoon: "About Going to Work" - Zhu Deyong, so, in the first year after I graduated and joined the company, my boss asked me to do the first ppt, and I was very happy I wrote a big title: "About Testing", which was later ordered by the boss to be changed to "Sharing xx Project Test Experience".

Time flies, last week just passed the 6th anniversary of joining Ant, and 10 days away is the 17th anniversary of my practice in testing. I can finally use this topic to connect my work experience, experience, and thinking over the years and share it with everyone.

If you have been in the testing and quality industry for several years, and have such questions or confusions:

What should I do if I have been working on projects and feel that there is no accumulation?
I want to do something new, but don't know what to do?
What does quality career development planning look like? How to get promoted to the next level?
Why is the quality team divided and combined?
What should I do if my growth is not obvious this year? Do you want to change the team, or change the position?

I hope these sharing can give you some perspectives or ideas, and see the problem more clearly through the surface.

This article is divided into six parts. The first three parts look at the testing work and testing technology from the product form and R&D model. The next four and five parts look at the testing work and testing organization from the perspective of the organization. The sixth part looks at the testing capability structure from the individual point of view. Briefly discuss personal growth.

1. The product form determines the testing work

Don't look at the version for too long: Different product forms lead to different testing priorities, and testing methods and tools have also evolved and developed accordingly. The traditional engineering product testing methodology has been relatively perfect, and for new businesses, a comprehensive and effective basic testing system can be quickly established by referring to existing methods and tools. The focus of testing work usually falls on the special testing technology for business characteristics. In addition to engineering products, the testing of intelligent services such as data algorithm models has become a hot spot in the testing industry in recent years and has huge room for development.

"The so-called testing is to check whether the product meets the requirements as the goal, and find product bugs by running the product."

Over the years, the object of testing—the product—has been evolving. Let’s first analyze the similarities and differences of the testing work content based on some product forms that have emerged during the product evolution process.

1. Stand-alone software

The earliest software testing books started from the testing of stand-alone software, and the most typical ones are Microsoft's office series products. Due to the characteristics of stand-alone software installation and the very low frequency of software updates at that time, in addition to conventional function and performance testing, the testing work also needs to focus on installation package testing, system compatibility testing, and very important user feedback. The reproduction of the bug. Design test cases according to prd, simulate user operations, and verify functions through interface interaction. This is also the reason why the test is called a little bit.

2. Web services

The product I started testing in 2005 was search, a typical web service.

Web service tests can be divided into front-end and back-end tests. Front-end page tests can still simulate user operations for use case execution. Back-end tests do not have a user interface, so it is necessary to develop test tools to assist use case initiation and result query. At this time, the concept of interface testing began to emerge, emphasizing the testability of back-end applications.

At the same time, compared with the clear functions of stand-alone software, the realization of the search strategy of the search product is more important. The use case design cannot be limited to prd, but the realization of the internal logic of the system must be considered comprehensively, such as the selection of search query (test data), and the capture and sorting The verification of the strategy and the design of the test case should be based on the comprehensive input of the prd and the department.

Other more functional web services, such as forums, download sites, and portal browsing, can be relatively easily tested for most use cases through the front-end page, as well as compatibility tests for mainstream browsers. Front-end page recording and playback began to develop.

The performance test begins to differentiate between the front and back ends. The back end emphasizes application processing speed and hardware occupancy indicators, and the front end emphasizes page response speed.

3. Client software

Typical products: social software pc version, music download software pc version, input method pc version. The client is installed and deployed on the user's computer, which is somewhat similar to stand-alone software. This part of the test should take into account the differences in the user environment and improve the simulation coverage as much as possible. As above, due to the connectivity between the client and the server, the client logs can be automatically reflowed to assist Monitoring and positioning is much more convenient than bug reproduction in the stand-alone era.

The server still uses interface testing as the main method, and interface automation has gradually developed.

Performance testing continues to be refined, and multi-dimensional index evaluations are required for each user function.

4. Mobile apps

The rise of mobile app was around the beginning of 2010, and its essence is similar to that of client software. However, because the test environment changed from PC to mobile phone, automated testing tools based on interface operation recording and playback ushered in a wave of refreshes at that time. The channel release and update features of mobile apps have brought about the technical development of grayscale release and grayscale update. On the one hand, the user behavior log assists in problem discovery and positioning, and on the other hand, it is also used as a reference for product iteration. The log return needs to consider the user's mobile phone traffic usage. The standard method is to wait for the user to connect to Wi-Fi before returning.

Compatibility should consider top20-100 models.

In the era of mobile Internet, performance testing has become very important. Obviously, users are not as patient on mobile phones as they are on PCs. And pay attention to the specific indicators of mobile phone scenarios such as power, traffic, heat generation, and weak network.

5. Hardware

Around 2014, smart hardware broke out, and by chance, I also got involved in a little hardware testing: TV boxes. The essence is similar to app, with the difference between firmware and software. At the same time, it is necessary to consider the quality of hardware delivery, and can cooperate with manufacturers to ensure batch quality.

6. Mobile web services

Compatibility is changed from the browser version coverage of PC to the combination coverage of mobile browser and mobile phone model. With the rise of H5, the page is no longer limited to display in the browser.

The performance test around the page is more refined. For example, the indicators of page opening include: first character display, page frame display, first screen display, all display, etc., all for the user's fast experience.

7. International business

Note: This classification is not orthogonal to the above classification. All of the above product forms may be internationalized business. If you only do product business for a specific country, build in a localized environment, localized network simulation, time zone logic verification, localization Copywriting verification, localization regulatory compliance and other aspects need to be focused on.

If the internationalized business is oriented to multiple countries and multiple languages, that is to say, a product must support at least two sets of languages at the same time, and may expand to support more languages at any time. In product design and development, it is necessary to focus on language-independent and language-related functional architecture design, and in the testing process, it is necessary to support internationalization, localization, multi-version and multi-language fast concurrent testing. Developing and changing one line of code, and testing and verifying 10 language versions will become a nightmare for testing efficiency. The opposite of a nightmare is an opportunity. There is a lot of room for how to improve efficiency from the perspectives of architecture design, case automation, and case language logic separation.

8. Microservice distributed architecture

The essence is server-side testing, interface testing, integrated joint debugging, and system testing are still applicable, but due to the characteristics of distributed architecture and the surge in the number of microservice applications/interfaces involved in single-user functions, some specific problems and solutions have been brought about: General Mock improves the efficiency of mock development, the stability of the test environment must be continuously managed, the exception use cases of interface/asynchronous calls are covered, the full link performance test is done online, and the surge in interface automation costs has brought about the development of interface test recording and playback.

9. Cloud products/services

Full access to the cloud! ! ! The cloud migration of the tested object brings the cloud migration of the test process, as well as the cloud migration of test tools and test platforms.

If a cloud product is deployed on a single cloud, the difference from traditional testing mainly lies in environment-related work (test environment, joint debugging environment, performance test environment, test tool platform on the cloud, etc.), if a cloud product is deployed on a multi-version cloud Then, cloud integration testing, multi-cloud adaptation, customer version management, deployment solution verification, disaster recovery and multi-active, version management, problem recurrence positioning, etc. need to be considered.

The product form of the cloud business has not changed much, and the business model behind it, such as saas and toB, will have a greater impact on the test.

10. Big data/algorithm model/intelligent business

At the end of 2012, there was a classic book "The Era of Big Data", which clearly pointed out that the biggest change in the era of big data is to weaken causality and pay more attention to correlation. The changes brought about by the product are: first, there must be a lot of data, and second, based on big data, intelligent methods such as machine learning are used to find the correlation between data and results. logic) to provide services externally. The products I have been exposed to include: intelligent search, personalized recommendation, voice and image recognition, intelligent dialogue, etc.

This kind of product form has brought a very big impact on the test: because the internal logic of the object under test is no longer clearly visible, the test case has completely failed to follow the idea of realizing logic coverage. Similarly, even if it is changed to data coverage, it is difficult to divide the data into equivalence classes, and the code coverage rate loses its role in measuring the node combination of the neural network. The only way to fight it is to test it with big data. For example, divide the data into two parts, one as the training/validation set and the other as the test set, but in the actual development process, the testing process is very similar to the training and verification process, and there is no need to hand over the testing process to the testing team for execution , In addition, due to the high tolerance of customer quality, many smart businesses can directly conduct bucket verification/evaluation/iteration online, and the entire R&D process can easily skip the testing team. (So some smart businesses really don't have a testing team...)

Of course, testing can still do a lot of things.

1) Separate the data part for data quality assurance.

There is big data behind the intelligent business, and there are also pure data services: such as massive data such as maps, such as documents delivered to organizations in the form of data, the output data of these pure data services, the input data of intelligent services, and the output data can be used as Data-type test objects are used for comprehensive quality assurance. Compared with the use-case-centered and object-oriented running process tests on the server, client, and app sides, the data field is more inclined to test results (data). I conclude that the reasons are as follows: First, some data is collected/purchased data, and the processing process is not within the scope of the test. Second, even if the data processing process is within the scope of the test, it is too costly to do use case design for data processing logic and accurate coverage of data structure, so it is better to directly test the result data. 3. The output results of data services are naturally different each time. Compared with application services that can run for a long time after one test before release, data services need to test coverage/quality assurance for each batch of data results in the running state (one test is not enough works).

Data quality methodology and tool services have flourished in the Ali system in recent years, including but not limited to: data rule verification, data inspection, data monitoring, data attack and defense, data link stability guarantee, etc. Based on the characteristics of big data, a wave of intelligent testing solutions has also been born, such as automatic derivation of verification, automatic attack and defense, etc.

2) The business effect evaluation of intelligent business is a large field. Different intelligent business forms have different evaluation and analysis angles. This is a test of testers' ability to understand business, data, and algorithm implementation. First define the evaluation dimensions, indicators, and caliber, then delineate the evaluation data range for evaluation and analysis, and conduct further attribution analysis for indicators that do not meet expectations. Finally, the entire process is precipitated into a tool platform, which can also be directly integrated into the R&D process.

3) Combining intelligent business research and development features to improve efficiency, such as data set labeling/management/evaluation platform, badcase attribution analysis tool, release process stuck point capability, etc. This part of the work will play a big role in the early stage of smart business development.

4) The most challenging thing is the test of the algorithm model itself. Under the constraints of cost performance, no mature and complete methodology has been seen. A more realistic approach is to solve local problems based on business characteristics and balance input and output. My team is trying metamorphosis testing, badcase attribution analysis, etc.

The smart business has been developed for more than ten years, and is currently growing at a high speed, covering more and more industries, businesses, and products. In the most optimistic future, almost all businesses will evolve into smart businesses (just like almost all industries can Call it Internet+). Therefore, there will be a wave of technological upgrades and innovations in quality assurance for smart services, which will drive the development of the testing industry. My team is also working hard to explore this direction. Those who are interested are welcome to email and call to discuss on DingTalk.

The above lists some product forms and testing work I have been in contact with. It can be seen that the product form determines the content of the testing work, and then determines the testing technology. The industry maturity of the product form also determines the maturity of the industry testing technology/tools. When launching a new business test, it is necessary to use the industry's/company's mature testing tools to save organizational costs and use valuable testing manpower for testing specific to business characteristics. Businesses with more mature product forms need to quickly establish an efficient initial testing system, while those with novel product forms must strengthen pre-research on testing technology to make full use of potential development space.

2. Requirements of R&D mode for testing work

It’s too long to read the version: the R&D model puts forward more specific requirements for the testing work. The more agile the R&D model, the higher the requirements for the technical accumulation of the testing work. In the end, the testing work must be connected in series through automation and continuous integration.

In addition to the product form, another decisive influence on the testing work is the R&D mode.

In 2009-11, there was a clear wave of agile development in the R&D model: from large version waterfall development to continuous integration/agile development, but as the agile mania gradually receded, many teams returned to rationality and did not pursue extreme agility (daily n releases) as the goal, but iterative balance driven by business demands (weekly/biweekly release + monthly large-scale project release).

Here we do not comment on the good or bad applicability of the R&D model, but only analyze the impact of the R&D model on testing.

Let me first define the difference between what I mean by waterfall and agile development:

In the Great Waterfall mode, the R&D process and the testing process are independent, and the overall testing is carried out after all requirements are realized. The test process includes complete requirements review, system review, test score review, interface test, integration test, system test, pre-release verification and other links, and the overall release will be released after all the test processes are completed.

In agile development (taking weekly release as an example), all requirements are divided into n stories (user stories), and the development is implemented according to the story. After each story is completed, it will be tested. After the story test is completed, the story can be released directly. When testing the current story, the development is already developing the next story, thus forming a staggered pipeline, and the cycle from requirement proposal to release and launch of each story can be shortened to the weekly level. Since the development process and the testing process are staggered and parallel, the cycle for all requirements to be released and launched = the time to develop all stories + the time to test a story.

Using the analogy of making a three-layer cake, the big waterfall is to make the whole cake (large version) layer by layer, and submit it to the test for testing. Therefore, there is an obvious test as the boundary in the development and testing process. Agile development is divided into many small cakes (story), one is tested after one is completed, and one is released after testing. When the test is testing the current small cake, the development is already developing the next small cake, forming a staggered pipeline mechanism. When all the small cakes are finally developed, the test only needs to test the last small cake to complete the release of all cakes.

Sounds like a great idea for being agile, right?

In fact, in terms of story and architecture, agile implementation is very difficult. Can the requirements be split into independent stories, and each story is about the same size (in order to ensure the staggered development and testing)? In terms of architecture, can the three-layer cakes be realized one by one, and the decoupling of each three-layer cake can be achieved? How to avoid the effect of bottom cake modification on all top cakes?

Assuming that the above problems can be solved, the biggest impact on testing lies in how to ensure the controllability of each story test cycle. The testing process includes new features, regressions, performance, etc. For mature products, the set of regression use cases can be very large. Bugs found during testing also have indeterminate impact. Then the answer is ready to come out: the automation of the whole test process is the strongest guarantee for the cycle. Through automation, the cycle is changed from the uncertain time of humans to the predictable and accelerated machine running time. The more complete the automation covers the process, the higher the cycle certainty is. The cost of automated development/maintenance itself should be hidden in the running time of these automations, preferably in the preparation time before each story is tested.

Having said so much, the conclusion is actually very simple: the more agile the release, the faster the test is required, the higher the degree of automation is required, and the higher the automation capability of the team is required.

Perhaps this is the reason why Internet testers are more technical than traditional software testers. The Internet's rapid trial and error, small steps, and user iteration ideas require Internet release efficiency and promote the development of testing technology. For traditional software companies, including some traditional financial institutions, testers can also use a small amount of automation + bit by bit to do test delivery in a version cycle of several months.

I have been in contact with extremely agile (multiple releases per day) teams, and the automated testing technology has developed to the extreme: the coverage of automated use cases is extremely high (>90%), and the automation runs ultra-fast (hours or even minutes). This promotes technical practices such as automated use cases, code, and configuration same-source management, one-click construction of the environment, coverage measurement card points, and CI (continuous integration) pipeline.

Back to reality, business development does not necessarily require such extreme agility. The ultimate agility is to use out-of-cycle investment in exchange for in-cycle speed, which will bring about an increase in various technical costs (machine resources, architecture capabilities, and personnel capabilities), and the iterative rhythm It usually falls at the weekly level to achieve a more balanced state. Correspondingly, testers will use the automation cost-benefit as the selection principle for the scope of automation. The benefit depends on the time saved after automation, and is benefited by the number of times the automation is repeated. The cost comes from the cost of automation writing and when the program under test changes. update maintenance. For example, performance testing, interface testing, and unit testing were selected due to the number of repeated runs; recording and playback were selected due to the low cost of automated writing; code scanning can dilute costs due to a wide range of applicable business scopes; interfaces that change frequently are not suitable for UI automation, etc. .

3. Evolution of testing technology

Don't look at the version for too long: Testing technology continues to evolve in two directions: quality and efficiency, providing quality personnel with a broad space for technological development.

The requirements put forward by the business (product form and R&D model) for the testing work must be answered by the testing technology.

There are two types of business requirements for testing work, one is quality, and the other is performance (including efficiency and cost). Therefore, testing technology is also divided into two directions:

1. How to ensure the quality of the test: complete test, correct test

The most basic methodological equivalence class division, boundary value analysis, etc. will not be repeated. In business practice, user operation equivalence class coverage and code coverage are two important measures. The various measurement methods evolved from this are all to drive 100% coverage in these two directions.

The two important sources of use case design are prd and department points. prd represents user operation behavior, and department points represent system implementation logic. After more than 20 years of development of Internet services, user functions have become increasingly powerful, and the implementation logic behind the functions has become more complex. There may be thousands of equivalent classes behind the same operation (think about the preferential strategy for placing orders on Double Eleven).

In order to cover this ultimate goal 100%, testers need to have a deep understanding of the object under test, disassemble equivalence classes from multiple dimensions such as function, code, normal logic, exception handling, etc., and finally design runnable use cases to cover each an equivalence class.

The development of testing technology is also based on this:

Understanding of the measured object: prd analysis, system link analysis, change impact analysis, etc.

Equivalence class disassembly & coverage: code coverage measurement, fuzz testing, traffic scenario clustering, code rule scanning, use case generation, etc.

Use case verification relationship identification and judgment: code data lineage, verification automatic derivation, test result judgment, etc.

2. How to measure quickly and at low cost

On the basis of complete testing, improving efficiency and reducing costs are the eternal demands.

Here is a very clear technological evolution path:

Manual testing -> testing instrumentation -> automated testing -> testing platform / service -> intelligent testing

1) Test tools

Testing work is naturally repetitive: repeated work between versions is regression, repetition between different functional cases is environment and data initialization; repeated work between different equivalence cases of the same function is use case execution, and result verification. The same case will also be repeatedly executed in multiple test releases within the version, as well as between development self-test and test work. The most extreme repetition is compatibility testing, and all links are repeated.

If manual testing is disassembled into: environment construction, data preparation, use case execution, result reading, result verification, and report writing, then each link can be solved by tools to improve the efficiency of manual execution.

2) Automation

If each link is instrumentalized, all links are further connected in series through tools and scripts, and one-click execution of a single use case is achieved, and single use case automation is realized. After solving the execution interference/dependency between use cases and realizing the batch execution of all use cases, it can be linked with code submission to realize continuous integration.

3) Test platformization/service

The higher the degree of use case automation, the higher the cost of use case automation, the higher the use case coverage, and the higher the repetition between use cases. When the business and the team develop together to a certain extent, the general capabilities of testing can be further abstracted, and the testing capabilities that can be reused across teams and businesses can be provided to a wider range in a more flexible platform-based service mode, supporting technical teams with hundreds to thousands of people (dev + test). Such as automated testing framework, general mock, general scanning rules, general verification, code scanning, performance testing platform, etc.

4) Intelligent test

Different from the intelligent business test mentioned above, the intelligent test refers to the use of intelligent technology for testing. In 2019, I had the honor to lead the intelligent testing group of the economic entity quality group, conducted some explorations on intelligent testing, and also tried to sort out the definition of intelligent testing standard levels. Our team divided the test activities into three major areas and 8 sub-fields, and disassembled the intelligent space of each sub-field and corresponded with some practical cases.

The formulation of the standard is to guide the development direction of technology. The standard potentially embodies the difficulties in the field of intelligent testing: through the algorithm model/machine learning to solve the links that are strongly dependent on people in the testing process, such as scenario planning coverage and result judgment, so that no one is involved intervention.

Unmanned intervention in the test is equivalent to: test efficiency = machine execution efficiency, and the cycle occupancy of people in the test -> 0. Some explorations that can be seen so far include: automatic generation of use cases, automatic verification of use cases (automatic verification of UI through image recognition , through data relationship derivation for automatic data verification, etc.), use case failure automatic location, use case automatic repair, etc.

Intelligent testing has been tried in the industry for several years, but I have not seen any practical cases that completely replace human beings. I am still optimistic about this direction: with the continuous standardization of the testing process, the accumulation of test data assets, and the development of algorithm model technology , The improvement of testers' algorithm model ability, and intelligent testing will definitely make more and more contributions.

4. Evolution of Quality Responsibilities

Don't read the version for too long: With the demands of the industry and the development of technology, the extension of the concept of testing is gradually expanding, and the scope of responsibilities undertaken by quality personnel is also gradually expanding. The quality team of Ali Ant has gradually taken on the greater responsibility of technical risk, which has brought new space for the career development of quality personnel.

After talking about the testing work, we must also talk about the evolution process of the role of testing.

Analogous to testing technology, there is also an obvious evolution route for testing responsibilities in the industry, and the same development process can be seen in the development process of each company when benchmarking against the top companies in the Internet industry.

Test -> Quality -> Quality + Effectiveness -> Quality Effectiveness + Technical Risk

1. Testing vs Quality

In the previous article, the two responsibilities of testing and quality are basically synonymous, but in the early stage of industry development, from testing to quality represents a major difference between the two responsibilities of delivering the test process and delivering quality results. As long as it is called a quality team, it can break through the limitations of testing work and do quality assurance from various angles, such as R&D process management, sqa (software quality assurance), these work contents can belong to the quality team. Combined with business needs, the quality team can also undertake work such as business quality evaluation and business competition comparison.

2. Effectiveness

When the service quality reaches a stable state, performance requirements will definitely be raised. Performance can be divided into two scopes: quality performance and R&D performance.

Due to the nature of repetitive nature, mass performance is usually proposed first. The common way is to rely on the devops process to improve the efficiency of the tool platform, and accumulate the ability for each link with high cost and high repetition. This part of the work has been carried out in the faster link of how to test the test technology, and repeated analysis will not be done.

The scope of R&D efficiency is wider. Through the repetitive identification of the overall R&D work, we can find room for efficiency improvement and accumulate capabilities. For example, if similar marketing activities require multiple developments, we can accumulate low-code marketing configuration capabilities + self-service pre-run acceptance capabilities for operations. Similar institutions and merchants have multiple accesses, which can accumulate institutional self-service access + self-service joint debugging capabilities, etc. The work falling on the R&D side is usually undertaken by the R&D role, and the work falling on the quality side can be undertaken by the quality role.

3. Technical risk

This is the field I have learned the most since joining Ant, and it is also a new trend in the industry in recent years.

Let me tell you a personal experience about 9 years ago:

At that time, I was the leader of the quality team, and my business scope included a web site with a pv of tens of millions. One day, my boss (the leader of the quality team) called and said, "The website is down? They said it couldn't be opened online, so I tried it. There is also a white screen on the home page." I checked around and replied to him: "It's not a bug in the program, and no project has been released in the past few days. It is said that the server is down." My boss said: "Then what?" I said: "No Bug, it’s not a missed test, it has nothing to do with our team, and the operation and maintenance are dealing with it.” Then my boss took some time to let me understand, how could the website hang up have nothing to do with the quality, the quality team shouldn’t be unaware that the website is down , The quality team should do something to make the product quality better. After all, we are called the quality team, etc... Later, we did fine-grained business monitoring, hierarchical alarms, automatic positioning of high-frequency alarm rules and other online quality special projects. Comparing the current Ant words, it is almost technical risk, high availability, stability, high security governance, emergency quick response and other work.

Why did I talk about this experience? This incident touched me a lot: we are a quality team, and we test every line of code released online, but there will be quality problems that seriously affect users online, so what should the quality team do? (Define responsibilities, assume responsibilities), to be responsible for this part of quality? At that time, we defined it as online quality, that is, whether it is caused by program bugs or design, the problems that will affect online quality are the scope of responsibility. For this reason, we have to find problems, quickly locate them, and solve them. Recover these perspectives to form a methodology, and collaborate with other roles to establish a joint process. For me, this responsibility was called online quality in the former company, and technical risk in Ant.

(Note 1: There is a conceptual logic conflict here: online quality = technical risk, so quality (offline + online) includes technical risk. But at the same time, strongly influenced by the above personal experience, in my mind technical risk = will make For all problems that go wrong online, technical risk covers the traditional concept of quality in reverse, because program quality is also a source of online problems.)

(Note 2: Since the responsibilities of technical risks are destined to be shared by multiple teams and roles, and there are differences in the setting of team roles and responsibilities in major technical organizations, in different contexts, technical risks refer to things and problems , there are also slight differences when working with teams.)

Technical risk work can be divided into risk identification, prevention, discovery, hemostasis, positioning, recovery, plan drill, red and blue attack and defense, etc. according to the stage. Compared with unit testing, interface testing, integration testing, system testing, and user testing in the testing phase, these phase divisions only represent front-to-back dependencies, not absolute serialization. I believe that students who read this article have a lot of understanding of technical risks. I will not expand too much on the specific methodology here, but just list a few key points I want to emphasize:

1) Risk identification should be analyzed from two perspectives: the final manifestation of the problem and the source of the problem.

a. Since technical risk = online problems, the performance of online problems is naturally the most important perspective. Ant has defined six areas of technical risk in 16 years: high availability, capital security, data quality, performance capacity, cost, and security. The first four types of risks involving more quality roles are defined from the problem performance. Although these concepts are not completely orthogonal (for example, performance and capacity problems will be manifested as high availability problems, and data quality will also cause financial security performance), it is relatively easy to use to distinguish the main types of technical risks in work.

b. There are various sources of problem introduction. There is a classic saying: "No change, no harm." The entire link that provides services includes gateways, applications, configurations, middleware, clouds, computer room hardware, networks, etc. Changes may be introduced that cause problems online. (There may be changes in the optical cable if there is no change at home -_-).

c. In the process of risk identification, there must be two ways of thinking, positive and negative, to ensure the integrity of risk identification: deduce the reasons from possible problems (service failure may be due to bugs, machines, computer rooms, networks), and from possible problems The changes will cause problems (misconfiguration may lead to service hangup, wrong amount, copywriting customer complaints).

2) Build prevention capabilities for the source of problems, and build discovery capabilities for problem manifestations.

3) The two lines of hemostasis and positioning restoration should be decoupled and parallelized.

In the concept of testing, fixing the bug after positioning = solving the problem, but in technical risk, hemostasis and positioning recovery are two concepts. Hemostasis only means that the problem will no longer be continuously triggered, and the impact will no longer expand, but the consequences of the problem that has occurred Still, (such as card bills, data errors, fund receipt and payment errors), recovery is to completely return to the normal state before the problem. Therefore, the positive time-related characteristics of technical risks require us to give priority to hemostasis when hemostasis and recovery cannot be performed at the same time, and the recovery of hemostasis must be decoupled.

4) The plan needs to be evaluated and drilled regularly to maintain its effectiveness, and errors in the plan will cause secondary failures.

5) Red and blue offense and defense are to verify the capabilities of a series of technical risk systems such as discovery, hemostasis, recovery, and pre-planning. The regression capability similar to the test ensures the technical risk prevention and control capabilities of the online system and the person in charge.

6) Cultural awareness is important. Compared with offline testing, technical risk involves more work content involving people: risk analysis, emergency response speed, emergency decision-making and judgment, multi-department collaboration (such as full customer operation, product, pr, gr, business, finance, etc.) ), etc., and the online harmfulness of the risk determines that experience cannot be accumulated through actual combat. Therefore, in daily work, the team must emphasize the awe of risks, strictly implement the process, take every offensive and defensive drill seriously, and maintain the inheritance of risk culture through continuous risk culture operation.

The technical development of technical risk also revolves around these stages, similar to the testing process, each stage is also undergoing a process from manual to automated to intelligent. Since the risk performance is based on the online production environment, the basic data of the risk is easier to standardize than the basic data of the test (change orders, operation orders, monitoring items, verification rules, call links, data lineage, etc.), in recent years The accumulation of risk data has also paved the way for the intelligentization of risk business, and many good technical practices have also emerged.

As mentioned above, quality and technical risk are already two concepts that go hand in hand. As a quality person, you must not miss the exploration and development in the field of technical risk. From responsibility assumption to experience accumulation to capacity building, you must do it in a down-to-earth manner.

5. Quality organization design

Don't read the version for too long: Only by understanding your own quality team can you better evaluate your present and plan your future.

Assignment of responsibilities comes from organizational design, and organizational design comes from business strategy. In order to do a good job in quality work, it is also necessary to understand the quality organization design from a higher perspective and see where you are. The following only represents personal thinking and analysis, welcome to make bricks.

Note: Considering the general situation in the industry, the quality team will be used as the representative team to undertake the responsibilities mentioned above.

In recruitment, apart from caring about business and technology, candidates are often most concerned about the team: How big is this quality team? This simple question is actually asking: At what level of organizational structure is this quality team designed?

Since there is a natural mapping between testing work and development work, usually the organizational structure of the quality team and the organizational structure of the development team will also maintain a mapping. From the perspective of business collaboration upstream and downstream, ensuring that a development team unit only connects to a unique test team unit can save development. Test synergy costs. So when asking how big a quality team is, the subtext is asking: How big is this quality team connecting with a development team? How many senior leaders does the leader of this quality team report to? The answer is obvious. The larger the quality team, the higher the organizational level the quality team belongs to, and the greater the scope of responsibilities that the quality team can undertake.

The quality team of a large company has experienced more than 5 years of development, and has basically experienced the process of combination and separation:

Start-up companies usually start with several developments without full-time testing. After a single product is formed, full-time testing begins. During the product development period, the testing team and the development team grow together. Small test teams are aggregated into a large product matrix test team, and even continue to aggregate upward as a multi-business test team. After the multi-service development reaches a certain stage, due to the various impacts of multi-service and multi-line development, the multi-service test team will be split, returning to the form of a single-service test team to support each business, or even split into smaller unit teams. When multi-services continue to develop to a certain stage and more collaboration and linkage between businesses is required, the test team will be aggregated again for a more holistic unified planning.

Generally speaking, a combined organizational form is conducive to undertaking a wider range of horizontal responsibilities, emphasizing the overall consistency of quality, efficiency, and technical risk strategies in a large technical organization, thereby promoting the full reuse of quality, efficiency, and risk capabilities, and avoiding duplication At the same time, it is beneficial for potential testers to make cross-business cloud moves at the right time, and promote the ability of testers. If a large technology organization supports multiple businesses, and there are obvious differences in strategic goals, stages, and development methods between businesses, the overall strategy for quality, efficiency, and technical risk will tend to consider the business type with the highest proportion, which is likely to give Special business brings discomfort. The greater the business difference, the greater the discomfort and the greater the potential harm to the business. If the quality team abandons the overall strategy and emphasizes business individualization strategies and differentiated capability building, then firstly, the reuse advantages of large teams cannot be fully reflected; scale, personnel capacity) will eventually reflect the conflicts within the quality team, which will bring great or even unbearable pressure to the internal management of the quality team.

The divided organizational form is conducive to the quality team to keep up with the business, adjust strategies flexibly, and quickly build targeted capabilities. It is most suitable for rapidly developing and changing businesses. The disadvantage is that, limited by the team size and personnel capacity reserves, testing technology The speed of development and ability precipitation will be affected. Testers have been corresponding to a business for a long time, and their vision is limited, which will also affect the ability accumulation and development of testers. Moreover, due to poor information and inconsistent decision-making among multi-quality teams, repeated capacity building is prone to occur, resulting in repeated investment at the organizational level.

In general, the organizational structure design of the quality team affects two aspects of organizational management: uploading and distributing, that is, information flow and decision-making execution flow. A reasonable organizational design can balance the information storm and information loss, and can also improve Management efficiency of decision-making implementation and feedback adjustment. No matter what the status is, the quality team is best to receive uploads from the business line and the functional line at the same time, and based on the demands behind the organizational form, make specific strategic decisions, taking into account the requirements of the two lines, and balancing potential conflicts. There are no advantages or disadvantages in itself, and the one that suits the business and organization is the best. But at the same time, we should also insist that whatever organizational form the team is in, we should give full play to the current advantages, avoid disadvantages, and make quality an accelerator of business rather than a speed bump.

The business strategy determines the design of the technical organization, and the scope of the test-to-exploitation ratio is determined within the technical organization, and the size of the quality team is basically determined. Here is a brief talk about the very important measure-to-open ratio in quality organization design.

More than ten years ago, I asked a certain boss, what is the appropriate test ratio? My boss said: the test ratio is a magic number, I can set any number, and then you can do it.

I have been thinking about it for many years, and I have come up with the following ideas:

1) It is possible to set the measurement opening ratio to any value.

2) Like the devil number, it seems meaningless at first glance. I don’t know why, but there is a lot of information hidden behind it.

3) Regardless of the measurement ratio, quality results must be obtained.

4) The test opening ratio may continue to change, and the quality results must be guaranteed if it changes.

5) A good boss should clearly explain the thinking behind it to his classmates, and he should not be lazy in important communication, and should consider the feelings of members, otherwise the resentment may last for a long time.

Still the same sentence: the business strategy determines the organizational design, and the test-to-open ratio design can be considered comprehensively from the following perspectives.

1) Quality attributes of the business itself

a. How much damage will a quality problem bring to the business?

b. Can good quality bring growth to the business?

c. What are the quality requirements of the industry environment?

2) Current business quality requirements

a. Quality, efficiency, and cost are an impossible triangle. What is the current trade-off strategy for the business?

b. How does the quality of the business compare with other competing products?

3) Current quality water level

a.Quality results

b.Quality capability

c. Quality Organization Capabilities

4) Looking forward to changes

a. Changes in quality requirements for industry development

b. Changes in the stage of business development

c. The challenge of new technology development to quality technology

5) Other organizational considerations such as personnel recruitment, team development, and industry competitiveness.

Someone asked, does it mean that the business does not pay attention to quality because the proportion of testers is lower than that of testers? First of all, a misunderstanding must be eliminated: the quality role undertakes the quality work, but it does not mean that all the quality work is undertaken by the quality role. The test-to-open ratio is the ratio of full-time quality personnel: non-professional quality personnel, not the ratio of quality: non-quality workload. Another influencing factor is the division of quality work undertaken by non-professional quality personnel, such as development self-test, product verification, and operation acceptance.

Regardless of the test-to-open ratio, the quality team must coordinate the quality work of all roles, formulate quality strategies, provide tool platforms, design quality processes, and supervise the execution of each role to achieve a balance of overall quality, efficiency and cost. So the answer is naturally no. The test-to-capital ratios of different businesses are not comparable, and the test-to-capital ratios are not necessarily related to the degree of business emphasis.

(Some people also ask, under the same business, can the historical test and development ratio explain the importance of the business? The answer is also negative. Under the same business, it also depends on the development stage, test maturity, corresponding developer capabilities, and the actual test team. Responsibilities and other comprehensive factors.)

A little trick, to see whether the business attaches importance to quality, you can look at the voice of the quality role in the quality work of other roles.

After determining the location and size of the organizational structure of the quality team, the quality team should be given clear responsibilities. Refer to the quality, efficiency, and technical risks mentioned above. It is necessary to determine the specific responsibilities of these three directions, and decompose the corresponding strategic goals to the quality team.

The responsibilities undertaken by a specific quality team can be reflected from the team name: for example, the string of name changes of Quality Department -> Quality Technology Department -> Quality and Technology Risk Department represents the upgrading process of team responsibilities.

Summary: Quality organization design is decomposed layer by layer from business strategy to technology strategy to quality risk strategy. From organizational structure to scale ratio to assignment of responsibilities, different organizational designs are not inherently good or bad, only suitable is the best. In the context of the Internet, the premise of rapid changes is also added. Only by understanding the quality organization design in which you are in can you better take advantage of the trend, give full play to your advantages, and deliver quality value.

6. Career development plan for quality personnel

Don't read the version if it's too long: understand the requirements of the ability structure, disassemble and analyze yourself.

Finally, let’s talk about the career development plan of quality personnel.

In fact, the main motivation that prompted me to write this article comes from the question that my quality students have asked me the most these years. To answer this question, as mentioned above, you must first see clearly the environment you are in (industry, company, business, team), and then see yourself clearly, and finally you can answer your own career development questions by yourself.

We talked about the development of testing technology from product form and R&D model, and talked about organizational design from roles and responsibilities. In fact, parts 1-3 talk about what quality people can do, and parts 4-5 talk about what the organization needs quality people to do. Only in this chapter can I return to the individual, what am I already doing, what else can I do, and what will I do in the future?

After these years of development, major companies have established a stable quality professional promotion system. Behind the promotion standard is the ability structure. We need to see ourselves clearly. The most convenient way is to disassemble ourselves according to the ability structure and correspond to them. The specific criteria for promotion levels analyze themselves.

Quality professional ability can be disassembled into three dimensions:

1) Comprehension of business and architecture, which is the basic ability of quality roles, reflected in: understanding of business model, product design, process logic, and architecture design, technology stack, and implementation details on the technical side. Understanding is not only for the work of quality, efficiency and risk links, but also needs to participate in product and architecture design as early as possible from the perspective of quality, and deliver quality value at the earliest stage.

2) Quality technology capability, which is the core capability of the quality role. From test analysis to test strategy, from manual to automation, from basic test capability building to test innovative technology implementation, it is necessary to be keen on the development of industry technology, constantly expand the horizon, and promote Cutting-edge technology is implemented, and quality technical solutions are provided according to business characteristics.

3) General technical capabilities such as engineering data algorithms, these are the catalyst capabilities of the quality role, engineering capabilities range from automated use case code writing, to quality tools, platform construction, quality architecture design and delivery, data algorithm capabilities are reflected in how to test intelligent business , and how to test with intelligence.

With the development of the industry, each of these three dimensions has sufficient technical depth, and it is not easy to balance development in the short term. There is a very important principle in capacity planning: make use of your strengths. I personally suggest that in the early stage, you should properly dabble in multiple technical directions (make full use of the opportunities given by the team), and have an understanding of your ability level, potential, and interest in each direction; , it is best to be good at and like the direction, cultivate deeply for 2-3 years, and then repeat this evaluation-selection process.

How to assess the extent to which one's own ability has reached? Remember one principle: you deserve what you deserve. Only by getting results can we prove our ability.

The threshold requirement is to be responsible for large projects and complex businesses (many and often complicated according to the level of ability benchmarking). The advanced requirement is to have technical methods, systems, and innovations in the process. The ultimate requirement is to prove that the results of the above process are due to you. Achieved.

Let me share with you a practical tip:

Summarize the hardest thing you do every year and compare it with the hardest thing in the previous year. Is it more difficult?

Every year, I look back at the best skills I use. Compared with the best skills of the previous year, is it better?

Think about your job every year, change to yourself two years ago, and feel that you can do it at that time, please raise your hand. Students who raise their hands, please pay attention to your growth self-evaluation.

Personal growth and organizational development are positively correlated most of the time. If there is an inconsistency between the two, then one of them must make changes. Everyone is the first person responsible for their own career development. They need to take a more proactive step, talk to their supervisors, align their cognitions, get feedback, and seek support.

Finally, the career development plan of quality personnel can jump out of the quality line. Due to the characteristics of quality work, diversified business contacts, and extensive professional capabilities, as long as you make up your mind to adjust the ability structure, transfer to business, research and development, efficiency, project management, and success in technical risks There are many cases every year.

egg time

Going back to the questions at the beginning of the article, most of the questions I receive on a daily basis are actually based on the questioner's own inquiries, even if they are concerned about the industry, they are also concerned about themselves. Therefore, the person answering the question must know something about the person asking the question in order to answer more accurately.

The following three methods are provided, and the operation difficulty is divided into three levels from easy to difficult.

The easiest way, I call it the snapshot analysis method: Talk to the supervisor, as long as the supervisor who has been with you for more than half a year must be the person who knows (and cares most) about your current state, the analysis feedback given is naturally more precise. Take the teacher as your friend and do every task well.

A little troublesome method, I call it the historical review method: analyze your own growth process since your career, focus on analyzing the key events, which skills have been significantly improved, and what kind of scene gave you this opportunity , you can grow all the way to today (working in Ali Ant), you must have done something right, extract the key points of your growth, use history as a mirror, and look for opportunities to trigger again.

It’s quite a difficult method, which I call the future prediction method: Find relatively senior peers (including your supervisor) that you trust and recognize, discuss future business and industry development together, and analyze what requirements your job and skills will have , Start with the end in mind, develop yourself with purpose.

17 years in the industry, talk about testing