Internet back-end technology encyclopedia!

Internet back-end technology encyclopedia!

1. System development

High Cohesion/Low Coupling

high cohesion

High cohesion means that a software module is composed of highly related codes and is only responsible for one task, which is often referred to as the single responsibility principle. The cohesion of the module reflects the tightness of the internal connection of the module.

low coupling

The closer the links between the modules, the stronger the coupling, and the less independent the modules are. The level of coupling between modules depends on the complexity of the interface between modules, the way of calling and the information passed. A complete system, between modules, make it exist independently as much as possible.

Generally, the higher the degree of cohesion among modules in the program structure, the lower the degree of coupling between modules.

over design

Over-design is to carry out too much future-oriented design or complicate relatively simple things, excessive pursuit of modularity, scalability, design patterns, etc., adding unnecessary complexity to the system.

premature optimization

Premature does not mean early in the development process, but when it is not clear where future changes in requirements will go. Not only may your optimizations cause you to not be able to implement new requirements well, but your guesses about optimization expectations may still be wrong, causing you to actually gain nothing except complicating the code.

The correct way is to first realize your requirements with quality, write enough testcases, and then do a profile to find the bottleneck of performance, and only then do optimization.

Further reading: Why premature optimization is the root of all evil?

Refactoring

Refactoring is to improve the quality and performance of the software by adjusting the program code, make the design mode and structure of the program more reasonable, and improve the scalability and maintainability of the software.

Further reading: Techniques for refactoring code

broken window effect

Also known as the broken window theory, the broken window effect (Broken windows theory) is a theory of criminology. This theory holds that if the bad phenomena in the environment are allowed to exist, it will induce people to imitate, or even intensify . Take a building with a few broken windows, for example, if those windows are not repaired, there may be more windows damaged by vandals. Eventually they'll even break into buildings, and if found unoccupied, perhaps settle there or set fire to them.

When applied to software engineering, the hidden dangers of system code or architecture design must not be allowed to emerge, otherwise the hidden dangers will become more and more serious as time goes by. On the contrary, a high-quality system itself will make people involuntarily write high-quality code.

principle of mutual distrust

It means that in the entire link between the upstream and downstream of the program, each point cannot be guaranteed to be absolutely reliable, and any point may fail or behave unpredictablely at any time , including machine networks, services themselves, dependent environments, inputs, and requests Etc., so be fortified everywhere.

Further reading: The principle of distrust in the programming world

Persistence

Persistence is the mechanism for transitioning program data between temporary and persistent states . In layman's terms, temporary data (such as data in memory, which cannot be stored permanently) is persisted to persistent data (such as persisting to a database or local disk, which can be stored for a long time).

critical section

A critical section is used to represent a common resource or shared data that can be used by multiple threads, but each time, only one thread can use it. Once the critical section resource is occupied, other threads want to use this resource. have to wait.

blocking/non-blocking

Blocking and non-blocking usually describe the interaction between multiple threads . For example, if a thread occupies a critical section resource, then all other threads that need this resource must wait in this critical section, and waiting will cause the thread to hang. This situation is known as blocking . At this time, if the thread occupying the resource has been unwilling to release the resource, then all other threads blocked in this critical section cannot work. Non-blocking allows multiple threads to enter a critical section at the same time .

Synchronous Asynchronous

Usually synchronous and asynchronous refer to function/method call aspects.

Synchronization means that when a function call is issued, the call does not return until the result is obtained. The asynchronous call will return instantly , but the instant return of the asynchronous call does not mean that your task is completed. It will start a thread in the background to continue the task, and notify the caller by callback or other means after the task is completed.

concurrent/parallel

Parallel

It means that at the same time, multiple instructions are executed on multiple processors at the same time . So no matter from the micro or macro point of view, the two are executed together .

concurrency

It means that only one instruction can be executed at the same time, but multiple process instructions are quickly executed in rotation , so that it has the effect of executing multiple processes at the same time at the macro level, but it is not executed at the same time at the micro level, but the time is divided into several Segment, so that multiple processes can be executed quickly and alternately . Further reading: The difference between concurrency and parallelism

2. Architecture design

High Concurrency

Due to the advent of distributed systems, high concurrency (High Concurrency) usually refers to the design to ensure that the system can process many requests in parallel at the same time . Generally speaking, high concurrency means that at the same point in time, many users access the same API interface or URL address at the same time . It often occurs in business scenarios with a large number of active users and a high concentration of users.

Extended reading: High concurrency (horizontal expansion, vertical expansion)

High Availability

High Availability HA (High Availability) is one of the factors that must be considered in the design of distributed system architecture. It usually means that a system is specially designed to reduce downtime and maintain high availability of its services.

Further reading: what is high availability

read-write separation

In order to ensure the stability of database products, many databases have a dual-machine hot backup function. That is, the first database server is a production server that provides external addition, deletion and modification services; the second database server mainly performs read operations .

Cold Standby/Hot Standby

cold standby

Two servers, one running and one not running as a backup. In this way, once the running server goes down, the backup server will run. The cold backup solution is relatively easy to implement, but the disadvantage of cold backup is that the backup machine will not automatically take over when the host fails, and needs to actively switch services.

Hot Standby

That is the so-called active/standby mode, server data including database data is written to two or more servers at the same time. When the active server fails, the standby machine is activated through software diagnosis (usually through heartbeat diagnosis) to ensure that the application can fully resume normal use in a short time. When a server goes down, it will automatically switch to another standby machine for use.
Extended reading: High Availability of Integrated Platform - Dual Machine Cold Standby and Hot Standby

Live more in different places

Multi-active in different places generally refers to the establishment of independent data centers in different cities . "Active" is relative to cold backup. Cold backup is to back up the full amount of data. It usually does not support business needs. It will only be used when the main computer room fails. Switching to the backup computer room, and being more active means that these computer rooms also need traffic in daily business to provide business support.

Further reading: Summary of the industry's remote multi-active high-availability architecture design scheme

Load Balance

Load balancing is a load balancing service that distributes traffic to multiple servers . It can automatically distribute the external service capability of the application among multiple instances, improve the availability of the application system by eliminating single points of failure, and allow you to achieve a higher level of application fault tolerance, thereby seamlessly providing the load required to distribute application traffic Balanced capacity to provide you with efficient, stable and secure services.

Further reading: Everything about load balancing: summary and reflection

static and dynamic separation

Dynamic and static separation refers to the architecture design method of separating static pages from dynamic pages or static content interfaces from dynamic content interfaces in the web server architecture to improve the access performance and maintainability of the entire service.

Further reading: static and dynamic separation architecture

cluster

The concurrent carrying capacity of a single server is always limited. When the processing capacity of a single server reaches the performance bottleneck, multiple servers are combined to provide services. This combination is called a cluster, and each server in the cluster is called this A "node" of the cluster, each node can provide the same service, thus doubling the concurrent processing capability of the entire system.

distributed

A distributed system is to split a complete system into many independent subsystems according to business functions. Each subsystem is called a "service". The distributed system sorts and distributes requests to different subsystems, allowing different Services to handle different requests. In a distributed system, subsystems operate independently, and they are connected through network communication to realize data intercommunication and composite services.

CAP theory

The CAP theory refers to that in a distributed system, Consistency (consistency), Availability (availability), and Partition Tolerance (partition tolerance) cannot be established at the same time.

  • Consistency: It requires that at the same point in time, all data backups in the distributed system are the same or are in the same state.

  • Availability: After some nodes of the system cluster go down, the system can still correctly respond to user requests.

  • Partition tolerance: The system is able to tolerate failures in network communication between nodes.

Simply put, in a distributed system, at most the above two attributes can be supported. But obviously since it is distributed, we are bound to partition. Since partitioning, we cannot 100% avoid partition errors. Therefore, we can only make a choice between consistency and usability.

In distributed systems, we often pursue availability, which is more important than consistency. Then how to achieve high availability, there is another theory here, which is the BASE theory, which further expands the CAP theory.

BASE theory

BASE theory states:

  • Basically Available

  • Soft state

  • Eventually consistent (final consistency)

The BASE theory is the result of a trade-off between consistency and availability in CAP. The core idea of ​​the theory is: we cannot achieve strong consistency, but each application can use an appropriate method according to its own business characteristics to make the system achieve eventual consistency.

Scale horizontally/Scale vertically

Horizontal expansion Scale Out

By adding more servers or program instances to spread the load, so as to improve storage capacity and computing power.

Vertical expansion Scale Up

Improve stand-alone processing capabilities. There are two ways to expand vertically :
enhance the performance of stand-alone hardware , for example: increase the number of CPU cores such as 32 cores, upgrade better network cards such as 10 Gigabit, upgrade better hard drives such as SSD, expand hard drive capacity such as 2T, and expand system memory Such as 128G;
improve the performance of stand-alone software or architecture , for example: use Cache to reduce IO times, use asynchrony to increase single-service throughput, and use lock-free data structures to reduce response time;

Parallel expansion

Similar to horizontal scaling.

The nodes in the cluster server are all parallel peer nodes. When expansion is required, more nodes can be added to improve the service capability of the cluster. Generally speaking, the key paths in the server (such as login, payment, core business logic, etc. in the server) need to support dynamic parallel expansion at runtime.

Elastic expansion

It refers to the dynamic online expansion of the deployed cluster.

The elastic expansion system can automatically add more nodes (including storage nodes, computing nodes, and network nodes) according to a certain strategy according to the actual business environment to increase system capacity, improve system performance, or enhance system reliability, or achieve these three goals at the same time .

State Synchronization/Frame Synchronization

state synchronization

State synchronization means that the server is responsible for calculating all game logic and broadcasting the results of these calculations, and the client is only responsible for sending player operations and displaying the received game results. State synchronization has high security, logic updates are convenient, and disconnection and reconnection are fast, but the development efficiency is low. Network traffic increases with the complexity of the game, and the server needs to bear greater pressure.

frame sync

The server only forwards the message without any logical processing. The number of frames per second of each client is the same, and the same input data is processed in each frame. Frame synchronization needs to ensure that the system has the same output under the same input. The development efficiency of frame synchronization is high, the traffic consumption is low and stable, and the pressure on the server is very small. However, the network requirements are high, the disconnection and reconnection time is long, and the client computing pressure is high.
Extended reading: The difference between frame synchronization and state synchronization

3. Network communication

connection pool

Establish a connection buffer pool in advance, and provide a set of connection usage, allocation, and management strategies, so that the connections in the connection pool can be reused efficiently and safely, avoiding the overhead of frequent connection establishment and closure.

Disconnect and reconnect

Due to network fluctuations, the user disconnects from the server intermittently. After the network is restored, the server tries to connect the user to the state and data at the time of the last disconnection.

session hold

Session persistence refers to a mechanism on the load balancer that can identify the relevance of the interaction process between the client and the server, and ensure that a series of related access requests are allocated to one machine while performing load balancing . In human terms, it means that multiple requests initiated during a session will all fall on the same machine.

Extended reading: session (session) retention mechanism

long connection/short connection

Usually refers to the long connection and short connection of TCP.

A persistent connection means that after a TCP connection is established, the connection is maintained all the time. Generally, heartbeats are sent to each other to confirm the corresponding existence. Multiple business data transmissions are performed in the middle, and the connection is generally not actively disconnected.

A short connection generally means that after a connection is established, a transaction (such as: http request) is executed, and then the connection is closed.

Extended reading: What exactly are HTTP long connections and short connections?

Flow Control / Congestion Control

flow control

Prevent the sender from sending too fast, exhausting the receiver's resources, so that the receiver has no time to process.

congestion control

Prevent the sender from sending too fast, making the network too late to deal with congestion, which will cause performance degradation of this part or even the entire network, and even cause network communication services to come to a standstill in severe cases.
Extended reading: TCP flow control (sliding window) and congestion control (working process of congestion control)

shocking crowd effect

The shocking group effect is also called the thundering group effect, but what is it called? In short, the shocking group phenomenon is when multiple processes (multi-threads) block and wait for the same event at the same time (sleeping state), if the waiting event occurs, then He will wake up all the waiting processes (or threads), but in the end, only one process (thread) may obtain the "control" of this time and process the event, while other processes (threads) obtain "control" If it fails, it can only re-enter the dormant state. This phenomenon and performance waste are called thundering groups.

Further reading: Linux Thundering Herd Effect Detailed Explanation

NAT

NAT (Network Address Translation, Network Address Translation) is to replace the address information in the header of the IP packet. NAT is usually deployed at the egress of an organization's network, and provides public network accessibility and upper-layer protocol connectivity by replacing the internal network IP address with the egress IP address.

Further reading: Detailed NAT

4. Abnormal failure

downtime

Downtime, generally refers to the unexpected failure of the host computer and crash. Secondly, some servers such as database deadlocks can also be called downtime, and some services of some servers hang up, so to speak.

coredump

When a program fails and is interrupted abnormally, the OS will store the current status of the program as a coredunmp file. Usually the coredump file contains the memory, register status, stack pointer, memory management information, etc. when the program is running.

Extended reading: Introduction to coredump and summary of coredump reasons

Cache penetration/breakdown/avalanche

cache penetration

Cache penetration refers to querying a data that must not exist. Since the cache needs to be queried from the database when the cache misses, it will not be written to the cache if the data cannot be found. This will cause the non-existing data to go to the database every time it is requested. queries, which in turn put pressure on the database.

cache breakdown

Cache breakdown refers to when a hotspot key expires at a certain point in time, and there are a large number of concurrent requests for this key at this point in time, so that a large number of requests hit the db.

cache avalanche

Cache avalanche means that a large amount of data in the cache reaches the expiration time, and the query data volume is huge, causing excessive pressure on the database or even downtime.
The difference from cache breakdown is that: cache breakdown is the failure of hot keys; cache avalanche is the failure of a large number of keys at the same time.
Extended reading: Thoroughly master cache breakdown, cache penetration, and cache avalanche in ten minutes

500/501/502/503/504/505

500

Internal Server Error. Internal service error, generally the server encounters an unexpected situation and cannot complete the request.
Possible reasons:
1. Program error, for example: ASP or PHP syntax error;
2. Due to high concurrency, too many files cannot be opened due to system resource limitations.

501

Not implemented. The server does not understand or support the requested HTTP request.

502

Bad Gateway. The failure of the WEB server may be due to insufficient program processes. The requested php-fpm has been executed, but it has not been executed for some reason, which eventually leads to the termination of the php-fpm process.
Possible reasons:
1. Nginx server, the number of php-cgi processes is not enough;
2. The execution time of PHP is too long;
3. The php-cgi process is dead;

503

Service Unavailable. The server is currently unavailable. The system maintenance server is temporarily unable to process the client's request, which is only a temporary state. You can contact the server provider.

504

Gateway Timeout. The server 504 error indicates a timeout, which means that the request sent by the client has not reached the gateway, and the request has not reached the executable php-fpm, which is generally related to the configuration of nginx.conf.

505

HTTP Version Not Supported. The server does not support the HTTP protocol version used in the request. (The HTTP version is not supported)
Except for the 500 error which may be a programming language error, the rest of the errors can probably be interpreted as a problem with the server or server configuration.

Memory overflow/memory leak

out of memory

Memory overflow (Out Of Memory) means that when the program applies for memory, there is not enough memory for the applicant to use, or in other words, you are given a storage space for storing int type data, but you store long type data, then the result is memory If it is not enough, an error OOM will be reported at this time, which is the so-called memory overflow.

memory leak

Memory leak refers to the heap memory that has been dynamically allocated in the program is not released or cannot be released for some reason, resulting in a waste of system memory, resulting in serious consequences such as slowing down the running speed of the program and even system crashes.
Extended reading: What is memory overflow, what is memory leak

handle leak

A handle leak is when a process fails to release an open file handle after calling a system file.

Generally, the phenomenon after a handle leak is that the machine slows down, the CPU soars, and the CPU usage of the cgi or server with the handle leak increases.

Extended reading: http://km.oa.com/group/19143/articles/show/162768

deadlock

Deadlock refers to a blocking phenomenon caused by two or more threads during the execution process due to competition for resources or communication with each other. If there is no external force, they are all inhibited in a blocked state and cannot proceed. It is said that the system is in a deadlock state or the system has a deadlock.

Further reading: Deadlock Baidu Encyclopedia

Soft Interrupt/Hard Interrupt

hard interrupt

The interrupt we usually refer to refers to a hard interrupt (hardirq). Automatically generated by peripherals (such as network cards and hard disks) connected to the system. It is mainly used to notify the operating system of system peripheral status changes.

soft interrupt

1. It is usually the interrupt of the kernel by the hard interrupt service program;
2. In order to meet the requirements of the real-time system, the interrupt processing should be as fast as possible. In order to realize this feature of Linux, when an interrupt occurs, the hard interrupt processes the work that can be completed in a short time, and the work that processes events for a long time is completed after the interrupt, that is, the soft interrupt (softirq) to complete .
Extended reading: https://www.cnblogs.com/widic/p/7392485.html

glitch

At a short moment, the server performance indicators (such as traffic, disk IO, CPU usage, etc.) are much larger than the time period before and after this moment. The appearance of glitches means that the resource utilization of this server is uneven and insufficient, and it is easy to induce other more serious problems.

replay attack

The attacker sends a packet that has been received by the destination host to achieve the purpose of deceiving the system, which is mainly used in the identity authentication process and destroys the correctness of authentication. It is a type of attack that repeatedly maliciously or fraudulently repeats a valid data transmission, either by the initiator or by an adversary who intercepts and resends the data. Attackers use network monitoring or other means to steal authentication credentials and then resend them to the authentication server.

network island

A network island refers to a situation in which some machines lose their network connection with the entire cluster in a cluster environment, split into a small cluster and have data inconsistencies.

data skew

For a cluster system, the general cache is distributed, that is, different nodes are responsible for a certain range of cached data. We did not disperse the cached data enough, resulting in a large amount of cached data being concentrated on one or several service nodes, which is called data skew. Generally speaking, data skew is caused by poor load balancing implementation.

Destruction

Split brain refers to the system split caused by the unreachable network between some nodes in the cluster system. Small clusters with different splits will provide services according to their respective states. The original clusters will have inconsistent responses at the same time, causing nodes to interact with each other. Competition for resources, system chaos, data corruption.

5. Monitoring alarm

service monitoring

The main purpose of service monitoring is to accurately and quickly find out when a service has a problem or is about to have a problem, so as to reduce the scope of impact.

There are generally many means of service monitoring, which can be divided into the following levels:

System layer (CPU, network status, IO, machine load, etc.)
Application layer (process status, error log, throughput, etc.)
Business layer (service/interface error code, response time)
User layer (user behavior, public opinion monitoring, front-end Buried point)

Full link monitoring

Service dialing

Service dialing test is a monitoring method to detect service (application) availability. The target service is periodically detected through the dialing test node, which is mainly measured by availability and response time. There are usually multiple dialing test nodes in different places.

node detection

Node detection is a monitoring method used to discover and track network availability and smoothness between nodes in different computer rooms (data centers). It is mainly measured by response time, packet loss rate, and hop count. The detection method is generally ping, mtr or other proprietary agreement.

Alarm filtering

Filter some predictable alarms, and do not enter the data of alarm statistics, such as http response 500 errors caused by a small number of crawler visits, custom exception information of business systems, etc.

Alarm Deduplication

When an alarm is notified to the person in charge, the same alarm will not continue to be received until the alarm is recovered.

Alarm suppression

In order to reduce the interference caused by system jitter, it is also necessary to implement suppression. For example, the instantaneous high load of the server may be normal, and only the high load that lasts for a period of time needs to be paid attention to.

Alarm recovery

Development/operation and maintenance personnel not only need to receive alarm notifications, but also need to receive notifications that the fault is eliminated and the alarm returns to normal.

Alarm merge

Merge multiple identical alarms generated at the same time. For example, if multiple sub-service loads are too high in a microservice cluster at the same time, they need to be merged into one alarm.

Alarm Convergence

Sometimes when an alarm is generated, it is often accompanied by other alarms. At this time, an alarm can only be generated for the root cause, and other alarms converge into sub-alarms and send notifications together. For example, when a CPU load alarm occurs on a cloud server, it is often accompanied by an availability alarm of all the systems it carries.

self-healing

Real-time detection of alarms, pre-diagnosis and analysis, automatic recovery of faults, and opening up of peripheral systems to achieve a closed-loop of the entire process.

6. Service Governance

microservice

Microservice architecture is an architectural pattern, which advocates dividing a single application into a group of small services, and the services coordinate and cooperate with each other to provide users with ultimate value. Each service runs in its own process, and services communicate with each other using a lightweight communication mechanism (usually an HTTP-based Restful API). Each service is built around a specific business and can be Independent deployment to production environment, production-like environment, etc.

Extended reading: God tells you how to understand the microservice framework

service discovery

Service discovery refers to using a registration center to record the information of all services in the distributed system, so that other services can quickly find these registered services. Service discovery is the core module supporting large-scale SOA and microservice architecture, and it should be as high available as possible.

Further reading: Service Governance - Service Discovery

flow clipping

If you look at the request monitoring curve of the lottery or seckill system, you will find that this kind of system will have a peak during the time period when the event is open, but when the event is not open, the request volume and machine load of the system are generally relatively stable . In order to save machine resources, we cannot always provide the maximum resource capacity to support short-term peak requests. Therefore, it is necessary to use some technical means to weaken the instantaneous request peak and keep the system throughput controllable under the peak request.

Peak clipping can also be used to eliminate glitches, making server resource utilization more balanced and sufficient.

Common peak-shaving strategies include queuing, frequency limiting, hierarchical filtering, and multi-level caching.

Extended reading: High Concurrency Architecture Series: What is traffic peak clipping? How to solve the peak-shaving scenario of seckill business

version compatible

In the process of upgrading the version, it is necessary to consider whether the new data structure can understand and parse the old data after the version is upgraded, and whether the newly modified protocol can understand the old protocol and make appropriate processing as expected. This requires version compatibility during the service design process.

overload protection

Overload means that the current load has exceeded the maximum processing capacity of the system. The occurrence of overload will cause some services to be unavailable. If it is not handled properly, it is very likely to cause the service to be completely unavailable, or even an avalanche. Overload protection is just a measure for this abnormal situation to prevent the service from being completely unavailable.

Service circuit breaker

The function of the service fuse is similar to the fuse in our home. When a service is unavailable or the response times out, in order to prevent the entire system from avalanche, the call to the service is temporarily stopped.

service downgrade

Service downgrade means that when the pressure on the server increases sharply, some services and pages are strategically downgraded according to the current business situation and traffic, so as to release server resources and ensure the normal operation of core tasks. Degradation often specifies different levels, and performs different processing in the face of different exception levels.

According to the service method: the service can be refused, the service can be delayed, and sometimes the service can be random.

According to the scope of service: a certain function can be cut off, and some modules can also be cut off.

In short, service degradation needs to adopt different degradation strategies according to different business needs. The main purpose is that although the service is damaged, it is better than nothing.

Fuse VS Degradation

Same point

The goal is the same: all start from usability and reliability, in order to prevent the system from crashing;
the user experience is similar: in the end, what users experience is that some functions are temporarily unavailable;

difference

The triggering reasons are different: service fusing is generally caused by a service (downstream service) failure, and service degradation is generally considered from the overall load;

Service throttling

Current limiting can be regarded as a kind of service degradation. Current limiting is to limit the input and output traffic of the system to achieve the purpose of protecting the system. Generally speaking, the throughput of the system can be measured. In order to ensure the stable operation of the system, once the threshold that needs to be limited is reached, it is necessary to limit the flow and take some measures to complete the purpose of limiting the flow. For example: delayed processing, rejected processing, or partial rejected processing, etc.

Further reading: talk about current limiting in high concurrency systems

fault shielding

The faulty machine is removed from the cluster to ensure that new requests are not distributed to the faulty machine.

7. Test method

Black Box/White Box Testing

black box testing

Black-box testing does not consider the internal structure and logical structure of the program, and is mainly used to test whether the functions of the system meet the requirements specification. Generally, there will be an input value, an input value, and an expected value for comparison.

white box testing

White-box testing is mainly used in the unit testing stage, mainly for code-level testing. For the internal logic structure of the program, the testing methods include: statement coverage, decision coverage, condition coverage, path coverage, and condition combination coverage

Unit/Integration/System/Acceptance Tests

Software testing is generally divided into four stages: unit testing, integration testing, system testing, and acceptance testing.

unit test

Unit testing is to check and verify the smallest verifiable unit in software, such as a module, a process, a method, etc.
The unit test has the smallest granularity, and is generally tested by the development team using a white box method, mainly to test whether the unit conforms to the "design".

Integration Testing

Integration testing is also called assembly testing. Usually, on the basis of unit testing, all program modules are tested in an orderly and incremental manner.
Integration testing is between unit testing and system testing, and acts as a "bridge". Generally, the development team uses a white box and black box method to test, which not only verifies the "design" but also verifies the "requirements".

System test

During system testing, the software that has passed the integration test, as a part of the computer system, is combined with other parts of the system, and a series of strict and effective tests are carried out in the actual operating environment to find potential problems in the software and ensure the normal operation of the system.
System testing has the largest granularity, and is generally tested by an independent testing team using a black box method, mainly to test whether the system meets the "requirement specification".

Acceptance Test

Acceptance testing, also known as delivery testing, is a formal test aimed at user needs and business processes to determine whether the system meets the acceptance criteria, and users, customers or other authorized organizations decide whether to accept the system.
Acceptance testing is similar to system testing, the main difference is that the testers are different, while acceptance testing is performed by users.
Further reading: Introduction to unit testing, integration testing, system testing, regression testing, and user acceptance testing

Regression Testing

When defects are found and corrected, or new functions are added to the software, retesting is performed. It is used to check that the found defects have been corrected and that the modifications have not caused new problems.

smoke test

This term originates from the hardware industry. Power up the device directly after making a change or repair to one piece of hardware or a hardware component. If there is no smoke, the component has passed the test. In software, the term "smoke testing" describes the process of verifying code changes before embedding them in the product's source tree.

Smoke testing is a rapid basic function verification strategy for software version packages in the software development process. It is a means of confirming and verifying the basic functions of software, not an in-depth test of software version packages.

For example: for a smoke test of a login system, we only need to test the input of the correct user name and password to verify the core function point of login. As for the input box, special characters, etc., it can be done after the smoke test.

Performance Testing

Performance testing is to test various performance indicators of the system by simulating various normal, peak and abnormal load conditions through automated testing tools. Both load testing and stress testing are performance testing, and they can be combined.

Through load testing, determine the performance of the system under various workloads. The goal is to test the changes in various performance indicators of the system when the load gradually increases.

Stress testing is a test to obtain the maximum service level that the system can provide by determining the bottleneck or unacceptable performance point of a system.

Benchmarks

Benchmark (Benchmark) is also a performance test method, which is used to measure the maximum actual operating performance of the machine's hardware, as well as the performance improvement effect of software optimization, and can also be used to identify CPU or memory efficiency problems of a certain piece of code. Many developers Benchmarks are used to test different concurrency modes, or benchmarks are used to assist in configuring the number of worker pools to ensure maximum system throughput.

A/B testing

The A/B test is to use two or more randomly assigned samples of similar quantity for comparison. If the experimental results of the experimental group and the control group are compared with the experimental results, there is statistical significance in the target index, which can indicate that the experimental group Features that lead to the outcome you want to help you test a hypothesis or make a product decision.

Further reading: How to get A/B testing right

code coverage testing

Code coverage (Code coverage) is a measurement in software testing, which describes the proportion and degree of source code in the program being tested, and the resulting proportion is called code coverage.

When doing unit testing, the code coverage rate is often used as an indicator to measure the quality of the test. Even, the code coverage rate is used to assess the completion of the test task. For example, the code coverage rate must reach 80% or 90%. Ever since, testers have worked hard to design case coverage code.

Extended reading: Talking about unit test code coverage

8. Release deployment

DEV/PRO/FAT/UAT

DEV

Development environment
The development environment is used for developers to debug and use, and the version changes greatly.

FAT

Feature Acceptance Test environment
is a functional acceptance test environment for software testers to test.

UAT

User Acceptance Test environment
The user acceptance test environment is used for functional verification in the production environment and can be used as a pre-release environment.

PRO

Production environment
production environment, formal online environment.
Further reading: https://www.cnblogs.com/chengkanghua/p/10607239.html

Gray release

Grayscale release means that during the version upgrade process, some users are first upgraded with product features through partition control, whitelist control, etc., while the remaining users remain unchanged. When users who upgrade product features have no feedback after a period of time, they can Gradually expand the scope, and finally open the features of the new version to all users. Grayscale release can ensure the stability of the overall system. Problems can be found and corrected at the initial grayscale to ensure their impact.

Rollback

Refers to the behavior of restoring the program or data to the last correct state (or the last stable version) when the program or data is processed incorrectly.

Guess you like

Origin blog.csdn.net/heshihu2019/article/details/132632296