A review of formal methods

A review of formal methods

references:

  1. Wang Ji, Zhan Naijun, Feng Xinyu, Liu Zhiming. Overview of formal methods. Journal of Software, 2019, 30(1): 3361. http://www.jos.org.cn/1000-9825/5652.htm

Formal method definition

​ Formal methods are techniques for describing, developing, and verifying computer hardware and software systems based on strict mathematical foundations. Its mathematical foundation is based on the formal logic system of the trinity of formal language, semantics, and reasoning proof.

​ Formal methods have been successfully used in various hardware designs, especially chip designs. Since the complexity and uncertainty of software systems far exceed hardware systems, formal methods are not highly used in software development.

The basic concepts, brief history and structural system of formal methods.

1 Basic concepts of formal methods

Formal methods are techniques for formal specification, development and verification of computer software (hardware) systems based on strict mathematical foundations.

  • Specification: The specification of the developed software system using formal language. They correspond to artifacts at different stages of the software life cycle and describe the models and properties of the system at different abstract levels. , such as requirements model, design model and even code and code execution model, etc.
  • Development: Based on formal specifications and verification, formal development mainly constructs and proves the equivalent conversion and refinement relationships between formal specifications. Guided by the formal model of the system, it is gradually refined to develop a system that meets the needs. , also known as correct by construction development.
  • Verification: Formal verification is to prove the logical relationships between different formal specifications. These logical relationships reflect the class correctness requirements that need to be met between software products at different stages of software development. For example, formal verification gives "the system design model satisfies several Proof construction of "specific properties".

The main difference between formal methods and other software development methods [4] is that the language used to describe software and its properties is unambiguous, and the methods for constructing and verifying software are rigorous.

1.1 Development history

The development of formal methods has a long history. People mainly promote the proposal and early development of formal methods from two perspectives, namely, the theoretical research perspective of providing mathematical foundation for program design and the perspective of providing strict quality assurance for software development . Software engineering perspective .

Please add image description

1.1 .1Basic research around formal language and formal semantics (1930~present)

A formal language is a language in which all expressions or statements are completely defined and generated by a symbolic alphabet and recursive grammatical rules. The languages ​​of formal logic are formal languages, such as propositional logic, predicate logic and Boolean algebra.

  • In the 1930s, Church used formal language definitions to study calculations and algorithms, and proposed a computational model, Lambda calculus, which later became the theoretical basis for functional programming languages, type theory, and operational semantics. In fact, Lambda calculus itself can be regarded as It is a programming language.
  • In the late 1950s, the definition of high-level programming languages ​​began
    research on the formal system of computing, resulting in Backus-Naur Forms (BNF paradigm Backus-Naur Forms structured operational semantics) and used to define ALGOL60, forming the language. Recursive abstraction. Formal languages ​​are not only used in the definition of languages, but also play a role in system software development, for example, the development of yacc and grep in UNIX. While formal languages ​​are being defined, how to define the meaning of a program becomes a concern Focus.
  • The study of formal semantics has gradually formed four major systems: operational
    semantics, referential semantics, algebraic semantics and axiomatic semantics.
  • In the 1960s, Petri proposed Petri Net as a mathematical modeling language for distributed systems. For concurrent systems, Hoare proposed the communication sequence process CSP, Milner proposed the communication system calculus CCS, and Hennessy and Lin proposed the message transfer process. Symbolic mutual simulation theory.
  • As software forms continue to change, formal modeling languages ​​continue to develop. For example, for reactive systems, Pnueli introduced linear sequential logic LTL in 1977, and Clarke and Emerson established computational tree logic CTL in 1981; in reactive systems Based on the traditional system description, TPTL, Timed Automata, Timed Regular Expressions, Timed CSP and Timed CCS for real-time systems have been developed. Hardware description language, architecture description language, communication control modeling simulation language, etc. have also appeared.
1.1.2 Methodological research around formal specification and development (1970~present)

It is difficult to directly use programming languages ​​​​and their semantics to describe and prove the artifacts and their correctness of different levels of abstraction created at each stage of the software development process from requirements documents to program codes. People began to study the design of high-level abstract formal specification languages, forming Formal development method based on formal specification language.

1.1.3 Engineering research around formal verification technology (1980~present)

After the formal specification is established, how to develop the correct system from the formal specification becomes the key. **Formal verification includes how to prove that specifications at different abstract levels are equivalent or satisfy refinement relationships, and how to verify the formal specification (required properties) The satisfying relationship between them and their models is a scientific problem and a practical application problem that must be solved by formal methods to ensure the correctness of software development.

In the process of researching and verifying automation, a large number of negative results such as undecidable problems, NP-complete problems, and state explosions were discovered. However, these corresponding problems also continue to promote the development of various reduction technologies. The tool's automation level and scalability have been significantly improved. Formal verification has achieved great success in hardware verification, and has also continuously entered the development of high-security level software such as embedded software and safety-critical software.

1.1.4 Multidisciplinary research targeting verifiable software (2000~present)

Formal methods have played an important role in computer software/hardware development and quality assurance. Program synthesis combined with artificial intelligence, big data, and software automation combined with formal reasoning have returned to the forefront. In addition, formal methods have played an important role in the network Cross-cutting applications in security, quantum computing, biological computing and other directions have also received widespread attention.

1.2 Basic system of formal methods

Formal methods are a whole formed by formal specification language (including formal semantics and model theory), formal specification (including refinement and synthesis), formal verification, formal tools, etc.

2 Formal specifications

Formal specification is a system model or the properties that the system needs to satisfy that are strictly described by a formal specification language. The former is a model specification, and the latter is a property specification.

2.1 Formal specification language

​ A formal specification language refers to a language defined by strict recursive grammatical rules. A sentence that satisfies the grammatical rules is called a well-formed specification.

2.1.1 Model specification language

The model specification language uses mathematical structures to describe the state changes or event trajectories of the system. It directly defines the structure, functional behavior and even non-functional behavior (such as time) of the described system model.

The model specification provides models at different levels of abstraction in the system development process, with corresponding logical reasoning systems supporting their decomposition and combination to complete the conversion and refinement of specifications between different levels. It mainly includes the following categories.

  1. Algebraic Reduction Language: An algebraic reduction consists of symbols expressing sorts, operation symbols between sorts, and equality axioms in many-sorted equality logic. The advantage of algebraic reduction is that it has a very good mathematical foundation, and the calculation results of any sequence of operations can be automatically obtained and executed.

  2. Structured specification language: data type specification and program structure

  3. Process algebra (calculus): In order to design and develop concurrent and distributed systems, process algebras (calculus) such as CCS[15], CSP[13], and ACP[79] have emerged. Both CCS and CSP maximize the capabilities of concurrent communication systems. The data status and data calculation function are abstracted away, and the communication and synchronization and the relationship between the two are concentratedly described. It is an event-based specification language. The CCS specification is an expression defined by CCS syntax, and the semantics are defined through structural operation semantics. The behavioral evolution of the communication process described. The state transition rules of CCS expressions defined in this way constitute a formal system for deriving various equivalence relationships between CCS expressions. These equivalence relationships can be expressed as different bisimulation relationships. .

    In order to deal with other characteristics of concurrent systems, such as information security, mobile, real-time, hybrid, probabilistic and stochastic, these concurrency models have been
    variously extended. For example, in order to deal with real-time systems, Reed and Roscoe extended CSP to real-time systems and established Timed CSP [23]; In order to deal with hybrid systems,
    He Jifeng, Zhou Chaochen and others extended CSP to hybrid systems and established hybrid CSP. Another example: In order to deal with mobile computing, Milner proposed mu-
    calculus, which was further expanded into Ambient-calculus by Cardelli and Gordon; in order to deal with information security, Abadi et al. improved mu-calculus
    into spi-calculus; etc. Milner tried to Using category theory to unify these concurrent computing models, Bigraph theory was proposed.

  4. Specification based on migration system: **Migration system can naturally represent the behavior of the system. Typical specification languages ​​based on migration systems include Petri nets [12] and Statecharts [87]. ** Specification languages ​​based on migration systems often have graphics representation, called a visual specification language.

    In order to model non-functional requirements, people have made various extensions to the mark migration system. Taking automata as an example, its subsequent extensions include: timed automata [21], hybrid automata [88], probabilistic timed automata [ 89], stochastic hybrid automata [90], etc. Moreover, these models are no longer limited to the computer field, and have been widely used in many fields such as control, biology, physics, chemistry, etc.

2.1.2 Property specification language

​ Property specification language is based on a program logic system and uses logical formulas to describe a set of properties to define the desired system behavior.

The properties that the system

The early program logic of sequential programming was Floyd-Hoare Logic System Design and Verification Study Notes - Floyd-Hoare Logic - Zhihu (zhihu.com)

Please add image description

2.2 Formal Semantics

Formal semantics originated from the study of the semantics of programming languages. The study of using mathematical structures to define the semantics of programming languages ​​was later expanded to various types of formal specification languages, forming formal semantics. Formal semantics
(theory) studies the semantics of formal specification languages The mathematical foundation and construction methods provide mathematical means for studying the expression ability, reliability and completeness of formal language. According to the different mathematical structures and semantic representation methods used, formal semantic research methods can be divided into 4 categories, namely operational semantics, referential Semantics, algebraic semantics and axiomatic semantics.

  1. Operational semantics uses an abstract interpreter (sometimes called an abstract machine or abstract function) to define language semantics, focusing on simulating the
    operation of the computer system during data processing.

  2. Denotational semantics: Denotational semantics interprets the basic grammatical components of language into mathematical objects (called references), and uses operations on mathematical objects to define the semantics of the language.

    For example, for Timed CSP, people have expanded the trace semantic model and the failure-divergence-trace semantic model, and proposed the trace semantic model with timestamp and the failure-divergence-trace semantic model with timestamp respectively [23, 125].

  3. Algebraic semantics uses algebraic structures to define the semantics of computer languages ​​(especially algebraic specification languages) and is developed on the basis of abstract data types. Algebraic semantics is closely related to algebraic specifications and is mainly used to solve problems based on algebra The reasoning of program correctness in the formal development of specifications has a relatively high level of abstraction.


  4. Axiomatic semantics directly uses formal logic to describe the semantics of a program. The basic idea is to add basic propositions (program axioms) that all programs must satisfy on the basis of the existing formal logic system .

Please add image description

2.3 Formal development and software construction

Formal specifications and development methods follow the basic principles of software development methods, including separation of concerns and gradual refinement. Based on formal specifications and verification, software formal construction activities include multi-perspective modeling of formal specifications, specifications at different abstraction levels Space refinement and program synthesis, etc.

2.3.1 Formal development based on protocols

​ Formal specification can use different/same specification languages ​​to describe the system from different perspectives, which is the so-called multi-dimensional perspective specification method, as shown in Figure 2. For example, a system (requirements) specification can include data model specification, data function specification , interactive communication protocol specifications
and dynamic state migration behavior model.

2.3.2 Program synthesis

Program synthesis refers to the technology of using a specified programming language to automatically generate programs that comply with program specifications.

3 Formal verification

The most significant role of formal methods is to verify formal specifications. There are two common forms of formal verification: one is to reason about "whether the system model specification satisfies its property specification." At this time, the model specification tends to be operational, and the property specification is It is often descriptive; the other is reasoning "whether one model specification of the system has a refinement or equivalence relationship with another model specification ." Formal verification methods mainly include deductive theorem proof and algorithmic model testing.

3.1 Proof of theorem

Formal verification based on theorem proof takes the assertion "the system satisfies its specification" as a logical proposition, and proves this proposition in the form of deductive reasoning through a set of inference rules. Most verification based on theorem proof uses program logic as the theory Basic, but program logic is not the only verification method. For example, we can directly express various properties such as the safety and correctness of program execution based on the operational semantics of the program and prove relevant theorems.

Floyd-Hoare logic [10,93] is a classic verification system based on theorem proof, and its verification object is a sequential program. Owicki and Gries proposed a general concurrent program verification method [96]. Jones expanded on this method and proposed the Rely-Guarantee method [98] to solve the composability problem.

According to the different proof methods and degree of automation, verification based on theorem proof can be divided into two categories, namely automatic verification based on automatic theorem prover and semi-automatic verification based on human-computer interaction.

3.2 Model testing

In theorem proving, formal verification directly proves the property to be proved as a mathematical theorem, which is also called deductive verification. A method corresponding to deductive verification is model testing [175177]. Model testing was proposed by Clarke. and Emerson, Queille and Sifakis independently proposed it in the early 1980s [19,176]. The basic idea is**: It is much easier to test whether a structure satisfies a formula than to prove that the formula is satisfied under all structures, and then it is oriented to concurrent systems. Created a new verification form for testing the satisfiability of formulas on finite state models [178].**

​ Model checking tests the satisfying relationship between the semantic model of the system and its property specifications by automatically traversing the finite state space of the system model. The most common ones in model checking are sequential model checking or logical model checking, and most of the system specifications are based on The specification of the model uses operational semantics to describe system behavior, and the formal model uses automata, mark migration systems, etc.; the properties to be tested are property-based specifications described with temporal logic. If the system model does not satisfy the properties, the model checking algorithm will give If the system behavior does not meet the counterexample of the property specification, the user can analyze and debug based on the counterexample; if the model test does not find a counterexample, the system must satisfy the tested property.

3.2.1 Basic approach

The core of model testing is the traversal strategy and algorithm on the finite state space. There are mainly explicit methods and implicit methods. The explicit method traverses the state space through state calculation, and the implicit method traverses the state through fixed point calculation. space. The essence of both is the exhaustive search of the finite state space. Therefore, the key issue of model testing is how to deal with the system state explosion and complete the search in the representable state space and effective time. For this problem, there are mainly Type 3 pathways:

(1) Structural method: Use the grammatical expression (model) structure that defines the system to alleviate the state space explosion problem . Typical methods include symmetric model testing, on-the-fly state space search, partial order model testing, and parametric models. Inspection, etc.;
(2) Symbolization method: Encode the state and transition of the model's migration structure into logical formulas . This symbolic encoding can effectively compress the data structure representing the state set, and the operation of state transitions is also correspondingly efficient. Symbolization The coding method is often based on BDD, propositional formulas
or first-order constraints without quantifiers, etc.;
(3) Abstract method: **Reducing the state space structure of the complex system to a smaller homomorphic image, the latter is a version of the former Over-approximation, thereby converting the verification of the original system into problems that can be handled by model testing, such as predicate abstract methods.** As a more general method, abstract interpretation is a method based on ordered sets. The theory that monotonic functions can reliably approximate program formal semantics provides a general framework for automatic program analysis.

Please add image description

3.2.2 Software model inspection

​ Software systems are infinite state systems. Even if the state is finite, the scale of its state space is often far beyond what current computers can handle. While hardware system model testing has achieved great success, the challenges faced by software model testing are still severe.

When the model checking abstraction obtains a counterexample, first check whether the counterexample path is feasible, which is usually obtained by solving the coding constraints based on the counterexample path. If it is not feasible, that is, the path formula of the counterexample is unsatisfiable, grammar-based refinement is Abstract refinement can be performed by adding or subtracting appropriate predicates .

In software model checking, the use of static analysis, symbolic execution and other methods to extract program models, as well as path-based model checking and other static and dynamic methods are also important ways to effectively improve the scalability of model checking [3]. In recent years, the Effectively combining model testing and theorem proving is also a promising direction.

4 Application of formal methods

Formal methods have been very successful in the design and application of hardware systems in the industrial world. The application of formal methods in software was earlier than that in hardware, but its impact is much smaller than in industry. The main reason is that the complexity of software systems is much higher than that of hardware. The level of corresponding software system formal tools is also much lower than that of hardware formal tools, especially in terms of formal verification tools.

Since the formal method itself has overhead, it is necessary to reasonably consider the economics of its application in the application. Formal methods often get more results in safety-critical systems (aviation, aerospace, nuclear, railway, etc.) For applications, some software security assurance standards.

Depending on the degree of formalization, the application of formal methods must first be determined whether it is applied in the entire system or in key parts. After determining the system scope or boundaries of the application, formal methods can be applied to varying degrees in relevant parts .

In the past 10 years or so, formal methods based on interactive theorem proving have made significant breakthroughs in verifiable system software. This is due to three factors: First, the value of basic software in the entire information system system is increasing. To a certain extent, it makes it possible to accept the cost of applying heavyweight formal methods; secondly, compared with application software, the boundaries and functions of the core parts of system software are relatively stable and not changeable, and one verification is completed Finally, it can be shared with the community; third, the automation capabilities of formal verification tools have been significantly improved, and the system software can also be used as a whetstone for the development of formal methods.

Formal methods can not only ensure the reliability and security of the system software itself, but in turn can provide important inspiration and support for the optimization of the system structure.

5 Challenges and future of formal methods

Methods with mathematical foundations or mathematical foundations for establishing methods are the only way for engineering methods to become mature and rational. From an application point of view, it is engineering practice to continuously increase the mechanization and automation of software development, improve software quality and productivity, and reduce costs as much as possible. Vision. Although there is a consensus on the role of formal methods in improving software quality, there is still no clear understanding of its impact on large-scale software productivity and costs, and the progress in the recognition and application of formal methods is still slow. In In the large-scale application of existing formal methods, most of the users are personnel with good formal method literacy/training, and even developers of methods, technologies and tools themselves. Some software engineering practices show that in addition to treating programs as forms In addition to specifications, engineers are not willing to write a large number of formal specifications, believing that the formal method itself is complex, which to some extent increases the design complexity of the software
system . Therefore, the primary challenge of formal methods is to develop formal methods. Application forms, including technical forms and tool forms, improve the ease of use, effectiveness and scalability of formal methods, and lower the application threshold of formal methods.

Programming language and program correctness are the original sources of the development of formal methods. For programming languages ​​and codes, research and application of formal methods, techniques and tools is an important direction. People have carried out a lot of research on actual programming languages. In the research on verification technology, the development trend of formal verification technology around program code will be obvious. Verification will become a part of the programming environment, just like the functions of program testing and code recommendation. The integration of programming language and specification language will become a trend. Many ideas and methods of formal methods have an important influence in the design of programming languages. The initial ideas and applications of many new programming languages ​​​​are derived from formal methods. The success of the Rust language [236] is the result of formal methods Study representative cases that provide support for system development, which is mainly oriented to system programming: on the one hand, the language supports concurrency and manual memory allocation and release; on the other hand, the language draws on the ideas of type systems, linear logic and concurrent separation logic. The concepts of memory ownership and ownership transfer are introduced to avoid memory errors and common data race errors in concurrent programs. At present, the Rust language
has received The more influential systems developed using Rust include browsers, operating systems, and various other tools. Along with this trend, support for visual programming mechanisms and domain-related features will further promote the usability and feasibility of new languages. .

In the past 10 years, formal methods have entered a stage of revitalization. Whether it is the combination of lightweight formal methods and mainstream methods, or the application of heavyweight formal methods on industrial-grade software, great achievements have been made. Progress and success [237]. Behind these successful applications, tools play a decisive role. Once the system is modeled using formal specification language, it can use tools for semantic analysis. Tools also alleviate the pressure caused by the scale of the problem. .Therefore, build more usable and robust tools to support parallel semantic analysis and verification of large-scale specifications, build reusable formal specification libraries and method communities, and promote the advancement of formal method tools and reusable library facilities, including tools The integration of tools, the intangibility of tools, specification and verification of assets will undoubtedly be the direction of efforts of formal methods.

​ Changes in the form of systems and environments that are specified, developed and verified are the driving force for the development of formal methods. The goal of formal methods is to describe, develop and confirm software systems with high quality, so the form progress and status of software/hardware are changing. It has a direct impact on formal methods. For example, an important clue in the development of formal methods is from sequential programs to parallel programs, hybrid systems, cyber-physical fusion systems and even human-machine-physical fusion systems, and hybrid systems in the human-machine-physical fusion society It poses a comprehensive challenge to the foundation, methods, techniques and tools of formal methods [238,239].

​ Software is becoming social infrastructure, and formal methods play a very important role in the reliability of basic software/hardware of computer systems. This is what people most recognize as the role of formal methods in critical information infrastructure. The application point of the role. In terms of software infrastructure, full-stack verifiable software will continue to progress and may gradually penetrate into practical mainstream operating systems. For example, in order to ensure the reliability of cloud service infrastructure, Amazon
uses The TLA+ method formally verified the key algorithms of its S3 cloud storage service and found many defects [240]. In 2017, the Linux Foundation announced that it would conduct formal verification of some Linux kernel modules to improve the security of the system [241 ]. Information security research based on formal methods is undoubtedly one direction [242]. Future-oriented software infrastructure, correctness and information security verification of blockchain and smart contracts are booming [243, 244].

​ In an era where software defines everything, formal methods will define software. How formal methods integrate with other software development methods and domain-specific integration is particularly important. Corresponding to the changes in the characteristics of software forms and changes in quality requirements in the era of software definition, Formal methods need to adapt to more complex, open, dynamic, and continuously evolving software forms in terms of basic concepts, protocols, development and verification technologies and tools. For example, under the fusion of humans, machines, and things, informal methods need to be handled accurately and appropriately. From requirements to formal specifications, from formal abstraction to boundary modeling of non-formal scenarios and the real world, a large number of non-functional specifications include specifications of socialized human factors, specifications, reasoning and behavior of new software structures and behaviors such as autonomous adaptive self-organization. Verification, etc. In the development of formal methods, mathematics and formal methods have a close interaction. Mathematics provides the basis for formal models and reasoning, and formal methods also promote the development of mathematics. Formal methods can mechanically, Write reliable proofs of complex mathematical problems efficiently and accurately, and even help solve some long-standing mathematical problems, such as the four-color theorem[245], Robbins conjecture[246], and Kepler's conjecture. )[247] (The original proof exceeds 300 pages, and the officially published proof is nearly 130 pages. Its correctness cannot be guaranteed[248]) etc. Formal (engineering) mathematics[249] is essential for building high-confidence intelligent manufacturing The software environment also has important value.
Formal methods and artificial intelligence are closely related. Theorem proving and constraint solving are important contents of the symbolic school of artificial intelligence. How to use other achievements of artificial intelligence to improve the level of formal methods is a worthwhile The direction of attention is, for example, based on machine learning to help build formal specifications, discover invariants or recommend proof strategies to assist formal verification, assist in specification refinement and program synthesis, etc. Program synthesis intersects with machine learning, and there is the emergence of deep learning and framework generation
. A combined program synthesis method. On the other hand, machine learning software is also a program, and it is very valuable to study their formal methods [250, 251]. For example, formal semantics, verification and debugging of probabilistic programming, and big data processing programs Verification, formal specification and robustness verification of deep learning programs, using formal methods to establish better training methods, and studying the interpretability of machine learning are all topics worth exploring.

​ In terms of new computing models, the theory of quantum programming [252] has become a new content in the development of formal methods. Formal methods have been applied to semantic analysis of quantum programming languages ​​and reasoning of key properties, and quantum computing has also emerged. Program logic and model checking methods. Since quantum programs are very different from traditional programs, especially due to the existence of quantum superposition and entanglement, establishing a formal method for systematic quantum computing and developing effective verification technology have just begun. .

The permeability of computational thinking has also led to the cross-integration of formal methods with other disciplines. For example, in the field of biological research, computational modeling and analysis have become an important method [253], such as the temporal behavior modeling of Naïve T cell differentiation. and analysis [254]. These studies have effectively promoted the development of formal methods for hybrid systems, also promoted the development of medical life sciences, and provided a clear direction for the integration of medicine and engineering [255].

​ Education is an important driver of the sustainable development of formal methods. Limited by usability and scalability, formal methods have a long learning curve and high-intensity application requires a high threshold, which seriously restricts the widespread application of formal methods in software development. .The trustworthiness of computing systems is becoming more and more important. The computer science and software engineering curriculum plans formulated by ACM and IEEE all include program correctness [256,257]. The survey results on the current status of formal method education in China point out that professional education needs to be strengthened. Cognition of formal methods in education [258]. The lightweight application of formal methods has been able to significantly improve people's understanding of system requirements and design, and the program is a formal specification that can be mechanically and automatically processed (compiled or executed) ). Formal methods are actually common to software developers, but the degree of formalization is different. Therefore, in the process of developing computational thinking, the discussion of formal concepts is increased in basic courses such as programming and data structures, and in discrete Subsequent professional courses such as mathematics, algorithms, and software engineering highlight the relationship and combination of formal methods with mainstream methods, which is very important for the promotion and improvement of formal methods.

Guess you like

Origin blog.csdn.net/qq_40893490/article/details/127076478