Foreword: The National Computer Technology and Software Professional Technical Qualification (Level) Examination (hereinafter referred to as the IT Professional Qualification Examination) is a state-level and authoritative computer science and technology examination administered by the Ministry of Personnel of the People's Republic of China and sponsored by the National Computer Network and Information Security Management Center. Vocational Skills Proficiency Certification Exam. It mainly provides a way for enterprises, institutions and social training institutions to test and certify the professional skills of computers and software.
In the IT professional qualification examination, the software designer examination (soft examination for short ) is an important examination category, and the qualification certificate is considered to be an important symbol of talents in the software industry and a professional qualification certificate.
The soft test intermediate software designer is a level of the soft test, which belongs to the professional level of software engineers. It is a professional qualification certification in the field of advanced software design and development, and it is one of the core competency certifications for software industry professionals. The soft test intermediate software designer test content includes software development, requirements analysis, software testing, software project management, software quality assurance and other aspects. Passing this certificate exam proves that the candidates have professional knowledge and practical experience in software design and development, can play an important role in software projects, and provide high-quality software application solutions for enterprises and organizations.
The following are the knowledge points of software designers compiled by the seniors for you, hoping to help you achieve a good result~
Chapter 1 Computer System Knowledge
A computer system consists of two parts: hardware and software.
• Five components of computer hardware system
Controller, computing unit, memory, input device, output device
The memory is divided into internal memory ( memory, small capacity, fast speed, storing temporary data, disappearing after power failure ) and external memory ( hard disk, CD, large capacity, slow speed, long-term data storage )
Input devices and output devices are collectively referred to as peripherals
Host ( CPU + main memory )
• Central Processing Unit CPU
CPU composition: composed of arithmetic unit, controller, register group (fastest reading speed) , internal bus
CPU function: realize program control, operation control, time control, data processing functions
The composition of the calculator (often tested):
Arithmetic logic unit ALU (Arithmetic logic unit): realizes arithmetic and logic operations on data , and provides a workspace
Accumulation register AC (Accumulator): storage area for operation results or source operands
Data buffer register DR (Data Register): Temporarily store instructions or data in memory
Status condition register PSW (Program Status Word): save the conditional content of the instruction operation result , such as overflow flag, etc.
Calculator function: perform arithmetic and logic operations
Controller:
Instruction Register IR (Instruction Register): Temporarily store the current
Instructions the CPU is executing
Program Counter PC (Program Counter): archive instruction execution address
Address register AR (Address Register): save the memory address accessed by the current CPU
Instruction Decoder ID (Instruction Decoder): Analyze instruction opcodes
Controller function: control the work of the entire CPU , the most important, including program control, timing control
Programmers can access general-purpose registers to access data, status registers and program counters , but cannot access instruction registers
• Data base conversion
binary, hexadecimal ( 0x18F, 18FH )
R base to decimal case: Hexadecimal 5043 to decimal => 36^0 + 46^1 + 06^2 + 56^3 => 1107
Case of converting decimal to R: decimal 200 to hexadecimal => 200/6 = 33 remainder 2 => 33/6 = 5 remainder 3 => 5/6 = 0 remainder 5 => hexadecimal is the remainder of the remainder Arrange forward 532
Convert m-ary to n-ary: transfer through decimal
• Representation of numbers (few exams)
Minimum data unit b (bit bit)
Minimum storage unit 1B (byte byte) = 8b
1KB=1024B; 1MB=1024KB; 1GB=1024MB
Machine number: the form in which various values are represented in the computer. It is characterized by the use of the binary counting system. The symbols of the number are represented by 0 and 1, and the decimal point does not occupy the position implicitly (for example, +0 (0 0000000) -0(1 0000000)) Among them, the first digit is the symbol, and the last seven digits represent the value digits
Fixed-point representation: divided into pure decimals and pure integers, where the decimal point does not occupy storage space
Pure decimal: agree that the decimal point is before the highest numerical digit of the machine number
Pure integers: the position of the decimal point is agreed after the lowest numerical digit of the machine number
True value: the actual value corresponding to the machine number
• How to encode numbers (not much test)
原码:一个数的正常二进制表示 例如 +0 ( 0 0000000 ) -0(1 0000000)
反码:正数的反码即为原码;负数的反码是在原码的基础上,除了符 号位以外,其他各位按位取反(例如如上数值的反码为 +0 ( 0 0000000) -0 ( 1 1111111 ))
补码:正数的补码即原码;负数的补码是在原码基础上,除了符号位 以外,其他各位按位取反,而后在末位 +1 ,若有进位则产生进位 +0 ( 0 0000000 ) -0 ( 0 0000000 )) -0 的补码有溢出
移码:用作浮点运算的阶码,无论正数负数,都是将该原码的补码的 首位(符号位)取反得到
计算机系统中常采用补码来表示和运算数据,原因是采用补码可以简 化计算机运算部件的设计
• 浮点表示(常考)
浮点数 N = F * 2^E, 其中 E 称为阶码 ( 带符号的纯整数 ) , F 称为尾数(带符号的纯小数),类似于十进制的科学记数法 例如 101.011=0.101011*2^3
浮点数所能表示的数值范围由阶码确定,所表示的数值精度由尾数确 定
浮点数运算需要先 1. 对阶,即将阶码换算成相同的后再计算,小阶 向大阶对接,否则会损失尾数的精度 》 2. 尾数计算 》 3. 结果 格式化
浮点数存储格式: | 阶符 | 阶码 | 数符 | 尾数 | ,一般尾数用补码,阶码用移码规格化浮点数:将尾数的绝对值限定在 [0.5, 1]
浮点数的范围: M: 尾数补码位 ( 包括数符 ) , R: 阶码补码位 ( 包括阶符 ) ,则最大正数 +(1-2^-M+1) * 2(2R-1 - 1), 最小负 数 -1 * 2(2R-1 - 1)
定点表示法与浮点表示法:定点表示法分为定点整数和定点小数,定 点表示法的小数点不需要占用存储位,总位数相同时浮点表示法可以 表示更大的数
定点小数在机器字长为 n 的表示范围是定点整数表示范围除以
2^n-1
• 寻址
立即寻址:操作数包含在指令中
直接寻址:操作数存在内存单元中,指令中给出操作数所在存储单元 的地址
寄存器寻址:操作数存在某一个寄存器中,指令中给出存放操作数的 寄存器名,比直接寻址要快,寄存器 " 距离 " CPU 更近
寄存器间接寻址:操作数存在内存单元中,寄存器中存放了操作数所 在的内存地址,指令中则存了寄存器名,也就是说寻址路径 指令 - > 寄存器 -> 内存
间接寻址:指令中给出操作数地址的地址
Addressing efficiency: immediate addressing > register addressing > direct addressing > register indirect addressing > indirect addressing
The purpose of using different addressing methods: to expand the addressing space and improve programming flexibility
• checksum
Code distance: refers to how many binary differences there are at least between two legal codes in a coding system
Parity check code: Add a check bit in the code to make the number of 1s odd or even, and the code distance is 2. Parity check can only check errors and cannot correct errors
Hamming code: a check method that uses parity to correct errors . The code distance is 3. Suppose there are n bits of data bits and k bits of check bits, then n and k must satisfy the following relationship: 2^k-1 > = n+k
Cyclic redundancy check code (CRC): It can detect errors but cannot correct them . The code distance is 2. It consists of k data bits + r check bits . The check code is generated by the information code. The more check code digits The stronger the multi-check ability, the modulo two operation is used when calculating the CRC code
• RISC and CISC
RISC: Reduced instruction set computer, less instructions, low complexity , fixed instruction length, fewer addressing modes, more general-purpose registers, support for pipeline technology , hard-wired control logic, combinational logic controller
CISC: complex instruction set computer, with many and complex instructions, variable instruction length, complex and diverse addressing methods, general-purpose registers, and pipeline technology , using microprogram control technology
• Pipeline technology
Pipeline: A quasi-parallel processing implementation technology in which multiple instructions overlap and operate
Instructions are divided into three parts: Fetch -> Analysis -> Execution
Execution time of a complete instruction = fetch time + analysis time + execution time
Pipeline cycle: The section of the instruction step that takes the longest time, for example: fetch instruction 1ms -> analyze 3ms -> execute 2ms, then the pipeline cycle is 3ms, which means that except for the first one, the following instructions only need to spend 3ms more can complete
The total time of the pipeline: theoretical formula: completion time of a complete instruction + (total number of instructions - 1) * pipeline cycle ; practical formula: give the first instruction sufficient time, and each step of the first instruction uses a pipeline cycle time
Throughput rate: refers to the number of tasks completed by the pipeline per unit time. The calculation formula is: number of instructions / total time spent on the pipeline , and the maximum throughput rate = 1 / pipeline cycle
Asynchronous control will prolong the time and reduce performance, and an end signal must be sent after each operation
• memory
Classified by location: internal memory; external storage
Classified by working mode: read/write memory RAM; read-only memory ROM
Classification by access method: access to memory by content (for example: connected memory), access to memory by address
按寻址访问存储器:随机存储器、顺序存储器、直接存储器
闪存,一种只读存储器 ROM ,删除时以块为单位删除,类比为 U 盘
虚拟存储器,由主存 + 辅存 组成
存储系统的层次结构,由内而外: CPU 内部通用寄存器 》 Cache(SRAM 静态随机存储器 ) 》主 \ 内存储器 (DRAM 动态随 机存储器,需要周期性刷新来保持数据 ) 》外存储器
空间局部性:若一个存储单元被访问,则其临近的存储单元在不久的 将来也很可能被访问,这种特性就是空间局部性
时间局部性:若一个存储单元被访问,则这个单元在以后也可能被再 次访问,这种特性就是时间局部性
• Cache 高速缓存
Cache 高速缓存:位于 CPU 与 主存之间,用来存放当前最活跃的 程序和数据(主存的部分拷贝信息),速度比主存块 5~10 倍,对程 序员来说是透明的(程序员不可操作)
Cache 容量越大,命中率越高,逐渐接近 100% ,但是随着容量变 大, Cache 成本和命中时间也在增大
替换算法:目标是使 Cache 获得更高的命中率
随机替换算法
先进先出算法
近期最少使用算法
优化替换算法
Cache 中的地址映像方法
Address mapping: The main memory address sent by the CPU when it is working . To read and write information from the Cache, it is necessary to convert the main memory address into a Cache address . This conversion is called address mapping
Direct image: Main memory partition, Cache block , each area of main memory has the same block as Cache, the corresponding relationship between the blocks of each area of main memory and the block of Cache is fixed , the hardware circuit is simple, but the conflict rate is high
Full-link image: main memory is not partitioned , and main memory and Cache are divided into blocks according to the same size. Cache blocks can correspond to any block on main memory. The circuit design is difficult, and it is only suitable for small-capacity Cache, and the conflict rate is low.
Group-connected image: It is a compromise between direct connection and full-link image, grouping first, direct image between groups , and full-link image within the group
Probability of Conflicts: Full Associative Image « Group Associative Image « Direct Image
The mapping between Cache and main memory is done automatically by hardware
• interruption
Interruption: When encountering an event that needs to be processed urgently, suspend the currently running program, turn to execute the relevant program, and return to the source program after processing . This process is called interruption
Interrupt vector: Provides the entry address of the interrupt service routine
Interrupt response time: the period from when an interrupt request is issued to when the interrupt service routine is entered
The purpose of saving the scene: to return to the interrupted program correctly and continue execution
7 Chapter 1 Computer System Knowledge
• Input and output ( I/O ) control mode
Program query mode (program direct control mode):
The CPU and I/O can only work serially , the CPU needs to poll and check the status all the time, it is in a busy state for a long time, and the CPU utilization is low
Only read and write one word at a time
Put the number into memory by the CPU
Interrupt drive mode:
The I/O device actively reports to the CPU that the I/O operation has been completed through the interrupt signal
CPU and I/O peripherals can operate in parallel , improving CPU utilization
Only read and write one word at a time
Data is put into memory by the CPU
Direct memory storage method (DMA method):
The CPU issues data read and write commands to the I/O module, and then the CPU can do other things. The I/O module establishes a direct data path with the memory. After the I/O module operation is completed, it informs the CPU through an interrupt signal.
CPU and I/O peripherals can work in parallel
Put data directly into memory by peripherals
The unit of one read and write is block instead of word
CPU intervention is only required at the beginning and end of the transfer block
The CPU responds to the DMA request at the end of a bus cycle , and each transfer of data takes up a storage cycle
An interrupt request from an I/O device is a maskable interrupt , and a power failure is a non-maskable interrupt
• bus
Bus: A group of signal lines linking related components of a computer, and is a public channel used by computers to transmit information codes
Bus classification: data bus, address bus, control bus
8 Chapter 1 Computer System Knowledge
Advantages of the bus: Simplify the system structure, reduce the number of connecting lines , facilitate interface design, facilitate fault diagnosis and maintenance, and reduce costs
Bus bandwidth calculation: clock frequency * bytes transferred per second
Address bus width calculation: The width of the address bus indicates the addressing capability of the CPU, which is related to the size of the memory. The size of the memory requires a wider address bus. For example: the memory capacity is 4GB -> 2^32 B -> the address bus width is 32
Data bus width calculation: The data bus width is the word length of the processor
PCI: parallel internal bus, system bus ; SCSI: parallel external bus
• Encryption technology and authentication technology
The Problem Encryption Solved: Eavesdropping
Problems that authentication solves: tampering, counterfeiting, denial
Encryption Technology
Symmetric encryption: the same secret key is used for encryption and decryption, and there is only one secret key ; the encryption and decryption speed is fast, suitable for a large amount of plaintext data, and the secret key distribution is flawed
Asymmetric encryption: Encryption and decryption are not the same secret key , there are two secret keys (public key and private key); encryption and decryption speed is slow, secret key distribution is flawless, public key and private key cannot be calculated
Hybrid encryption: It is a mixture of symmetric encryption and asymmetric encryption: first encrypt a large amount of plaintext data with symmetric encryption, then encrypt the "symmetric encryption key" with an asymmetric public key, and transmit it to the receiver along with the encrypted plaintext , the receiver uses the asymmetric private key to decrypt the "symmetric encryption key", and then uses the "symmetric encryption key" to decrypt the plaintext
authentication technology
Abstract: Get the summary of the sent plaintext Hash algorithm , put it in the ciphertext and send it together, and compare it with the summary result of the plaintext Hash algorithm decrypted by the receiver. If they are consistent, there is no tampering
Digital signature: On the basis of the abstract, the private key is used to sign the abstract , and the recipient passes
The public key decrypts the digital signature , which can determine whether it has been tampered with, counterfeited, or denied, and is used to verify the authenticity of the source of the message
Digital certificate: use the private key of the third-party organization CA to digitally sign the user's public key to ensure that the public key is not tampered with, and the receiver decrypts it with the public key of the CA to obtain the public key of the sender
Use digital certificates to authenticate user identities , and use digital signatures to prevent tampering, counterfeiting, and denial
Encryption Algorithm
Symmetric encryption algorithm (private key, private key encryption, shared key encryption algorithm)
DES 3DES RC-5 IDEA AES RC4
Asymmetric encryption algorithm (public key, public key encryption)
ECC RSA DSA
Hash function, MD5 digest algorithm, SHA-1 secure hash algorithm
• System reliability
Suppose a system is composed of N subsystems, and the reliability of the subsystems are R1 R2 R3
Series system reliability: R = R1R2R3
Parallel system reliability: R = 1 - (1-R1)(1-R2)(1-R3 )
• Other
The number of bits in the instruction register depends on the instruction word length
The hierarchical storage speed of the computer is as follows: CPU internal general-purpose registers "Cache" memory "external memory
Safety requirements:
Physical Line Security – Computer Room Security
Network Security – Intrusion Detection
System Security – Vulnerability Patch Management
Application Security – Database Security
10 Chapter 1 Computer System Knowledge
Chapter 2 Programming Languages
• Low-level language vs. high-level language
Low-level languages: machine language and assembly language
High-level language: various application-oriented programming languages , such as C JAVA Python, are closer to natural language and improve programming efficiency
A high-level language or assembly language is called a source program, and the source program cannot be directly executed on the computer
• Interpreter (Interpreter)
When translating the source program, no independent object program is generated
The interpreter and the source program need to participate in the running process of the program
• Compiler (compiler program)
When translating, translate the source program into an independently saved target program
An object program that is equivalent to the source program when running on the machine
Neither the compiler nor the source program participate in the running process of the target program
• Data components of programming languages
Identifier: A token consisting of numbers, letters, and underscores
Constants and variables: Distinguish according to whether the value of the data can be changed when the program is running. Constants are stored in the pool and do not have their own storage unit
Global and local quantities: divided according to the scope of data in the program code
控制结构:顺序结构、循环结构、选择结构
程序中的数据具有类型的作用: 1. 便于为数据分配存储单元; 2. 便于对参与表达式计算的数据对象进行检查; 3. 规定数据对象的取
11第 1 章 计算机系统知识
值范围以及能够进行的运算
表达式的左结合:由左向右执行例如: a && b
表达式的右结合:由右向左执行例如: x = y = z
• 传值调用和传址调用
函数定义:函数首部(返回值类型 函数名 ( 形参 ) ) + 函数体 ({})
传值调用:将实参的值传给形参,实参可以是常量、变量、表达式,
不可以实现实参形参之间双向传递数据的效果
传址调用:将实参的地址传给形参,实参必须有地址,实参不能是常 量(值)或者表达式,可以实现实参形参之间双向传递数据的效果, 即改形参的值,实参的值也同时改掉了
• 编译、解释程序翻译阶段
编译方式各个阶段:词法分析 – 语法分析 – 语义分析 – 中间代 码生成(可省略) – 代码优化(可省略) – 目标代码生成
解释方式各个阶段:词法分析 – 语法分析 – 语义分析
编译器与解释器都不可省略且变换顺序的阶段是 词法分析、语法分析、 语义分析
The intermediate code generation and code optimization stages of the compiler are not necessary and can be omitted
• The role of the symbol table
Continuously collect, record and use the type and characteristic information of some symbols in the source program , and store them in the symbol table
Record the necessary information of each character in the source program to assist the semantic correctness check and code
generate
12 Chapter 1 Computer System Knowledge
• Lexical analysis
Treat the source program as a multi-line string , scan character by character from top to bottom, from left to right, and identify word symbols (such as keywords, identifiers, operators, etc.)
Words analyzed by lexical analysis are often output in the form of two-tuples, that is, the word category and the value of the word
Lexical analysis is based on the lexical rules of the language
The input of lexical analysis is the source program , and the output is a stream of tokens
The main function of lexical analysis: to analyze whether the characters that make up the program and the symbols formed by the characters according to the construction rules conform to the regulations of the program language
• syntax analysis
On the basis of lexical analysis, word symbol sequences are decomposed into various grammatical units according to the grammatical rules of the language , such as 'expression', 'statement', 'program', etc.
If there is no grammatical error, a grammatical tree will be constructed after grammatical analysis , otherwise an error will be pointed out and diagnostic information will be given
The input of the syntax analysis is the token stream generated by the lexical analysis , and the output is the syntax tree
The main function of grammatical analysis is to analyze the legality of the structure of each statement and find all grammatical errors in the program
• Semantic analysis
Used to check whether the source program contains static semantic errors , mainly used for type analysis and inspection
The input of semantic analysis is the syntax tree generated in the syntax analysis stage
In the semantic analysis stage, not all semantic errors can be found , only static semantic errors can be analyzed , and dynamic semantic errors can only be found at runtime
Dynamic semantic errors often have infinite loops
13 Chapter 1 Computer System Knowledge
• Object code generation
The task of this stage is to turn the intermediate code into absolute instruction code , relocatable instruction code or assembly instruction code on a specific machine
The work of the target code generation stage is closely related to the specific machine
The allocation of registers is in the object code generation phase
• intermediate code generation
Generate intermediate code according to the output of semantic analysis , which is a simple notation system that can have many forms, and its characteristic is that it has nothing to do with the specific machine
Semantic rules of the language upon which semantic analysis and intermediate code generation are based
Common intermediate codes: suffix type, three-address code, ternary type, quaternary type, tree (graph) and other forms
Different high-level languages can be compiled into the same intermediate code
Intermediate code can be cross-platform
Intermediate code facilitates machine-independent optimization and portability of compiled programs
• Formal
Formal forms are tools for lexical analysis
It can roughly be compared to the basic rules of regularization in JS, mainly remember that the value range represented by * is [0, infinity], and then bring the option into it to see if the character does not conform to the rules
• Finite Automata
Finite automata is a tool for lexical analysis , which can correctly identify regular sets
State, divided into initial state and final state , a state can be either initial state or final state
The basis for successful recognition is: the road of the state machine runs smoothly and the end point after running is the final state
Deterministic finite automata: The transition state after character recognition is unique for each character
14 Chapter 1 Computer System Knowledge
of
Uncertain finite automata: For each character, the transition state after recognizing characters is not unique
The difference between a definite finite automaton and an uncertain finite automaton is that given a number or letter, it has only one way to run, that is definite , otherwise it is uncertain
empty string
• Context Free Grammar
It is widely used to represent the grammar rules of various programming languages – context-free grammar
Do the question: start from the start symbol, push to the terminal symbol in the option , try one option at a time
• 后缀式、中缀式
中缀式就是常见的表达式: 1*2
后缀式的符号放在后边: 12*
中缀式转后缀式:按照 () 、 * /、 + - 的优先级,一个表达式 一个表达式的转换成后缀式,同等优先级的从右向左转换
后缀式转中缀式:使用栈的方式(先进后出、后进先出)
语法树中缀遍历 -> 生成中缀式:左根右
语法树后缀遍历 -> 生成后缀式:左右根
后缀式又称逆波兰式
• 其他
反编译通常不能将可执行文件还原成高级语言源代码,只能转换成功 能等价的汇编程序
动态语言指的是程序运行时可以改变其结构
指针变量:变量是内存单元的抽象,用于在程序中保持数据,当变量
15第 1 章 计算机系统知识
存储的时内存单元地址时,称为指针变量
链表中的节点控件需要程序员申请和释放,数据控件应采用堆存储分 配策略
可视化程序设计特点:
基于面向对象思想,引入控件概念和事件驱动
研发过程遵循,先界面绘制,再基于事件编写程序代码
Designers do not need to write or write a small amount of code
During the compilation process, the logical address is used to allocate the storage unit for the variable , and it is mapped to the physical address when the program is running.
The storage space of global variables in C is in the static data area
Grammar-directed translation is a static semantic analysis method
Syntax analysis method:
Both recursive descent analysis and predictive analysis are top-down analysis methods
Shift-reduce analysis is a bottom-up analysis
16 Chapter 1 Computer System Knowledge
Chapter 3 Intellectual Property
• Copyright
Copyright is divided into personal rights and property rights
Personal rights are divided into: right of publication, right of authorship, right of modification, right to protect the integrity of works
Except for the right of publication, the time limit of other rights is permanent, and the time limit of the right of publication is life + 50 years after death
Territoriality of IP: Where IP is granted, it is valid only in the granting country and not protected in foreign countries
• Computer software copyright
Computer software copyright is protected by the "Copyright Law of the People's Republic of China" and "Computer Software Protection Regulations"
The subjects of computer software copyright are citizens
计算机软件著作权的客体的是指受保护的计算机程序(源程序和目标 程序)和相关文档(流程图、说明书、用户手册)
计算机软件著作人身权:发表权、开发者身份权(署名权、永久)
计算机软件著作权的保护期:自软件开发完成之日起,保护期为 50 年,保护期满,除了开发者身份权外,其他权利终止
《计算机软件保护条例》是国务院颁布的
侵权行为鉴别:未经著作权人的同意,发表、登记、署名、更改、翻 译、复制、出售、出租
• 职务作品
职务软件作品指公民在单位任职期间为执行本单位的工作所开发的计 算机软件作品
17第 1 章 计算机系统知识
公民在单位任职期间所开发的软件,著作权属于单位。如果开发软件 不是职务工作,那著作权就不是单位所有,但是如果用了单位设备, 则不能归个人享有
如果是职务软件作品,那开发者只有署名权
• 委托开发
委托开发的作品,著作权由委托方和受委托方订立的合同决定,无合 同约定的,著作权为受委托方所有
• 商业秘密
商业秘密的基本内容:经营秘密和技术秘密
商业秘密的构成条件:
具有未公开性,不为公众所知悉
具有实用性,能给权利人带来利益
具有保密性,即采取了保密措施
• 专利权
由书面形式申请的,一份申请一项发明
专利权保护期限 20 年(实用新型专利是 10 年)
专利权就申请之日起算,两人以上申请,现申请先获得,同一天申请,
由两人协商决定
• 商标权
自核准之日起, 10 年有效,届满前可以续,每次续 10 年
谁先注册,谁享有商标;同时注册,谁先使用,谁享有商标;同时注 册,都没有使用,则协商或者抓阄决定
18第 1 章 计算机系统知识
• 软件许可使用
独占许可使用:软件著作权人不能再给他人许可,软件著作权人也不 可使用
独家许可使用:软件著作权人不能再给他人许可,但是软件著作权人
可以使用
普通许可使用:软件著作权人可以再给他人许可,软件著作权人也可 以使用
• 软件著作权中的翻译权
Convert software from one programming language to another
19 Chapter 1 Knowledge of Computer Systems
Chapter 4 Database Knowledge
• Taxonomy of data models
Conceptual data model: a data model abstracted from the information world, independent and computer systems, generally expressed by the entity -relationship method (ER method) , used for modeling the information world
Structural data model (data model DBSM): directly oriented to the logical structure of the database , generally examining the relational model and the corresponding relational schema in the data model
• Conceptual data model common terminology
Entity: There are objectively differentiated transactions , such as a person, a unit, and an external system
Attribute: Used to describe the characteristics of an entity , such as the name of a person, the address of an organization
code | key: A property or set of properties that uniquely identifies an entity
Domain: the value range of the attribute
Contact: Correspondence between entities
• There are three types of connections between entities
One-to-one (one class corresponds to one monitor)
One-to-many (one class corresponds to many students)
Many-to-many (a teacher can correspond to multiple classes, and a class can also have multiple teachers)
• ER Diagram (Entity - Relationship)
Entity – Rectangular representation
Attributes – Ellipse Representation
Contact – indicated by a diamond
20 Chapter 1 Computer System Knowledge
Use undirected edge links, use 1:n, n:1, n:m to represent the type of connection
Structural data models are mainly divided into: hierarchical model, network model, relational model and object-oriented model
• Three-level schema and two-level mapping
External mode (user mode or sub-mode): corresponding to external views, interacting with users
Conceptual mode: corresponding to the basic table of the database
Internal mode (storage mode): the actual database storage file
Mode-internal mode image: realizes the conversion between conceptual mode and internal mode, and maintains the physical independence of data
External mode-schema image: realizes the conversion between external mode and conceptual mode, and maintains the logical independence of data
• Basic terms in the relational model
Relationship: A relationship is a two-dimensional table
Yuanzu: A row in the table is a Yuanzu, corresponding to a record value
Attribute: A column in the table is an attribute, the first row of the column is the attribute name, and the others are attribute values
Domain: the value range of the attribute
Relationship mode: description of the relationship, format: entity (attribute 1, attribute 2, ...attribute n)
Candidate key | Candidate key: an attribute or combination of attributes that can uniquely identify a tuple
Primary key | Primary key: There may be multiple candidate keys in a relationship , and one of them is selected as the primary key
Foreign key | Foreign key: An attribute or attribute group in a relationship is not the key of the relationship , but is the primary key in another relationship, it is called the foreign key of the relationship | foreign key
Full code: a candidate key that contains all the attributes in the relation , then the candidate key is a full code
Superkeys: attribute sets that contain candidate keys
21 Chapter 1 Computer System Knowledge
Main attributes: all the candidate codes are main attributes , and others are non-main attributes
• Relational Model Integrity Constraints
Entity integrity: the value of the primary key cannot be empty
Referential integrity: the value of the foreign key can be empty , but if there is a value, its value must be found in the corresponding table
User-Defined Integrity: User-defined specified constraints on a piece of data
• Relational algebra operations
∪ (union): R ∪ S, R, S all ancestors | records merged, the result after deleting duplicate records
- (bad): RS, delete records from R that exist in S
∩ (intersection): R ∩ S, the composition of records that exist simultaneously in R, S
× (Cartesian product): R × S, each record in R is concatenated with each record in S to form a new record
π (Projection): π 1,3 ® Extract the 1st and 3rd column attributes from the relationship R to form a new relationship (where 1, 3 column numbers can also be replaced by an attribute name)
6 (Select): 6 1=5 ® Select all records whose value in the first column is equal to the value in the fifth column from the relation R to form a new record
Link: Link is to select qualified rows in the Cartesian product of two relations
Equivalence link: The connection condition is that the values of the two columns are equal
Natural link ( |><| ): No need to write the link condition, the natural link will select the records with the same value corresponding to the attribute with the same name in the two tables , and the duplicate attribute column will be removed from the new relationship generated
Left outer link: R natural link S, based on the natural link, the result of concatenating the records lost in the natural link result in the left relation R with the natural link, and filling the right attribute with the null value NULl
22 Chapter 1 Computer System Knowledge
Right Outer Links: Opposite of Left Links
Full Outer Links: The results of left outer links and right outer links are superimposed
• Relational schema
R is the relationship name< attribute group, attribute domain, attribute-to-domain mapping, data dependency of attributes in the attribute group> usually D dom can be omitted, for example: R<{A, B, C, D}, {A ->B, A->C, C->D}> "->" can be understood as "derivation" or "decision"
Complete functional dependency: for example (student number, course) -> grades, but a single student number or course cannot directly deduce grades, which is a complete functional dependency
Partial functional dependency: for example (student number, course number) -> name, a single student number can also deduce the name, which is partial functional dependency
依赖传递: A->B , B->C , 则称 C 对 A 传递依赖
属性闭包计算(挑选主键),由推导关系选出可以完全推导出所有属 性集的某个属性或者某几个属性组合,例如: R<{A, B, C, D}, {A- >B, A->C, C->D}> 中 A 属性可以完全推导出 {A, B, C, D}, A 就是主键,有多个情况,则可以有多个候选键
冗余函数依赖:例如 {A->B, B->C, A->C, C->D} 中 A->C 即为冗 余,因为 A->B->C 有传递依赖,传递依赖优先(自己的理解)
• 范式 – 应试技巧
1NF -> 2NF: 消除非主属性对码的部分函数依赖
2NF -> 3NF: 消除非主属性对码的传递函数依赖
3NF -> BCNF: 消除主属性对码的部分函数依赖和传递函数依赖
BCNF -> 4NF: 消除非平凡且非函数依赖的多值依赖
结题技巧:
一般都满足 1NF,除非有类似 工资(基本工资,加班工资,实发 工资)这种可在细分的属性
23第 1 章 计算机系统知识
通过函数依赖集,找到 “码”,以及主属性与非主属性(能被推 导出的属性都不是主属性或者码)
Determine whether the non-key attribute has partial functional dependence on the code, that is, whether the non-key attribute can be derived from a part of the code (if it is a member in the code + non-key attribute A -> non-key attribute B, it is not considered a partial function Dependency) must be at least 2NF
See if there is any transitive functional dependency (similar to A->B, B->C; or in some cases of pseudo-transitive: A->B, BC->D, then there can be AC->D), and if it is met, it is at least 3NF,
Check whether the main attribute has partial functional dependence or transitive dependence on the candidate code , and if it matches, it is at least BCNF
See if there is a multi-valued dependency, and the left side of the multi-valued dependency is a code, such as A->B, A->C, and A is a code, then it conforms to the fourth normal form
• Determine whether it is a lossless connection
Directly link the decomposed relationship through the "natural link" to see if the obtained result is consistent with the original, and if it is inconsistent, it is lossy
• Judging whether it is "maintain functional dependency"
It depends on whether there are dependencies of the original functional dependency set in the two decomposed relationships , if not, it will not be maintained
• Database design steps
User needs analysis – data flow diagram ;
Conceptual Design – ER Diagram ;
Logic Design – Relational Schema
24 Chapter 1 Computer System Knowledge
• ER model
Entity: rectangle representation ;
Weak entity: The double-sided rectangle indicates that the existence of an entity must depend on another entity, and this type of entity is a weak entity
Connection: The diamond shape indicates that the type of connection is marked on the undirected edge 1:nn:1 n:m, and the connection of weak entities is a bilateral diamond shape
Attribute: ellipse representation ;
Simple attributes: attributes that cannot be subdivided ; composite attributes: can be subdivided into smaller parts
Multi-valued attribute: A double-sided ellipse indicates that a multi-valued attribute means that an attribute can correspond to a set of values.
Derived attribute: The dotted ellipse indicates the attribute that can be calculated from other attributes
• Types of conflicts between ER diagrams
Attribute conflict: attribute type, value range, data unit conflict
Naming conflicts: attributes with the same meaning are named differently in different ER diagrams
Structural conflict: the same object is abstracted as an entity in one ER diagram , and as an attribute in another ER diagram, or the same entity has different attributes in different ERs
• Relational model conversion
Convert the 1:1 relationship into a relationship mode : put the attributes corresponding to the relationship into any entity, and put the primary key of another entity into this entity
Convert the 1:n relationship into a relationship mode : put the attribute corresponding to the relationship into the entity corresponding to n, and put the primary key of another entity into this entity
Convert the n:n relationship to a relationship mode : treat the relationship as a new relationship alone, and combine the primary keys of other entities as the primary key of this new relationship
关系模式规范化的要求:至少满足 3NF
25第 1 章 计算机系统知识
• 事务管理的特性
原子性:事务是原子的,要么都做,要么都不做
一致性:事务执行的结果必须保证从一个一致性状态 变到另一个一致 性状态
隔离性:事务间互相隔离,多个事务并发时,任意事务的变更操作知 道其成功提交的整个过程对其他事务都是不可见的
持久性:一旦事务成功提交,即使数据库崩溃,其对数据库的更新操 作也永久有效
• 数据库备份方法
静态转储与动态转储,动态指在转储期间允许对数据库的存取修改操 作,静态则不允许
海量存储与增量存储,增量是指仅转储距离上次转储更新的数据
日志文件,把对数据库的每次操作写入日志文件,一旦发生故障,则 利用日志文件撤销事务,回退到事务前的数据状态
• 封锁
排它锁:就是加了这个锁那么其他的排它锁和共享锁都不能加了,直 到这个锁被释放了才可以加其他的排它锁和共享锁
共享锁:就是加了这个锁那么不能加排它锁,但是可以加共享锁,直 到这个锁被释放了才可以加排它锁
• 分布式数据库
分片透明:指不需要知道表具体是如何分块存储的
复制透明:采用复制技术的分布式方法时,用户不需要知道复制到了 那些节点,和如何复制的
位置透明:无需知道数据存放的物理位置
26第 1 章 计算机系统知识
逻辑透明:无需知道局部使用的时那种数据模型
共享性:数据存储在不同的节点数据共享
自治性:每个节点对本地数据独立管理
可用性:某一个场地故障,系统可以使用其他场地的副本而不至于整 个系统瘫痪
分布性:数据在不同场地存储
• 存储过程
存储过程是在大型数据库系统中,一组为完成特定功能的 SQL语句集
通过提供存储过程让第三方调用,将需要更新的数据传入存储过程, 从而避免了向第三方提供系统的表结构,保证了系统的数据安全
27第 1 章 计算机系统知识
第 5章 面向对象基础
• 面向对象基本概念
如何识别是面向对象:对象 + 分类 + 继承 + 通过消息的通 信
类:定义了一组大体上相似的对象,是一组对象的抽象
类分为三种
实体类:表示显示世界中的真实实体,如人物等
Interface class (boundary class): Provide users with a way to cooperate and interact with the system , such as QR codes, bar codes, etc.
Control class: used to control the flow of activities and act as a coordinator
Object: It is a basic runtime entity. An object has both attributes and behaviors (operations on data). An object can usually be composed of object names, attributes, and methods
Message: is a construct for communication between objects
Overloading: It is a series of methods with the same name but different parameter types or numbers at the same position
Encapsulation: It is an information concealment technology, the purpose is to separate the user and producer of the object , and to separate the definition and implementation of the object
Inheritance: It is a mechanism for sharing data and methods between parent classes and subclasses. A parent class can have multiple subclasses. These subclasses are special cases of the parent class . The parent class describes the public properties and methods of the subclass
A subclass has a parent class called single inheritance , a subclass has multiple parent classes called multiple inheritance , multiple inheritance may lead to ambiguous members in the subclass
Override: In an inheritance relationship, the subclass implements the method inherited from the parent class in a more specific way , called overriding or overriding
Polymorphism: When receiving a message, the object needs to respond. Different objects can produce completely different results when receiving the same message . This phenomenon is called polymorphism.
The implementation of polymorphism is supported by inheritance . Using the hierarchical relationship of inheritance, consumers with general functions
28 Chapter 1 Knowledge of Computer Systems
The information is placed at a high level , and different implementations are placed at a low level . The objects generated by these low levels can respond to messages differently.
There are four types of polymorphism:
General polymorphism: parametric polymorphism (the most widely used), including polymorphism (common example: subtyping)
Specific polymorphism: overload polymorphism (the same name has different meanings in different contexts), mandatory polymorphism
• Static binding and dynamic binding
Binding is the process of linking a method call with the class that calls the method
Static binding means that before the program runs, the compilation phase can determine who calls the method
Dynamic binding, binding according to specific object types at runtime, dynamic binding supports inheritance and polymorphism
• Object-oriented design principles
Single Responsibility Principle: A class has only one reason for its change , and a class only has one type of responsibility
Open-Closed Principle: Software entities should be extensible (open), but not modifiable
(closed)
Li-style substitution principle: the subclass must be able to completely replace the parent class
Dependency Inversion Principle: Abstractions should not depend on details, details should depend on abstractions ; high-level modules should not depend on low-level modules, both should depend on abstractions
The principle of interface separation: the interface belongs to the client , not to the class hierarchy in which it resides, and there should be no dependence on details at the level of abstraction
Common closure principle: If a change affects a package, it will affect all classes in the package , but will not affect other packages
Common Reuse Principle: If you reuse one class in a package, you must reuse all classes in the package
29 Chapter 1 Knowledge of Computer Systems
• Object-oriented analysis
The purpose of object-oriented analysis: to gain understanding of the application problem
The five activities of object-oriented analysis: identify objects - organize objects - describe the interaction between objects - determine the operation of objects - define the internal information of objects
Identification object: used to define the problem domain , with naturally existing nouns as an object
Defining the domain model is one of the key steps in object-oriented analysis. It creates a description of the object domain from the perspective of object classification, including defining concepts, attributes, and important associations . The results are organized with class diagrams
• Object Oriented Design
Based on the results of object-oriented analysis, transform the analysis results into design models and define the blueprint of system construction
Object-oriented design: obtain the solution to the corresponding problem, realize the system, pay attention to the details of technology and implementation level
Five activities of object-oriented design: identify classes and objects - define attributes - define services - identify relationships - identify packages
• Object-oriented programming
The essence is to choose an object-oriented language and use the programming concepts of objects , classes and so on.
• Object Oriented Testing
Four levels of object-oriented testing: algorithm layer - class layer - template layer - system layer
30 Chapter 1 Knowledge of Computer Systems
Chapter 6 UML _
• UML concepts
The UML vocabulary consists of three building blocks: transactions (abstracting the most representative components of a model), relations (joining transactions together), diagrams (aggregating related transactions)
• UML transactions
Structural affairs: static parts of the model , nouns, describing concepts or physical elements, including classes, interfaces, use cases, components, etc.
Behavioral transactions: the dynamic part of the model , verbs, describe behaviors across time and space, including interactions, state machines, activities, etc.
Grouping transaction: the organizational part of the model , the most important grouping transaction is the package, and the structural transaction, behavioral transaction or other grouping transactions can be placed in the package
Annotation transaction: the interpretation part of the model , which is used to describe, illustrate, and label. Annotation is the most important annotation transaction
• UML relationships
Dependency relationship: It is the semantic relationship between two transactions . A change in one transaction (independent transaction) will affect the semantics of another transaction (dependent transaction);
- Graphically realized by dotted line + arrow
- A (dependent transaction) ·······> B (independent transaction) A depends on B, and the change of B transaction will cause the semantic change of A transaction
Association relationship: It is a structural relationship that describes a set of chains. A chain is a link between objects . It is usually divided into an aggregation relationship and a composition relationship , and describes the association between the whole and the parts.
-One -way association: solid line + arrow A() —————> B(), it makes a class know the properties and methods of another class , A class depends on B object, and B as
31 Chapter 1 Computer System Knowledge
is a member variable of A, then there is an association relationship between A and B, and the association can be one-way or two-way (two-way association does not use arrows)
-Bidirectional association: solid line A ——————— B, the multiplicity of association is located above the solid line , indicating how many instances of class A can be associated with an instance of class B, and how many instances of class A can be associated with an instance of class B. Usually the multiplicity can be represented by (1 *) (1 1...*) , etc. For many-to-many bidirectional associations, the association relationship can generally be extracted into an "association class" - Aggregation relationship: solid line + empty diamond A (part) ———————<> B (the whole) The life cycle of the whole is not synchronized with the part, the whole disappears, and the part can still exist- combination relationship: solid line + solid diamond A (part) —————<+> (The solid rhombus is not easy to draw, use <+> instead) B (whole) The life cycle of the whole is synchronized with the part, the whole disappears, and the part also disappears
Generalization relationship: It is a relationship between "special" and "general" . The object of a special element (child) can replace the object of a general element (parent). The child element shares the structure and behavior of the parent element - graphically through Solid line + hollow triangle arrow - A (special, child) ————————|> B (general, parent)
Realization relationship: is a semantic relationship between classifiers, where one classifier specifies a contract guaranteed to be executed by another classifier
- 通常的使用情况: 1. 接口和实现他们的类或构件之间; 2. 用 例和实现他们的协作之间 - 图形上通过 虚线 + 空心三角箭头 - A( 类 ) ········|> B( 接口 )
• 类图(静态)
类图:展现了一组对象、接口、协作和他们之间的关系,类图中可以 包含注解和约束,也可以有包或子系统
类图对静态设计视图建模的三种方式
32第 1 章 计算机系统知识
对系统词汇建模
对简单的协作建模
对逻辑数据库模式建模
类图中类的组成
第一层 类名
第二层 属性名 ( 属性名 1: 属性的类型 )
第三层 方法名 (方法名 1(): 方法返回值类型),其中属性名 和方法名前可以有修饰符 + public 公有的; - private 私 有的; # protected 受保护的;
• 对象图(静态)
对象图:展现了某一时刻,一组对象及他们之间的关系
对象图中对象与类图中类的区别:对象分为两层,第一层是 对象名
( 对象名 : 类名 ) , 第二层 属性
• 用例图(静态)
用例图:展现了一组用例、参与者、以及他们之间的关系
参与者:参与者是与系统交互的外部实体,可以是使用者或者参 与系统交互的外部系统,基础设备等
用例:是从用户角度描述的系统行为,用例是一个类,代表了一 类功能,而不是该功能的某一个具体实现
包含关系:用例与用例之间的关系,图形上使用虚线 + 箭头 + 《 include 》 表 示 : A ( 基 本 用 例 ) ··· 《 include》 ···> B (被包含用例) , 箭头指向被包含的用 例, A 用例包含 B 用例,则 A 执行用例 B 一定也会被执行
扩展关系:用例与用例之间的关系,图形上使用虚线 + 箭头 + 《 extend 》 表 示 : A ( 扩 展 用 例 ) ··· 《 extend 》 ···> B (被扩展用例) , 箭头指向被扩展的
33第 1 章 计算机系统知识
用例,被扩展后的用例 B 在执行时可能会遇到特殊的情况或者可选的 情况,这个时候就可以用 扩展用例
包含关系与扩展关系区分:
区分包含关系:使用某个用例,必然会使用另外一个用例
区分扩展关系:当执行某个用例,不一定要去执行另外一个用例
• 序列图(动态)
序列图,描述了以时间顺序组织的对象之间的交互活动,序列图是对 一个用例进行详细过程的分解
图形上,参与交互的对象放在图的上方,沿水平方向排列,发起交互 的对象放左边;然后把对象间发送和接收的消息,按照时间顺序由上 到下排列
对象生命线:由对象起始的一条垂直向下的虚线,表示对象在一段时 间内存在
控制焦点:对象生命线之上的一段瘦高的矩形,表示对象执行一个动 作所经历的时间段
调用消息用实线 + 箭头表示,返回消息用 虚线 + 箭头表示
调用消息所要执行此消息方法的是箭头指向的对象
• 通信图(动态)
通信图也成协作图,强调收发消息对象的结构组织
通信图与序列图的不同,在于通信图有路径、通信图有顺序号,延同 一个链可以展示许多消息,每个消息都有唯一的顺序号
通信图与顺序图是同构的,可以相互转换
通信图展现了对象间的消息流及其顺序
34第 1 章 计算机系统知识
• 状态图(动态)
状态图展现了一个状态机,它由状态、转换、事件和活动组成
状态图对系统的动态方面建模
当对系统、类或用例的动态方面建模时。通常是对反应型对象建模
状态:任何可以被观察到的系统行为模式,一个状态代表系统的一种 行为模式
The state is divided into initial state (solid circle), final state (a layer of circle outside the solid circle) and intermediate state
The state in the state diagram is a rounded rectangle, the first layer is the state name , and the middle layer is
State variables (can be absent), the last layer is the activity table (can also be absent)
A line with an arrow between the states indicates a "transition (transition)" When an event on the arrow line occurs, the transition begins
A state diagram can only have one initial state, and can have multiple final states or no final state
Activity: It consists of "event name/action expression" and is located in the activity table of the state. There are three standard events as follows
entry: entry action, enter the state, execute immediately
do: internal activity, takes up a limited time, can interrupt work
exit: exit action, exit status, execute immediately
Event: The time that occurs at a specific moment, which is an abstraction of events that cause the system to act and transition from one state to another
Transitions consist of "event(guard condition)/action"
Transition consists of two states
event triggered conversion
Activities (actions) can be executed within a state or when a state transitions
guard condition is a boolean expression
The "event" occurs and the "monitoring condition is true" state transition occurs , and the "action" will not be executed until the state transition starts
Composite state: A set of state transitions surrounded by an action rectangle as one in another state diagram
35 Chapter 1 Knowledge of Computer Systems
state exists
All sub-states in the combined state are completed before going to other states outside the combined state
• Activity diagram (dynamic)
The activity diagram shows the flow of the system from one activity to another . It is very important to model the function of the system, emphasizing the control flow between objects
The activity diagram consists of start, end, activities composed of rounded rectangles, streams composed of solid lines + arrows , concurrent forks, concurrent confluences, branches, and guardian expressions on branches
Use activity diagrams to model workflows and operations
Activities that are directly connected after a concurrent fork can be executed concurrently
• Component diagram (static)
A component diagram, also known as a component diagram , shows the organization and dependencies between a set of components
Related to class diagrams , which typically map components to one or more classes, interfaces, or collaborations
Component diagrams have special marks , and the component diagrams are connected through supply interfaces (hollow circles) and required interfaces (arcs) . The supply interfaces provide corresponding method implementations , and the required interfaces call this method
• Deployment diagram
Deployment diagrams are used to model the physical aspects of the system
Three-dimensional graphics of the deployment diagram, "artifact" indicates the product
Deployment diagrams show the relationship between system software and hardware , used in the implementation phase
The relationship between deployment components is similar to package dependencies
• Summary
Static modeling: class diagram, object diagram, use case diagram
Dynamic modeling: sequence diagrams, communication diagrams, state diagrams, activity diagrams
Physical Modeling: Component Diagrams, Deployment Diagrams
36 Chapter 1 Knowledge of Computer Systems
Interaction diagrams: sequence diagrams (sequence diagrams, sequence diagrams), communication diagrams (collaboration diagrams)
37 Chapter 1 Knowledge of Computer Systems
Chapter 7 Design Patterns
• Creative Design Patterns
Concept: The creative design pattern abstracts the instantiation process and helps a system how to create, compose, and represent those objects
• Simple factory pattern
Define a factory class that can return instances of different classes according to the parameters passed in . These "different classes" inherit from the same parent class
The method of creating an instance in a factory class is usually a static method , so it is also called a static factory
Users do not need to know how product objects are created
• Factory method pattern ( Factory Method )
On the basis of a simple factory, define a factory interface for creating objects , and let the subclasses that implement this interface decide which class to instantiate
The factory method pattern defers the instantiation of a class to subclasses
applicability:
When a class does not know the class of the object it must create
When a class wants its subclasses to specify the objects it creates
• Abstract Factory pattern (Abstract Factory)
Intent: To provide an interface for creating a series of related or interdependent objects without specifying their concrete classes
It can be understood as: "factory method" creates an object , and "abstract factory" creates a series of objects
Applicability: (Drawing independent combination of multiple products, the joint display interface is not implemented)
38 Chapter 1 Knowledge of Computer Systems
When a system is to be created, composed and represented independently of its products
When a system is to be configured by one of several product families
When the design of a series of related product objects is to be emphasized for joint use
When providing a product class library, only want to show their interface and not the implementation
• Builder pattern (Builder)
Intent: To separate the construction of a complex object from its representation , so that the same construction process
Different representations can be created
Understanding: define an abstract generator; then define multiple concrete generator classes (encapsulating complex algorithms of objects) to inherit it; define a manager (object assembly) to operate the generator, so that the manager and different generators Combining classes to create different product representations
Applicability: (generate complex algorithms, different construction objects)
When complex algorithms for creating an object should be independent of the object's components and assembly
When the construction procedure must allow different representations of the constructed object
• Prototype mode (Prototype)
Intent: Use prototype instances to specify the type of object to create , and create new objects by copying these prototypes
Applicability: (prototypes are independently constituted, run-specific, different combination states)
when a system should be created, composed and represented independently of its products
When the class to instantiate is specified at runtime , such as dynamically loaded
When an instance of a class can only have one of several different combinations of states
• Singleton _
Intent: To ensure that a class has only one instance and provide a global access point to it
Applicability: (singleton has one instance)
39 Chapter 1 Knowledge of Computer Systems
When a class can have only one instance and clients access it from a well-known access point
• Structural Design Patterns
Concept: deals with how to combine classes and objects to obtain larger structures
• Adapter mode (Adapter)
Intent: Convert the interface of a class into another interface that the user wants , so that those classes that could not work together due to interface incompatibility can work together
Applicability: (adaptation interface does not meet)
Want to use an existing class, but the interface does not meet the requirements
• Bridge mode (Bridge)
Intent: To separate the abstraction from the implementation so that both can vary independently
Understanding: Split a class with multiple combinations into several interrelated classes through aggregation or combination relationships , each of which can change independently
Applicability: (Bridge binding to augment, does not affect does not compile class hierarchy)
Don't want a fixed binding between an abstraction and its implementation
Both the abstraction of a class and its implementation can be extended by generating methods for subclassing
Modifications to the implementation part of the abstraction should have no impact on the client and do not require recompilation
A class hierarchy with many classes to generate
• Composite _
Intent: To combine objects into a tree structure to represent a part -whole hierarchy, so that users can use a single object and a combined object in a consistent manner
Understanding: Understand the relationship between folders and files
Applicability: (combine part-whole, use compose object)
Want to represent the part of the object – the overall hierarchy
40 Chapter 1 Computer System Knowledge
It is hoped that the user ignores the difference between the combined object and the single object , and uses all objects in the combined object uniformly
• Decorator mode (Decorator)
Intent: Dynamically add some additional responsibilities to a class
Applicability: (decorated to add and remove responsibilities, not extendable)
Dynamically and transparently add responsibilities to individual objects without affecting other objects
Deal with responsibilities that can be revoked
When it is not possible to extend by generating subclasses
• Appearance mode (Facade)
Intent: Provide a consistent interface for a set of interfaces in the subsystem , define a high-level interface , and make the subsystem easier to use
Applicability: (simple interface to facade subsystem, depends on build hierarchy entry point)
When you want to provide a simple interface to a complex subsystem
There is a large dependency between the client program and the implementation part of the abstract class
When you need to build a hierarchical subsystem , use the facade pattern to define the entry point for each layer of the subsystem
• Flyweight _
Intent: Use sharing technology to effectively support a large number of fine-grained objects
Understanding: Use a flyweight factory to create and maintain instances, only one instance of a type is created , and subsequent creations directly return the created instance
Applicability: (flyweight has a lot of overhead, external state replaces multiple groups of objects)
An application uses a large number of objects
Due to the use of a large number of objects, a large storage overhead is caused
41 Chapter 1 Knowledge of Computer Systems
Most states of an object can become external states
Many groups of objects can be replaced by relatively few shared objects if the external state of the objects is removed
• Proxy mode (Proxy)
Understanding: Provide a proxy for other objects to control access to this object
Applicability: (proxy simple pointer)
When more general and complex object pointers are needed instead of simple pointers
• Behavioral Design Patterns
Concept: Involves algorithms and assignment of responsibilities between objects , describes communication patterns between them , uses inheritance mechanism to assign behavior among objects
• Chain of Responsibility
Intent: Make multiple objects have the opportunity to process the request , thereby avoiding the coupling relationship between the sender and receiver of the request , connect the objects to a chain , and pass the request along the chain until an object processes it
Applicability: (responsibility chain request is automatically determined, and the receiver is not specified dynamically)
There are multiple objects handling a request , which object handles the request is determined automatically at runtime
Want to submit a request to one of multiple objects without explicitly specifying the recipient
The set of objects that can handle a request should be dynamically specified
• Command mode (Command)
Intent: To encapsulate a request into an object , so that different requests can be used to
42 Chapter 1 Knowledge of Computer Systems
User parameterization , queuing requests, recording request logs, and supporting undoable operations
Applicability: (command parameterization, specified arrangement, cancel operation modification)
Abstract the action to be performed to parameterize an object
Specify, queue and execute requests at different times
支持取消操作、修改日志
• 解释器模式 (Interpreter)
意图:给定一个语言,定义它的文法的一种表示,并定义一个解释器, 解释语言中的句子
适用性:(解释抽象语法树)
当一个语言需要解释执行时,可将语言中的句子表现为一个抽象 语法树
• 迭代器模式 (Iterator)
意图:提供一种方法顺序的访问一个聚合对象中的各个元素,且不需 要暴露该对象的内部表示
适用性:(迭代聚合无需暴露。多种遍历统一接口)
访问一个聚合对象的内容而无需暴露它的内部表示
支持对聚合对象的多种遍历
为遍历不同的聚合结构提供一个统一的接口
• 中介模式 (Mediator)
意图:用一个中介对象来封装一系列的对象交互,中介者使得各个对 象不需要显示的相互引用,从而使其耦合松散,而且可以独立的改变 它们之间的交互
适用性:(中介复杂通信,依赖难以理解)
一组对象定义良好,但是以复杂的方式进行通信,产生的相互依
43 Chapter 1 Knowledge of Computer Systems
The dependency structure is confusing and difficult to understand
• Memento mode (Memento)
Intent: To capture the internal state of an object and save it outside the object without breaking encapsulation , so that the object can be restored to the original saved state later
Applicability: (memorandum state at a certain moment, breaking encapsulation)
The (partial) state of an object at a certain moment must be saved so that it can be restored later when needed
If an interface is used to allow other objects to directly obtain these states, it will expose the implementation details of the object and break the encapsulation of the object
• Observer pattern (Observer)
Intent: Define a one-to-many dependency relationship between objects . When an object changes, all objects that depend on it are notified and automatically updated
Applicability: (Observers change other objects, not coupled)
When the change of one object needs to change other objects at the same time , and it is not known how many objects need to be changed
When an object must inform other objects , but it cannot assume who the other objects are, that is, it does not want these objects to be tightly coupled
• State mode (State)
Intent: To allow an object to change its behavior when its internal state changes , the object appears to modify its class
Applicability: (state change behavior, multi-branch statement depends on state)
The behavior of an object is determined by its state , and its behavior must be changed according to the state at runtime
An operation contains a large multi-branch conditional statement , and these branches depend on the object
44 Chapter 1 Knowledge of Computer Systems
status.
• Strategy _
Intent: Define a series of algorithms , encapsulate them one by one, and make them interchangeable . This mode allows the algorithms to change independently of the customers who use them
Applicability: (strategies for multiple behaviors, algorithm variants to avoid exposure, multiple conditional statements)
Many related classes simply behave differently , and strategies provide a way to configure a class with one of many behaviors
Different variants of an algorithm need to be used
Clients using the algorithm should not know the data structure, avoid exposing
Multiple behaviors are defined in a class , and these behaviors appear in the form of multiple conditional statements in the operation of this class , and the relevant conditional branches are moved into their respective strategy classes to replace these conditional statements
• Template Method
Define the algorithm skeleton in an operation , and defer some steps to subclasses , so that a subclass can redefine some steps of the algorithm without changing the structure of the algorithm
Applicability: (Templates can be changed to subclasses to avoid code duplication and subclass extensions)
Implement the invariant parts of an algorithm once and leave the variable behavior to subclasses
Common behaviors in subclasses should be extracted and centralized into a common parent class to avoid code duplication
Controlling subclass extensions , template methods only allow extensions at specific points
• Visitor pattern (Visitor)
Intent: Represents an operation to be performed on elements in an object structure . Allows defining new operations on elements without changing their class
45 Chapter 1 Knowledge of Computer Systems
Applicability: (Visitors depend on concrete classes, classes of unrelated objects, define new operations)
An object structure contains many class objects, they have different interfaces, and the user wants to perform some operations on these objects that depend on their specific classes
Many different and unrelated operations need to be performed on objects in an object structure , and there are classes that do not want these operations to pollute these objects
The class defining the object rarely changes, but often new operations need to be defined on this structure
46 Chapter 1 Computer System Knowledge
Chapter 8 Information Security
• Firewall
防火墙建立在内外网络边界上的过滤封锁机制,认为内部网络是安全 可信赖的,外部网络是不安全和不可信任的
防火墙对通过受控干线的任何通信进行安全处理,例如:控制、审计、 报警、反应等
DMZ(屏蔽子网防火墙):位于内网与外网之间,通常作为隔离区, 在 这 里 可 以 放 置 一 些 公 用 服 务 器 , 例 如 web 服 务 器 、 Email 、 FTP 等
包过滤防火墙:通过一个包过滤器,根据数据的包头中各项信息来控 制站点、网络之间的访问性
包过滤防火墙对用户完全透明、访问速度快、低水平控制
包过滤防火墙处在网络层和数据链路层之间
每个 IP 字段都被检查:源地址、目的地址、协议、端口
缺点:不能防黑客攻击、不支持应用层协议、访问控制粒度粗糙、 不能处理新的安全威胁
应用代理网关防火墙:彻底隔绝内外网之间的直接通信,内外网之间 的互相访问需要经过应用层代理软件转发
优点:可以检查应用层、传输层、网络层的协议特征,对数据包 的检测能力较强
Disadvantages: Difficult to configure, slower processing speed
State inspection technology firewall: combines the advantages of both packet filtering firewall and application proxy gateway firewall, namely security and high speed
• virus
Computer virus characteristics: transmission, concealment, infectivity, latency, triggering, destruction
47 Chapter 1 Knowledge of Computer Systems
sex
Worm: worm ; Trojan: Trojan horse ; Backdoor: backdoor virus ; Macro: macro virus
Objects of macro virus infection: text documents, spreadsheets
Trojan software: Glacier
Trojan horse infection process: through software download and bundling, etc. , a Trojan virus server is established on the user host, and the Trojan virus server establishes a network connection with the Trojan virus client on the attacker’s host, so that the attacker can use the To steal or destroy user host data
Worms: Happy Hour, Incense Panda, Code Red, Love Bug, Stuxnet
• Cyber attacks
Denial of service attack (Dos attack): By continuously sending requests to the computer , the target server has no resources to receive other normal requests, so as to achieve the purpose of "making the computer or network unable to provide normal services"
Replay attack: The attacker sends a message that the target host has already received to achieve the attack purpose , mainly used in the identity authentication process and destroying the correctness of authentication ; it can be prevented by adding a timestamp in the message
Password intrusion attack: use some legitimate user accounts and passwords to log in to the target host , and then carry out attack activities
Trojan horse: After the user downloads the software containing the Trojan horse , the Trojan horse program will initiate a connection request to the hacker. After the connection is established, the hacker can carry out the attack
Port spoofing attack: Use port scanning to find system vulnerabilities and implement attacks
Network monitoring: The attacker can interface with all information transmitted on a unified physical channel on a certain network segment ,
Intercept account number and password
IP spoofing attack: Forge the source IP address , pretending to be another system or the identity of the sender
SQL injection attack: By injecting certain SQL query codes , obtaining database privileges, thereby stealing and modifying information
Intrusion detection technology: expert system, model detection, simple matching
48 Chapter 1 Knowledge of Computer Systems
• Internet Security
SSL (Secure Sockets Layer): Transport Layer Security Protocol port number 443
TLS (Transport Layer Security Protocol): It is also a Transport Layer Security Protocol, a subsequent version of SSL 3.0
SSH: A secure protocol for establishing connections between terminal devices and remote sites , based on a full protocol based on the application layer and transport layer
HTTPS: HTTP encrypted with SSL
MIME: e-mail extension related protocol , not secure
PGP: Mail protocol with asymmetric encryption via RSA
IPSec: Encrypts IP datagrams
ARP: Address Resolution into Physical Address Protocol
Telnet: an insecure remote login protocol
WEP: Limited Equivalent Nondisclosure Agreement
TFTP: Trivial File Transfer Protocol
PP2P: link encryption
RFB: Remote Login Graphical User Interface Protocol
IGMP: Internet Group Management Protocol
Five basic elements of information security: confidentiality, integrity, availability, controllability, and auditability
49 Chapter 1 Computer System Knowledge
Chapter 9 Computer Networks
• Network equipment
Interconnection devices at the physical layer: repeaters (Repeaters) and hubs (Hubs), where the hub can be regarded as a multi-port repeater
Data link layer interconnection equipment: bridge (Bridge) and switch (Switch), where the switch is a multi-port bridge
Network Layer Interconnection Devices: Routers
Application Layer Interconnect Devices: Gateways
The physical layer device cannot isolate the broadcast domain and the collision domain, the data link layer device can isolate the conflict domain but cannot isolate the broadcast domain, the network layer can isolate the broadcast domain and the collision domain
• Classification of TCP/IP protocol suite
Network layer protocol IP: a best-effort communication protocol, the transmitted data may be lost ICMP: Internet Control Information Protocol, using IP to transmit messages
Transport layer protocol: TCP UDP, all based on IP
application layer protocol
FTP: File upload protocol, the port for file upload is 20, and the control port is 21
SNMP: Network Management Protocol
Easy to remember: 1. Those with "IP" and "AP" in their name are network layer, 2. All application layer protocols with "T", except TFTP are based on TCP, others are based on UDP, without "T" Only POP3 is TCP, others are UDP
50 Chapter 1 Knowledge of Computer Systems
• Network layer protocol IP TCP UDP
The service provided by IP is connectionless (refers to sending data without determining that the target system is ready to receive), unreliable (the target system does not confirm the successfully received packet)
TCP Connection-oriented, reliable transmission control protocol, using three-way handshake to achieve reliability
Reliable transmission, connection management, error checking, retransmission, flow control (variable size sliding window protocol), port addressing,
UDP is a connectionless and unreliable transmission protocol, which can ensure the communication between application processes and help improve the efficiency of transmission
port addressing
• Email Service Agreement
SMTP: Simple Mail Transfer Protocol, used to send mail, port 25, based on TCP, can only transfer text and ASCII code files
SMTP communicates based on the C/S mode, that is, the client/server mode
MIME message attachment extension type
PEM private email
POP3 is the protocol used to receive mail, based on TCP, port number 110 is based on C/S mode communication
• Address Resolution ARP RARP
ARP: Address Resolution Protocol, which converts IP addresses into physical addresses (MAC address, unique for each network card)
RARP: Anti-Address Resolution Protocol, which converts physical addresses into IP addresses
Computers use ARP communication process PC1 to communicate with PC2
Query the ARP cache
If there is an IP address cache of PC2, use its corresponding physical address to send directly
51 Chapter 1 Knowledge of Computer Systems
data
If there is no cache, send the ARP request packet in the form of broadcast on the LAN
If a computer on the LAN has the same IP address, that computer will respond with an ARP reply containing the corresponding physical address
ARP stores the IP and the physical address of the reply in the cache
• DHCP
Dynamic host configuration protocol, centralized management and allocation of IP addresses, so that hosts in the network can obtain IP addresses, gateway addresses, DNS addresses, DHCP server addresses, etc.
• URL
Protocol name://hostname. domain name. domain name suffix. domain name category/directory/webpage file
• DNS domain name query order
Local hosts file》Local DNS cache》Local DNS server》Root domain name server
• The query order of the main domain name server after receiving the request
Local cache "local hosts file" local database "forward domain name server
• IP addresses and subnetting
The network in the Internet is divided into 5 categories A, B, C, D, E
Class A network, the network address has 8 bits (the first bit is 0), and the rest are host addresses
Class B network, the network address is 16 (the first two digits are 10), and the rest are host addresses
Class C network, the network address is 24 (the first two digits are 110), and the rest are host addresses
52 Chapter 1 Computer System Knowledge
Subnet mask: To identify whether the message is only stored in the network or forwarded to other places by routing. Use 1 to represent the network address and 0 to represent the host address. For example, the C-type mask is 11111111.11111111.11111111.00000000, which is 255.255.255.0
Subnetting:
Divide a network into multiple subnets: take part of the host number as the subnet number, and take as many digits as k to get 2^k subnets
Merge multiple networks into one large network: remove part of the network number and host number
In 222.125.80.128/26, /26 represents a 26-bit network address and a 32-26-bit host address
All 1s are broadcast addresses, all 0s are network addresses
• IPv6
IPv4 is 32 bits, IPv6 is 128 bits, and the address space is not exhausted
• Wireless Internet
The bluetooth has the smallest coverage and the shortest communication distance in the wireless network
• Windows commands
ipconfig: Display IP addresses, subnet masks, and default gateway values for all network adapters
ipconfig/release: release IP address
ipconfig/flushdns: clear dns cache, or flush dns
ipconfig/displaydns: display dns
ipconfig/registerdns: DNS client registers with the server manually
ipconfig/all: Display the complete TCP/IP configuration information of all network adapters, including whether the DHCP service is started
ipconfig/renew: DHCP client refresh, re-apply for IP
53 Chapter 1 Knowledge of Computer Systems
• Routing
Five Routing Types
Host route: subnet mask 255.255.255.255
direct network
remote network
Default route: destination network and netmask are both 0.0.0.0
persistent routing
When the server receives an IP packet, it first looks for the host route, then looks for the network route (directly connected network, remote network), and finally looks for the default route
If a router receives multiple routes for a destination, it compares the administrative distances of the routes and uses the one with the smallest administrative distance.
• Other
Network Availability: Percentage of Network Time Available to Users
Campus Subsystem: Communication system linking each building
DNS for load balancing: set up multiple host records for the same domain name, enable round robin, add host records for each web server
To enable two IPv6s to communicate through the existing IPv4 network, "tunneling technology" is required
To enable IPv6 to communicate with IPv4, a "translation technique" is required
The DNS server and the computer are not in the same subnet, which will not cause the computer network to be inaccessible, as long as the route can reach DNS, it can work normally
The main function of the core layer of the hierarchical LAN model: forwarding packets from one area to another at high speed
The default gateway must be in the same subnet as the current IP address
54 Chapter 1 Knowledge of Computer Systems
Chapter 10 Operating Systems
• Operating system status
A computer system consists of software and hardware, and those without software are called bare metal
Operating system status: computer hardware "operating system" system software "application software" user
All other software, such as compilers, assemblers, database management systems, etc., and a large number of application software are built on top of the operating system
Think of the operating system as the interface between the user and the computer
• Process management
A process is the basic unit of source allocation and independent operation
The focus of process management is to study the concurrency characteristics between processes, as well as the problems arising from mutual cooperation and resource competition between processes
• Precursor graph
Directed acyclic graph, composed of nodes and one-way edges, nodes represent the operation of each program segment, one-way edges represent the predecessor-predecessor relationship Pi (node, predecessor) ---> Pj (node, successor), Pi executes End Pj to execute
For the forward graph, if there are n arrows, n semaphores need to be set, and written to the graph in order from small to large. The direction of the arrow is the P operation, and the tail of the arrow is the V operation.
The main characteristics of program sequential execution: sequentiality, closure (exclusive resources), reproducibility
The main features of concurrent execution of programs: loss of program closure, no one-to-one correspondence between programs and machine execution activities, mutual constraints between concurrent programs
55 Chapter 1 Knowledge of Computer Systems
• Three-state model of process
In a multi-programming system, processes alternately run on the processor, usually in three states: ready to run, blocked
Running: When a process is running on the processor (CPU), it is in the running state
Ready: A process has obtained all resources except the processor (CPU). Once it gets the processor, it can run. This is the ready state
Blocking: waiting, sleeping, a process is waiting for some event (such as waiting for I/O to complete) to occur and stop running
• Synchronization and mutual exclusion
Synchronization is a matter of direct constraints between cooperating resource processes
Mutual exclusion is an indirect constraint problem between processes applying for critical resources
• Critical Section Management Principles
Enter when you are free, if there is no process in the critical area, you are allowed to enter, and can only run in the critical area for a limited time
If there is no space, wait, if there is a process in the critical section, other processes will have to wait
Limited waiting, the process waiting outside must be guaranteed to be accessible within a limited time
Give up the right to wait. When the process has CPU but no resources, it cannot enter its own critical section. It must release CPU resources immediately to avoid busy waiting.
• Semaphore mechanism
The physical meaning of the semaphore S, S>=0 indicates the available number of resources, and S<0 indicates the number of processes waiting for the resource
56 Chapter 1 Knowledge of Computer Systems
• PV operation is a common way to achieve synchronization and mutual exclusion
P means to apply for a resource (S = S-1, which can be understood as applying for one from the semaphore S to use. After the application, if S<0, the process will turn into a blocking state and insert into the blocking queue),
V means to release a resource (S = S+1, which can be understood as releasing a resource to a semaphore. After release, S<=0 will wake up a process from the blocking queue and insert it into the ready queue)
• PV operation realizes process mutual exclusion
Let the initial value of the semaphore mutex be 1, execute the P operation when entering the critical area, and execute the V operation when exiting the critical area. These two use PV to realize code mutual exclusion
• PV operation realizes process synchronization
Single buffer synchronization (only one product can be placed in the buffer): it is divided into producers and consumers, and two semaphores need to be set. The initial value of S1 is 1 to indicate the number of products that can be placed in the buffer, and the initial value of S2 is 0. Indicates the number of products that can be taken out of the buffer; P(S1) and V(S2) are required for each production product, and P(S2) and V(S1) are required for each consumption product
Multi-buffer synchronization (the buffer can hold multiple products): add a semaphore S on the basis of a single buffer, named a mutex semaphore, the initial value is 1, and mark the operable amount of the buffer (the buffer is a Mutually exclusive resources); every time a product is produced and consumed, a P(S1) P(S) V(S) V(S2) operation is added, and the PV operation of S is placed in the middle
• deadlock
Deadlock caused by improper allocation of similar resources: If the resource allocation strategy adopted is to allocate for each process in turn, it may result in that after several rounds of allocation, no process reaches the required number of resources. At this time, each process All are waiting for resource allocation, forming a deadlock;
The solution to the deadlock caused by improper allocation of similar resources: m is the total amount of resources, n is the number of processes, and k is the resources required by each process. Satisfying m >= n * (k-1) + 1 can avoid deadlock
57 Chapter 1 Knowledge of Computer Systems
• Process resource map
Pi represents the process, Ri represents the resource type; each Ri can have multiple resources; the arrow pointing to the process indicates the allocation of resources; the arrow pointing to the resource indicates the application resource;
Allocate resources first and then apply for resources. The process that does not satisfy the resources after the allocation application is "blocking"
Whether it can be reduced: depends on whether it is possible to release resources after a process is completed and allow subsequent processes to complete
Reducible is non-deadlock
• Deadlock avoidance
Deadlock handling strategy: ostrich strategy (ignore), prevention strategy, avoidance strategy, detection and removal of deadlock
Deadlock avoidance algorithm: banker's algorithm, that is, before each allocation of resources, it is detected whether the system is safe after allocation of resources (whether it is safe or not depends on whether the system can have a certain sequence after allocation of resources to complete all processes) , high resource utilization, but increased detection overhead
Banker’s Algorithm Calculation: 1. First calculate the number of resources still needed, 2. Then calculate the number of remaining resources
• threads
During the creation, cancellation, and switching of processes, the system will pay a large time and space overhead, so the system will not introduce too many processes, and the frequency of process switching will not be too high, so the introduction of threads
Thread is the basic unit of scheduling and allocation, process is the unit of independent allocation of resources, and thread is an entity in the process
Not visible between threads, but threads can share process resources
58 Chapter 1 Knowledge of Computer Systems
• Principle of locality
Time limitation: When a certain instruction of the program is executed, the instruction may be executed again in the near future, and if a storage unit is accessed, it may be accessed again in the near future
Space limitation: the program accesses a certain storage unit, and its nearby storage units may also be accessed in the near future
• Relevant question type "elimination" questions
in memory to be eliminated
Weed out unvisited ones first
unmodified
• Paging storage management
The address structure of pure paging storage management: n-bit page number + m-bit in-page address
The page size of the computer is 4k => it represents n-bit page address 2^n = 4 * 1024 => n=12
Page conversion table Logical address to physical address => Logical address is the address structure of pure paging storage management, composed of n-bit page number + m-bit in-page address => The composition of the page address remains unchanged, replace the page number with The corresponding physical block number in the "page conversion table" is enough
• Segment page storage management
Segment page storage management address structure: n-bit segment number + k-bit segment page number + m-bit in-page address
• Single buffer
The buffer can only have one "job", the buffer can be entered when it is empty, and the buffer has a job
59 Chapter 1 Knowledge of Computer Systems
can send when
I/O device—input (T)—>buffer—transfer (M)—>working area (processing C)
Time taken to calculate n job ticket buffers: (T+M)*n + C
• double buffer
There are two buffers, each of which can store a "job"
Time taken to calculate double buffer for n jobs: T*n + M + C
• Disk scheduling algorithm
First come, first served (FCFS): Start the disk drives in the order of requesting visitors, with a large average seek length
Shortest seek time first ( SSTF ): Let the one with the shortest distance from the current track position be executed first, regardless of the order of visitors
Scanning Algorithm or Elevator Scheduling Algorithm (SCAN): Starting from the current position of the magnetic head, along the moving direction of the magnetic head, select the nearest cylinder, if there is no requested cylinder in the moving direction of the magnetic head, reverse the direction and select the nearest cylinder
Cyclic Scanning Algorithm (CSCAN): Based on the scanning algorithm, after turning the direction, the nearest cylinder is no longer selected, but moved to the innermost end
• Rotation scheduling algorithm
The disk rotation will not stop. After the disk rotates one sector, it means that the record of one sector has been read. The disk will not stop within the processing time after the record is read.
If n is processed sequentially, the total time = (read time + sector round time)*(n-1) + read time of the first sector + processing time of the first sector
Optimizing processing: rearrange the sectors so that the position where the first sector stops after processing is at the start position of the sector where the second record is located, and the time spent = (reading time + processing time)*n
60 Chapter 1 Knowledge of Computer Systems
• Multi-level index structure
Direct address index: the index starts from 0, and an address entry points to a disk data block
Level-1 indirect address index: An address item points to a disk index block (also called a first-level index block), and a disk index block contains many address items, and the address item in the disk index block points to a disk data block
Second-level indirect address index: Compared with the first-level indirect address index, there is one more disk index block
• file directory
In order to realize access by name, the system sets a data structure for description and control for each file, including at least the file name and the physical address of the stored file. This structure is called the file data block FCB
The file directory is composed of file control blocks for file retrieval
The file control block contains three types of information
Basic information: file name, file physical address, file length, number of file blocks, etc.
Access Control Information: File Access Permissions
Usage Information: Created Date, Last Modified Date, Current Usage Information
A crash occurs when the directory file is modified, which has a great impact on the system
• Directory structure
Multi-level directory structure: an inverted rooted tree, also known as a tree directory structure
Full path name: from the root directory to the file name D:\
Absolute path: starting from the root directory and ending with /
Relative paths: start with the current directory and end with /
• bit view
The bit view uses binary to represent the usage of a physical block, 0 means free and 1 means used
61 Chapter 1 Knowledge of Computer Systems
The size of the bit view is determined by the size of the disk space (number of physical blocks). The bit view has strong description ability and is suitable for various physical structures
Assuming that the computer system has n bits, the 0th word of that bit view can correspond to the 0~n-1 physical block on the memory, the first word can correspond to the n~2n-1 physical block on the memory, and so on
• Other
Variable partition allocation scheme: process P has an upper-adjacent free area or a lower-adjacent free area, then after the process P is released, the free areas are merged into one
When the user enters an application system through the mouse or keyboard, the interrupt handler first obtains the input information of the keyboard or mouse
The real-time of the real-time operating system means that the computer can process external information at a fast enough speed and respond quickly within the time allowed by the controlled object.
Hierarchy of the I/O system: Hardware - "Interrupt Handler - "Device Driver - "Device Independent Program - "User Process
The I/O software hides the implementation details of I/O operations, which is convenient for users to use
In the disk scheduling management, the arm-shifting scheduling is performed first and then the rotation is performed. When accessing information on different cylinders, the arm-shifting scheduling is required first. When accessing the same track, only rotation is required.
62 Chapter 1 Knowledge of Computer Systems
Chapter 11 Structured Development
• Modularity
Modularization refers to decomposing a software to be developed into several small simple parts-modules, each module is independently developed and tested
• Module independent
Module independence means that each module completes a relatively independent specific sub-function, and the connection with other modules is simple. There are two criteria for measuring module independence: cohesion and coupling
• Coupling
Coupling is a measure of the relative independence (closeness of interconnection) between modules, the higher the degree of coupling, the weaker the independence of the modules
The seven types of coupling are ordered from low to high
No direct coupling: no direct relationship between two modules (no calls, no information passed)
Data coupling: there is a call relationship between two modules, and simple data values are passed (value transfer)
Mark coupling: there is a call relationship, and the data structure is passed
Control coupling: there is call concern, and the control variable is passed, which allows the callee to selectively execute a certain function
External coupling: Modules are connected through an environment outside the software
Common Couplings: Couplings between those modules that interact through a common data environment
Content coupling: when a module directly uses the internal data of another module, or passes it into another module through an abnormal entry
63 Chapter 1 Knowledge of Computer Systems
• cohesive
Cohesion is a measure of how closely the various elements in a module are combined with each other. The higher the degree of cohesion, the stronger the independence of the module.
Cohesive where types are sorted from low to high
Accidental cohesion (coincidental cohesion): There is no connection between elements within the module
Logical cohesion: refers to the execution of several logically similar functions in a module, and determines which one to execute through parameters
Time aggregation: a module formed by combining actions that need to be executed simultaneously at a specific time,
Process cohesion: a module completes multiple tasks, and these tasks must be performed according to the specified process
Communication cohesion: processing elements within a module all operate on the same data structure
Sequential cohesion: a single function, each processing element in the module is closely related, and needs to be executed sequentially
Functional cohesion: all the elements in the module work together to complete the same function, and one cannot be separated
• System structure design principles
Decomposition-coordinating principle: breaking down large problems into smaller ones
Top-down principle: grasp the main function of the system and decompose it hierarchically from top to bottom
Information hiding-abstract principle: the upper layer specifies what the lower layer does and coordinates between modules, but does not specify how to do it
Consistency principle: unified specification, unified standard, unified file mode
The principle of clarity: each module has clear functions and interfaces, eliminates multiple functions and useless interfaces, avoids pathological links, and eliminates interface complexity
High cohesion and low coupling
Moderate fan-in and fan-out: a module calling other modules is called fan-out, and being called by other modules is called fan-out
The module size is moderate: if it is too large, the decomposition is insufficient; if it is too small, the independence of the module may be reduced
64 Chapter 1 Knowledge of Computer Systems
The scope of the module should be within its control range
• System Documentation
System documentation is the "trace" of the system construction process, a guide for system maintainers, and a communication tool between developers and users
The role of system documentation among system developers, project managers, system maintainers, system evaluators, and users
User-system analyst: feasibility study report, overall planning report, system development contract, system program specification
System developers – project managers: system development plan, system development monthly report, system development summary report, project management documents
System testers – system developers: system solution specification, system development contract, system design specification, test plan document
System Developers – Users: User Manuals, Operating Guides
System developers – system maintainers: system design specification, system development summary report, technical manual
User-maintenance personnel: system operation report, maintenance modification suggestion
• Data Flow Diagram
basic graphic elements
External entity: rectangle, generally represented by Ei
Data storage: two horizontal lines or a rectangle with missing sides, generally represented by Di
Data flow: directed edge, starting point ———— > end point
Processing: rounded rectangle or circle, generally represented by Pi
The top-level data flow diagram describes the input and output of the system, and the 0-level data flow diagram is a subdivision of the top-level data flow diagram
External entities: people, things, external systems outside the current system
People: students, teachers, staff, supervisors
65 Chapter 1 Knowledge of Computer Systems
Objects: sensors, controllers, vehicles, procurement departments
External systems: payment system, vehicle transaction system, inventory management system
Data storage: store data, provide data
Store processed output data and provide processed input data
For example: xxx table, xxx file
Processing: process the input data to get the output data
A process has at least one input data stream and one output data stream
Only input and no output is called "black hole"
Only output without input is called "white hole"
Insufficient processed data to produce output "grey holes"
Data flow: the starting point or focus of data flow must be processing
• Balance between parent graph and child graph
The data flow in the parent graph must also be present in the sub-graph. In fact, it is to go to the sub-graph one by one according to the parent graph to see if there is a data flow in the parent graph but not in the sub-graph.
Tips for Finding Lost Data Streams
parent graph subgraph balance
Processing requires both input data and output data streams
Data conservation (go to the content of the title to find what is missing in the picture)
Data Modeling – ER Diagram; Behavioral Modeling – UML; Functional Modeling – Data Flow Diagram
• Data dictionary
The data dictionary describes each data flow, file, process, and data item that makes up the data flow diagram
The data dictionary has four categories of entries: data flow, data storage, basic processing, data item
Data items are the smallest elements that make up data streams and data storage, and external entities are no longer described in the dictionary
Common processing logic description methods: structured language, decision table, decision tree
66 Chapter 1 Knowledge of Computer Systems
• Structured development methodology
The general guiding ideology is top-down, layer-by-layer decomposition (from abstraction to concrete)
The basic principle is the decomposition and abstraction of functions
The earliest method proposed in software engineering, especially suitable for problems in the field of data processing
It is not suitable for solving large-scale and particularly complex projects, and it is difficult to adapt to changes in requirements
• Structured design
Architecture Design: Define the main structural elements and relationships of the software
Data design: determine the file system structure and database table structure
Interface design: describe the external interface used by the software, and the internal interface between various components
Process Design: Define the algorithms and internal data structures within each component
Interface Design Golden Principles
user manipulation control
Reduce user memory burden
Keep the interface consistent
Problems needing attention when constructing hierarchical data flow diagrams
appropriately named
data flow
One processing does not fit too much data flow
Break down as evenly as possible
67 Chapter 1 Knowledge of Computer Systems
Chapter 12 Software Engineering
• CMM ( Capability Maturity Model )
The CMM divides software process improvement into the following five levels
Initial level: disorganized, with no clearly defined steps
Repeatable level: Basic project management processes and practices are established with the necessary process discipline to repeat previous success on similar projects
Defined level: software process documented, standardized
Managed level: Detailed metrics for software process and product quality are developed
Optimal level: quality analysis is strengthened, continuous improvement through process quality feedback, new concepts, new technologies, etc.
• CMMI ( Capability Maturity Integration Model )
Staged model, similar in structure to CMM, focusing on the maturity of the organization
Initial, process unpredictable lack of control
managed, the process serves the project
defined, the process serves the organization
Quantitatively managed, the process is measured and controlled
Optimized, focused on process improvement
Continuous model, focusing on the capabilities of each process area
CL0 (incomplete): Indicates that one or more objectives of the process area have not been met
CL1 (Performed): Process area specific objectives accomplished, transforming identifiable input target products, producing identifiable output target products
CL2 (Managed): Managed process institutionalization, focusing on the ability to target individual process instances
CL3 (Defined level): Institutionalization of defined processes, focusing on organizational standardization and deployment of processes
68 Chapter 1 Knowledge of Computer Systems
CL4 (quantitative management level): process institutionalization with quantitative management
CL5 (optimized): Institutionalization of optimization process, continuous improvement optimization
• Waterfall model
The waterfall model is a model that defines each activity in the software life cycle as a number of stages linked in a linear order
Requirements analysis "design" coding "test" operation and maintenance, from front to back, a fixed sequence of mutual connection
The waterfall model assumes that a system requirement to be developed is complete, concise, consistent, and can be completed prior to design and implementation
Guide the development process with project phase review and document control
advantage:
Easy to understand and low management cost
Emphasis on early development planning, requirements research, and product testing
It is suitable for a system with clear development requirements, which is roughly fixed and will not be changed at will
shortcoming:
Customers must complete, correct, and clearly express their needs
Difficult to assess progress in the first two to three stages
Towards the end of the project, a lot of integration and testing work occurs
System capabilities cannot be demonstrated until the end of the project
Errors in requirements or design can only be found at the later stage of the project
Project risk control ability is weak
The V-model is a variation of the Waterfall model, with a focus on quality assurance activities and communication, breaking down basic questions and actually performing a series of tests
• Incremental model
Incorporating the basic components of the waterfall model and the iterative nature of prototype implementation, it assumes that requirements can be segmented into a series of increments, each of which can be developed separately
69 Chapter 1 Knowledge of Computer Systems
The first increment is often the core product, with customer usage and evaluation of each increment serving as new features and functionality for the next increment
Each increment releases an operational version
advantage:
Has all the benefits of the waterfall model
Very little cost time to first shippable version
There is little risk involved in developing small systems represented by increments
shortcoming:
If there is no plan for the user's change requirements, the resulting initial increment may cause instability in subsequent increments
Manage cost, schedule, complexity
• Evolutionary models
The evolutionary model is an iterative process model that enables software developers to gradually develop more complete software versions
Evolutionary models are particularly suitable for situations where there is a lack of accurate knowledge of software requirements,
Typical evolution models are prototype model and spiral model
The difference between the evolutionary model and the incremental model: incremental development of small functional modules each time, and evolutionary development of the entire product each time
• prototype model
The prototype model is more suitable for the situation where the user's needs are unclear and the needs change frequently. It is more appropriate when the system scale is not too large and not too complicated.
A prototype does not have to meet all the constraints of the target software, its purpose is to build a prototype quickly and at low cost, and quickly develop a tangible system framework
Steps: Communication "Quick Planning" Rapid Design Method Modeling "Building Prototypes" Deployment Delivery and Feedback – Continue the cycle steps after completion
Prototyping begins with communication and its purpose is to define the overall software and effectively capture user needs
70 Chapter 1 Knowledge of Computer Systems
Prototype mode is not suitable for large-scale system development
• spiral model
The spiral model combines the waterfall model and the evolutionary model, adding a risk analysis that both models ignore
Each spiral cycle roughly corresponds to the waterfall model
Each spiral cycle is divided into four steps
Make a plan: determine the software goals, develop an implementation plan, and clarify the constraints of project development
Risk Analysis: Identify Risks, Eliminate Risks
Implementation engineering: software development, verification phased products
User evaluation: Put forward correction suggestions and formulate the development plan for the next cycle
The spiral model is characterized by the addition of risk analysis, which is suitable for large-scale, high-risk, and demand-changing systems
Disadvantages: Too many iterations will increase development costs and delay submission time
• Fountain model
The fountain model is a model driven by user needs and driven by objects, suitable for object-oriented development methods
The fountain model overcomes the limitation that the waterfall model does not support software reuse and integrates with multiple development activities
Fountain model development was iterative and seamless
No gap means that there is no clear boundary between development activities (analysis, design, coding), allowing activities to intersect and iteratively proceed
advantage:
High software development efficiency, saving development time
shortcoming:
Overlapping development phases, requiring a large number of developers, and high management costs
71 Chapter 1 Knowledge of Computer Systems
Strict management documents are required, and auditing is difficult
• Unified Process Model ( UP )
is a use-case and risk-driven, architecture-centric, iterative and incremental development process
5 core workflows per iteration
Four technical stages and milestones of the unified process
Initial stage: focus on project start-up; Milestones: life cycle goals
Elaboration Phase: Requirements Analysis and Architecture Evolution; Milestone: Lifecycle Architecture
Construction phase: Focus on the construction of the system and generate an implementation model; Milestone: Initial test operation function
Transition phase: Focus on software delivery work, resulting in software increments; Milestone: Product release
• Agile development
The overall goal is to deliver valuable software as early as possible and continuously
Agile development enables users to add or change requirements later in the development cycle
Extreme Programming (XP)
XP is a lightweight (agile), efficient, low-risk, flexible, predictable, and scientific software development method
The Four Values of XP: Communication, Simplicity, Feedback, Courage
5 Principles: Rapid Feedback, Simplicity Assumptions, Incremental Revision, Championing Change, and Quality Work
12 Best Practices: Planning Game, Small Releases, Metaphors, Simple Design, Test First, Refactoring, Pair Programming, Collective Code Ownership, Continuous Integration, 40h Weekly, Live Clients, and Coding Standards
Crystal: Every different project requires a different set of strategies, conventions and methodologies
Parallel Solicitation Method (Scrum): Using an iterative approach, one sprint every 30 days, as needed
72 Chapter 1 Knowledge of Computer Systems
level to achieve product
Adaptive Software Development (ASD): 6 Fundamentals
have a mission as a guide
Characteristics are seen as key points of customer value
"Redo" is as important as "do"
Changes are not considered corrections, but "adjustments"
Lead times force consideration of critical requirements for each production release
Risk includes
Agile Unified Process (AUP): Adopt the principle of "continuous on the large, iterative on the small" to build
Each AUP performs the following activities: modeling, implementation, testing, deployment, configuration and project management, environment management
• Requirements analysis
Software requirements: Refers to the user's expectations of the target software in terms of functionality, behavior, performance, design constraints, etc.
Functional requirements: Consider what the system does and when
performance requirements
User or human factors: users understand the difficulty of using the system, the possibility of wrong operation
Environmental requirements: hardware or software environment, model, operating system, platform
Interface requirements: consider input and output from other systems, data format storage medium requirements
Documentation needs: Who is the document aimed at
Data requirements: receiving and sending data format
Resource usage requirements: computer resources required for operation, manpower required for maintenance
Security and confidentiality requirements: data isolation, system backup
Reliability requirements: isolate errors, restart and wait for errors
Software cost consumption and development progress
Other non-functional requirements
73 Chapter 1 Knowledge of Computer Systems
• Outline design
Design the overall structure of the software system: Divide the complex system into modules by function, and determine the function, calling relationship, interface, structure and quality of the module
Data structure and database design: database design (conceptual design ER, logical design, physical design)
Write high-level design documents: high-level design specification, database design specification, user manual, revised test plan
review
• Detailed design
Detailed algorithm design for each module
Design the data structure in the module
Physically design the database
Other designs
Write a detailed design specification
review
• System testing
The meaning of system testing: the process of executing a program in order to find errors. A successful test is the discovery of errors that have not been found so far.
The purpose of testing: to find potential errors and defects with the minimum manpower and time
Basic principles of testing:
Early and continuous testing throughout the entire development phase
Avoid testing by the original developer
When testing, there must be expected output results and compare them with the test results
Concerned about the consequences of unreasonable inputs or operations
Check not only if the program does what it should do, but also whether it does what it shouldn't
74 Chapter 1 Knowledge of Computer Systems
to do
Test strictly according to the plan to avoid randomness
Keep test cases and related documents properly
Subsequent tests are modified on the basis of previous tests
The test objective of system testing comes from the requirement analysis stage
• Unit testing
Unit test, also known as module test, starts to execute after the module is written and there are no compilation errors
Unit testing focuses on the processing logic and data structures inside the module
Unit tests check 5 characteristics of modules
Module interface: ensure that the data flow of the test module can be input and output normally; test: whether the formal parameters match the actual parameters, the use of global variables, I/O format, file processing, etc.
Local Data Structures: Test: Variable Definition and Use
Important Execution Path
error handling
Boundary conditions
unit testing process
Since the modules do not run independently, there is a call-to-call relationship between each module, so two kinds of modules need to be developed during testing
Driver module: equivalent to a main program, which receives the data of the test case, inputs the data to the tested module, and outputs the test result
Stub module (stub module): used to replace the submodule called by the test module, and detect the input of the test module
High cohesion simplifies unit testing
• Integration testing
Integration testing is to combine modules according to the system specification to test, aiming to discover and interface
75 Chapter 1 Knowledge of Computer Systems
related error
Possible problems after integration:
Data lost across modules
The functionality of one module has harmful effects on other modules
After the module is integrated, the expected function is not achieved
Problem with global data structure
Error accumulation after module combination
Top-down integration test: it belongs to the incremental method, the module integration sequence starts from the main control module, and gradually goes down along the control level, without writing the driver module, you need to write the stub module
Bottom-up integration test: the module integration sequence starts from the bottom atomic module, and gradually goes up along the control level. It is not necessary to write a stub module, but a driver module is required
• Regression testing
Software changes may cause problems with the original normal functions. At this time, regression testing is required to re-execute some subsets that have been tested to ensure that no undesired side effects are propagated.
Regression testing helps ensure that unintentional behavior or additional bugs are not introduced
• Smoke test
Integrate software components that have been converted to code into components
Design a series of tests to expose bugs that prevent the build from performing its function correctly
• Test method
Static testing: The program under test does not run on the machine, and the program is tested by manual detection and computer-aided static analysis
Dynamic testing: find errors by running the program, black box testing and white box testing can be used
Test case: It consists of test input data and expected output results. When designing a test case, it should include
76 Chapter 1 Knowledge of Computer Systems
Contains reasonable input conditions and unreasonable input conditions
• Black box testing
Regardless of the internal structure and characteristics of the software, treat the software as a black box and test the external characteristics of the software
Common black box testing techniques
Equivalence class division: Divide the program input and output into several equivalence classes, and then take a representative data from each equivalence class as a test case, and test valid equivalence classes and invalid equivalence classes at the same time
Boundary Value Analysis: Inputs are more prone to errors at the border than in the middle, should test borderline values and values just beyond the borderline
Incorrect guessing: guessing based on experience
Cause-and-Effect Diagram: Convert Decision Tables Through a Cause-and-Effect Diagram
• McCabe measure
Establishes a measure of program complexity by defining cyclomatic complexity, which is based on the number of loops in the program graph of a program module
The calculation formula is V(G) = m - n + 2 ; G represents the program graph, V(G) represents the cycle complexity of the program graph, m represents the number of directed arcs, n represents the number of nodes
It can also be found by finding how many closed regions k there are in the graph, then V(G) = k + 1
• White box testing
White box testing is also called structural testing, and test cases are designed according to the internal structure of the program
Common techniques for white box testing are: logic coverage, loop coverage, basic path testing
White box testing principles:
All independent paths in a program module are executed at least once
In the logical judgment, both "true" and "false" are executed at least once
77 Chapter 1 Knowledge of Computer Systems
Each cycle is executed once for the boundary conditions and for the general conditions
Test the validity of program internal data structures
logic coverage
Statement coverage: select enough test data so that each statement in the program under test is executed at least once (as long as each statement is guaranteed to be executed, so some judgment logic branches may be missed), statement coverage has a low degree of coverage of program execution logic, is very weak logical coverage
Decision coverage: design enough test cases, each decision expression of the test program obtains "true" and "false" values at least once, and each branch that takes "true" and "false" runs at least once, so also known as branch coverage
Condition coverage: Create a set of test cases, each possible value of each logical condition in each judgment statement is satisfied at least once
Judgment/Condition Coverage: It is necessary to meet the requirements of both judgment coverage and condition coverage
Condition coverage combination: Under the premise of judgment/condition coverage, all the different combinations of true and false of the judgment expression must be tested, for example, A>0 && b>0 will test four kinds of T&&T, F&&F, T&&F, F&&T
Path coverage: refers to the test case to cover all possible paths in the program
Path coverage: each path may have
How to cover all the paths: as long as you can walk all the paths once
If there is pseudocode, convert it into a flowchart first, and then calculate
• Operation and maintenance
Software maintenance is the last stage of the software life cycle, not part of the system development process
System maintainability evaluation indicators: understandability, testability, modifiability
Documentation is a determinant of software maintainability
Maintainability needs to be considered at the development stage, and at every stage thereafter
78 Chapter 1 Knowledge of Computer Systems
• Software Documentation
Writing high-quality documentation improves software development quality
Documentation is also a part of software products, and software without documentation cannot be called software
The preparation of software documentation occupies a prominent position in software development, and a considerable workload
High-quality documentation is of great significance to the benefits of software products
Overall the software documentation has to be fair
• Software maintenance content
Correctness maintenance: correcting problems not found during the system development and testing phase
Adaptive Maintenance: Modifications to adapt to changes in the industry environment and management needs
Improvement maintenance: changes made to expand functionality and improve performance
Preventive maintenance: proactive prevention to adapt to future software and hardware environment changes
• Software reliability, availability, maintainability formulas
Reliability: Probability of failure-free operation under given conditions for a given time interval MTTF/ (1+MTTF) MTTF: Mean Time Between Failures
Availability: The probability of the system operating correctly at a given time MTBF/(1+MTBF) MTBF: Mean time between failures
Maintainability: Given time, the probability of completing a maintenance activity using specified processes and resources 1/(1+MTTR) MTTR: Mean Time to Repair
• Communication path calculation
n programmers, no master programmer: (n-1)*n/2
n programmers, one master programmer: n-1
79 Chapter 1 Knowledge of Computer Systems
• Software project estimation
COCOMO estimation model: accurate and easy-to-use cost estimation model, divided into basic COCOMO, intermediate COCOMO and detailed COCOMO according to the degree of refinement
Basic COCOMO model: static univariate model
Intermediate COCOMO model: Static multivariate model, which divides the system model into two levels: system and component
Detailed COCOMO model: divide the software system model into three parts: system, subsystem and module
COCOMOII Estimation Model: Estimation Model of Hierarchical Structure, Divided into Three Stages
Application assembly model: used in the early stages of software development, using object point estimation
Early design stage model: used when the requirements have stabilized and the basic software architecture has been established, using function point estimation
Architectural Phase Model: used during software construction, using lines of code estimation
• Gannt chart (Gantt chart)
A simple horizontal bar chart that describes project tasks based on the calendar. The horizontal axis shows the calendar timeline. Each horizontal bar represents a task. The start and end points of the horizontal bar correspond to the calendar time, indicating the start and end of the task, and its length Represents the duration of the task, and multiple horizontal bars in the same time period are concurrent relationships
Advantages: clearly describe the start and end time of the task, the progress of the task, and the parallel relationship between tasks
Disadvantages: Can't clearly reflect the dependencies between tasks, can't reflect the key of the project, can't reflect the potential part of the plan
• PERT chart
A PERT graph is a directed graph
The arrows in the figure indicate "tasks", and the time on the arrows indicates the time required to complete the "tasks"
80 Chapter 1 Knowledge of Computer Systems
The nodes in the graph represent "events", which indicate the end of the "task" pointing to the current node. and started by the "task" pointed to by the current node
The task pointed to by the current node will start when all tasks flowing into this node have finished
The event itself does not consume time and resources, it only represents a point in time
The "event" node consists of three parts: event number, the earliest time when the event occurs, and the latest time
Start node: a node with no inflowing tasks, there can be multiple nodes, and the earliest moment of the start node is 0
End node: There can only be one node with no outgoing tasks, and the latest time of the end node is equal to the earliest time of itself
The earliest time is calculated from the start node to the end node, and the latest time is calculated from the end node to the start node
Earliest time: The earliest time at which the subsequent tasks starting from the event node can be started. When there are multiple incoming tasks, the calculated largest one will be selected
The calculation of the earliest time of the node: the required time of the inflowing task + the earliest time of the inflowing previous node
The latest time: the task starting from this node must start before this time, otherwise the project cannot be completed as scheduled. When there are multiple outgoing tasks, choose the smallest one after calculation
The latest time calculation of the node: the latest time of the node pointed to by the outflow task - the time required for the outflow task
Slack time: Under the premise of not affecting the construction period, how much room for maneuver is available to complete the task, which is hung under the task
Calculation of slack time: Among the two nodes of the link task, the latest moment of the node pointed by the arrow - the time consumption of the task - the earliest moment of the node at the tail of the arrow
Critical path: from the start node to the end node, the slack time is 0
Advantages of the PERT chart: It gives the start and end time of the task, and also gives the dependency relationship and critical path built by the task
Disadvantages of the PRET graph: it cannot reflect the parallel relationship of task keys
81 Chapter 1 Knowledge of Computer Systems
• Project Activity Diagram
Similar to a PERT diagram, where: the vertices represent milestones, the changes at link vertices represent activities, and the numbers on the edges represent the time required for activities
Except that the graphics and naming are different from the PERT chart, other calculations are basically similar
• Software configuration management
The main goals of software configuration management: change identification, change control, version control, ensuring correct implementation of changes, change reporting
Software configuration management content: version management, configuration support, change support, process support, team support, change reporting, audit support
Software Configuration Management Contents (Second Edition): Software Configuration Identification, Change Management, Version Control, System Establishment, Configuration Review, Configuration Status Reporting
The configuration database is divided into three categories: development library, controlled library, product library
• Risk management
Characteristics of Software Risk: Uncertainty (may or may not occur) and Loss (may or may not occur)
Classification of Software Risks
Project risk: delay project progress, increase project cost. Examples: Uncertainty about budget, schedule, personnel, resources, project complexity, size, and structure
Technical risk: quality and delivery time. For example: design, implementation, interface, maintenance, etc.
Business Risk: Threats to Software Viability
• Risk identification
Systematically point out the threats to the project plan (estimation, schedule, resource allocation, etc.). After identifying known and predictable risks, the project manager should avoid these risks and control them when necessary.
82 Chapter 1 Knowledge of Computer Systems
these risks
One way to identify risks is to create a "Risk Entry Checklist"
Risk item checklist format: List the relevant characteristics of each type, and finally give a set of risk factors and driving factors and probability of occurrence, risk factors include performance, cost, support, progress
• Risk prediction
Also known as risk estimation, it is evaluated through two aspects, 1. The probability of risk occurrence 2. The consequences of risk occurrence
Risk Prediction Activities:
Establish a scale or standard to reflect the possibility of risk occurrence
Describe the consequences of the risk
Estimate the impact of risks on projects and products
Label the accuracy of risk predictions to avoid misinterpretation
Risk Prediction Techniques: Building a Risk Table
Three factors affect the consequences of risk: the nature, scope, and timing of the risk
Overall risk exposure = probability of risk occurrence * cost brought to the project by risk occurrence
Risk assessment: A useful technique for risk assessment is to define risk reference levels
• Risk control
The purpose of risk control: to assist the project team to establish strategies to deal with risks
Risk avoidance: the best way to deal with risks is to actively avoid risks
Risk Monitoring: Project managers should monitor certain factors that can provide an indication of whether risks are becoming lower or higher
RMMM Plan: All risk analysis work is documented and used by the project manager as part of the overall project plan
risk mitigation is a problem avoidance activity and risk monitoring is a project tracking activity
83 Chapter 1 Knowledge of Computer Systems
Another task of risk monitoring is to find the "origin"
• Software quality
ISO/IEC 9126 software quality model: consists of three layers: quality characteristics "quality sub-characteristics" quality metrics
Functionality: those functions that satisfy stated or implied needs
Suitability: suitability for the relevant software attributes
Accuracy: being able to get correct results
Interoperability: the ability to interact with other systems
Compliance: Comply with relevant standards and regulations
Security: Avoid unauthorized access, and accidental access
Reliability: the ability of software to maintain a level of performance over time
Maturity: the frequency with which software faults cause failures
Fault Tolerance: Maintaining a specified level of performance in the event of software errors or violations of specified interfaces
Recoverability: the ability to recover after a failure
Ease of use: whether it is easy to use, and the cost of learning is paid for using it
Understandability:
Ease of learning
Ease of use
Efficiency: Software performance level and resource usage
Time Characteristics: Response Processing Time
Resource properties: Amount of resource used
maintainability
Ease of analysis: the cost of diagnosis
Ease of Change: Defect Correction Costs
Stability: the ability to avoid risks 4. Ease of testing
portability
84 Chapter 1 Knowledge of Computer Systems
Adaptability: time-consuming and costly transfer of software to different environments
Ease of installation: the cost of installing software in a specified environment
Consistency: The software capabilities are consistent after installation in different environments
Replaceability
• Software review (low probability test)
Design quality: the design specifications meet the user's requirements
Program Quality: The program is correctly executed according to the conditions stipulated in the specification
Design quality review content:
Evaluate whether the software specifications meet user requirements
Review reliability, that is, whether input exceptions can be avoided
Review the implementation of confidentiality measures
Review performance implementation
Review software for modifiability, extensibility, interchangeability, and portability
Review software testability
Review software reusability
Program quality review content: Review from the perspective of developers, directly related to development technology, focusing on the structure of the software itself
Functional structure
versatility of function
Module Hierarchy
Module structure: the correspondence between control flow structure, data flow structure, module structure and functional structure
Structure of the process
Interface with the operating environment: interface with hardware, interface with users
The central activity in completing a quality assessment is the technical review, which aims to uncover quality issues
85 Chapter 1 Knowledge of Computer Systems
• Software fault tolerance technology
Minimize the impact of unavoidable errors
Definition of Fault Tolerant Software
Software that has the ability to shield itself from errors
Software that recovers from an error state to a normal state to some extent
software that fails and still does what it is supposed to do
somewhat fault-tolerant
The general method of fault tolerance: the main means to achieve fault tolerance is redundancy, which refers to the redundant part of resources for realizing system functions
• Four types of redundancy technology
Structural redundancy: static redundancy, dynamic redundancy, hybrid redundancy
Information redundancy: a part of information added to detect or correct errors in information operation or transmission
Temporal Redundancy: Repeated execution of a program or instruction to first out the effects of transient errors
Redundant additional technology:
Additional technologies for shielding hardware errors: 1. Redundant storage of key programs and data. 2. Detection, voting, switching, reconstruction, error correction, recalculation
Additional technologies for shielding software errors: 1. Storage and recall of redundant backup programs. 2. Implement error checking and error recovery. 3. Firmware required to implement fault-tolerant software
• Software tools
software development tools
Requirements Analysis Tool
Design Tools
Coding and Debugging Tools
test tools
86 Chapter 1 Knowledge of Computer Systems
Software Maintenance Tool
version control tool
Document Analysis Tool
Development Repository Tool
reverse engineering tools
reengineering tool
• Other
High-quality documentation characteristics: pertinence, precision, clarity, completeness, flexibility, traceability
Fundamentals of Software Engineering: Methods, Tools, and Processes
Less complex software in the field of data processing is suitable for a structured development approach
Software debugging method:
Heuristics: Guess where the problem is, and get error clues through the output statement
Backtracking method: starting from the location where the problem was found, trace the code back along the control flow of the program
Binary search method: find the problem by narrowing the scope of the error
Induction method: collect correct and incorrect data, analyze the relationship between them, and propose hypothetical causes of errors
Deductive method: list all possible causes of errors, eliminate, try
87 Chapter 1 Knowledge of Computer Systems
Chapter 13 Data Structures and Algorithms
• Complexity
Big O notation: The number of repeated executions (frequency) of basic operations in the algorithm is used as the measure of the time point of the algorithm. Generally, it is only necessary to roughly calculate the order of magnitude
O(1) < O(log2 n) < O(n) < O(nlog2 n) < O(n^2) < O(n^3) < O(n!) < O(n^n)
Constant order < logarithmic order < linear order < linear logarithmic order < square order < cubic order < factorial order < nth order
Complexity calculation rules: the highest item is reserved for multi-addition; multi-multiplication is reserved; addition and multiplication are mixed, according to the calculation rule; the coefficient is converted to 1
Time complexity, related to the loop
Space complexity, see if there is a new space opened up, such as an array
• Progressive symbols
O(g(n)): Indicates the asymptotic upper bound, 10n^2+4n+2 = O(n^2) is true, because the complexity in the parentheses of the asymptotic upper bound is greater than or equal to the calculation result on the left side of the equation
Ω(g(n)): Indicates the asymptotic lower bound, 10n^2+4n+2 = O(n^3) is not true, because the complexity in the parentheses of the asymptotic lower bound is less than or equal to the calculation result on the left side of the equation
Θ(g(n)): Indicates an asymptotically compact bound, 10n^2+4n+2 = O(n^3) is not valid, because the complexity in the parentheses of the asymptotically compact bound is equal to the calculation result on the left side of the equation
• Time and space complexity of recursion
Time complexity of recursion = number of recursions * time complexity of each recursion
Recursive space complexity: If there is a variable declaration assignment in the recursion, it is equivalent to an array whose length is the number of recursions
88 Chapter 1 Knowledge of Computer Systems
Recursive main method:
If the title gives a recursive expression that looks like T(n) = aT(n/b) + f(n), then you can try the following method
For example, given a topic T(n) = 2T(n/2) + nlgn to find the complexity
Then convert according to the formula to get a=2; b=2; f(n) = nlgn;
If there is lg correlation in f(n), then apply this formula f(n) = Θ(n^(logb a)lgk n); Substitute the converted data, that is, nlgn = (n^(log2 2)lgk n) ; get k = 1; and then substitute into this formula T(n) = Θ(n^(logb a)lgk+1 n) to get T(n) complexity is nlg2n
If there is no lg in f(n), directly substitute into T(n) = Θ(n^(logb a))
• Linear table
Linear relationship: a data relationship with a single predecessor and successor, the elements are arranged one after the other
Linear table: the simplest and most common linear data structure, usually expressed as (a1,a2,…an)
Linear table features: 'the first element' and 'the last element' are unique and only one; except that the first element has only a successor, and the last element has only a predecessor, the rest of the elements have predecessors and successors
Linear table sequential storage: refers to using a group of continuous storage units to store the data in the linear table at one time, that is to say, the physical locations are adjacent
Advantages: Random access to elements in the table, high query efficiency
Disadvantages: Insertion and deletion need to move elements, deletion and insertion are inefficient, the table length is n, and new values inserted move n/2 on average; deleted values move on average (n-1)/2
The time complexity of inserting elements in the sequence table: the last insertion of the sequence table is O(1); the first insertion of the sequence table is O(n); the average complexity is O(n)
Time complexity of deleting elements in the sequence table: O(1) for deleting the last digit of the sequence table; O(n) for deleting the first digit of the sequence table; average complexity O(n)
Time complexity of finding elements: query directly according to the array subscript, so it is O(1)
89 Chapter 1 Knowledge of Computer Systems
• Linear table chain storage
Link nodes through pointers to store data elements, which are divided into data fields + pointer fields; the node addresses of data elements are not continuous, and the node space is only applied when needed
If the node has only one pointer field, it becomes a linear linked list or a singly linked list
Head node: no data is stored (!! can also store the length of the linked list), only the address of the first node of the linked list is stored
Head pointer, tail pointer: With the tail pointer, you can traverse and search directly from the tail. With the tail pointer, the time complexity will change
The time complexity of inserting elements in continuous storage: O(1) for first bit insertion; O(n) for last bit insertion; average complexity O(n)
The time complexity of deleting elements in continuous storage: first delete O(1); last delete O(n); average complexity O(n)
The time complexity of finding elements in continuous storage: the first search O(1); the last search O(n); the average complexity O(n)
Circular singly linked list: based on the singly linked list, the pointer of the tail node points to the head node, and the time complexity is consistent with that of the singly linked list
Double-linked list: each node pointer not only points to the subsequent node, but also points to the predecessor node, that is, a node knows the address of the previous node and the address of the next node
• stack
Definition of stack: A linear data structure that can only store and retrieve data by accessing one end of it
The modification of the stack is carried out according to the principle of first in first out and last in first out. One end of the insertion and deletion operation is called the top of the stack, the other end is called the bottom of the stack, and the one without elements is called an empty stack
Understanding: The stack can be imagined as a cup, first in first out, similar to the recursive execution process
Chained storage of the stack: A stack that uses a linked list as a storage structure, also known as a linked stack, does not need to set a head pointer. The head pointer of the linked list is the top pointer of the stack
90 Chapter 1 Knowledge of Computer Systems
• queue
Definition of queue: a first-in-first-out linear table that only allows insertion of values at one end of the table and deletion of elements at the other end
Sequential queue: For queues that use sequential storage, you need to set the queue head pointer and queue tail pointer
Circular queue: It can handle the overflow and out-of-bounds insertion value in the sequential queue, only need to change the queue head and queue tail pointer, avoiding the traversal caused by linear table interpolation
• Queue chain storage
Double-ended queue: entry and exit can be performed at both ends
Two stacks can simulate a queue, but two queues cannot simulate a stack
• skewers
A string is a special linear table whose data elements are characters, which is a limited sequence of characters, for example: 'abc'
Empty string: has length zero and contains no characters
Substring: A sequence of consecutive substrings of any length in the string. The substring of 'abc' can be 'ab' but not 'ac'
String comparison: when comparing two strings, the ASCII code value of the character is used as the basis
String pattern matching: can be understood as the desired effect in JS a.indexOf(b)
Complexity: main string length n, substring length m
Best case (the first digit is a successful match): complexity O(m) | O(1)
Worst request (compared to the last m digits): complexity O(n m) => (n-m+1) m
Average complexity: O(n+m)
• String pattern matching KMP algorithm
String prefix: a substring containing the first character but not the last character
String suffix: a substring containing the last character but not the first character
91 Chapter 1 Knowledge of Computer Systems
KMP: It can improve the pattern matching efficiency of strings, and the time complexity is: O(n+m)
The numerical calculation of KMP next: the next value of the i-th character = the length of the longest "string prefix === string suffix" in the string before the i-th character + 1; where next[1] = 0
• One-dimensional array
LOC: indicates the first address of the first element; L: indicates the size of each element
Calculate the address of an element i in the array: ai = LOC + i*L
• Two-dimensional array
The storage of the two-dimensional array will store the second row consecutively after the first row (column-first storage, then the second column will be stored consecutively after the first column storage)
LOC: indicates the first address of the first element; L: indicates the size of each element; N: the number of rows; M: the number of columns;
Calculate the address of the two-dimensional array i = LOC + ( how many elements before i ) * L
Row-first storage: LOC + (i*M + j) * L
Column-first storage: LOC + (j*N + i) * L
When N == M and i == j, the address by row or by column is the same, and the offset is also the same
• Symmetric matrix
Any element in the matrix has the characteristics of Ai,j = Aj,i
According to the symmetry of the main diagonal, it is divided into an upper triangular area and a lower triangular area
When storing, you only need to store the lower triangle + the main diagonal, and generally use a one-dimensional array to store
Stored by row: when i >=j, Ai,j = (i+1)i/2 + j + 1; when i < j, because
92 Chapter 1 Knowledge of Computer Systems
If the main diagonal is piled up, it can be changed to calculate Aj,i
• Tridiagonal matrix
Only the area immediately on both sides of the main diagonal has a value, and the other areas are 0
When storing, only the value of the middle area is stored, and the position of 0 is not stored, and it is stored in a one-dimensional array
Store by row: Ai,j = 2i+j+1
• sparse matrix
The matrix is very large, but there are very few non-zero elements stored
Compressed storage method: use triple sequence table to store [i, j, data] [row, column, value]
Another Compression Method: Cross Linked List
• tree
A very important non-linear structure, an element can have zero or more successor elements
A tree is a finite combination of n nodes. When n=0, it is called an empty tree, and there is only one root node.
Sibling nodes: nodes with the same parent
Degree of node: The number of subtrees of a node is counted as the degree of the node
Leaf nodes: terminal nodes, nodes with no children, nodes with degree 0
Internal nodes: branch nodes, nodes whose degree is not 0
Hierarchy of nodes: the root is the first level, the child is the second level, and so on
Tree height: The maximum number of layers of a tree is counted as the height of the tree or the depth of the tree
The degree of the tree: the maximum value of the degrees of all nodes in the tree
• Properties of the tree
Total number of nodes in the tree = sum of degrees of all nodes + 1
93 Chapter 1 Knowledge of Computer Systems
In a tree with degree m, there are at most m^(i-1) nodes on the i-th layer, and the most cases are that each layer has m nodes
A tree with a height of h and a degree of m has at most (m^h - 1)/(m-1) nodes. The most common case is that each layer has m nodes, and there are a total of h layers
The minimum height of a tree with n nodes and degree m is logm (n(m-1) + 1). To achieve the minimum height, each layer must have m nodes
• Binary tree
A finite set of n nodes, when n=0, it is an empty tree, or it is composed of a root node and two binary trees called left and right subtrees that do not want to intersect
Difference Between Tree and Binary Tree
The subtree in the binary tree is divided into the left subtree and the right subtree, even if there is only one subtree, the left and right must be distinguished
The maximum degree of a node in a binary tree is 2
• Properties of binary trees
There are at most 2^(i-1) nodes in the i-th layer of the binary tree, which is actually the formula of the tree. In China, the degree == 2 is substituted into
A binary tree with a height of h has at most 2^h - 1 nodes; in most cases, the number of nodes in each layer is the number of nodes in all previous layers + 1
For any binary tree, the number of leaf nodes with a degree of 0 = the number of nodes with a degree of 2 + 1; it is inferred from "the total number of nodes in the tree = the sum of the degrees of all nodes + 1"
The height of a complete binary tree with n nodes is (log2 n + 1) rounded down or (log2 (n+1)) rounded up
94 Chapter 1 Knowledge of Computer Systems
• full binary tree
A binary tree with a height of k, if there are 2^k -1 nodes, it is a full binary tree, which can be numbered from top to bottom and from left to right
• complete binary tree
Except for the last layer, all other layers are "full", and the nodes of the last layer are also placed in order from left to right; in this case, each node in the complete binary tree can correspond to the full binary tree of the same depth
• Cattelan number
How many kinds of binary trees with n nodes are there: (C2n n)/(n+1); where (Cn m) = n!/m!*(nm)!
• Sequential storage of binary trees
Use a set of consecutive storage units to store the nodes in the binary tree
Relationship between tree node and number i
Find the parent node: if i=1, it is the root node, and the root node has no parent node; if i>1, the parent node of the node is the rounded down integer of i/2
Find the left child node: 2i<=n, then the number of the left child node of the node is 2i, otherwise there is no left child node
Find the right child node: 2i+1<=n, then the number of the right child node of the node is 2i+1, otherwise there is no right child node
Sequential storage is more suitable for a complete binary tree, but for an ordinary binary tree, in order to maintain the relationship, there will be many "virtual nodes"
Single-branched tree, except for leaf nodes, the degree of other nodes is 1
95 Chapter 1 Knowledge of Computer Systems
• Binary tree chain storage
Binary linked list storage, each binary linked list node stores [the data element of the current node, the left child node pointer, the right child node pointer], if there is no corresponding child node, it will store NULL
The number of effective pointer fields in the binary linked list storage: that is, the number of effective associations in the tree structure, each child node has only one parent node (except the root node), so the effective number = total number of nodes - 1
Three-fork list: Add a pointer field pointing to the parent node on the basis of the binary linked list
• Binary tree traversal
Preorder traversal: traverse in order from root to left
Inorder traversal: traverse in order from left root to right
Post-order traversal: traverse in the order of the left and right roots
Hierarchical traversal: starting from the root node, each layer is accessed from left to right
• Restore the binary tree
A single traversal result cannot restore the tree
The first position of pre-order traversal and hierarchical traversal is the root node, and the last position of post-order traversal is also the root node, so the combination of in-order traversal and any other traversal can restore the binary tree
• Balanced binary tree
The difference between the left and right subtree heights of any node in a binary tree is no more than 1, and a complete binary tree must be a balanced binary tree
• Binary sort tree
binary check tree
Root node keyword: is the value of the root node
96 Chapter 1 Knowledge of Computer Systems
The key of the root node is greater than the keys of all nodes in the left subtree
The key of the root node is less than the keys of all nodes in the right subtree
The left and right subtrees are also binary sorted trees, recursively
The inorder traversal of a binary sort tree is an ordered sequence
Calculation problem: A keyword sequence will be given. The last element of the keyword sequence is the root node. If the number behind is larger than the root, it will be placed on the right, and if it is smaller than the root, it will be placed on the left. If the node is empty, it will be inserted directly. If it is not empty, it will be compared with it. Then insert to the lower layer, and other elements can be judged and inserted into the binary sorting tree in turn
The efficiency of binary sorting tree search is related to the number of search layers. The higher the number of layers, the worse the efficiency
• Optimal Binary Tree
Also known as the Huffman tree, it is a tree with the shortest weighted path length
Path: A path from one node of the tree to another
path length: number of branches on the path (several lines)
The path length of the tree: the sum of the path lengths from the root node to each leaf node, multiplied by the weight value represents the weighted path length of the tree
• Optimal binary tree construction
Question: Construct a set of weights (for example: {1,3,3,4} ) into a binary tree
Construction method:
Find the two smallest weights from front to back
The smaller of the two is used as the left subtree, and the larger one is used as the right subtree to construct a new binary tree. The weight of the root of this binary tree is equal to the addition of the two
Add the calculated root to the end of the weight set
Continue the above steps until there is only one left in the set
The ones with larger weights are closer to the root node, and the ones with smaller weights are farther away from the root node
The optimal binary tree only has nodes with degree 0 and nodes with degree 2
Total number of nodes = (number of weights * 2) - 1
97 Chapter 1 Knowledge of Computer Systems
• Huffman coding
Equal-length encoding: Compile a binary code of the same length for each character, for example, 26 characters in English, which requires 2^5, that is, a 5-digit binary string representation
Huffman coding is not equal-length coding
After the receiver divides the message into groups of 5 digits, the decoding is realized through correspondence
Question: Generally, a string of characters will be given and the weight of the characters will be explained
We draw the Huffman tree according to the weights, and replace the nodes with the corresponding characters
The connection between the root node and the left child node is 0, and the connection between the right child node and the right child node is 1, mark the connection between each node
The encoding of a character is composed of 0 and 1 on the path from the root node to the current character node
Huffman encoding compression ratio: that is, the compression of each character from equal-length encoding to Huffman encoding
Huffman coding is based on a greedy strategy
• Threaded binary tree
Ordinary binary tree, using the binary linked list as the storage structure, there will be a null pointer field in the linked list, use this null pointer field to store the predecessor and successor information of the node
• diagram
In the graph, there may be a relationship between any two nodes, and a node may have multiple predecessors or multiple successors
Graph, denoted G(V,E) V represents a non-empty finite set of vertices; E is a finite set of edges in the graph
Directed graph: each edge has a direction, then the vertex relationship uses v1 as the starting point and v2 as the end point
Undirected graph: each edge is undirected, then the vertex relationship is (v1,v2)
Complete Graph: Every vertex has an edge with every other vertex, then it is called a complete graph
98 Chapter 1 Knowledge of Computer Systems
Assuming that the undirected complete graph has n vertices, then the complete graph has a total of n(n-1)/2 edges
The total number of edges in a directed complete graph is n(n-1), because there are two edges between every two vertices
Degree of an undirected graph vertex: the number of edges associated with the vertex
Out-degree and in-degree of a directed graph: Out-degree – the number of edges pointing out from the vertex; in-degree – the number of edges pointing to the vertex; total degree = out-degree + in-degree
Total degree of graph = number of edges * 2
Path: It is through the combination of those edges to achieve from one top line to another vertex; the path length is the number of edges or arcs on the path
A path whose first vertex is the same as the last vertex is called a cycle or cycle
Simple path: On the path, except the starting point and the ending point can be the same, the rest of the vertices are not the same path
• connected graph
Connected graph: In an undirected graph, if there is at least one path between any two vertices, it is called a connected graph
For an undirected graph with n vertices, at least n-1 edges can be connected, and at most n(n-1)/2
Strongly connected graph: In a directed graph, any two vertices are connected by two paths back and forth, called a strongly connected graph
For a directed graph with n vertices, at least n edges can be connected, and at most n(n-1)
• Graph storage structure
Adjacency matrix notation: use a matrix to represent the relationship between the vertices in the graph. For a graph with n fixed points, its adjacency matrix is of order n. The value in the matrix is 1 for edges, and 0 for no edges.
The adjacency matrix of an undirected graph is symmetric, but not necessarily for a directed graph
99 Chapter 1 Knowledge of Computer Systems
The adjacency matrix of an undirected graph calculates the degree of a fixed point: the degree of a fixed point vi is the number of non-zero elements in the i-th row
The directed graph adjacency matrix calculates the degree of a fixed point: the out-degree of the fixed point vi – the number of non-zero elements in the i-th row; the in-degree: the number of non-zero elements in the i-th column
Adjacency linked list representation: build a singly linked list for each node in the graph, the specifics depend on the graph, so I won’t explain it in detail here
有向图的邻接链表,有几个指出来的表结点就有几条边;无向图的邻 接链表,有 n 指出来的表结点就有 n/2条边
稠密图和稀疏图,边多的就是稠密图,边少的就是稀疏图
邻接矩阵表示法适合稠密图,邻接链表适合稀疏图
网:边或弧带有权值的图,称为网;网的邻接矩阵中有边的会用权值 表示,没有边的用 oo 无穷表示
• 图的遍历
深度优先搜素:从一个顶点 A 按照出度向另一个顶点 B , B 在按 照出度向顶点 C, 这样先从路径的起始遍历到路径的末尾,然后在通 过回溯,换一个路径遍历
深度优先搜素的时间复杂度: n 表示顶点数, e 表示边数,邻接矩 阵存储的复杂度为 O(n^2) ;邻接链表的时间复杂度 O(n+e) ,用 栈的方式
广度优先搜索:先遍历一个顶点 A 的所有出度的节点,在遍历出度节 点的所有出度节点,以此类推,相当于一层层遍历
广度优先搜素的时间复杂度: n 表示顶点数, e 表示边数,邻接矩 阵存储的复杂度为 O(n^2) ;邻接链表的时间复杂度 O(n+e) ,用 队列的方式
• 拓扑排序
AOV 网:一种有向无环图
100第 1 章 计算机系统知识
AOV 网中 弧的尾部是前趋,弧指向的是后继,前趋对后继有制约关 系
拓扑排序:是 AOV 网中所有定点排出的线性序列,并且网中任意路 径 的前后顶点在这个线性序列中 vi 排在 vj 前
假设 AOV 图是一个工程的计划,那 AOV网的一个拓扑排序就是工程 顺利完成的可行方案
拓扑排序计算方式:
在 AOV 网中选择一个入度为 0 的顶点,且输出它
在网中删除该顶点及与该顶点相关的所有弧
重复上述两步直到不存在入度为 0 的顶点为止
例如:得到 614325 这个拓扑序列,那么对于 6 与 4 来 说,可能存在弧6->4;不可能存在弧 4->6;可能存在 6->4 的路径, 一定不存在 4->6 的路径
• 查找
Lookup table: a collection composed of elements of the same type, and there is a completely loose relationship between the elements of the collection
The static lookup table only performs the following two operations
Query whether a specific element is in the lookup table
Retrieve various properties of a specific element
The dynamic lookup table, in addition to the function of the static lookup table, also performs the following operations
Insert a data into the lookup table
Delete a data from the lookup table
The keyword is the value of a data item of the data element, which can be used to mark the data element
Static lookup tables include: sequential search, binary search, block search
Dynamic lookup tables are: binary sorting tree, balanced binary tree, B_ tree, hash table
The basic operation of lookup: compare the key of the record with the given value
Sequential search: search from left to right, does not need to be ordered, suitable for sequential storage and chain storage, the average search length is (n+1)/2
101 Chapter 1 Knowledge of Computer Systems
• binary search
Also known as half search, it is to compare the given value with the middle value of the lookup table, find the middle value (comparison value) with the subscript, round down the middle value to a decimal, and discard the middle value after the comparison for the next round of comparison
For example, if there are 10 values, take (1+10)/2 => 5 in sequence; (6+10) => 8; (9+10)/2 => 9; (10+10)/2 => 10;
Sequential storage is required, and it must be stored in an orderly manner
When the binary search is successful, the number of comparisons of the given value is at most [log2 n) + 1 and rounded down
The average search length of a binary search is: (log2 (n+1)) - 1
• hash table
Hash table: get the storage address of the record by calculating a function (hash function) with the recorded keyword as an argument
According to the set hash function H(key) and the method of dealing with conflicts, a set of keywords is mapped to a limited set of continuous addresses
For a hash function, when two different keywords have the same address after being searched by the hash function, it is called a conflict, and keywords with the same hash function value are called synonyms
In general, conflicts can be reduced as much as possible, but cannot be avoided. To reduce conflicts, it is necessary to map keywords to each storage unit in the storage area as evenly as possible.
For the hash table, there are two main considerations, one is how to construct a hash function, and the other is how to resolve conflicts
Hash function construction method:
When constructing a hash function, the keywords are generally calculated, and all the components of the keywords work as much as possible
Remainder method after division: H(key) = key % m = address; Find the remainder of the keyword key as the address, where m is a prime number close to n but not greater than n, n is the length of the hash table, and the address generally starts from 0
102 Chapter 1 Knowledge of Computer Systems
method of conflict resolution
To resolve the conflict is to find another address to store the conflicting keyword when there is a conflict
Linear detection method: if H(key) conflicts, then follow Hi = (H(key) + i) % m; calculate another address, where i=1,2,3,... means to calculate again if there is still a conflict Then, increase the code and calculate again until there is no conflict
Secondary detection method: if H(key) conflicts, then follow Hi = (H(key) + di) % m; calculate another address, where i=1^2,-1^2,2^2,- 2^2,... means that if there is still a conflict in the calculation again, try in the order of 1^2,-1^2,2^2,-2^2,... until there is no conflict, and linear Compared with the detection method, it is to test back and forth around the H(key) address
Filling factor: a = the length of the record tree/hash table loaded in the table, a represents the fullness of the hash table, the larger a is, the greater the probability of collision
• heap
A sequence {k1,k2,...kn} composed of n key codes, which satisfies the following relationship, is called a heap
Small top heap: ki <= k2i && ki <= k(2i+1) means a binary tree whose root node is smaller than its child nodes, the result of hierarchical traversal
Large top heap: ki >= k2i && ki >= k(2i+1) means a binary tree whose root node is larger than its child nodes, the result of hierarchical traversal
Resize a sequence into a large top heap or a small top heap
First restore it to a binary tree according to the result of hierarchical traversal
Judging from the leaf node upwards, for example, to restore a small top heap, it is necessary to make the value of the local root node smaller than the child node, and exchange it if it does not match
• Sort
By sorting, the keywords satisfy the ascending or descending relationship
Stable: In the original sequence, the keywords of Ri and Rj are the same, and Ri is before Rj. After sorting algorithm, it can keep Ri before Rj, that is stable.
103 Chapter 1 Knowledge of Computer Systems
otherwise unstable
Homing: The final sorting position can be determined during sorting. For example, Ri should be placed at position 3 after sorting. If it is not placed at position 3 at the beginning of calculation, it will not be homing
• Direct insertion sort
Start with R1 in the new sequence, traverse the original sequence R, and compare each Ri in turn with the keywords starting from the end in the new sequence. The larger one is directly inserted to the back, and the smaller one continues to judge until it is inserted
Direct insertion sort, stable, non-homing, average complexity O(n^2), maximum complexity O(n^2), minimum complexity O(n), space complexity O(1)
Applicable to the case of basic order
• Hill sort
It is an improvement of the direct insertion sort algorithm
method:
Set a sequence of increments, eg: 5 , 3 , 1
Cut the original sequence into multiple segments in sequence according to the incremental sequence. For example, when the increment is 5, the elements at positions 0, 5, and 10 are grouped into one group, elements at positions 1, 6, and 11 are grouped, and so on
The elements between each group are directly inserted and sorted, and the sorted values are swapped and inserted back into the original sequence
Follow the sequence of increments until the increment is 1
Hill sort: unstable, non-homing, average complexity O(n^1.3), maximum complexity O(n^2), minimum complexity O(n), space complexity O(1)
• counting sort
Suitable for sorting with a small amount of data
104 Chapter 1 Knowledge of Computer Systems
It is to count the numbers to be sorted and count how many of each type of data there are, and then add them to the sequence in turn
• Simple selection sort
Starting from the first position, compare the following keywords with the current keyword in turn, and select the smallest one to replace the current position until the last position
Simple selection sort: unstable, homing, average complexity O(n^2), maximum complexity O(n^2), minimum complexity O(n^2), space complexity O(1)
• Heap sort
First, the original sequence is traversed according to the level to form a binary tree; then a large root heap or a small root heap is constructed; the root of the heap is exchanged with the end of the heap, and then the large root heap or the small root heap is constructed again; the operation is repeated until The entire heap becomes a new sequence
Heap sort: unstable, homing, average complexity O(nlog2 n), maximum complexity O(nlog2 n), minimum complexity O(nlog2 n), space complexity O(1)
• Bubble sort
Starting from the first bit of the original sequence, compare ki with k(i+1) respectively. If ki is larger, exchange the order, so that until the last bit is swapped, it can be guaranteed that the last bit is the largest. Repeat
Bubble sort: stable, homing, average complexity O(n^2), maximum complexity O(n^2), minimum complexity O(n), space complexity O(1)
• Quick Sort
Based on the idea of divide and conquer, the records to be sorted are divided into two parts (the first half area and the second half area) by one-pass sorting. The keywords in the first half area are not greater than the keywords in the second half area, and then load the two parts Perform quick sort, followed by recursive operation
105 Chapter 1 Knowledge of Computer Systems
The quick sort of the basic ordered sequence has the lowest efficiency and the largest time complexity O(n^2)
Quick sort: unstable, homing, average complexity O(nlog2 n), maximum complexity O(n^2), minimum complexity O(nlog2 n), space complexity O(log2 n)
• merge sort
Based on the idea of divide and conquer, a sequence is divided into two, and each half is divided into two, so recursively until each item is 1, from the bottom up, compare between every two items, and compare between every four items For comparison, recurse upwards until the outermost layer;
Merge sort: stable, homing, average complexity O(nlog2 n), maximum complexity O(nlog2 n), minimum complexity O(nlog2 n), space complexity O(n)
• Backtracking
N Queens Problem: Given an N * N chessboard, there are N queens to be placed on the board, and any two of the queens are not in the same row, column, or diagonal
判断是否同一列: Qi 列 == Qj列;判断是否在一个对角线: |Qi 行 - Qj 行 | == |Qi 列 - Qj 列 |
代码求解 N 皇后问题
非递归(循环、迭代)
递归
深度优先
主要考的是下午 C 语言计算题,放弃。。。
• 分治法
用递归来实现的
递归是指自己调用自己,或者间接的自己调用自己,有两个基本要素: 1. 需要有边界条件(递归出口), 2. 递归模式(递归体)
分治法的基本思想:
106第 1 章 计算机系统知识
规模越小,解题所需时间越少,越容易处理
将一个难以解决的大问题,分解成一些规模较小的相同问题,以 便各个击破,分而治之
分治算法三个步骤:
3. 分解:原问题分解为子问题
4. 求解:递归求解各个子问题
5. 合并:子问题的解合并成原问题的解
• 动态规划法
与分治法类似,基本思想都是将问题分解为若干个子问题,然后求解 子问题,在通过子问题的解得到原问题的解
不同点:适合动态规划法的问题分解为的子问题往往不是独立的(有 相同的子问题)
操作上,将动态规划法会用一个表记录所有已解决的子问题答案,在 后续计算中如果有相同的问题,则直接找出已求解的答案,避免重复 计算
动态规划算法,通常用来求解某种最优性质的问题(全局最优解)
适合动态规划法求解的问题的两个特征:
最优子结构:一个问题的最优解中包含其子问题的最优解(需要 注意,贪心算法也有这个特性)
重叠子问题:原问题的递归算法可反复的解同样的子问题,对每 个子问题只解一次,保存在表中,需要时查表
• 0-1 背包问题
问题详情表: n 个物品,第 i 个物品价值为 vi, 重量 wi, 背包容量 W ,如何装,使得背包物品价值最大
0-1 表示物品要么装入,要么不装入
代码实现:放弃~~~
107第 1 章 计算机系统知识
背包问题时间复杂度: O(n * w); 空间复杂度: O(n * w)
• Matrix multiplication
Realized by dynamic programming method
Time complexity: O(n^3); Space complexity: O(n^2)
Calculation method:
The number of multiplications required to multiply matrices A(mn) and B(np) is m * n * p
The result after multiplication can be similarly expressed as AB(mp), and multiplied with C(pk) again, the number of multiplications required is m * p * k
Therefore, the multiplication times of A(mn), B(np), C(p*k) can be m * n * p + m * p * k
Multiply multiple matrices, the optimal calculation order is to multiply the largest one among m, n, p, k first, and eliminate
• Greedy method
The greedy method is similar to the dynamic programming method and is also used to solve optimization problems, but in terms of problem-solving strategies, the greedy method does not consider the overall optimal, but local optimal
Two characteristics of problems suitable for greedy methods:
Optimal substructure: The optimal solution of a problem contains the optimal solution of its sub-problems (note that greedy algorithms also have this feature)
Greedy choice property: the overall optimal solution to the problem can be achieved through a series of locally optimal choices, namely greedy choice
partial knapsack problem
Based on the 0-1 knapsack problem, items can be partially loaded into knapsacks
• Branch and Bound
Similar to the backtracking method, it is also a method of searching the solution of the problem on the solution space tree T of the problem.
108 Chapter 1 Knowledge of Computer Systems
Used to find a solution that satisfies the constraints
The search method adopts breadth first or minimum consumption first