Summary of Unity development experience--1.C# foundation

1. C# Language Basics

1. Value types and reference types of C#

Value types are allocated on the stack, have high performance, and inherit from ValueType. A structure is a value type, struct.

The reference type is allocated on the heap, and the new of the reference type will generate GCAlloc. Inherited from Object.

2、ArrayList、List

The ArrayList type is object. Any object can be placed, and there will be boxing and unboxing operations. Boxing is to convert a value type into a reference type (object). Unboxing is converting a reference type to a value type.

List is generic and has better performance. Recommended Use.

The memory of ArrayList and List are both continuous and both are array structures. It needs to be distinguished from LinkedList.

3. Loop through the List to delete elements. The common way is to traverse in reverse order to delete the specified element. If it is a positive order traversal, after the element is deleted, the element behind the List is moved forward, and then the iterator jumps to the next element, then the next element of the deleted element will be skipped. Reverse order traversal will not have this problem.

4. Make good use of data structures. For example, our player list saves two data structures, a List structure for traversal, and a Dictionary structure for query.

When we query frequently, making judgments or queries through HashSet or Dictionary will greatly improve performance.

In terms of the order of magnitude of normal games, LinkedList is almost unnecessary to use. Even if there is a need to insert and delete elements, List is generally faster. Unless there are hundreds of objects in the list.

2. Threads, coroutines and processes

(1) Thread

1. Multi-threading can make good use of multi-core and improve execution efficiency.

2. The calculation of sub-threads will not block the main thread, so some complex calculations can be placed in sub-threads to reduce game lag.

3. There are some blocking codes that need to be placed in sub-threads, such as downloading code, socket communication, etc.

4. There are many restrictions on sub-threads, and most Unity APIs cannot be called.

5. Writing multi-threaded code requires attention to thread synchronization. A little carelessness may lead to deadlock or logic errors or even flashbacks due to thread insecurity.

6. If a large number of threads are created, the thread pool can be used for optimization. Servers are relatively common. Clients use less.

7. Too many threads may cause frequent context switching. It saves the state of one thread, then processes another thread's stuff, and then restores the state of the first thread to continue processing that thread. This operation is a resource-intensive process and should therefore be avoided if possible.

(2) Coroutine

1. The advantage of coroutines is that asynchronous logic can be processed in the form of synchronous code, avoiding complex callback code.

2. Although the coroutine is also executed asynchronously, it is executed on the main thread. So it is still impossible to do operations that may cause the main thread to freeze, such as instantiating a large number of objects. The correct operation should be performed in frames.

3. Direct StartCoroutine still has a certain overhead, and coroutines should be merged as much as possible. For example, use CoroutineManager.

(3) Process

1. Mutex is a process mutex. When a piece of logic can only be executed by one process, you can use Mutex to control it.

(4) lock

1、lock(object) {xxxxx}

2. For example, in network communication, for a buffer, the main thread keeps reading and the sub-thread keeps writing. If you do not lock it when reading, it may conflict with the child thread, resulting in wrong results or data confusion.

3. Lua state is not thread-safe. The lua state created by the main thread, if we call its interface in the sub-thread, such as calling a lua function, the stack may be messed up. It is very likely to cause the game to crash. The mistake we made before is to notify lua if the network is abnormal in the callback thread of socket.BeginReceive. This operation is very dangerous.

(v) JobSystem

1. This is a multi-threaded code scheduled by Unity. more efficient. And you don't need to care about thread safety.

2、IJob

(6), ECS

1、Entity–Component–System

Entity holds Component

Component is a container for data

System handles Component, which can be understood as a collection of functions. Nothing to do with the object.

2. Overwatch uses the ECS system to conveniently implement battle rollback.

3、ComponentSystem、NativeArray

4. Unity's ECS mode is better for data alignment, which is conducive to CPU cache hits. Performance is better.

(7), DOTS

1、ECS+JobSystem+Burst

3. Others

1、Xml

1.1. Parsing Xml is divided into two types: document object model (DOM) and flow model.

1.2. The former is relatively simple to write and can randomly read the content in xml. The disadvantage is that it consumes more performance and needs to load the entire xml file into memory at one time.

1.3. The stream model is read-only, and can only read the contents of xml sequentially. The advantage is good performance.

2、Json

2.1. The advantage is fast speed. The downside is that annotations are not supported.

2.2. When we parse the json configuration, most of them directly use the JsonUtility API provided by Unity. This interface is fast. The disadvantage is that the format is fixed and cannot be parsed into any structure like Dictionary.

2.3. Use LitJson or MiniJson when you need to process any structure returned by server data.

3、YAML

3.1. All serialized text formats in Unity are YAML. Such as prefab, material, .unity scene files and so on. Setting Force Text under the editor can force these serializations to be in text format. Otherwise it may be in binary format.

4. AES encryption

4.1. Advanced Encryption Standard (AES, Advanced Encryption Standard) is a popular symmetric encryption algorithm. The secret key used for encryption and decryption of symmetric encryption algorithm is the same. DES encryption is also common, but DES encryption is relatively slow and is no longer recommended.

4.2. According to the secret key and vector, the plaintext is encrypted into ciphertext. After receiving the ciphertext, the server decrypts it into plaintext according to the same secret key and vector.

4.3. The speed of AES encryption is very fast. Our message communication and Lua file encryption all use AES encryption.

4.4. There is also an asymmetric encryption algorithm. The secret key is divided into public key and private key. A common one is the RSA algorithm. The advantage is that it is safe and extremely difficult to be cracked. The disadvantage is that it is slower. So there are not many places where RSA is used in normal games. However, similar encryption certificates are used in push notification servers, Https servers, and other places. At the same time, our commonly used git user authentication is also this algorithm. Our public key is configured on the server, and the private key is stored locally. The private key encrypts and the public key decrypts.

5. LZ4 compression

5.1. Commonly used compression algorithms include Deflate (zlib, zip), lzma (7z) and lz4. There are also some compression algorithm features that are repeated with these three, so no explanation will be given.

5.2. zlib is a very common compression algorithm library. Almost every game engine or game will use it. It implements the Deflate algorithm. Common compressed archive formats, such as zip, tar.gz, etc., mostly use this algorithm.

It is supplemented that png also uses this compression algorithm. In fact, zlib was originally used for png compression. PNG is a lossless compression format. Finally, the image size is compressed by zlib.

5.3, lzma, which is the 7z format, has a very high compression ratio. The disadvantage is that the compression and decompression speed is relatively slow. The optional format of the ab package in Unity includes lzma, and versions before 5.0 only have two options: lzma and no compression. The disadvantage of using lzma in Unity's ab package is that the ab package will be decompressed in memory as a whole. If an ab package has 100mb, the decompressed memory must be at least 100mb.

5.4, ​​lz4. Its advantage is that it is extremely fast. Especially the decompression speed. It can reach several times to dozens of times that of zlib. Although not commonly used as an archive format. But it is widely used in games. After we encrypted the lua file, we performed lz4 compression again. The optional format ChunkBaseCompression of Unity's ab package is the format of lz4. In addition to being fast, the advantage is that the loading of the ab package itself does not consume memory, only the memory consumption of the file header. Which resource is used, and which part to decompress. The design of our resource management module relies heavily on this feature.

If there is no lz4 feature, then we may choose a solution: the ab package chooses an uncompressed format. Then use zip to compress it into the apk when packaging. When the game starts for the first time, extract it to a writable directory. Many games used to choose this solution. The main considerations are loading speed and memory usage.

6. Character encoding

6.1, ASCII code. 0-128 represents our common English letters, numbers, etc.

6.2, ANSI standard. There are only 128 ASCII codes, and only 256 including extension codes. It is obviously impossible to represent all the texts in the world.

The code of the ANSI standard formulated by China is GB2312. It contains more than 7000 Chinese characters and symbols.

GBK is fully compatible with GB2312 and contains more than 30,000 Chinese characters.

GB18030 further expands Chinese characters on the basis of GBK, adding Tibetan, Mongolian and other minority languages.

6.3, Unicode standard. Under the ANSI standard, each country has a different encoding meaning. And Unicode is a unified standard all over the world. Contains all texts from all over the world.

6.4, MBCS (Multi-Byte Character System). Multibyte character set. The ASCII code occupies one byte, and the vast majority of Chinese characters occupy two bytes.

6.5, CodePage. code page. This is also a very common concept. By setting the code page, tell the operating system which country or region the ANSI standard is. For example, Windows stipulates that the code page of 65001 represents UTF-8, and the code page represented by 936 represents GBK.

6.6、UTF-8、UTF-16、UTF-32

Unicode is a character set standard. There are different implementations on how to encode on a specific computer.

UTF-8 is a variable byte, compatible with the ASCII code part. Chinese characters generally occupy 3 bytes, up to 4 bytes.

UTF-16 uses 2 bytes or 4 bytes to represent a character. Commonly used characters such as ASCII codes and most Chinese characters are 2 bytes.

UTF-32, which is basically consistent with the Unicode code table, uses 4 bytes to represent a character.

6.7. On Windows, wchat_t represents a Unicode character, which is 2 bytes, which means that the Windows system adopts the UTF-16 scheme.

Various Windows APIs, such as SetWindowTitle and SetWindowTitleW, correspond to ANSI interfaces and Unicode interfaces respectively.

In Chinese environment, the default character set of Windows system is GBK. For example, the file name of the file we create should be GBK encoded. If the code is written in UTF-8 encoding, then the created file may have garbled characters. This situation is common when converting files under Windows and Mac.

Similarly, the default code page of the cmd window is 936, the output result of cmd is GBK encoding, and the default character encoding of Jenkins is UTF-8, so when Jenkins executes bat on Windows, the output result may be garbled. The solution is to call chcp 65001 to switch the code page to UTF-8 before executing the bat command.

6.8. On Linux, the default character set is UTF-8. The corresponding wchar_t is 4 bytes. This is because the character encoding chosen by Linux is UTF-8 instead of UTF-16. For UTF-16 selected by Windows, most characters can be represented by 2 bytes, and characters that need to be represented by 4 bytes are very rare. When a 4-byte text appears, it is actually 2 wchar_t to represent a text. In this case, most functions such as calculating the length are problematic, but because this situation is extremely rare, there is no big problem .

The UTF-8 encoding used by Linux, most Chinese characters are 3 bytes, it is impossible to use 2 bytes to represent wchart_t to handle most of the text like Windows, so wchar_t can only use 4 under Linux bytes to represent.

6.9. It is relatively simple to deal with character encoding in C#, just use the Encoding class. However, it should be noted that if you want to convert Chinese characters from UTF-8 to GBK, you need to copy several dlls to the project, and configure them not to be trimmed in link.xml. Then it can be called normally.

I18N.dll I18N.West.dll I18N.CJK.dll。

Because we use Encoding.GetEncoding(936) in our code to get the conversion object, if it is not configured in link.xml, these dlls will be cut out because they are not referenced.

6.10. Finally, add that the character encoding of Lua is UTF-8. When C# calls the C++ interface, CharSet can be specified to indicate the character encoding of the parameters we pass. for example,

[DllImport(LUADLL, CallingConvention = CallingConvention.Cdecl, CharSet = CharSet.Unicode)]

7. Byte order

7.1. Byte order is a type that only occupies more than 1 byte in memory. How to store data in memory.

Little Endian, Little Endian. The low byte data is stored at the low address of the memory. Our common PC and mobile platforms are all little endian.

Big Endian, Big Endian. The high byte data is stored at the low address. Many embedded platforms are big endian.

7.2. Assume that the memory address growth direction is from left to right, from low to high.

For example, \n in UTF-16, the data is 0x000D, 00 is the high digit, and the number is more important. Imagine that 1 in 102 is the high digit, which is more important.

The little endian memory layout is 0D00, and the big endian memory layout is 000D. Correspondence can be seen in the description above.

This knowledge is useful when we look at the binary encoding of a file. The display order from left to right can be understood as the order of memory address growth.

7.3. UTF-16 means that 2 bytes represent 1 character. It can also be big endian or little endian. Therefore, it is stipulated to add an identifier to the file header to explain the byte order of the file. This is BOM (Byte Order Mark).

FE FF stands for UTF-16BE, which is big endian.

FF FE stands for UTF-16LE, which is a little-endian file.

There are other BOM tags of UTF-32, but they are basically not used. These two tags of UTF-16 are the most widely used.

7.4, UTF-8 also has a BOM header, EF BB BF. However, this is determined by Microsoft itself. In fact, UTF-8 is the same as GBK, and there is no byte order problem. They are all multi-byte encodings. So some languages, such as Go language, did not support UTF-8 encoded source files with BOM headers.

7.6. Our Lua files are in UTF-8 format without BOM. C# and C++ source files are in UTF-8 format with BOM.

UTF-8 encoding is used because VSCode or Sublime supports UTF-8 format by default. If the source code is in GBK format, it needs to be converted to display normally, otherwise it will be garbled. In addition, if the source file is encoded in GBK, it will be displayed in garbled characters under Mac, so UTF-8 is more common.

Lua files do not have a BOM header because Luajit supports lua files with BOM headers, but native Lua5.1 does not support files with BOM headers. So if the Lua version we use is native lua, the BOM header must be removed before loading the code. The way is to remove these three bytes if it is determined that the first three bytes of the file are EF BB BF.

C++ files have BOM headers because our C# and C++ viewing and compiling tools are Microsoft’s own. Since we use UTF-8 format without BOM, Microsoft’s own compiler may compile and report errors when encountering Chinese.

8. Line break

Although this is very simple, it is very common and has some pitfalls, so I will mention it separately.

8.1. The carriage return character \r has an ASCII code of 13, which is 0D.

The newline character \n has an ASCII code of 10, which is 0A.

8.2. The newline character under Windows is \r\n, expressed as CRLF

The newline character under Linux is \n, expressed as LF

The newline character under Mac is \r, expressed as CR

The line breaks are different under different systems, so some text files edited under linux will not have line breaks when opened with a text editor under Windows. VSCode can also modify the newline character of the current file.

8.3. By default, Git will convert newline characters when checking out. When submitting for storage, a newline character will also be uniformly set. This often has some pits. So when we set up the development environment, we will set up git and turn off the function of autoCrlf. When checking out and committing code, it remains as it is.

Sometimes our new warehouse does not turn off autoCrlf. If there are some copy files or other operations later, it may appear that the file has not been modified, but it will prompt that the file has changed when it is submitted, and it will not display any modified content when it is opened. This is because although the content of the file has not changed, the newline character has changed.

8.4. Because we are all developing under Windows, the newline characters of our code and configuration are unified as CRLF.

8.5. Unity can set line breaks in Project Setting – Editor. The default seems to be LF.

Guess you like

Origin blog.csdn.net/s10141303/article/details/127305571