(Shallow) Lua study notes _String (a)

   In Lua may contain a letter string may be a whole book. 100K 1M content expressed are not uncommon even in Lua with a string.  

   In Lua, a string is a sequence of bytes. Lua kernel does not care what these byte encoding format. Lua simply to save them in 8bit units, and each byte may be any number, including zero. This means that we can save arbitrary binary data string. You can also save any Unicode string representation (UTF-8, UTF-16, etc.); however, the following talk, try to use UTF-8 and has many advantages. Lua standard character string library assumes least one byte, which is UTF-8 has good compatibility. Furthermore, version 5.3 from the beginning, Lua built a small library for handling UTF-8 encoding.

 


    

   Lua strings are immutable (immutable). Alone can not modify a character string, like the C language did. But we need to create a new string, and it will be created to look like you want:

1 a    =    "one string"    
2 b    =    string.gsub(a,    "one", "another")  -- 修改部分字符
3 print(a)    -->    one string
4 print(b)    -->    another string

   

   String object in Lua automatic memory management, and other objects with the same Lua (tables, functions, etc.). This means that applications do not need to be concerned about the string and release space; Lua has been a good deal for us.


 

  • Get the length of the string

   You can take the operator to obtain the length of the length of the string (written as "#").

1 a = "hello"        
2 print(#a)          -->    5
3 print(#"good bye")    -->    8

  

   It can also be used string.len () Method:

1 a = "hello"        
2 print(string.len(a))         --> 5
3 print(string.len("good bye"))--> 8

 

   These two results are the same.

   The results were always the length of the number of bytes represented, may differ from the actual number of characters represented in different coding.


 

  • Connection string  

   Can ".." (two points) two strings are connected via string concatenation operator. If one of the operands is a number, it will be digital and is connected to a string:

1 "Hello " .. "World"    --> Hello World
2 
3 "result is " .. 3     --> result is 3

   (Some languages ​​use a plus sign string, but the 3 + 5 and 3..5 are different.)

 

   Remember, Lua strings are immutable object. Join operation always generate a new string , and does not modify the original string :

1 a    = "Hello"            
2 a    .. " World"  -->    Hello World
3 a              -->    Hello    

 


  • Strings literal ( string literal  

   Single or double quotes, a character string can be created, the content portion is in quotes:

a = "a line"

b = 'another line'

   The above two way completely equivalent; the only difference is that, in the interior of the string represented by single quotation marks, double quotes may be used safely without escape characters; the same and vice versa.

   In order to unify style, most programs will use the same authors quote the same kind of strings, which is "kind" depends on the specific program. For example, a program can process XML string that represents single quotes XML field, because often these fields contain double quotes.

   Lua strings, supports C-style escape characters:

 

\a

bell

\b

back space

\f

form feed

\n

newline

\r

carriage return

\t

horizontal tab

\ v

vertical tab

\\

backslash

\"

double quote

\'

single quote

 

   Example escape characters are as follows:

 1 print("one line\nnext line\n\"in quotes\", 'in quotes'")
 2 -->one line
 3 -->next line
 4 -->"in quotes", 'in quotes'
 5 
 6 print('a backslash inside quotes: \'\\\'')
 7 -->a backslash inside quotes: '\'
 8 
 9 print("a simpler way: '\\'") 
10 -->a simpler way: '\'

   (See faint ....

  

   We can also use a digital way to represent a character string, in the form of \ ddd and \ xhh The, wherein ddd three digits, and hh is two hexadecimal digits.

   As an example of deliberate: string "ALO \ n123 \" "and the character string '\ x41LO \ 10 \ 04923' having the same value in ASCII comply with specifications of the system, 0x41 (i.e., 65 in decimal) character of A coding, coding line break is 10, and 49 is the number 1 character encoding (writing must note here \ 049 as a digital still follow later, will result in ambiguity, be mistaken \ 492).

   We can also put more string writing '\ x41 \ x4c \ x4f \ x0a \ x31 \ x32 \ x33 \ x22', each character is encoded in hexadecimal representation.

   (This translation may be a problem, reads as follows

We can specify a character in a literal string also by its numeric value through the escape sequences \ddd and \xhh, where ddd is a sequence of up to three decimal digits and hh is a sequence of exactly two hexadecimal digits. As a somewhat artificial example, the two literals "ALO\n123\"" and '\x41LO \10\04923"' have the same value in a system using ASCII: 0x41 (65 in decimal) is the ASCII code for A, 10 is the code for newline, and 49 is the code for the digit 1. (In this example we must write 49 with three digits, as \049, because it is followed by another digit; otherwise Lua would read the escape as \492.) We could also write that same string as '\x41\x4c\x4f\x0a\x31\x32\x33\x22', representing each character by its hexadecimal code.

  

   Lua 5.3 version from the start, can also specify a string with a UTF-8 encoding \ u {h h ...} manner; braces can be written in any number of hexadecimal digits to represent a string of characters:

1 "\u{3b1} \u{3b2} \u{3b3}" --> # # #

   (Assuming the above example is performed in a UTF-8 terminal.)

 


  • Long strings

   We can also use square brackets to define a string of similar multi-line comments as .

   String defined in this way may span multiple lines and does not require an escape character. And if the first character is a newline, the first character will be ignored. This definition string manner for writing long strings is very convenient , for example as follows:

 1 page = [[
 2 <html>
 3     <head>
 4         <title>An HTML Page</title>
 5     </head>
 6     <body>
 7         <a href="http://www.lua.org">Lua</a>
 8     </body>
 9 </html>
10 ]]
11 io.write(page)

 

   Sometimes, other codes may also be used in two successive bracket , such as A = B [C [I ]] (note the end]]), or has been annotated to include the multi-line contents will We have this problem. To handle this situation, only we need to add two left bracket between any number of the equal sign : e.g. [=== [. As a result, you must use the same number of square brackets to equate pairing :] ===], the compiler will ignore a different number equal sign in parentheses. By selecting a different number of equal sign, it is possible to include any piece of code without the need to make changes to it. This solves the problem of conflict in parentheses.  

   The same method can also be used for multi-line comments. For example, if a comment to - [= [start, then it would have to] =] end. This method is suitable for multi-line comments nested situation occurred.

   

   Long strings is very applicable to a case where a literal string, but should not be used to indicate that non-text based string.

   Although binary byte long string may contain non-printable, but the use of long strings of binary data is directly included inappropriate (for example, you may have a display editor problems); and even, for example, a carriage return "\ r \ n "is converted to" \ n ". When it is a better way to represent binary data, using the numeric escape characters, such as "\ x13 \ x01 \ xA1 \ xBB". However, when a long string, it will lead to a very long string in the line of source code. (There is no understanding) In this case, Lua version 5.2 provides an escape character \ z , like all blank characters after it skipped until the first flying space character. As in the following example:

1 data = "\x00\x01\x02\x03\x04\x05\x06\x07\z 
2         \x08\x09\x0A\x0B\x0C\x0D\x0E\x0F"

   The first end of the row \ Z action is to skip the end of the first line and line feed characters at the beginning of a second white line , until the next \ X08, that is to say \ X08 immediately after \ X07 is . The wording of convenience we put a long string written in multiple lines .

 

Guess you like

Origin www.cnblogs.com/guoyujam/p/11420627.html