Collection of basic data types and character encoding

A collection of

1. Definitions

  {} The plurality of elements spaced apart by commas, element satisfies the following conditions:

    a, elements within the set of immutable type

      print (set ([1,1,1,1,1,1, [11,222]]) # error

    b, the unordered collection of elements

    C, within the set of elements can not be repeated, a repeat element count only

2, built-in method

Relational operators ============== =============

friends1 = {"zero","kevin","jason","egon"}
friends2 = {"Jy","ricky","jason","egon"}

  a, on the intersection: two sets of the same elements

    res = friends1 & friends2

  b, and take the set / Collection: All the elements of the set of two

    res = friends1 | friends2

  C, taking the difference set: a set of common elements in the extraction, the remaining difference is set

    friends1 的:res = friends1 - friends2

    friends2 的:res = friends2 - friends1

  d, symmetric difference: a unique set of two elements, i.e., elements common to remove

    res = friends1 ^ friends2

  e, parent subset

    1, S1: {1,2,3}

       s2: {1,2,4} # inclusion relationship does not exist

    print(s1>s2)  #False

    2, only when the set of s1 s2 subordinate, s1 and s2 set to become a father

      When s1 = s2, the mutual parent subset

================ deduplication ===============

  Only for immutable type, can not guarantee that the original order

l=[
    {'name':'lili','age':18,'sex':'male'},
    {'name':'jack','age':73,'sex':'male'},
    {'name':'tom','age':20,'sex':'female'},
    {'name':'lili','age':18,'sex':'male'},
    {'name':'lili','age':18,'sex':'male'},
]
new_l=[]
for dic in l:
    if dic not in new_l:
        new_l.append(dic)

print(new_l)
 

Other built-in method ####

s={1,2,3}
You need to have built-in method
  1:discard
    s.discard (4) # remove an element does not exist do nothing
    print(s)
    s.remove (4) # remove an element does not exist error
  2:update
    s.update({1,3,5})
    print(s)
  3:pop
    res=s.pop()
    print(res)
  4:add
    s.add(4)
      print(s)
 

Two-character encoding

Details, refer to: https: //zhuanlan.zhihu.com/p/108805502

##Analysis process

  x="上"
                    RAM
The translation ----- ------- "0101010
On "---- translating" ----- 0101010
 
Character code table is a table of correspondence between characters and numbers
 
 
a-00
b-01
c-10
d-11
 
ASCII table:
    1, only supports English string
    2, 8-bit binary number corresponding to an English string
 
GBK表:
    1, supports English characters, Chinese characters
    2、
    8-bit (8bit = 1Bytes) binary number corresponding to an English string
    16-bit (16bit = 2Bytes) corresponds to a binary number string Chinese
 
 

unicode(内存中统一使用unicode):

    1、
        兼容万国字符
        与万国字符都有对应关系
    2、
    采用16位(16bit=2Bytes)二进制数对应一个中文字符串
    个别生僻会采用4Bytes、8Bytes
 
 
    unicode表:
                          内存
        人类的字符---------unicode格式的数字----------
                             |                     |
                             |                     |
                             |
                            硬盘                    |
                             |
                             |                     |
                             |                     |
                        GBK格式的二进制       Shift-JIS格式的二进制
 
        老的字符编码都可以转换成unicode,但是不能通过unicode互转
 
 
utf-8:
    英文->1Bytes
    汉字->3Bytes
 
 
 

结论:

    1、内存固定使用unicode,我们可以改变的是存入硬盘采用格式
        英文+汉字-》unicode-》gbk
        英文+日文-》unicode-》shift-jis
        万国字符》-unicode-》utf-8
 
    2、文本文件存取乱码问题
        存乱了:解决方法是,编码格式应该设置成支持文件内字符串的格式
        取乱了:解决方法是,文件是以什么编码格式存如硬盘的,就应该以什么编码格式读入内存
 
 
 

其他内置方法
s={1,2,3}
需要掌握的内置方法1discard
s.discard(4) # 删除元素不存在do nothing
print(s)
s.remove(4) # 删除元素不存在则报错


需要掌握的内置方法2update
s.update({1,3,5})
print(s)

需要掌握的内置方法3pop
res=s.pop()
print(res)

需要掌握的内置方法4add
s.add(4)
print(s)

Guess you like

Origin www.cnblogs.com/NevMore/p/12482319.html