Road of big data [X] articles: kafka messaging system

I. Introduction

1 Introduction

Introduction
• Kafka is Linkedin in December 2010 open source messaging system
• a distributed, based publish / subscribe messaging system

2. Features

- Message persistence: data provided by O (1) the disk data structures persistence
- high throughput: news reader megabits per second level
- Distributed: Expansion capability
- multi-client support: java, php, Python, c ++ ......
- Real-time: producers to produce the message is immediately visible to consumers

3, the basic components

• Broker: each machine called a Broker
• Producer: log message producer, used to write data
• Consumer: Consumers message, used to read data
• Topic: Different consumers to read the specified Topic, different production who wrote to a different Topic
• Partition: the new version is only supported Partition, made a further distinction on the basis of layered Topic

Two benefits: Dynamic import module (this module based on the reflection member)

note:

Internal • Kafka is distributed, a Kafka cluster typically includes multiple Broker
• Load balancing: The Topic is divided into multiple partitions, each Broker stores one or more Partition
more • Producer and Consumer simultaneous production and consumption of news

4、Topic

• A Topic is a name for publishing classified or feed messages, kafka cluster using the log partition, each partition is sequential and constant sequence of messages.

• commit the log can be constantly added. Message in each partition is assigned a sequence called offset id to uniquely identify the message in the partition

 

• Example: To create topic1 and topic2 two topic, and there were 13 and 19 partitions, then the entire cluster accordingly generates a total of 32 folders

note:

• Regardless of whether the message is issued consumer, kafka will persist a certain time (configurable).

• In every consumer persistence of this offset in the log. Normally consumers will read the message offset value of the linear growth, but in fact its position is controlled by the consumer, it can be in any order to consume the message.

For example, reset to the old offset to reprocess.
• Each partition represents a parallel unit.

 

5、Message

• message (the message) is the basic unit of communication, each producer to be a topic (theme) 
release some news. If the consumer subscribed to this theme, the new announcement will be broadcast to 
the consumer. 
The Message format •: 
- the Message length: 4 bytes (value: 1 + 4 + the n-) 
- " Magic " value: 1 byte 
- crc: 4 bytes 
- payload: the n-bytes

6、Producer

• Producers can publish data to its designated topic, the topic and you can specify in which messages are assigned to which partitions (such as a simple key turn distribute the partitions or partition by specifying the semantics assigned to the corresponding points

Region)
• producers broker to send the message to the corresponding partition, without any routing layer.
• Batch send, when to send the message to accumulate a certain number or wait for a certain time.

7、Consumer

• A more abstract patterns of consumption: Consumer Group (Consumer Group)
• This mode contains the traditional queue and release subscription
    - First consumer labeling himself a consumer group name. The message delivered to each consumer group in one instance on consumers.
    - If all instances consumers have the same consumer groups, such as the traditional way of queue.
    - If all instances consumers have different consumption groups, such as the traditional publication subscription.
    - consumer group is like a logical subscribers, each subscriber by the example of the construction of many consumers (for expansion or fault tolerance).
• Compared to traditional messaging systems, kafka has a stronger order guarantee.
• As the topic using partitions, ensure order and load balancing in a multi Consumer process operation.

 

Third, the type of packaging standards

Package: python as we provide the standard data types, as well as a wealth of built-in methods, in fact, in many scenes we all need based on standard data types to customize our own data type, add / rewrite method, which uses we just learned Inheritance / derive other standard types of knowledge can be secondary processing in the following way

class List(list):
    def append(self,p_object):
        if type(p_object) is str:
            # self.append(p_object)
            super().append(p_object)
        else:
            print('只能添加字符串类型') def show_middle(self): mid_index=int(len(self)/2) return self[mid_index] # l2=List('hello world') # print(l2,type(l2))  l1=List('helloworld') # print(l1,type(l1)) l1.append(123456) l1.append('sb') print(l1)

 

四、isinstance(obj,cls)和issubclass(sub,super)

Whether isinstance (obj, cls) to check whether the cls object obj

class Foo(object):
      pass

obj  = Foo()

isinstance(obj,  Foo)

Whether issubclass (sub, super) checks sub class is a super class of the derived class

class Foo:
    pass


class Bar(Foo):
    pass
f1=Foo()
print(isinstance(f1,Foo))
print(issubclass(Bar,Foo))

五、__getattribute__

class Foo: 
    DEF the __init__ (Self, X): 
        self.x = X 

    DEF __getattr__ (Self, Item):  Print ( 'do is getattr' )  DEF __getattribute__ (Self, Item): Print ( 'is performed getAttribute' ) raise AttributeError ( 'thrown up' ) f1 = Foo (11 ) # f1.x f1.xxxxxx # property access does not exist, triggering __getattr__

六, setitem __ __ __ __ GetItem __ delitem__

class Foo:
    def __getitem__(self, item):
        print('getitem')
        # retun self.__dict__

    def __setitem__(self, key, value):
        print('setitem')
        self.__dict__[key]=value def __delitem__(self, key): print('delitem') self.__dict__.pop(key) f1=Foo() print(f1.__dict__) # f1.name='simon' f1['name']='simon' f1['age']=28 print('===========>',f1.__dict__) # del f1.name # print(f1.__dict__) # print(f1.age) del f1['name'] print(f1.__dict__)

Seven, __ str __, __ repr__

STR ####### ##### 
class Foo: 
    DEF the __init__ (Self, name, Age): 
        the self.name = name 
        self.age = Age 
 DEF __str__ (Self):  return 'name is [% s] age [% s] '% (the self.name, self.age) F1 = Foo (' Simon ', 18 is ) Print (F1) # - STR (F1) -> F1 .__ STR __ () = X STR ( F1) Print (X) ######## ###### repr, str and repr when coexistence priority class using str Foo: the __init__ DEF (Self, name, Age): = the self.name name = self.age Age # DEF __str __ (Self): # return 'this is str' DEF __repr__ (Self): return 'name is [% s] age [% s]'% (self.name, self.age) f1 = Foo ( 'simon', 20) #repr(f1) ----->f1.__repr__() print(f1) #str(f1) ---->> f1.__str__() ------>f1.__repr__()

Remarks:

'' ' 
STR function or the print function ---> __ obj .__ STR () 
the repr or interactive interpreter ---> obj .__ repr __ () 
if __str__ is not defined, it will be used instead of the output __repr__ 
Note: the return value must be a string methods maybe, or throw an exception 
'' '

Eight, __ format__

format_dic={
    'ymd':'{0.year}{0.mon}{0.day}',
    'm-d-y':'{0.mon}-{0.day}-{0.year}',
    'y:m:d':'{0.year}:{0.mon}:{0.day}'
 } class Date: def __init__(self,year,mon,day): self.year=year self.mon=mon self.day=day def __format__(self, format_spec): print('我要执行啦') print('------->',format_spec) if not format_spec or format_spec not in format_dic: format_spec='ymd' fm = format_dic[format_spec] return fm.format(self) d1=Date(2018,12,30) format(d1) #d1.__format__() print(format(d1)) print(format(d1,'int ')) print(format(d1,'y:m:d')) print(format(d1,'m-d-y')) print(format(d1,'fsdrerewr'))

九、__slots__

What is 1 .__ slots__ : class variable is a variable value may be a list, ancestral or iterables, may be a string (meaning all instances only one data attribute) 
2 primer: point of use to access properties in essence, access class or object attributes __dict__ dictionary (dictionary class is shared, and each instance is independent) 
3 why __slots__:. dictionary will take up a lot of memory, if you have a few properties class, but there are many instances, it can be used to save memory __slots__ substituted __dict__ instance 
when you define __slots__, __ slots__ instance will be represented as a more compact internal . Examples constructed by a fixed-size array of small, instead of defining for each instance of a  dictionary, which is similar with the tuple or list. __Slots__ attribute names listed in the internally mapped to the specified small scale this array. Use __slots__ a bad place that we can not give  examples of adding a new property, the property can only use those names defined in the __slots__.  4. Note: __ slots__ many features are dependent on the ordinary dictionary-based implementation . In addition, the definition of the class after __slots__ no longer supports some common characteristics of the class, such as multiple inheritance. In most cases, you should be  defined only in those classes are often used as a data structure to __slots__ such as the need to create millions of instances of a class of objects in the program.  A common misunderstanding is that it can be about __slots__ package as a tool to prevent the user to add new attributes instances. Although __slots__ can achieve this purpose, but this is not its original intention. More is used as a memory optimization tool.
class Foo: 
    __slots __ = [ 'name', 'Age' ] 

F1 = Foo () 
f1.name = 'Alex'  f1.age = 18 is  Print (F1 .__ slots__ ) 
 F2 = Foo () = f2.name 'Egon' F2 Print = 19 .age (f2 .__ slots__ ) Print (Foo .__ dict__ ) # f1 and f2 are no properties __dict__ the dictionary, are grouped together __slots__ tube, to save memory

Ten, __ doc__

class Foo: 
    'I am the description of' 
    Pass 

class Bar (Foo): 
    Pass 
# Print (Foo .__ doc__) # This attribute can not be inherited by subclasses 

Print (Bar .__ dict__ ) 
Print (Bar .__ doc__) # This attribute can not be inherited to Subclass

Eleven, __ module__ and __class__

__module__ indicate the current operation of the object in which module

What -class__ object representing the current operation of the class is

lib/simon.py

#!/usr/bin/env python
# _*_ coding:utf-8 _*_

class C:
    def __init__(self):
        self.name = 'simon'
        

aa.py

Import lib.simon from C 

obj = C () 
Print (obj .__ module__) # output lib.simon, namely: the output module 
print (obj .__ class__) # output lib.simon.C, namely: Output Class

Twelve, __ del__

class Foo: 
    DEF __init__ (Self, name): 
        self.name = name 
    DEF __del__ (Self):  Print ( 'I perform it' ) 
 f1 = Foo ( 'simon' ) # f1 # del delete instances will trigger __del__ del f1 .name # delete instances of property does not trigger Print __del__ ( '--------------------->' ) # finished running automatically reclaims the memory, trigger _ _del__

Thirteen, __ call__

Brackets behind the object, trigger the execution.

Note: the constructor is executed by the object triggers created, namely: class name = Object (); the method is performed for __call__ bracketed by the object triggers, namely: Object () or class () ()

class Foo: 
    DEF the __call __ (Self, args *, ** kwargs): 
        Print ( 'Examples of execution it obj ()' ) 

F1 = Foo () 
 F1 () # class Foo f1 at the __call__ of  Foo () #Foo of __call__ problems under xxx

Fourteen, __ next__ and __iter__ implement the iterator protocol

1, the iterator protocol means: Next object must provide a method, the method is performed either return the next iteration, or to cause a StopIteration exception to terminate the iteration (only next to go forward not back)

2, iterables: iterator object that implements the protocol (how to: define a target inside the __iter __ () method)

3, the protocol is a convention, iterables implements iterator protocol, Python inner tool (e.g., a for loop, sum, min, max functions, etc.) using an iterative protocol to access the object.

class Foo:
    def __init__(self,n):
        self.n=n
    def __iter__(self):
        return self
    def __next__(self): if self.n == 13: raise StopIteration('终止了') self.n+=1 return self.n # l=list('simon') # for i in l: # print(i)  f1=Foo(10) # print(f1.__next__()) # print(f1.__next__()) # print(f1.__next__()) for i in f1: #iter(f1)------------>f1.__iter__() print(i)

Fibonacci columns:

class Fib:
    def __init__(self):
        self._a=1
        self._b=1

    def __iter__(self):
        return self
    def __next__(self): if self._a > 100: raise StopIteration('终止了') self._a,self._b=self._b,self._a + self._b return self._a f1=Fib() print(next(f1)) print(next(f1)) print(next(f1)) print(next(f1)) print(next(f1)) print('============================') for i in f1: print(i)

XV descriptor (_get _, _ set _, _ delete_)

1, what is the descriptor

Descriptor is essentially a new class, in this new class implements at least __get __ (), __ set __ (), __ a () Delete the __, which is also called descriptors Protocol

__get __ (): When you call a property trigger

__set __ (): When the assignment is a property trigger

__delete __ (): When using del to delete attributes, trigger

class Foo: 
    DEF __get__ (Self, instance, owner): 
        Print ( '================> GET method' ) 

    DEF __set__ (Self, instance, value):  Print ( '================> SET method' ) 
 DEF __delete__ (Self, instance): Print ( '================> Delete method' ) class Bar: X = Foo () # of where? And when # = B1 Bar () =. 1 b1.x b1.x del b1.x F1 = Foo () = f1.name 'Simon' Print (f1.name)
 
 
Class attribute> Data Descriptor
# Descriptor Str 
class Str: 
    DEF __get__ (Self, instance, owner): 
        Print ( 'call Str' ) 
    DEF __set__ (Self, instance, value):  Print ( 'Str provided ...' )  DEF __delete__ (Self, instance ): print ( 'Str delete ...' ) class People: name = Str () DEF __init __ (Self, name, age): #NAME proxied Str classes, age classes are Int agents, self.name = name Self. = Age Age # based on the above presentation, we already know that it is the definition of a class descriptor property, there is the attribute dictionary in class in a class instead of an instance attribute dictionary # since it became a descriptor is defined class attributes directly through the class name also will be able to call it, yes People.name # Well, call the class attribute name, essentially calling descriptor Str, triggered __get __ () People.name = 'egon' # then assigned it, I went, and did not trigger __set __ () del People.name # hurriedly try del, I went to, did not trigger __delete __ () # Conclusion: no descriptor for class action --------> sucker home conclusion '' 'Cause: class descriptor is defined as a property of another class in use, thus camouflaged from the class attributes than the secondary processing descriptor class attributes there are a higher priority People.name # Well, call the class attribute name, attribute name can not be found went to class descriptor camouflage, triggering __get __ () People.name = 'egon' # assignment that it directly assigned a class attribute, it has a higher priority, the equivalent of covering the descriptor, certainly will not trigger descriptor __set __ () del People.name # ditto '' ' 

Data Descriptor> Instance Properties

# Descriptor Str 
class Str: 
    DEF __get__ (Self, instance, owner): 
        Print ( 'call Str' ) 
    DEF __set__ (Self, instance, value):  Print ( 'Str provided ...' )  DEF __delete__ (Self, instance ): print ( 'Str delete ...' ) class People: name = Str () DEF __init __ (Self, name, age): #NAME proxied Str classes, age classes are Int agents, self.name = name Self. = Age Age People P1 = ( 'Egon', 18 is ) # If the descriptor is a data descriptor (i.e. __get__ there __set__), then p1.name calls are triggered operation and assignment of the descriptor, p1 in itself irrelevant, equivalent to an example of covering property = p1.name 'egonnnnnn' p1.name Print (p1 .__ dict __) # attribute name is not in the dictionary example, because the data is a descriptor name, a higher priority than examples of attributes, view / assignment / deletion is related with the descriptor, it has nothing to do with the examples del p1.name

Examples of Properties> non-data descriptor

class Foo: 
    DEF FUNC (Self): 
        Print ( 'I Hu Hansan back' ) 
F1 = Foo ()  f1.func () # call the class methods, may be said to be a non-call data descriptor  # is a non-data function Object descriptor (all are subject it)  Print (the dir (Foo.func))  Print (the hasattr (Foo.func, '__ set__' )) Print (the hasattr (Foo.func, '__ get__' )) Print (the hasattr (Foo. FUNC, '__ delete__' )) # one might ask, descriptors are not all like it, functions, how to count and should be an object ah, how is the descriptor of the # idiot brother, descriptors are like no problem, descriptor when the application is not a class is instantiated it # attribute is a function of a non-descriptor class is instantiated object # right obtained, the same string f1.func = 'this is an example of attribute ah' Print (f1.func ) del f1.func # deleted the non-data f1.func ()

Re-verification: Example Properties> non-data descriptor

class Foo: 
    DEF __set__ (Self, instance, value): 
        Print ( 'SET' ) 
    DEF __get__ (Self, instance, owner):  Print ( 'GET' )  class Room: name = Foo () DEF the __init__ (Self, name, width, length): = the self.name name self.width = width = self.length length is a data descriptor to the #NAME as name = Foo () and Foo achieved get and set methods, and thus have higher properties than examples # priority attribute operational example, descriptors are triggered r1 = Room ( 'WC', 1,1 ) = r1.name r1.name 'kitchen' class Foo: DEF __get__ (Self, instance, owner ): Print ( 'GET' ) class Room: name = Foo () the __init__ DEF (Self, name, width, length): = the self.name name = self.widthself.length = width length to the #NAME is a non-data descriptor, since name = Foo () but does not implement Foo set method, hence the lower priority attribute # operation than in Example instance attributes are instances trigger their r1 = Room ( 'WC', 1,1 ) = r1.name r1.name 'kitchen'

 to sum up:

1, the descriptor itself should be defined as the new class, the proxy class should be new class

2, the descriptor must be defined as the agent of this class is not defined to the constructor

3, must strictly follow the priority, the priority from high to low

 

Sixteen, __ enter__ and __exit__

When the object file operations:

Open with ( 'named text.txt in', 'R & lt' ) AS F: 
        'code block'

Above is called context management protocol, namely with the statement, in order to allow an object compatible with the statement, must be declared in the class __enter__ and methods of this object in __exit__

class Foo:
    def __init__(self,name):
        self.name=name

    def __enter__(self):
        print('执行enter')
        return self def __exit__(self, exc_type, exc_val, exc_tb): print('执行exit') # f=Open('a.txt') with Foo('a.txt') as f: print(f) print(f.name) print('00000000000000000000')

The code analysis:

class Foo: 
    DEF the __init__ (Self, name): 
        the self.name = name 

    DEF the __enter__ (Self):  Print ( 'execution Enter' )  return Self DEF the __exit__ (Self, exc_type, exc_val, exc_tb): Print ( 'execute Exit' ) Print (exc_type) Print (exc_val) Print (exc_tb) return True # f = Open ( 'a.txt') with Foo ( 'a.txt' ) AS f: Print (f) Print (asfdreevergewafa) # trigger __exit__ print (F.Name) Print ( '00000000000000000000' ) #with obj AS F: 'code block' 1, with obj ---- >> .__ enter__ trigger obj (), get the return value 2, as f ---- -----> f = return value, 3, with obj as f obj .__ equivalent enter__ = F (. 4)Performing a code block: there is no abnormality, the operation is completed after the entire code block to trigger the __exit__, the three parameters that are two None: when there is an abnormality, from the position directly triggers the abnormal __exit__ a: If __exit__ the return value is True, on behalf swallowed abnormal b: If the return value is not __exit__ True, the representative of spit exception c: __exit__ of finished running on behalf of the entire statement is finished with

to sum up:

1. Use with the statement of purpose is to put the code block with the execution after the end with, clean up automatically, without manual intervention

2. On the need to manage some resources such as files, network connections, and lock the programming environment, you can customize the mechanism automatically release resources __exit__, you do not need to go to the relationship between this problem, which will be of great use

 

Guess you like

Origin www.cnblogs.com/hackerer/p/11430484.html