Study notes _python network programming 3_ (8)

And message queue buffer 8.

8.0 When the service load is heavy, the commonly used two basic techniques: the message queue buffer

此前是socket-API及Py中使用基础IP网络,来操作构建通信信道的方式
此后是关于,构建在socket上的特定协议---如何从web获取文档、发送mail、向远程Serv提交命令

. 8.0.1 caching and message queues have some common characteristics:

8.0.1.1. Use Memcached / a message queue, not to implement a protocol to interact with other tools, but to write services to solve specific problems.

8.0.1.2. Both technology to solve problems, often within the organization-specific issues. Often not only from the outside world that a particular website / web service which uses the cache, which message queues, which load distribution tool

8.0.1.3. Such tools HTTP and SMTP, are directed at a specific load settings (hypertext documents for HTTP, SMTP e-mail), but the cache and message queues, there is no need to understand the data to be transmitted.

8.1 Use Memcached:. "Memory cache daemon", will install its Serv of free RAM and a big Least Recently Used (LRU) cache used in combination. Memcached is important to learn from the modern network concept --- partition (sharding)

8.1.1 Use Memcached practical steps:

8.1.1.1. On Serv each have free memory, runs a Memcached daemon

8.1.1.2. All the IP Memcached daemon and port list, the list is sent to all of you want to use Memcached Cli

8.1.1.3. Cli can access an organization-level, fast cache keys, like all inter-Serv share of a huge Py dictionary. The cache based on LRU. If some items have not been accessed for a long time, it will discard these items, make room for the new term access, and entries are frequently accessed.

http://code.google.com/p/memcached/wiki/Clients列出了许多Memcached的Py-Cli

Install a virtual environment with Py3 Cli written to the index Py package provided by:

su root
$ pip3 install python3-memcached
>>> import memcache
>>> mc = memcache.Client(['127.0.0.1:11211'])
>>> mc.set('user:19', 'Simple is better than complex.')
0
>>> mc.get('user:19')
'Simple is better than complex.'    # 但是我的是没有值

Here similar interface Py-dic, an incoming value str seat set () when the write directly str Memcached in UTF-8 encoding, then by acquiring the time get str, will decode (). In addition str, any write operation pickle Py object will automatically trigger memcache module, stored in the binary pickle in Memcached. Only str value stored in the form before they can be used by other language Cli directly

8.1.2. Serv is stored in the data can be discarded in the Memcached. Memcached object, the result of repeated high computational expense record, to accelerate the operation. Not used as the sole data storage area, if the run the above command, the corresponding Memcached is in busy / set () and get () operation between spaced too long, when the operation get () may return str cache found more than valid, is discarded

8-1 is a basic model Py use of Memcached. Carried out before spending a lot of square integer operations, will first check whether Memcached has been saved previously calculated answer. If so, can return the answer immediately.

# 8-1 squares.py 使用功能Memcached为一个花销很大的操作加速
# !/usr/bin/env python
# -- coding: utf-8 --


import memcache, random, time, timeit

def compute_square(mc, n):
    value = mc.get('sq:%d' % n)
    if value is None:
        time.sleep(0.001) # pretend that computing a square is expensive
        value = n * n
        mc.set('sq:%d' % n, value)
    return value

def main():
    mc = memcache.Client(['127.0.0.1:11211'])

    def make_request():
        compute_square(mc, random.randint(0, 5000))

    print('Ten successive runs:')
    for i in range(1, 11):
        print(' %.2fs' % timeit.timeit(make_request, number=2000), end='')
    print()


if __name__ == '__main__':
    main()

To run this example, we need to re-run Memcached on the PC 11211-port. For hundreds of requests First up, the program will run at normal speed. Calculating the square of the first request to an integer, the program is not stored in RAM to discover if the square of an integer, must be calculated. With the program running and found the cache already stores a few square integer speed up the running speed of the program.

$ python3 squares.py
Ten successive runs:
2.87s 2.94s 1.50s 1.18s 0.95s 0.73s 0.64s 0.56s 0.48s 0.45s
# 但是我的结果是: 7.11s 6.41s 9.51s 14.44s 7.45s 5.91s 10.94s 6.46s 5.74s 6.51s
# 或  4.61s 3.70s 3.64s 3.63s 3.63s 3.76s 3.82s 3.75s 4.76s 3.70s

Shows the general characteristics of the cache, the cache along with a growing number of key-value pairs, running speed will be faster. When after Memcached is full / all possible inputs to calculate the speed will not significantly improve the
actual program, which hopes to write data to the cache?

8.1.3. Expenses directly to the bottom of the larger call into the cache, such as database queries, file system I / O, queries to external services.

To ensure that data in the cache, however, the need to determine the time information is stored in the cache.

8.1.4 The key must be unique. Various types of objects gradually developers tend to use the prefix and number to distinguish store. Key can contain up to 250 characters, but by using a strong hash function that provides query function for longer str. Memcached value stored in the key may be longer than, but not of 1M

8.1.5. Memcached is just a cache stored content is temporary. RAM is used as a medium, once restarted, will be lost. You should always be able to recover all the lost data and reconstruction.

8.1.6. Ensure that data can not be cached returned too long, so returned to the user's data is accurate. Whether the returned data is too old, depending on the problem to be solved.

3个方法可以用来解决脏数据的问题,确保数据过时后,进行及时清理,永远不返回脏数据:

1) Memcached allows one set for each of a cache expiration time, the arrival time, the Memcached these entries will be responsible for discarding
2) can be established if the key map including the identification information from the identification, to the cache, and then can be dirty after the data is present, take the initiative to remove the cache entry.
3) When the recording buffer is not available, it may be rewritten to use the new content instead of this record, rather than simply removing the record. For this approach can hit dozens of times per second, the record is very useful. All Cli using the cache will not find the entry you are looking for does not exist, but will re-recorded after the rewrite found in the cache. Also for this reason, when the program first started pre-installed cache for large web-site, it is a technology that is important.

8.1.7. Py decorators, you can call the function without changing the name and signature of its packaging, so it is common to add a cache function in practice Py. Py package index, there are a lot of decoration based on Memcached caching library

8.2. Hash and Zoning

When Memcached-Cli obtained it contains a plurality of Memcached list of instances, will, of Memcached database partition (Shard) based on the hash value of each key str value, calculated by the hash value, determined by the cluster Memcached which of Serv to store specific records.
8-1, may be stored in the key sq: 42 and 1764 values. To take full advantage of available RAM, Memcached cluster will only want to store once the pairs. To speed up the data service, at the same time it can be desirable to avoid redundancy, to avoid communication between the cooperating / Serv different each of Cli.
Means that, in addition to all Cli Memcached-Serv list of keys and configuration does not require any other information. Cli need to take some kind of mechanism to determine the stage on which specific information is stored in Serv.
If the same key is not mapped to the same Serv, then with a key-value pair will be multiple copies, reducing the total available memory space. Cli when trying to remove the records are not available on a Serv, the records still exist on other Serv.
The solution is the same for all Cli must achieve a stable algorithm to convert the bond into an integer n, to select from a Serv Serv int list based on this real storage. Done by "hashing" algorithm. The algorithm int a configuration, when all the bits are mixed str will, ideally, in all modes str is eliminated.
8-2, the program is a loaded word in the English dictionary, the program as a key word, Distribution of these words on the four Serv.
The first algorithm alphabet view substantially equally divided into four portions, and the first letter of a word according to the key distribution;
the other two algorithms use a hash algorithm.

8-2 hashing.py 向服务器分配数据的两种机制---数据中的模式与散列值中的位
import hashlib

def alpha_shard(word):
    """Do a poor job of assigning data to servers by using first letters."""
    if word[0] < 'g':
        return 'server0'
    elif word[0] < 'n':
        return 'server1'
    elif word[0] < 't':
        return 'server2'
    else:
        return 'server3'

def hash_shard(word):
    """Assign data to servers using Python's built-in hash() function."""
    return 'server%d' % (hash(word) % d)

def md5_shard(word):
    """Assign data to server using a public hash algorithm."""
    data = word.encode('utf-8')
    return 'server%d' % (hashlib.md5(data).digest()[-1] % 4)

if __name__ == '__main__':
    words = open('/usr/share/dict/words').read().split()
    for function in alpha_shard, hash_shard, md5_shard:
        d = {'server0': 0, 'server1': 0, 'server2': 0, 'server3':0}
        for word in words:
            d[function(word.lower())] += 1
        print(function.__name__[:-6])
        for key, value in sorted(d.items()):
            print(' {} {} {:.2}'.format(key, value, value / len(words)))

hash () function is Py own built-in hash function that is used internally when Py dictionary lookup realize, fast, MD5 algorithm complexity, in fact, as a hashing algorithm for encryption has been too weak, but for load distribution Serv is good.

8.2.1. The results found in operation, if the method for distributing the load directly exposed patterns in the data, will be quite dangerous.

$ python hashing.py
alpha 
    server0 35285 0.36
...
md5
    server0 14777 0.25

1) When the first letter of load distribution, the first letter of 4 Serv is in charge of several substantially identical, but despite Serv0 only responsible 6 initials, and Serv3 responsible 7 initials, Serv0 load is three times Serv3 of
2) two the performance of a hash algorithm is perfect. Hash algorithm with no aid of any word related to the mode, the word will be equally distributed to the four Serv

8.2.2 If necessary designed to automatically load / data, assigned to the serving nodes in the cluster, to make a plurality of identical Cli allocation results are given for the same input data, using a hashing algorithm is useful in code of.

8.3 Message Queuing

Message Queuing protocol allows reliable transmission of data blocks, called: messages, not data packets (datagram). Data packets are used to refer to an unreliable service, the transport process, data may be lost, duplicated, rearranged.

8.3.1 Automatic message queue to ensure reliable transmission of the message: a message is either transmitted to the destination intact, either completely transmitted. Message Queuing Protocol will be responsible for sealing the frame. The use of message queues Cli never needed before receiving a complete message has been circulating in middle-aged and constantly calling recv ()

8.3.2 Point to Point message queue and IP-based TCP connection mechanism provides a different, use of inter-Cli message queue may be provided a variety of topologies.

消息队列可能的应用场景:

8.3.2.1. When using email in a web-site to register a new account, the page immediately returned "Thank you for registering." Users do not need to wait for the site transmitted by emal email service provider, it may take several minutes of time. The web-site is common practice, the email address is placed in a message queue, when the background Serv ready SMTP establish a connection for transmitting the acquired email address directly from the message queue. If it fails, email address directly back into the queue, after an interval longer experience retry

8.3.2.2. Basic services can call (RPC) as a custom remote procedure. Serv will allow the busy front end of some hard work, to be responsible for the back-end Serv. Serv distal end may be placed in the message queue request, tens / hundreds rear Serv will listen to the message queue, the rear end Serv after processing the request in the message queue, the response will be returned to the front end of the waiting Serv

8.3.2.3. Often requires a large capacity of some event data, as a small effective flow message, the message queue is stored centrally summarized and analyzed. Some of the web-site, mq has completely replaced the stored log-os i.e. local hard disk syslog such older log transport mechanism

8.3.3. MQ programming has an important feature, with a mix and match schedule all Cli Serv ability and / publisher and the subscriber processes. All connections to the same MQ-os

8.3.4. MQ use, to the program brought a revolutionary advances. Typical conventional procedure, a single program contains all features. Layer by layer from the API composition. A control thd might be responsible for all the API calls.

如,先从socket读取HTTP数据,然后进行认证,请求解析,调用API进行特定的图像处理,最后将结果写入磁盘中。该控制线thd使用的所有API必须存在于同一台机器上,且被载入到同一个Py运行实例内。

Once able to use MQ, wondering:
Why such a computationally intensive image processing, and professional network of invisible work, to be shared with front-end CPU and disk HTTP service?
1) not big, powerful machines use different libraries installed to build the service, and the use of some machines dedicated to a single purpose, the collection of these machines to the cluster together to provide a service.
2) as long as the operation and maintenance responsible colleagues understood MQ transmission topology, and to ensure that separated during Serv, no message loss, unloading, mounting, reloading Serv image processing, it will not affect the front end located MQ HTTP server load balancing pool
all MQ supports a variety of topologies

. 8.3.4.1 pipe (Pipeline) topology is probably the most straightforward mode for Queue: Manufacturer create a message, the message is submitted to a queue, the customer receiving the message from the queue.

如,一个照片分享网站的前端Serv,会将用户上传的图片,存储在一个专门用于接收文件的内部队列中。包含许多缩略图生成工具的机房,会从MQ中读取图片。每个图像处理Serv每次从MQ中接收一条消息(消息中包含需要生成缩略图的图片),为其生成缩略图。站点繁忙时,MQ在运行过程中,可能会越来越长;站点较为空闲时,MQ会变短/再次清空。无论站点是否繁忙,前端Serv都可以直接向等待的Cli返回一个响应,告诉用户,上传已经成功,且很快就能在照片流中看到照片

8.3.4.2 Publisher - Subscriber (publisher-subscriber) / fan-out (fanout) topology, the pipeline looks similar, important differences. pipe of MQ queue to ensure that each message will only be sent to a consumer (the same picture sent two image Serv is a waste), but the sub usually want to accept all messages in MQ. Another method, a filter is provided by the sub, through a specific format, the message defining a range of interest. This type of MQ can be used, we need to push the event of an external service to the outside world.

Serv room can use this queue system to start the system for which, because down for maintenance notification. In addition, also when other MQ creation and destruction, release their IP

. 8.3.4.3 Request - Response (request-reply) mode, the most complex patterns. The reason is that the message needs to be back and forth. The first two modes, news producers work very little.

Manufacturer connected to the MQ, and then send the message. However, a request to initiate the MQ-Cli need to stay connected, and waits to receive a response. To support this, MQ must provide some sort of addressing scheme from thousands have been connected, and the Cli still in waiting, to find the right Cli, sends a response to the Cli.
But this complexity, the request - response model to become the most powerful mode. Allow several tens / hundreds Cli request to a large number of evenly distributed Serv, in addition MQ is provided, the need to do any other work.
An excellent MQ, allows Serv without missing a message, bind to the MQ / unbind, this also allows Serv MQ behavior when maintenance is required to close on Cli machine remains invisible.

8.3.5. Req-rep of MQ, is able on a machine, a plurality of lightweight thd large run (such as a network head by many threads Serv) one connected to the DB-Cli / file is Serv good way. DB-Cli / Serv file may need to be called, the front end Serv some alternative high-load operation.

req-rep适用于RPC机制,且提供了普通RPC系统没有提供的额外优点:

Many consumer / producer, can use fan-in / fan-out mode of operation, bound to the same queue, and different modes for Cli is not visible.

8.3.6 Using message queue in Python

8.3.6.1. The most popular MQ is implemented as a separate Serv. When building the program, all of the components in order to complete various tasks of choice (such as producers, consumers, filters, RPC service) can be bound to MQ, do not know each other and each other's address, they do not even know each other's identity.

AMQP协议是最常见的跨语言MQ协议实现之一,可安装许多支持AMQP协议的开源Serv,如RabbitMQ、Apache QpidServ等

8.3.6.2. Many programmers never learn some messaging protocol. Instead, it will rely on some of the three parties to the library, to encapsulate the important functions MQ provides easy to use API.

如,许多使用Django的Py程序,会使用Celery分布式任务队列,不去学习AMQP协议。这些库可支持其他后端服务,使其不依赖于特定的协议。

Celery may be used Redis simple key-value store, as the MQ, message without using a special mechanism.

8.3.7. In order to better focus on elaborate, no need to install using a full-featured, independent MQ-Serv example, it will be more convenient. ∅MQ (Zero Message Queue), proposed by the company to develop AMQP will provide task smart messaging mechanism to each MQ-Cli process is complete, there is no process to a centralized Serv.

Just ∅MQ library into their own in each program, we can simultaneously build messaging, without the use of a centralized Serv. Serv centralized architectures based, there are many different reliability, redundancy, retransmission persistent storage disk and so on.
8-3 use a simple example, may not be efficient Monte Carlo method is π Calcd.
Topology messaging is important, Figure 8-1 shows the structure:
bitsource ------------> publish, subscribe '00' -> always_yes --------- PUSH PULL -> TALLY
/ -----------> publish, subscribe '01''10''11' -> judge ----------> request, response -> Pythagoras
/ - ---------> the PUSH PULL-> TALLY
1) bitsource program generates a length of str 2π, str composed of 0 and 1. Wherein n is an odd bit using a digital representation x coordinate, using even parity of n-bit digital representation of the y-axis coordinate (coordinate values are unsigned integer).
The corresponding point coordinate value, falls in the center of the coordinate origin ?, quadrant of the coordinate value of the radius of the n-bit integer to 1/4 circle
2) publisher - MQ subscribers structure, build two monitoring module, generates binary on bitsource str listening axis.
3) always_yes monitoring module, only accept str starting at 00, and then generates a direct result Y, and pushed to tally module.
4) If the first two bits are 0 str, then the x-axis and y-axis coordinates must have less than half of the maximum coordinates, the corresponding point must lie within the first quarter of a circle quadrant
5) If the start 01 str / 10/11, must be addressed by the judge module, the real test.
6) judge module requests pythagoras module, calculates coordinate values of the two squares and the int is determined whether the corresponding point in the first quarter of a circle quadrant, the structure of the T / F pushed to the output queue.
7) the lowermost tally module, receiving each random string generated by the T / F,, π values can be estimated for the ratio of the total number of T and F by calculating the number of T.
8-3 realize the topology comprises five modules, a continuous operation 30s, the program needs ∅MQ:

# 8-3 连接5个不同模块的∅MQ消息机制 queuecrazy.py
# !/usr/bin/env python
# -- coding: utf-8 --
import random, threading, time, zmq


B = 32 # number of bits of precision in each random integer

def ones_and_zeros(digits):
    """Express 'n' in at least 'd' binary digits, with no special prefix."""
    return bin(random.getrandbits(digits)).Istrip('ob').zfill(digits)

def bitsource(zcontext, url):
    """Produce random points in the unit square."""
    zsock = zcontext.socket(zmq.PUB)
    zsock.bind(url)
    while True:
        zsock.send_string(ones_and_zeros(B * 2))
        time.sleep(0.01)

def always_yes(zcontext, in_url, out_url):
    """Coordinates in the lower-left quadrant are inside the unit cicle."""
    isock = zcontext.socket(zmq.SUB)
    isock.connect(in_url)
    isock.setsockopt(zmq.SUBSCRIBE, b'00')
    osock = zcontext.socket(zmq.PUSH)
    osock.connect(out_url)
    while True:
        isock.recv_string()
        osock.send_string('Y')

def judge(zcontext, in_url, pythagoras_url, out_url):
    """Determing whether each input coordinate is inside the unit circle."""
    isock = zcontext.socket(zmq.SUB)
    isock.connect(in_url)
    for prefix in b'01', b'10', b'11':
        isock.setsockopt(zmq.SUBSCRIBE, prefix)
    psock = zcontext.socket(zmq.REQ)
    psock.connect(pythagoras_url)
    osock = zcontext.socket(zmq.PUSH)
    osock.connect(out_url)
    unit = 2 ** (B * 2)
    while True:
        bits = isock.recv_string()
        n, m = int(bits[::2], 2), int(bits[1::2], 2)
        psock.send_json((n, m))
        sumsquares = psock.recv_json()
        osock.send_string('Y' if sumsquares < unit else 'N')

def pythagoras(zcontext, url):
    """Return the sum-of-squares of number sequences."""
    zsock = zcontext.socket(zmq.REP)
    zsock.bind(url)
    while True:
        numbers = zsock.recv_json()
        zsock.send_string(sum(n * n for n in numbers))

def tally(zcontext, url):
    """Tally how many points fall within the unit circle, and print pi."""
    zsock = zcontext.socket(zmq.PULL)
    zsock.bind(url)
    p = q = 0
    while True:
        decision = zsock.recv_string()
        q +=1
        if decision == "Y":
            p += 4
        print(decision, p / q)

def start_thread(function, *args):
    thread = threading.Thread(target=function, args=args)
    thread.daemon = True # so you can easily Ctrl-C the whole program
    thread.start()

def main(zcontext):
    pubsub = "tcp://127.0.0.1:6700"
    reqrep = "tcp://127.0.0.1:6701"
    pushpull = "tcp://127.0.0.1:6702"
    start_thread(bitsource, zcontext, pubsub)
    start_thread(always_yes, zcontext, pubsub, pushpull)
    start_thread(judge, zcontext, pubsub, reqrep, pushpull)
    start_thread(pythagoras, zcontext, reqrep)
    start_thread(tally, zcontext, pushpull)
    time.sleep(30)

if __name__ == '__main__':
    main(zmq.Context())

Each thd have created their own more than one / socket for communication, because, trying to get two threads share a mq-socket unsafe.
Between multiple thd do share a common context (context) objects, to ensure that all thd exist within the same URL and MQ space. Just create a ∅MQ context for each prc can.

8.3.8. Similarly socket normal operation although this name and method name recv () and send () which provides the socket, but with different semantics. The message is stored in an orderly and never repeated. In a continuous stream of messages, the message is divided to a plurality of separate messages, and will not be lost.

This embodiment can be summarized in a few code message mode is typically used with most major message queue provided.
and a connection between the judge and always_yes bitsource, constitutes a pub-sub system.
In this system, all connected Cli will receive a message from the publisher (concerns message is not sent to the corresponding subscribers). Each filter is provided ∅MQ-socket will receive all of the initial two characters match the message filtering str. Str ensures that all generated by the two receive sub bitsource is because "00" "01" "10" "11" which covers the filter 4 at the beginning of all possible combinations of two str.

8.3.9. Between the judge and pythagoras, it is typical of the RPC-req-rep relationships. Creating a REQ-socket of Cli, you must first initiate a request to send a message to bind to a certain wait for the socket in the proxy. Mechanism automatically request message implicitly add a return address. Once the agent completed its work, and appropriate sent, the return address can be used by REP-socket sends a response to the correct Cli (even dozens / one hundred Cli bound to the socket, can work correctly)

. 8.3.10 tally worker process explains the push - pull works (push-pull) mode. We can ensure that each is pushed, and only one will be received by a proxy is connected to the socket. If multiple tally-prc, from the upstream data it will only be sent to one of the tally-prc, calculates π respectively.

∅MQ不需考虑bind()和connect()的调用顺序。如果某个通过URL描述的终端后才会启动,那么∅MQ会通过延时设置是和轮询,不断地隐式重试connect()。即使在程序运行过程中,某个代理掉线,∅MQ也能保持健壮性
$ python queueapi.py
...
Y 3.14...

Need to ensure transmission of the message, the message can not be processed, they need to be persistently saved, in addition, the need for some traffic control, to ensure that when the agent is slow, but also the number of processing queues is a message waiting state.
Usually it requires the use of some of the more complex patterns. When fully functional MQ use RabbitMQ, Qpid and Celery underlying Redis, just do a small amount of work, but it can guarantee a very low error rate

8.4. Summary

Data 8.4.1. Memcached RAM using all idle mounted on its Serv build a large LRU cache, so long as the program need to remove / replace obsolete records / to be processed will expire over a fixed and predictable time, Memcached can greatly reduce DB / operating load of other back-end storage. A plurality of different places in the process may be inserted Memcached.

如,保存一个花销很大的DB查询结果,不如直接将最终生成的网络图形界面元素,存入缓存。

8.4.2. MQ is another in different parts of the program, provide mechanisms for collaboration and integration capabilities. In collaboration with the integration process, it may require different hardware, load balancing technology, platforms, programming languages. Ordinary TCP-socket function can only provide point to point connections. However, MQ message can be sent to multiple users in / Serv wait state. MQ also be used DB / other persistent storage mechanism to store messages, guaranteed not to lose time Serv does not start properly.

8.4.3. Due to a temporary part of the OS is called a performance bottleneck, allowing the MQ message waiting service stored in the queue, and therefore also provides MQ recoverability and flexibility. MQ hidden Serv / prc for the particular type of request, the service, the connection is disconnected in Serv, Serv upgrade, restart and reconnection Serv Serv time without informing the rest of the OS

8.4.4. May be used by MQ friendly API, such as in Django Celery, Redis can be used as the back end. Redis and Memcached as key-value pairs are maintained, but it will be key to the persistent storage, and database similar. Redis and MQ are similar in that both support FIFO

8.4.5. Stack Voerflow the latest culture has always been a strong packaging materials. In some solutions obsolete, and some new method appears, the answer will be constantly updated.

Guess you like

Origin www.cnblogs.com/wangxue533/p/12162139.html