[Study notes] Chapter VII python3 core technology and practice - Input and Output

[Chapter VI] Questions answer, for reference only:

# coding:utf-8
import time
#方法一
start_time = time.perf_counter()
s = ''
for n in range(0,100000):
s +=str(n)
end_time = time.perf_counter()
print('time elapse:{}'.format(end_time - start_time))

print('*'*50)

#方法二
start_time = time.perf_counter()
l = []
for n in range(0,100000):
l.append(str(n))

l = ''.join(l)
end_time = time.perf_counter()
print('time elapse:{}'.format(end_time - start_time))

print('*'*50)

#方法三
start_time = time.perf_counter()
s = ''.join(map(str,range(0,100000)))
end_time = time.perf_counter()
print('map elapse:{}'.format(end_time - start_time))

operation result:

time elapse:0.31916597257704393
**************************************************
time elapse:0.1134648720658673
**************************************************
map elapse:0.04524811079303531

A third method is more preferred;

Input and output

  Forum on the turn of the century there was a catchword: On the Internet, nobody knows you're a dog. When the Internet is just emerging, a cable link to your home, information through this high-speed cable directly to your screen, you rapid response to a friend's message via the keyboard, the information again flying into the intricacies of the virtual world through a network cable, and then enter a friend's house.

  Abstract point of view, a station computer is a black box, black box with inputs and outputs, it has the necessary conditions for the operation of the Turing machine.

  Python program is also a black box: the data delivered by the input stream, sent by the data output stream processing, may Python interpreter hid behind a person, or a Slytherin? No one cares.

First, the basic input output

The most simple and direct input from the keyboard, such as the following example.

INPUT = name ( 'your name:')
Gender = INPUT ( 'are you A Boy (Y / n-)?')

###### input ######
your name: Jack
you are A Boy?

welcome_str = 'The Matrix available for purchase to prefix {{name}}.'
welcome_dic = {
'prefix': 'is Mr.' IF Gender == 'Y' the else 'of Mrs',
' name ': name
}

Print (' Authorizing ... ')
Print (welcome_str.format (** welcome_dic))

########## output ##########
Authorizing ...
available for purchase to The Matrix is Mr. Jack.

INPUT () function pause the program running while waiting for keyboard input; until enter is pressed, the function parameter is the prompt, enter the type is always string (str). Note, where beginners can easily make a mistake, I will talk about the following example. print () function accepts the strings, numbers, dictionary, list some output even custom class.

Let's look at the following example.

INPUT = A ()
. 1
B = INPUT ()
2

Print ( 'A + B = {}'. the format (A + B))
########## ######### output #####
A + B = 12 is
Print ( 'type {IS} of A, of B type {IS}'. the format (type (A), type (B)))
######### # output ##############
type of IS A <class 'STR'>, of type B IS <class 'STR'>
Print ( 'A + B = {}'. the format ( int (A) + int (B)))
########## output ##############
A + B =. 3

[Note], the cast str Please use int int (), converted to floating-point numbers please use the float (). The use of cast in a production environment, remember to add try except (that is, error and exception handling, columns will be mentioned later in the article).

Python is no maximum limit on the type int (In contrast, C ++ int's largest 2147483647, exceeding this number will overflow), but still have to type float accuracy limitations. These features, in addition to some algorithms race to pay attention, but also beware of the time in a production environment, to avoid because of the unclear boundary conditions determine bug caused even 0day (critical security holes).

We look back at currency circle. Around at 11:30 on April 23, 2018, BEC smart tokens contract being attacked by hackers. Hackers use data overflow vulnerabilities, the company BEC US chain of attacks in cooperation with the beauties of intelligence contract, the successful transfer to address the two-day token amount level of BEC, BEC lead to massive market sell-off, the digital currency the value is also almost zero, BEC market transactions to bring a devastating blow.

Thus, although the input and output processing type things simple, but we must be cautious. After all, a significant percentage of vulnerabilities are from random I / O processing.

Second, file input and output

Command line input and output, only the most basic way to interact Python for some simple interactive applet. The production-level Python code, most of the I / O comes from files, network, messages from other processes, and so on.

Next, we analyze in detail a text file read and write. Suppose we have a text file in.txt, reads as follows:

I have a dream that my four little children will one day live in a nation where they will not be judged by the color of their skin but by the content of their character. I have a dream today.

I have a dream that one day down in Alabama, with its vicious racists, . . . one day right there in Alabama little black boys and black girls will be able to join hands with little white boys and white girls as sisters and brothers. I have a dream today.

I have a dream that one day every valley shall be exalted, every hill and mountain shall be made low, the rough places will be made plain, and the crooked places will be made straight, and the glory of the Lord shall be revealed, and all flesh shall see it together.

This is our hope. . . With this faith we will be able to hew out of the mountain of despair a stone of hope. With this faith we will be able to transform the jangling discords of our nation into a beautiful symphony of brotherhood. With this faith we will be able to work together, to pray together, to struggle together, to go to jail together, to stand up for freedom together, knowing that we will be free one day. . . .

And when this happens, and when we allow freedom ring, when we let it ring from every village and every hamlet, from every state and every city, we will be able to speed up that day when all of God's children, black men and white men, Jews and Gentiles, Protestants and Catholics, will be able to join hands and sing in the words of the old Negro spiritual: "!!! Free at last Free at last Thank God Almighty, we are free at last"

well, let we do a simple NLP (natural language processing) tasks. If you do not understand this has no effect, I will take you step by step to complete this task.

First of all, we need to know the basic steps NLP tasks, that is, the following four steps:

Read the file;
remove all punctuation and line breaks, and all uppercase to lowercase;
Merge same words, the statistical frequency of occurrence of each word, and word frequency in accordance with descending order;
the results by line output to a file out .TXT.
You can think about yourself first, how to solve this problem in Python. Here, I give my code, along with detailed notes. We look at this code together.

Re Import
# you should not be too concerned about this function
DEF the parse (text):
# use regular expressions to remove punctuation and line breaks
text = re.sub (r '[^ \ w]', '', text)

# lowercase
= text.lower text ()

# generate a list of all the words
WORD_LIST = text.split ( '')

# remove blank word
WORD_LIST = filter (None, WORD_LIST)

# word frequency dictionary words and generating
word_cnt = {}
for Word in WORD_LIST:
Word Not in word_cnt IF:
word_cnt [Word] = 0
word_cnt [Word] +. 1 =

# sorted according to word frequency
sorted_word_cnt the sorted = (word_cnt.items (), the lambda kV = Key: kV [. 1], Reverse = True)

return sorted_word_cnt

with Open ( 'in.txt', 'R & lt') AS FIN:
text = fin.read ()

word_and_freq = the parse (text)

with Open ( 'out.txt', 'w') as fout:
Word for, in word_and_freq FREQ:
fout.write ( '{} {} \ n'.format (Word, FREQ))

########## output (longer intermediate result is omitted) #### ######

and 15
BE 13
by Will 11
to 11
at the 10
of 10
a 8
WE 8
Day 6

...

Old 1
Negro 1
Spiritual 1
Thank 1
god 1
Almighty 1
are 1

you should not be too concerned about the parse () function specific implementation, you just need to know it to do is to input text string into a word frequency statistics sorted we need. And sorted_word_cnt is a tuple list (list of tuples).

First, we need to understand the basics of computer file access. In fact, the computer kernel (kernel) processing of the file is relatively complex, involving the kernel mode, the concept of a series of virtual file systems, locks and pointers, I will not go into the content of these talks, I can only say that some basic but adequate use of Know how.

We need to use open () function to get the pointer to the file. Wherein the first parameter specifies the file location (relative or an absolute position); a second parameter, if 'r' for read, if it is 'w' said writing, of course, also be used 'rw', represents reading and writing should be. a it is a less common (but useful) parameters represent an additional (append), so open files, if need be written, will be written from the beginning of the end of the original file.

Here I interject, Facebook at work, the code rights management is very important. If you only need to read the file, do not request written permission. To some extent this can reduce the risk of the entire system to bring bug.

Well, back to our topic. After the get pointer, we can () function by read, to read the entire contents of the file. Code text = fin.read (), i.e. showing all the contents of the file is read into memory, and assigned to the variable text. Do naturally pros and cons:

The advantage is convenience, then we can easily call the parse function analysis;
drawback is that if the file is too large, one-time reading may cause memory corruption.
In this case, we can specify the size parameter to read, for the reading represents the maximum length. Can also readline () function, each read a line, a practice commonly used in data mining (Data Mining) data cleansing, when writing small programs very lightweight. If there is no association between each line, this approach can also reduce the pressure of memory. And write () function, the string parameter may output to a file, is also very easy to understand.

Here I need to mention with a simple statement (mentioned in detail later). open () function corresponds to the close () function, that is to say, if you open the file, after the completion of the reading task, you should turn it off immediately. And if you use the with statement, you do not need to explicitly call close (). After the completion of tasks performed in the context with the, close () function is called automatically, the code also clean a lot.

Finally, it should be noted that all I / O error handling should be done. Because the I / O operation may have a variety of situations, while a robust (robust) program, need to be able to deal with various situations occur, it should not collapse (except in the case of intentional design).

Three, JSON serialization and Actual

Finally, I tell you a very close and practical application of knowledge.

JSON (JavaScript Object Notation) is a lightweight data interchange format, its design intent is to put everything represented by the string design, so convenient transmission of information on the Internet, but also to facilitate people to read (phase than some binary protocol). JSON In today's Internet is widely used, but also with every Python programmer should master the skill points.

Imagine a scenario, you want to buy a certain amount of the stock exchange. So, you need to submit ticker symbol, direction (buy / sell) order type (market value / price), price (if it is a limit order), the number and a series of parameters, and these data, there is a string, integers, floating point numbers there, and even Boolean variables, all mixed together is not easy to unpack Exchange.

Then how to do it?

In fact, we talk about JSON, being able to solve this scenario. You can put it simply interpreted into two black boxes:

A first input of these assorted information, such as Python dictionaries, outputs a string;
second, the input string may be output Python dictionary contains the original information.
Specific code as follows:

import json

params = {
'symbol': '123456',
'type': 'limit',
'price': 123.4,
'amount': 23
}

params_str = json.dumps(params)

print('after json serialization')
print('type of params_str = {}, params_str = {}'.format(type(params_str), params))

original_params = json.loads(params_str)

print('after json deserialization')
print('type of original_params = {}, original_params = {}'.format(type(original_params), original_params))

########## 输出 ##########

after json serialization
type of params_str = <class 'str'>, params_str = {'symbol': '123456', 'type': 'limit', 'price': 123.4, 'amount': 23}
after json deserialization
type of original_params = <class 'dict '>, original_params = { 'symbol': '123456', 'type': 'limit', 'price': 123.4, 'amount': 23}

wherein, json.dumps () This function, Python acceptable basic data type, then the sequence into String;

And json.loads () function, accepts a valid string, then deserialized into basic data types Python.
Is not it simple?

But then again, please remember to add error handling. Otherwise, if only to give json.loads () sends an illegal string, and you did not catch that, the program will collapse.

At this point, you may be thinking, if I want to output the string to a file, or read JSON strings from a file, how should I do?

Yes, you can still use the above-mentioned open () and read () / write (), reads the first character string / output to the memory, then JSON encoding / decoding, of course, this is a bit cumbersome.

import json

params = {
'symbol': '123456',
'type': 'limit',
'price': 123.4,
'amount': 23
}

with open('params.json', 'w') as fout:
params_str = json.dump(params, fout)

with open('params.json', 'r') as fin:
original_params = json.load(fin)

print('after json deserialization')
print('type of original_params = {}, original_params = {}'.format(type(original_params), original_params))

########## 输出 ##########

after json deserialization
type of original_params = <class 'dict'>, original_params = {'symbol': '123456', 'type': 'limit', 'price': 123.4, 'amount': 23}


Thus, we clearly realize the simple process of reading and writing JSON string. When developing a third-party application, you can configure the output to a user's personal files via JSON, easy to read automatically the next time the program starts. It is now generally use mature approach.

So JSON is the only choice? Obviously not, it's just one lightweight applications most convenient choice. As far as I know, Google, has a similar tool called Protocol Buffer, of course, Google has a completely open source this tool, you can see how to use your own.

Compared to the JSON, it has the advantage of generating an optimized binary file, thus better performance. At the same time, the resulting binary sequence, can not be read directly. It has a wide range of applications in TensorFlow and many other system requirements for the performance.

IV Summary

In this lesson, we mainly learned Python's normal I / O and file I / O, while understanding the basics of JSON serialization, and further grasp through specific examples. Re-emphasized the following points should be noted:

I / O operations need to be cautious, be sure to fully error handling, and careful coding, to prevent coding vulnerabilities;
encoding, memory usage, and disk footprint to make a full estimate, making it easier to find the cause when an error occurs;
JSON serialization is a very handy tool, in conjunction with actual practice a lot;
the code be concise and clear, even the beginner stage, but also have a Marshal's heart.
Five Questions

The first Q: Can you put word count NLP example to achieve it again? But this time, in.txt may be very, very large (meaning you can not read into memory once), and output.txt is not great (the word means a lot to the number of repeats).

Tip: You may need to read each string of a certain length, processed, and then read the next. But if you simply divided according to the length, you may put a word partition open, it is necessary to carefully handle this border situation.

The second question: You should use a similar Baidu network disk, Dropbox and other network disk, but they may be limited space (such as 5GB). If one day you plan to transfer 100GB of data to companies at home, but you did not bring U disk, so you want an idea:

Each write to Dropbox from home network drive no more than 5GB of data, while the company's computer once the new data is detected, it is immediately copied to the local then erasing data on a network drive. And other home computers detected after all of this data into the company computer, then the next write-once, until all the data is transmitted in the past.

According to this idea, at home you plan to write a server.py, the company wrote in a client.py to achieve this demand.

Tip: We assume that each file no more than 5GB.

You can control the synchronization status by writing a file (config.json). However, be careful design state, there are possible race condition.
You can also directly detect whether a file is deleted, or whether to synchronize the state, which is the simplest approach.
Do not worry about the difficulty of the problem, write down what you enjoy thinking, the final code I will be ready for you.

 

Guess you like

Origin www.cnblogs.com/tianyu2018/p/10965469.html