--log file automated analysis of data

https://mp.weixin.qq.com/s?__biz=MjM5NjE2MTIyMw==&mid=2257483803&idx=1&sn=efe24b040397cde3c16b890faf7f7717&chksm=a597abb392e022a5c1af95448abd4447a565e35199c2cd3a2f7b8891e52a20075b6ac312477d&mpshare=1&scene=1&srcid=&key=b79bd25d83f240ad4ead35697faece9905fec7160f80f3e6376d128d62c40b2d2cd2c2dfd66f8e9d1e0d9883cc9b6c1ed121ab9fb6fd2735650d82881c2aa4cdb5466c8ff2a9d42e0f950b87b7d3d2e1&ascene=1&uin=MjkxOTg1MjM0MQ%3D%3D&devicetype=Windows+10&version=62060833&lang=zh_CN&pass_ticket=S2vPzhfsZuo41GgVm%2Bek%2FliLi7nmgHlTEw39G2Lj6C55DEWxmX9T49C45ZAKKswr

 

1. Ask a question

 

There will be a variety of problems during product testing, such as termination or cause the machine to yield the product itself quality problems. Usually the test will have a test recording, which is extracted from the log data required organized into reports. log is zip up the csv file, there are a number of log, each record which has a different message. log file name contains a time stamp, which comprises time machine log number, unit, product batch, and the reason, fail classification, control of information.

The results to be achieved: to generate graphic reports on a regular basis by e-mail automatically send reports to the relevant personnel.

2. Solutions

According to the time stamp using a bat script to extract the last two months of log files to a specified folder. Python then decompressed using pandas data extraction processing. Using the company's own database management software, raw data filtering, and the use of software integration R, Python and other tools generate graphical reports, and then integrated tools to operate on the server, send a report on a regular basis.

3. The specific implementation process

Bat script file to extract the last two months:

@echo ON
xcopy /s /y "d:\event\log\log_201906*.zip" "C:\Users\sanmy\project\logs\"
xcopy /s /y "d:\event\log\log_201905*.zip" "C:\Users\sanmy\project\logs\"

Python carried zip unzip

def file_name(file_dir):
L=[]

for root, dirs, files in os.walk(file_dir):

for file in files:

if os.path.splitext(file)[1] == '.zip':

L.append(os.path.join(root, file))

return L

file_dirs=r'C:\Users\sanmy\project\logs\' t=file_name(file_dirs)for i in t: x=i[:-3]+'csv'for i in t: myzip=ZipFile(i) #print(myzip) #print(my_file) f=myzip.open(i[13:-3]+'csv') file=pd.read_csv(f) # name file link to get_data() function get_data()

Pandas using data extraction (Code omitted here ......)

def get_data():
……
……
log=file[['MC','A','action','time','year','month','day','times','dates','Fail']]
log.to_csv(r'.\logs.csv',mode='a')

4. The report organize and send mail

Finally, the extracted data, collate, graphic words, and generate graphics files, send messages to the relevant personnel through software.

5. Results and reporting

This picture shows excel made, the data is coined, as merely one example. In fact it generated a number of graphic reports.

As can be seen in Figure scrap machine MC1 is the worst, the machine may be a problem.


to sum up:

Encountered many asked not previously encountered in the implementation process of the project, such as a file copy method, because in the same folder below, there are many other names of different files and file types, and finally achieved using a bat script It can be considered to meet the basic needs, but after some time they need to change the copy file timestamps. Support for python in the company's database processing software is not clear, then I ask a lot of people finally realized. For picture report on the integration of database software R language generated in the sort, they also encounter problems, not always the size of the amount of data to be sorted, but in the X axis to sort, try it herself, slowly figured it out .

While this is not a big project, but the Ins and Outs spent almost three months. Mainly in the spare time to do, but I also learned a lot myself. The main spending time in the use of pandas processing csv file, although this place is only two hundred lines of code, but which involves a lot of basic knowledge and new knowledge have not used before.

There are places that will need to implement all of these functions deployed to the server up, because running environment, different configuration, when debugging spent a lot of time, but also to meet a lot of minefields. But these through patience and learning to ask, are all done.

The most tortuous is a place would have been a source of data has been tidied JMP (JMP do not know can go to Baidu) file, the file itself can be made directly to the graphics, but the drawback is not directly supported by database software company, we need to JMP converted into a csv file. Later tried jsl script using JMP runs on the server, to achieve the conversion and then use the corporate database software for analysis, and finally implement the same functionality. But just a few days to realize the company does not maintain the JMP file, all they toss back.

8:00 final program run time every day, and the timing of sending the report to the relevant art. To the problem machine or some other indicators to monitor abnormal timely action to improve product yield, while reducing maintenance costs.

 

Guess you like

Origin www.cnblogs.com/zkwarrior/p/11087888.html
Recommended