Team Statistics

Icon display data

1. Data Pretreatment

Data preprocessing is necessary in the process of data classification or grouping of money made, including: audit data, filter, sort.

1.1 Data Review

Data integrity: check the unit or individual should investigate whether there are any omissions, whether all of the research project to fill in all.
Data accuracy: checking the data for errors, if there is an abnormal value. For the outliers should be carefully screened.

Used data: practicality, timeliness recognition.

1.2 data filtering

To identify certain types of data that meet specific criteria
screened based on a single condition

import  pandas as pd
df1 = pd.read_csv('three_test.csv')
df1[df1['统计学成绩']>75]

   姓名  统计学成绩  数学成绩  英语成绩  经济学成绩
1  王翔     91    75    95     94
3  李华     81    60    86     64
5  宋媛     83    72    66     71
7  陈风     87    76    92     77

Screening based on multiple criteria

import  pandas as pd
df1 = pd.read_csv('three_test.csv')
df1[(df1['统计学成绩']>75) & (df1['数学成绩']>75)]
   姓名  统计学成绩  数学成绩  英语成绩  经济学成绩
7  陈风     87    76    92     77

1.3 Sorting data

The data is arranged in a certain order

df = df.sort_values('统计学成绩',ascending = False)
df
   姓名  统计学成绩  数学成绩  英语成绩  经济学成绩
1  王翔     91    75    95     94
7  陈风     87    76    92     77
5  宋媛     83    72    66     71
3  李华     81    60    86     64
4  赵颖     75    96    81     83
6  袁方     75    58    76     90
0  张松     69    68    84     86
2  田雨     54    88    67     78

2. The quality of finishing and presentation of data

2.1 the frequency and frequency distribution

Frequency: The number of data falls in a particular category or group of
frequency distribution of: wherein each category and falls corresponding to the number of all frequency lists, in tabular form and manifested

2.2 illustrates the classification data

1. bar
2. Pareto Chart
3. pie
4. FIG annular

3. organize and display of numerical data

3.1 Data Packet:

When using the group from the packet does not need to follow the principle of weight does not leak,
the variable value x after the packet satisfies a <= x <b (left and right open and closed)
1. Data packet: Histogram
2. No packet data: FIG stem and box plot
3. time series data: FIG line
4. multivariate data shown: Scatter, bubble, radar

4. rational use of charts

From Statistics Statistics study groups: [Wood East by the lay public] No. sponsor periodic
data with interested partners can exchange this study

Time constraints, some places still lack of code implementation, post-add.

Guess you like

Origin www.cnblogs.com/youchi/p/11789742.html