Data processing tools python - pandas (the data frame structure sequence)

   The core module is the operation target sequence Pandas (Series) and data frames (Dataframe). Sequence can be interpreted as a set of data fields, the data frame comprising at least two fields is a value (or sequence)

The data set.

 

Construction sequence

1 is constructed by a list of tuples or homogeneous

2. By constructing dictionary

3. The one-dimensional array constructed in numpy

4. Through a column of data block Dataframe construct

E.g:

PANDAS AS pd Import
Import numpy AS NP
GDP1 = pd.Series ([2.8,3.01,8.99,8.59,5.18])
GDP2 = pd.Series ({ 'Beijing': 2.8 'Shanghai': 3.01, 'Guangdong': 8.99 , 'Jiangsu': 8.59, 'Zhejiang': 5.18})
GDP3 = pd.Series (np.array ((2.8,3.01,8.99,8.59,5.18)))
Print (GDP1)
Print (GDP2)
Print (GDP3)

out:

2.80 0 
1 3.01 
2 8.99 
3 8.59 
4 5.18 
dtype: float64 
Beijing 2.80 
Shanghai 3.01 
Guangdong 8.99 
Jiangsu 8.59 
Zhejiang 5.18 
dtype: float64 
0 2.80 
1 3.01 
2 8.99 
3 8.59 
4 5.18 
dtype: float64

Shown above: either list, tuples, or one-dimensional array , a configuration of the sequence pattern will produce two results, a first column index column belonging sequences (also to be understood as the line number)

Automatically starting from 0, the second column is the actual value of the sequence. A dictionary structure of a print style is the second sequence, still contains the two, the difference is not the first column is the line number , but the specific row name (label), the corresponding keys to the dictionary , the second column is the actual value of the sequence, to a value corresponding to the dictionary;

 

Sequence of one-dimensional array has a high similarity, acquiring a one-dimensional array element index of all methods can be used in sequence, and an array of mathematical and statistical functions also can be used in sequence. Further sequences have more other treatments, as follows:

PANDAS AS pd Import
Import numpy AS NP
GDP1 = pd.Series ([2.8,3.01,8.99,8.59,5.18])
GDP2 = pd.Series ({ 'Beijing': 2.8 'Shanghai': 3.01, 'Guangdong': 8.99 , 'Jiangsu': 8.59, 'Zhejiang': 5.18})
GDP3 = pd.Series (np.array ((2.8,3.01,8.99,8.59,5.18)))
# Print (GDP1)
# Print (GDP2)
# Print ( GDP3)
print ( 'style line number sequence: \ n', gdp1 [[ 0,3,4]]) # remove the first 145 elements of GDP1
print ( 'sequence line style name: \ n', gdp2 [[0,3,4]]) # remove GDP2 first 145 elements    extracted key is the dictionary of
print ( 'sequence line style name: \ n', gdp2 [[ ' Shanghai' ' Jiangsu ',' Zhejiang ']]) #  fetched key is to
print: the value (' by numpy function \ n ', np.log (gdp1) ) # can be taken directly corresponding manner corresponding to the result of the function by numpy manner required by its index
print ( 'by numpy function: \ n-', np.mean (GDP1))     
print ( 'method sequence: \ n', gdp1.mean ()) # can be seen that the sequence of the method also supports the use of

out:

The line number style sequence: 
  0 2.80 
3 8.59 
4 5.18 
dtype: float64 
row name style sequence: 
 Beijing 2.80 
Jiangsu 8.59 
Zhejiang 5.18 
dtype: float64 
row name style sequence: 
 Shanghai 3.01 
Jiangsu 8.59 
Zhejiang 5.18 
dtype: float64 
by numpy function: 
 0 1.029619 
. 1 1.101940 
2 2.196113 
. 3 2.150599 
. 4 1.644805 
DTYPE: float64 
by numpy function: 
 5.714 
methods sequences: 
 5.714

Note that the code for the above points, if the sequence is the name of the style line, may be used position (row number) of the index, and can use the tags (line name) index; if the sequence is required operand sequence function, generally preferred numpy module, the module because the relative lack of pandas in this regard; if it is a sequence for statistical computing, both the function numpy module, the method can also be used in the sequence. General choice

Sequence method, because the method is more rich sequence number, computing a skewness, kurtosis sequence while Numpy no such function.

 

Data frame structure

   Data is essentially a set of data, each row represents an observation data set, the data set represents a column for each variable. In a data box can store different types of data sequences, such as integer, floating point, character and datetime type, but arrays and sequences have no such advantage, because they can only store homogeneous data. A configuration database can be applied in the following manner:

1. nested lists or tuples configuration

2. dictionary structure

3. By two-dimensional array structure

4. By reading the configuration of external data.

Example:

 

PD PANDAS AS Import
Import numpy AS NP
DF1 = pd.DataFrame ([[ 'John Doe', 23, 'M'], [ 'John Doe', 27, 'M'], [ 'Wang Wu', 26 ' female ']])
DF2 = pd.DataFrame ({' name ': [' Joe Smith ',' John Doe ',' king of five '],' age ': [23,27,26],' sex ': [ 'M', 'F', 'F']})
DF3 = pd.DataFrame (np.array ([[ 'John Doe', 23, 'M'], [ 'John Doe', '27 ',' F '], [' Wang Wu ', 26,' M ']]))
Print (' nested list structure data block: \ n-', DF1)
Print ( "dictionary data block structure: \ n-', DF2)
Print ( 'two-dimensional array structure data block: \ n', df3)

out:

Nested list structure data box: 
     0 1 2 
0 Joe Smith 23 Male 
1 Female 27 John Doe 
2 king five 26 female 
Dictionary structure data box: 
    Name Age Sex 
0 Joe Smith 23 Male 
1 Female 27 John Doe 
2 king five female 26 
-D array data structure box: 
     0 1 2 
0 Joe Smith 23 Male 
1 female 27 John Doe 
2 Wangwu 26 women

Configuration data frame needs to use Pandas DataFrame function module, or if the tuple by a nested list structure data block, the need of each row seat observation data frame or the list of elements nested tuple; if configured by a two-dimensional array data block, it is necessary to write to the row of the array of each row of the data frame; if the data block is configured by dictionary, the dictionary key frame data constituting the variable name, the observed value corresponding to the configuration data. Although the above code can be configured to block data, but say nested list, tuple or two-dimensional array is converted into a data block, the data block is no specific variable names, only from 0 to N column numbers. So, if you need to manually construct the data frame, then generally the preferred method dictionary.

 

Read external data

Read the contents of external data to construct a data frame will be more, come back the next record ...

 

 

Guess you like

Origin www.cnblogs.com/tinglele527/p/11760822.html