pandas read excel or csv

PD PANDAS AS Import 

"" "
PANDAS DOC:
df.dtypes See column data for each data type
ID Int64
X0 float64
df.reindex see how many rows of the data structure of how many columns [Columns 21 is 569 rows X]>
df.reindex_axis view the data with the data ranks
df.info () See non-null data in each column where
df.head (1) .values line data before the index view. Five default taken before
[[1.33000e + 02 4.49512e-01 nan 4.13178e-01 3.03781e-01
01 -1.84227e-01--1.23848e -2.19076e-01 2.68537e-01 1.59960e-02
-7.89267e-01 -3.37360e-01 -7.28193e-01 -4.42587e-01 -2.72757e-01
-6.08018 01 -5.77235e-01-E -5.01126e-01 1.43371e-01 -4.66431e-01
-5.54102e-01]]
......
......
df.tail (. 1) Check the last row data
df [n: n + k] .values view taken two-dimensional array of rows n to n + k data
df turn numpy
df.to_numpy()[:2]取前2行:
[[ 1.330000e+02 4.495120e-01 nan 4.131780e-01 3.037810e-01
-1.238480e-01 -1.842270e-01 -2.190760e-01 2.685370e-01 1.599600e-02
-7.892670e-01 -3.373600e-01 -7.281930e-01 -4.425870e-01 -2.727570e-01
-6.080180e-01 -5.772350e-01 -5.011260e-01 1.433710e-01 -4.664310e-01
-5.541020e-01]
[ 2.730000e+02 -1.245485e+00 -8.423170e-01 -1.255026e+00 nan
-4.263010e-01 -1.088781e+00 -9.763920e-01 -8.988980e-01 9.834960e-01
4.570200e-02 -4.936390e-01 3.486200e-01 -5.524830e-01 -5.268770e-01
2.253098e+00 -8.276200e-01 -7.807390e-01 -3.769970e-01 -3.102390e-01
1.763010e-01]]
......
......
......
df.describe (): Get the number of the data analysis of each column, mean, standard deviation, min, max, 25 bits 50, 75
ID X0 ... X18 X19
COUNT 569.000000 5.690000e + 02 5.690000e + ... 5.690000e + 02 02
Mean 284.000000 2.811951e-08 ... -5.272408e-09 1.230228e-08
std 164.400426 1.000880e + 00 ... + 00 1.000880e 1.000880e + 00
min 0.000000 -2.029648e + 00 ... + 00 + 00 -1.096968e -1.532890e
25% 142.000000 -6.893850e-01 ... -6.516810e-01 -5.851180e-01
50% 284.000000 -2.150820e-01 ... -2.194300e-01 -2.299400e -01
75% 426.000000 ... 4.693930e-01 3.556920e-01 2.886420e-01
max 568.000000 ... 3.971288e + 00 7.071917e + 00 9.851593e + 00

data transposition: row transfer column df.T
df.sort_values (by, axis = 0, ascending = True, inplace = False, kind = 'quicksort', na_position = 'last') according Sort: axis 0 row, 1 column
Default axis = 0 according to the column index is sorted, If specified by the sorting in a corresponding column, axis when 1 =, by fill line according to a row name
ascending: Boolean, True then ascending order, may be [True, False], i.e. the first field in ascending, descending the second
... ...
......
data selection:
. DF [ 'X0'] to select a column of data values
df [: n] The first two lines of data
or use df.loc [0] 0 line to take the index data
df.loc [ 0: 1] to get the index 0 1 data line
df.loc [:, "x0": "x3"] :, take a separate column for all rows with df.loc [:, "x3"]
taken to all rows x0 x3 between the column
only take a few columns such as X0, X7: df.loc [:, [ "X0", "X7"]]
df.iloc [0] to take the index line
df.iloc [0: 4] three opening and closing the left and right rows
take column df.iloc: print (df.iloc [:, 0: 4]) does not support the first three columns of "column name"
ID X0 X1 X2
0 133 0.449512 0.413178 NaN3
. 1 273 -1.245485 -0.842317 -1.255026
2175 -1.549664 -1.126219 -1.546652
take a second column of the first row:
Print (df.iloc [1,2]) or Print (df.at [0, "X2"])
......
... ...
missing values filled:
Print (df.fillna (value =. 5)) NaN3
0 ... 133 .449512 5.000000 0.413178 -0.466431 -0.554102 -0.501126 .143371
. 1 ... 273 -1.245485 -0.842317 -1.255026 -0.780739 -0.376997 - 0.310239 0.176301
......
......
mean See:
Print (df.mean ()) mean (0) a row

sums column:
df.loc [ "rows_sum"] = df.apply (the lambda X: x.sum ())
Print (DF)
567 2.0 1.579888 1.424827 .237036 .293559 .456187 ...
568 ... 39.0 -0.183840 .356123 0.133639 -0.819980 -0.229940
rows_sum 161596.0 0.000016 1.247216 ... -0.000005 -0.000003 0.000007
求和行:
df["rows_sum"]=df.apply(lambda x:x.sum(),axis=1)
print(df)
id x0 x1 ... x18 x19 rows_sum
0 133 0.449512 NaN ... -0.466431 -0.554102 128.790148
1 273 -1.245485 -0.842317 ... -0.310239 0.176301 266.205423
2 175 -1.549664 -1.126219 ... 0.795207 -0.149751 160.581688
......
......
"""

df=pd.read_csv('./datasets/breast_a.csv')
df["rows_sum"]=df.apply(lambda x:x.sum(),axis=1)
print(df)

Guess you like

Origin www.cnblogs.com/SunshineKimi/p/11653886.html