Took so long pandas, but every time you need to rely on the network to use the scrape (* ╹ ▽ ╹ *).
Focus tidy, but easy to confuse frequently used functions. . .
1, with the difference iloc and usage of loc
# These two functions are very basic, but also often used, and often can play a different pattern ~
# to this data set, for example
Import PANDAS AS pd
Import numpy AS NP
df = pd.DataFrame (np.array (List ( 'abcd1abcd1abcd0abcd1abcd0badc0 ' )). the RESHAPE (6,5 ),
index = Range (. 6), Columns = [ ' A ' , ' B ' , ' C ' , ' D ' , ' E ' ])
ABCDE 0 ABCD. 1 . 1 . 1 ABCD 2 ABCD 0 . 3 ABCD. 1 . 4 0 ABCD BADC 0. 5
# normal operation
[1]
df.loc [1]
A A B B C C D D E 1
df.iloc [1]
A A B B C C D D E 0
# operation will find only one row is returned, then , iloc results are the same, and loc
[2]
df.loc [. 1:. 4]
ABCDE . 1 ABCD. 1 2 0 ABCD . 3 ABCD. 1 . 4 0 ABCD
df.iloc [. 1:. 4]
ABCDE ABCD. 1 2. 1 0. 3 ABCD 1 ABCD
# this time a little bit different, loc is a closed interval return, it iLoc be an open interval returns, more in line with the general operation of the slice
[3]
df.loc [1: 4,: 3 ] # being given
df.loc [1: 4, [ ' A', 'B']] # can be indexed by a particular column
AB . 1 ab & 2 ab & 3 ab & ab &. 4
df.iloc [. 1:. 4,:. 3]
the ABC . 1 ABC 2 ABC . 3 ABC
df.iloc [. 1:. 4, [ 'A', 'B']]
# Will complain
df.iloc [1: 3, [3 ]] # but can be cut out by the index column labels single column, or in the form of a section
D . 1 D 2 D . 3 D
summary:
loc: General indexing sliced by column labels or label rows
iloc: General sliced by a row index or column index
df.iloc [:!, df.columns = 'E']
df.loc [:!, df.columns = 'E'] # both are sliced in such a manner, the feature can be used to separate the label but since the slicing loc closed interval, and therefore may be more suitable
df.columns = 'E' # returns a Boolean list!
df.loc [. 1:. 3, [True, True, True, False]]
# So therefore can be indexed
A B C 1 a b c 2 a b c 3 a b c
2, the random data disrupted sort DataFrame
In this Example and data #
Import PANDAS AS PD
Import numpy AS NP
DF = pd.DataFrame (np.array (List ( ' abcdcbaceebcabcdacbeaabcbfnaeb ' )). The RESHAPE (5,6 ),
index = Range (. 5), Columns = [ ' a ' , ' B ' , ' C ' , ' D ' , ' E ' , ' F ' ])
PANDAS sample provides a method
df = df.sample (frac = 1) .reset_index (drop = True) # plus on reset_index is to allow random sample or disrupt the normal sort of order, if you want to keep disrupt the index, you can not add
Results are as follows:
abcdef
. 4 bfnaeb
2 abcdac
. 1 aceebc
. 3 beaabc
0 abcdcb
herein may also be provided to return ratio, df has 10 rows of data, I want to return only 40% of them, then = 0.4 FRAC
DF = df.sample ( = 0.4 FRAC) .reset_index (drop = True)
abcdef
0 aceebc
. 1 abcdac
extension: there are other methods, such as the use sklearn library, or by numpy
from sklearn.utils Import shuffle
SK = shuffle (DF) .reset_index ()
NP = df.iloc [np.random.permutation (len (df) )]
To be continued. . .
pandas provides a method of sample