pandas common operations, you want all here. . .

Took so long pandas, but every time you need to rely on the network to use the scrape (* ╹ ▽ ╹ *).

Focus tidy, but easy to confuse frequently used functions. . .

 

1, with the difference iloc and usage of loc

# These two functions are very basic, but also often used, and often can play a different pattern ~ 

# to this data set, for example
Import PANDAS AS pd
Import numpy AS NP

df = pd.DataFrame (np.array (List ( 'abcd1abcd1abcd0abcd1abcd0badc0 ' )). the RESHAPE (6,5 ),
index = Range (. 6), Columns = [ ' A ' , ' B ' , ' C ' , ' D ' , ' E ' ])

ABCDE 0 ABCD. 1 . 1 . 1 ABCD 2 ABCD 0 . 3 ABCD. 1 . 4 0 ABCD BADC 0. 5

# normal operation
[1]
df.loc [1]
A A B B C C D D E 1

df.iloc [1]
A A B B C C D D E 0
# operation will find only one row is returned, then , iloc results are the same, and loc

[2]
df.loc [. 1:. 4]
ABCDE . 1 ABCD. 1 2 0 ABCD . 3 ABCD. 1 . 4 0 ABCD

df.iloc [. 1:. 4]
ABCDE ABCD. 1 2. 1 0. 3 ABCD 1 ABCD
# this time a little bit different, loc is a closed interval return,
it iLoc be an open interval returns, more in line with the general operation of the slice
[3] 
df.loc [1: 4,: 3 ] # being given
df.loc [1: 4, [ ' A', 'B']] # can be indexed by a particular column
AB . 1 ab & 2 ab & 3 ab & ab &. 4

df.iloc [. 1:. 4,:. 3]
the ABC . 1 ABC 2 ABC . 3 ABC
df.iloc [. 1:. 4, [ 'A', 'B']]
# Will complain

df.iloc [1: 3, [3 ]] # but can be cut out by the index column labels single column, or in the form of a section
D . 1 D 2 D . 3 D

summary:

loc: General indexing sliced ​​by column labels or label rows

iloc: General sliced ​​by a row index or column index


df.iloc [:!, df.columns = 'E']
df.loc [:!, df.columns = 'E'] # both are sliced in such a manner, the feature can be used to separate the label but since the slicing loc closed interval, and therefore may be more suitable

df.columns = 'E' # returns a Boolean list!

df.loc [. 1:. 3, [True, True, True, False]]
# So therefore can be indexed

A B C 1 a b c 2 a b c 3 a b c

 

2, the random data disrupted sort DataFrame

In this Example and data # 
Import
PANDAS AS PD Import numpy AS NP
DF
= pd.DataFrame (np.array (List ( ' abcdcbaceebcabcdacbeaabcbfnaeb ' )). The RESHAPE (5,6 ), index = Range (. 5), Columns = [ ' a ' , ' B ' , ' C ' , ' D ' , ' E ' , ' F ' ])

PANDAS sample provides a method
df = df.sample (frac = 1) .reset_index (drop = True) # plus on reset_index is to allow random sample or disrupt the normal sort of order, if you want to keep disrupt the index, you can not add
Results are as follows:
abcdef . 4 bfnaeb 2 abcdac . 1 aceebc . 3 beaabc 0 abcdcb

herein may also be provided to return ratio, df has 10 rows of data, I want to return only 40% of them, then = 0.4 FRAC
DF = df.sample ( = 0.4 FRAC) .reset_index (drop = True)
abcdef 0 aceebc . 1 abcdac

extension: there are other methods, such as the use sklearn library, or by numpy
from sklearn.utils Import shuffle
SK = shuffle (DF) .reset_index ()

NP = df.iloc [np.random.permutation (len (df) )]

 

 

To be continued. . .

pandas provides a method of sample

 

Guess you like

Origin www.cnblogs.com/lmcltj/p/11105411.html