import numpy as np
import pandas as pd
from pandas import Series, DataFrame
Clasificación en serie
s1 = Series( np. random. rand( 10 ) )
s1
0 0.324583
1 0.528829
2 0.922022
3 0.050265
4 0.069271
5 0.447179
6 0.595703
7 0.518557
8 0.695466
9 0.685736
dtype: float64
s1. values
array([0.32458288, 0.52882927, 0.92202246, 0.05026548, 0.06927059,
0.44717888, 0.59570299, 0.51855686, 0.69546586, 0.68573564])
s1. index
RangeIndex(start=0, stop=10, step=1)
s2 = s1. sort_values( )
s2
3 0.050265
4 0.069271
0 0.324583
5 0.447179
7 0.518557
1 0.528829
6 0.595703
9 0.685736
8 0.695466
2 0.922022
dtype: float64
s2. sort_index( )
0 0.324583
1 0.528829
2 0.922022
3 0.050265
4 0.069271
5 0.447179
6 0.595703
7 0.518557
8 0.695466
9 0.685736
dtype: float64
Clasificación de DataFrame
df1 = DataFrame( np. random. randn( 40 ) . reshape( 8 , 5 ) , columns= [ 'A' , 'B' , 'C' , 'D' , 'E' ] )
df1
UNA
si
C
re
mi
0 0
1.069063
0.266594
-0.129437
-0,361949
-1.491594
1
1.520675
1.673761
0.310567
-1.535689
0.388416
2
1.828228
0.221382
-0.092250
-0,111522
-1.187931
3
-1.049244
-0.093515
0.175138
0.627553
-0,357136
4 4
0.572511
-0,871314
1.142248
-0,489059
0.677733
5 5
0,088234
-0.786141
-0,222611
0.087407
-0,221874
6 6
2.199338
0.191928
0.278917
-0.388502
0.611719
7 7
1.260192
-0,001860
0.144536
-0.312155
1.664181
df1[ 'A' ] . sort_values( )
3 -1.049244
5 0.088234
4 0.572511
0 1.069063
7 1.260192
1 1.520675
2 1.828228
6 2.199338
Name: A, dtype: float64
df2 = df1. sort_values( 'A' )
df2
UNA
si
C
re
mi
3
-1.049244
-0.093515
0.175138
0.627553
-0,357136
5 5
0,088234
-0.786141
-0,222611
0.087407
-0,221874
4 4
0.572511
-0,871314
1.142248
-0,489059
0.677733
0 0
1.069063
0.266594
-0.129437
-0,361949
-1.491594
7 7
1.260192
-0,001860
0.144536
-0.312155
1.664181
1
1.520675
1.673761
0.310567
-1.535689
0.388416
2
1.828228
0.221382
-0.092250
-0,111522
-1.187931
6 6
2.199338
0.191928
0.278917
-0.388502
0.611719
df2. sort_index( )
UNA
si
C
re
mi
0 0
1.069063
0.266594
-0.129437
-0,361949
-1.491594
1
1.520675
1.673761
0.310567
-1.535689
0.388416
2
1.828228
0.221382
-0.092250
-0,111522
-1.187931
3
-1.049244
-0.093515
0.175138
0.627553
-0,357136
4 4
0.572511
-0,871314
1.142248
-0,489059
0.677733
5 5
0,088234
-0.786141
-0,222611
0.087407
-0,221874
6 6
2.199338
0.191928
0.278917
-0.388502
0.611719
7 7
1.260192
-0,001860
0.144536
-0.312155
1.664181
Lea el archivo csv, las clasificaciones de las películas están en orden descendente y genere un nuevo csv
csv_input = '/Users/bennyrhys/Desktop/数据分析可视化-数据集/homework/movie_metadata.csv'
pd. read_csv( csv_input) . head( )
color
nombre_director
num_critic_for_reviews
duración
director_facebook_likes
actor_3_facebook_likes
actor_2_name
actor_1_facebook_likes
bruto
géneros
...
num_user_for_reviews
idioma
país
Calificación de contenido
presupuesto
título_año
actor_2_facebook_likes
imdb_score
relación de aspecto
movie_facebook_likes
0 0
Color
James Cameron
723,0
178,0
0.0
855,0
Joel David Moore
1000,0
760505847.0
Acción | Aventura | Fantasía | Ciencia ficción
...
3054,0
Inglés
Estados Unidos
PG-13
237000000.0
2009.0
936,0
7,9
1,78
33000
1
Color
Gore Verbinski
302.0
169.0
563.0
1000.0
Orlando Bloom
40000.0
309404152.0
Action|Adventure|Fantasy
...
1238.0
English
USA
PG-13
300000000.0
2007.0
5000.0
7.1
2.35
0
2
Color
Sam Mendes
602.0
148.0
0.0
161.0
Rory Kinnear
11000.0
200074175.0
Action|Adventure|Thriller
...
994.0
English
UK
PG-13
245000000.0
2015.0
393.0
6.8
2.35
85000
3
Color
Christopher Nolan
813.0
164.0
22000.0
23000.0
Christian Bale
27000.0
448130642.0
Action|Thriller
...
2701.0
English
USA
PG-13
250000000.0
2012.0
23000.0
8.5
2.35
164000
4
NaN
Doug Walker
NaN
NaN
131.0
NaN
Rob Walker
131.0
NaN
Documentary
...
NaN
NaN
NaN
NaN
NaN
NaN
12.0
7.1
NaN
0
5 rows × 28 columns
pd. read_csv( csv_input) [ [ 'movie_title' , 'imdb_score' ] ] . sort_values( 'imdb_score' , ascending= False ) . head( )
movie_title
imdb_score
2765
Towering Inferno
9.5
1937
The Shawshank Redemption
9.3
3466
The Godfather
9.2
4409
Kickboxer: Vengeance
9.1
2824
Dekalog
9.1
pd. read_csv( csv_input) [ [ 'movie_title' , 'imdb_score' ] ] . sort_values( 'imdb_score' , ascending= False ) . to_csv( 'imdb.csv' )
!ls
02file.ipynb
4-1 DataFrame的简单数学计算.ipynb
4-2 Series和DataFrame的排序.ipynb
4-3 重命名Dataframe的index.ipynb
7B4349AB-7282-428F-A780-CB538E0517A3.dmp
[34mApplications[m[m
[34mCreative Cloud Files[m[m
[34mDesktop[m[m
[34mDocuments[m[m
[34mDownloads[m[m
[34mHadoop_VM[m[m
Java.gitignore
[34mLibrary[m[m
[34mMovies[m[m
[34mMusic[m[m
NumPy-排序.ipynb
Numpy-3.4数组读写.ipynb
Numpy1.ipynb
Pandas.ipynb
[34mPictures[m[m
[34mPostman[m[m
[34mPromotionRes[m[m
[34mPublic[m[m
[34mUntitled Folder[m[m
[34mUntitled Folder 1[m[m
Untitled.ipynb
Untitled1.ipynb
[34mVirtual Machines.localized[m[m
[34mWeChatProjects[m[m
ap.plist
apps.plist
bt.plist
[34meclipse-workspace[m[m
history.plist
[34miCloud 云盘(归档)[m[m
imdb.csv
[34minstall[m[m
nadarray.ipynb
[34mopt[m[m
[34msell[m[m
[34mvue-demo01[m[m
[34mvue-sell-cube[m[m
[34mvue-selll[m[m
输出1.spv
数据分析-分组 聚合 可视化.ipynb
班级成绩.ipynb
!more imdb. csv
,movie_title,imdb_score
2765,Towering Inferno ,9.5
1937,The Shawshank Redemption ,9.3
3466,The Godfather ,9.2
4409,Kickboxer: Vengeance ,9.1
2824,Dekalog ,9.1
3207,Dekalog ,9.1
66,The Dark Knight ,9.0
2837,The Godfather: Part II ,9.0
3481,Fargo ,9.0
339,The Lord of the Rings: The Return of the King ,8.9
4822,12 Angry Men ,8.9
4498,"The Good, the Bad and the Ugly ",8.9
3355,Pulp Fiction ,8.9
1874,Schindler's List ,8.9
683,Fight Club ,8.8
836,Forrest Gump ,8.8
270,The Lord of the Rings: The Fellowship of the Ring ,8.8
2051,Star Wars: Episode V - The Empire Strikes Back ,8.8
97,Inception ,8.8
1842,It's Always Sunny in Philadelphia ,8.8
459,Daredevil ,8.8
1620,Friday Night Lights ,8.7
[7mimdb.csv[m[K