[Análisis y visualización de datos] Clasificación de series y marcos de datos

import numpy as np
import pandas as pd
from pandas import Series, DataFrame

Clasificación en serie

s1 = Series(np.random.rand(10))
s1
0    0.324583
1    0.528829
2    0.922022
3    0.050265
4    0.069271
5    0.447179
6    0.595703
7    0.518557
8    0.695466
9    0.685736
dtype: float64
s1.values
array([0.32458288, 0.52882927, 0.92202246, 0.05026548, 0.06927059,
       0.44717888, 0.59570299, 0.51855686, 0.69546586, 0.68573564])
s1.index
RangeIndex(start=0, stop=10, step=1)
# value排序 升降可调ascending默认升序
s2 = s1.sort_values()
s2
3    0.050265
4    0.069271
0    0.324583
5    0.447179
7    0.518557
1    0.528829
6    0.595703
9    0.685736
8    0.695466
2    0.922022
dtype: float64
# 索引排序
s2.sort_index()
0    0.324583
1    0.528829
2    0.922022
3    0.050265
4    0.069271
5    0.447179
6    0.595703
7    0.518557
8    0.695466
9    0.685736
dtype: float64

Clasificación de DataFrame

df1 = DataFrame(np.random.randn(40).reshape(8,5),columns=['A','B','C','D','E'])
df1
UNA si C re mi
0 0 1.069063 0.266594 -0.129437 -0,361949 -1.491594
1 1.520675 1.673761 0.310567 -1.535689 0.388416
2 1.828228 0.221382 -0.092250 -0,111522 -1.187931
3 -1.049244 -0.093515 0.175138 0.627553 -0,357136
4 4 0.572511 -0,871314 1.142248 -0,489059 0.677733
5 5 0,088234 -0.786141 -0,222611 0.087407 -0,221874
6 6 2.199338 0.191928 0.278917 -0.388502 0.611719
7 7 1.260192 -0,001860 0.144536 -0.312155 1.664181
# 列排序 没法显示全部
df1['A'].sort_values()
3   -1.049244
5    0.088234
4    0.572511
0    1.069063
7    1.260192
1    1.520675
2    1.828228
6    2.199338
Name: A, dtype: float64
# 对指定列排序 显示全部
df2 = df1.sort_values('A')
df2
UNA si C re mi
3 -1.049244 -0.093515 0.175138 0.627553 -0,357136
5 5 0,088234 -0.786141 -0,222611 0.087407 -0,221874
4 4 0.572511 -0,871314 1.142248 -0,489059 0.677733
0 0 1.069063 0.266594 -0.129437 -0,361949 -1.491594
7 7 1.260192 -0,001860 0.144536 -0.312155 1.664181
1 1.520675 1.673761 0.310567 -1.535689 0.388416
2 1.828228 0.221382 -0.092250 -0,111522 -1.187931
6 6 2.199338 0.191928 0.278917 -0.388502 0.611719
df2.sort_index()
UNA si C re mi
0 0 1.069063 0.266594 -0.129437 -0,361949 -1.491594
1 1.520675 1.673761 0.310567 -1.535689 0.388416
2 1.828228 0.221382 -0.092250 -0,111522 -1.187931
3 -1.049244 -0.093515 0.175138 0.627553 -0,357136
4 4 0.572511 -0,871314 1.142248 -0,489059 0.677733
5 5 0,088234 -0.786141 -0,222611 0.087407 -0,221874
6 6 2.199338 0.191928 0.278917 -0.388502 0.611719
7 7 1.260192 -0,001860 0.144536 -0.312155 1.664181

Lea el archivo csv, las clasificaciones de las películas están en orden descendente y genere un nuevo csv

# 读取数据
csv_input = '/Users/bennyrhys/Desktop/数据分析可视化-数据集/homework/movie_metadata.csv'
pd.read_csv(csv_input).head()
color nombre_director num_critic_for_reviews duración director_facebook_likes actor_3_facebook_likes actor_2_name actor_1_facebook_likes bruto géneros ... num_user_for_reviews idioma país Calificación de contenido presupuesto título_año actor_2_facebook_likes imdb_score relación de aspecto movie_facebook_likes
0 0 Color James Cameron 723,0 178,0 0.0 855,0 Joel David Moore 1000,0 760505847.0 Acción | Aventura | Fantasía | Ciencia ficción ... 3054,0 Inglés Estados Unidos PG-13 237000000.0 2009.0 936,0 7,9 1,78 33000
1 Color Gore Verbinski 302.0 169.0 563.0 1000.0 Orlando Bloom 40000.0 309404152.0 Action|Adventure|Fantasy ... 1238.0 English USA PG-13 300000000.0 2007.0 5000.0 7.1 2.35 0
2 Color Sam Mendes 602.0 148.0 0.0 161.0 Rory Kinnear 11000.0 200074175.0 Action|Adventure|Thriller ... 994.0 English UK PG-13 245000000.0 2015.0 393.0 6.8 2.35 85000
3 Color Christopher Nolan 813.0 164.0 22000.0 23000.0 Christian Bale 27000.0 448130642.0 Action|Thriller ... 2701.0 English USA PG-13 250000000.0 2012.0 23000.0 8.5 2.35 164000
4 NaN Doug Walker NaN NaN 131.0 NaN Rob Walker 131.0 NaN Documentary ... NaN NaN NaN NaN NaN NaN 12.0 7.1 NaN 0

5 rows × 28 columns

pd.read_csv(csv_input)[['movie_title','imdb_score']].sort_values('imdb_score',ascending=False).head()
movie_title imdb_score
2765 Towering Inferno 9.5
1937 The Shawshank Redemption 9.3
3466 The Godfather 9.2
4409 Kickboxer: Vengeance 9.1
2824 Dekalog 9.1
# 一行代码排序并输出新csv
pd.read_csv(csv_input)[['movie_title','imdb_score']].sort_values('imdb_score',ascending=False).to_csv('imdb.csv')
!ls
02file.ipynb
4-1 DataFrame的简单数学计算.ipynb
4-2 Series和DataFrame的排序.ipynb
4-3 重命名Dataframe的index.ipynb
7B4349AB-7282-428F-A780-CB538E0517A3.dmp
Applications
Creative Cloud Files
Desktop
Documents
Downloads
Hadoop_VM
Java.gitignore
Library
Movies
Music
NumPy-排序.ipynb
Numpy-3.4数组读写.ipynb
Numpy1.ipynb
Pandas.ipynb
Pictures
Postman
PromotionRes
Public
Untitled Folder
Untitled Folder 1
Untitled.ipynb
Untitled1.ipynb
Virtual Machines.localized
WeChatProjects
ap.plist
apps.plist
bt.plist
eclipse-workspace
history.plist
iCloud 云盘(归档)
imdb.csv
install
nadarray.ipynb
opt
sell
vue-demo01
vue-sell-cube
vue-selll
输出1.spv
数据分析-分组 聚合 可视化.ipynb
班级成绩.ipynb
!more imdb.csv
,movie_title,imdb_score
2765,Towering Inferno             ,9.5
1937,The Shawshank Redemption ,9.3
3466,The Godfather ,9.2
4409,Kickboxer: Vengeance ,9.1
2824,Dekalog             ,9.1
3207,Dekalog             ,9.1
66,The Dark Knight ,9.0
2837,The Godfather: Part II ,9.0
3481,Fargo             ,9.0
339,The Lord of the Rings: The Return of the King ,8.9
4822,12 Angry Men ,8.9
4498,"The Good, the Bad and the Ugly ",8.9
3355,Pulp Fiction ,8.9
1874,Schindler's List ,8.9
683,Fight Club ,8.8
836,Forrest Gump ,8.8
270,The Lord of the Rings: The Fellowship of the Ring ,8.8
2051,Star Wars: Episode V - The Empire Strikes Back ,8.8
97,Inception ,8.8
1842,It's Always Sunny in Philadelphia             ,8.8
459,Daredevil             ,8.8
1620,Friday Night Lights             ,8.7
imdb.csv
发布了234 篇原创文章 · 获赞 164 · 访问量 14万+

Supongo que te gusta

Origin blog.csdn.net/weixin_43469680/article/details/105617666
Recomendado
Clasificación