Basic operations of pandas

This article is a note for learning mlcc "intro_to_pandas".

basic concept

Dataframe : relational data table, including rows and columns
Series: column data

Create data

Create column data

city_names = pd.Series(['San Francisco', 'San Jose', 'Sacramento'])
population = pd.Series([852469, 1015785, 485199])

Build table data

cities = pd.DataFrame({ 'City name': city_names, 'Population': population })

access data

Load table from csv

california_housing_dataframe = pd.read_csv("https://storage.googleapis.com/ml_universities/california_housing_train.csv", sep=",")
california_housing_dataframe.describe()

visit the first few

california_housing_dataframe.head(10)

access column

cities[ 'City name']

access row

cities[0]
cities[0:2]
cities['City name'][1]

manipulate data

basic operations

cities['Population'] / 100

Complex calculations for columns

cities['Population'].apply(lambda val: val > 1000000)

add data column

cities['Area square miles'] = pd.Series([46.87, 176.53, 97.92])
cities['Population density'] = cities['Population'] / cities['Area square miles']

Sort by index

cities.reindex([2, 0, 1])

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324937992&siteId=291194637