This article is a note for learning mlcc "intro_to_pandas".
basic concept
Dataframe : relational data table, including rows and columns
Series: column data
Create data
Create column data
city_names = pd.Series(['San Francisco', 'San Jose', 'Sacramento'])
population = pd.Series([852469, 1015785, 485199])
Build table data
cities = pd.DataFrame({ 'City name': city_names, 'Population': population })
access data
Load table from csv
california_housing_dataframe = pd.read_csv("https://storage.googleapis.com/ml_universities/california_housing_train.csv", sep=",")
california_housing_dataframe.describe()
visit the first few
california_housing_dataframe.head(10)
access column
cities[ 'City name']
access row
cities[0]
cities[0:2]
cities['City name'][1]
manipulate data
basic operations
cities['Population'] / 100
Complex calculations for columns
cities['Population'].apply(lambda val: val > 1000000)
add data column
cities['Area square miles'] = pd.Series([46.87, 176.53, 97.92])
cities['Population density'] = cities['Population'] / cities['Area square miles']
Sort by index
cities.reindex([2, 0, 1])