pandas功能学习之创建

版权声明:https://blog.csdn.net/thfyshz版权所有 https://blog.csdn.net/thfyshz/article/details/83540226

1.series的创建

class pandas.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)

Parameters means
data array-like, dict, or scalar value. Contains data stored in Series
index array-like or Index.
dtype numpy.dtype or None. If None, dtype will be inferred.
copy boolean, default False. Copy input data

1.1 列表创建

>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series([1,3,5,np.nan,6,8])
>>> s
0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64
>>> s = pd.Series(np.random.randint(0,7,size=5), index=list('ABCDD'), name='Hello')
>>> s		  
A    3
B    1
C    4
D    4
D    2
Name: Hello, dtype: int32
>>> 

1.2 字典转化

>>> s = pd.Series({'a':2,'b':4,'d':5}, index=list('abcd'))			  
>>> s
a    2.0
b    4.0
c    NaN
d    5.0
dtype: float64

1.3 标量值创建

>>> pd.Series(5, index=list('abcd'))
a    5
b    5
c    5
d    5
dtype: int64

2.dataframe的创建

class pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)

2.1 从字典

>>> df = pd.DataFrame({'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
					   'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])})
>>> df
   one  two
a  1.0  1.0
b  2.0  2.0
c  3.0  3.0
d  NaN  4.0

可通过设定index和columns自定义顺序,前提是已经有index和columns的情况下,若不存在则全为NaN。

2.2 从列表

>>> d = {'one' : [1., 2., 3., 4.],'two' : [4., 3., 2., 1.]}
>>> pd.DataFrame(d, index=['a', 'b', 'c', 'd'])
   one  two
a  1.0  4.0
b  2.0  3.0
c  3.0  2.0
d  4.0  1.0

2.3 从多个字典

>>> data2 = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]
>>> pd.DataFrame(data2)
   a   b     c
0  1   2   NaN
1  5  10  20.0

2.4 外部导入

例:read_table

>>> users = pd.read_table('https://raw.githubusercontent.com/justmarkham/DAT8/master/data/u.user',sep='|', index_col='user_id')
>>> users
         age gender     occupation zip_code
user_id                                    
1         24      M     technician    85711
2         53      F          other    94043
3         23      M         writer    32067
...
941       20      M        student    97229
942       48      F      librarian    78209
943       22      M        student    77841

[943 rows x 4 columns]

更多类型的读取:

在这里插入图片描述
读取过程可能会遇到的错误:中文路径问题、反斜杠问题、编码问题,建议的格式为:

file_path = '***.csv'
f = open(file_path, encoding='utf-8')
df = pd.read_csv(f)
f.close()

猜你喜欢

转载自blog.csdn.net/thfyshz/article/details/83540226
今日推荐