pandas basis of the Series and DataFrame operation

  pandas package

  # Package introduced

  import pandas as pd

  import numpy as np

  import matplotlib.pyplot as plt

  Series

  Series one-dimensional array is tagged, the array can put any data (integer, float, string, Python Object). Its basic function is to create:

  s = pd.Series(data, index=index)

  Wherein a list of index is used as the label data. data may be different data types:

  Python dictionary

  ndarray objects

  A scalar value, such as 5

  Series created

  pd.Series s = ([1,3,5, np.nan, 6.8])

  Series Date Created

  # Generation date generated from 2013-01-01 to 2013-01-06

  dates = pd.date_range('20130101', periods=6)

  # DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04','2013-01-05', '2013-01-06'], dtype='datetime64[ns]', freq='D')

  Series create a list

  Generating a first column # 2: 012345678910 second column: abbcdabacad

  s = pd.Series(list('abbcdabacad'))

  # Statistics different column names

  s.unique()

  # Statistics the number of column names appear

  s.value_counts()

  # Determine whether the first column in the list

  s.isin(['a', 'b', 'c'])

  Series Index

  Two a # abcde a five random numbers

  s = pd.Series(np.random.rand(5), index=list('abcde'))

  # S column name (first column), the object is the Index

  s.index

  # Add a line alpha

  s.index.name = 'alpha'

  # Returns all first column 'a' value

  s['a']

  # Are there duplicate index

  s.index.is_unique

  # Return index is not repeated

  s.index.unique()

  # Packets for the index, and each group is obtained

  s.groupby(s.index).sum()

  DataFrame

  DataFrame is a two-dimensional array with the row and column labels. DataFrame think you can put into an Excel spreadsheet or a SQL database table, it can also be similar to a Series object dictionary. Pandas It is most commonly used data structures.

  DataFrame create

  df = pd.DataFrame(np.random.randn(4, 6), index=list('ADFH'), columns=['one', 'two', 'three', 'four', 'five', 'six'])

  # Add index if the index does not correspond to the value to NaN

  df2 = df.reindex(index=list('ABCDEFGH'))

  # Reset col (first line)

  df.reindex(columns=['one', 'three', 'five', 'seven'])

  # The default value is set to NaN 0

  df.reindex(columns=['one', 'three', 'five', 'seven'], fill_value=0)

  # Fill method is valid only for the line

  df.reindex(columns=['one', 'three', 'five', 'seven'], method='ffill')

  # Reset the column index

  df.reindex(index=list('ABCDEFGH'), method='ffill')

  DataFrame operation

  df = pd.DataFrame(np.random.randn(4, 6), index=list('ADFH'), columns=['one', 'two', 'three', 'four', 'five', 'six'])

  All index value # is 'A' col is a 'one' position 100

  df.loc['A']['one'] = 100

  # Discarded index is 'A' line

  df.drop('A')

  # Give up columns for the 'two' 'four' column

  df2 = df.drop(['two', 'four'], axis=1)

  # Copy data

  df.iloc [0, 0] = 100

  # Gets index is 'one' row

  df.loc['one']

  DataFrame computing

  df = pd.DataFrame(np.arange(12).reshape(4, 3), index=['one', 'two', 'three', 'four'], columns=list('ABC'))

  Series # each column as a function as a parameter passed to lambda

  df.apply(lambda x: x.max() - x.min())

  Series # each line as a parameter passed to a function as a lambda

  df.apply(lambda x: x.max() - x.min(), axis=1)

  # Return multiple values ​​consisting of Series

  def min_max (x): Zhengzhou crowd how much money http://mobile.zyyyzz.com/

  return pd.Series([x.min(), x.max()], index=['min', 'max'])

  df.apply(min_max, axis=1)

  # Applymap each value calculated element by element 2 decimal places

  format = {0: .02f} '. Format

  df.applymap (formats)

  DataFrame column select / add / delete

  df = pd.DataFrame(np.random.randn(6, 4), columns=['one', 'two', 'three', 'four'])

  # Third row as the first plus second column

  df['three'] = df['one'] + df['two']

  # Add a flag column is greater than 0 True False otherwise

  df['flag'] = df['one'] > 0

  # Delete col as 'three' column

  del df['three']

  # Acquires deleted

  four = df.pop('three')

  # Selected for the five col

  df['five'] = 5

  #

  df['one_trunc'] = df['one'][:2]

  # Specify the insertion location

  df.insert(1, 'bar', df['one'])

  Use assign () method to insert a new column

  df = pd.DataFrame(np.random.randint(1, 5, (6, 4)), columns=list('ABCD'))

  # Ratio new column value df [ 'A'] / df [ 'B']

  df.assign(Ratio = df['A'] / df['B'])

  # New column value AB_Ratio CD_Ratio value of lambda expressions

  df.assign(AB_Ratio = lambda x: x.A / x.B, CD_Ratio = lambda x: x.C - x.D)

  Sort DataFrame

  df = pd.DataFrame(np.random.randint(1, 10, (4, 3)), index=list('ABCD'), columns=['one', 'two', 'three'])

  # Press index is one sort

  df.sort_values(by='one')

  #

  s.rank()

  DataFrame operation

  When DataFrame data calculation is performed automatically in rows and columns for data alignment. The final results will merge two DataFrame.

  df1 = pd.DataFrame(np.random.randn(10, 4), index=list('abcdefghij'), columns=['A', 'B', 'C', 'D'])

  df2 = pd.DataFrame(np.random.randn(7, 3), index=list('cdefghi'), columns=['A', 'B', 'C'])

  df1 + df2

  DF1 - df1.iloc [0]


Guess you like

Origin blog.51cto.com/14335413/2457711