Data study room second week · Python data base analysis 0722-0728

Article Directory

  1. Numpy(Numerical Python)
  2. Pandas
  3. supplement

Numpy(Numerical Python)

Python is a powerful database, intended primarily for performing multi-dimensional arrays, which provides a number of library functions that can be used to calculate matrix processing, all kinds of mathematical tasks calculated image (for example, calculus), is a fast alternative Python based on MATLAB. The following shows the functional portion may be achieved by Numpy code block:

# Create an array using NumPy Array
Import numpy NP AS
Data np.array = ([1,2,3,4,5,6,7,8,9,10]) to generate a one-dimensional array #
data_d = np.array ([ [1,2,3], [3,4,5]]) to generate two-dimensional array #
d1 = np.zeros (10, dtype = int) # 10 generates a length, the value of all array 0
d2 = np. ones (10, dtype = int) # 10 generates a length, the value of all array 1
d3 = np.arange (0,10,1) # generates [0,10], the interval of the data series 1
d4 = np. eye (3) # 3 * 3 matrix generating
d5 = np.random.randint (0,10,10) # length of 10 randomly generated, the value between [0,10] array

# Numpy using descriptive statistics calculated index (part schematic diagram of details can be referred to herein)
from numpy Import Mean, Median
from scipy.stats Import MODE
data_mean = Mean (Data) 
data_median = Median (Data)
DATA_MODE MODE = (Data)

Pandas

It is a powerful tool for analyzing structured data set; Numpy is based on its use (to provide a high-performance matrix operations); for data analysis and data mining, data also provides cleaning functionality. The following shows the functional portion may be achieved by Numpy code block:

Use Pandas # Create and DataFream Series
Import PANDAS PD AS
Data = pd.Series (100, index = Range (. 4)) is a built-index index #Series array
d1 = pd.Series (np.random.rand (5) , index = list ( "abcde" )) # custom indexes
# pandas.DataFrame (Data = None, index = None, Columns = None, DTYPE = None, Copy = False)
D2 = pd.DataFrame (np.random.randn (8,5)) * 5 # 8 creates random data matrix
d3 = pd.read_csv () # csv file read 
d4 = pd.read_excel () # read excel file

# Pandas using descriptive statistics calculated index (part schematic diagram of details can be referred to herein)
data_var = data.var ()
data_std = data.std ()
data_iqr data.quantile = (0.75) -data.quantile (0.25)

supplement

# Use Code connect to the database to achieve Python
import pymysql # for a database connection MySQL server version in Python3.x
(host = 'Your database address', user =' username ', password = conn = pymysql.connect' password ', db =' database name ', charset =' utf8 ') # is connected to a local database
sql_query1 =' '' select * from table1 where ... '' '#sql number codes provide
data = pd.read_sql (sql_query1, con = conn) # extract data read sql
 

 

Released seven original articles · won praise 3 · Views 1669

Guess you like

Origin blog.csdn.net/CCESARE/article/details/97624343