How to read the content of Excel table in Python

Read data in excel sheet with python

Suppose there is an excel table that stores data as follows, where x1-x6 are features, and y_label is the category label corresponding to the feature. We want to use python to analyze the following data, so the first step is to read the data in the excel table first. Here we mainly use the pandas library in python .
insert image description here
First determine where the excel table is stored, for example, my path is 'E:\relate_code\svm\dataset\data.xlsx'.

import pandas as pd

file_path = r'E:\relate_code\svm\dataset\data.xlsx'   # r对路径进行转义,windows需要
raw_data = pd.read_excel(file_path, header=0)  # header=0表示第一行是表头,就自动去除了
print(raw_data)

In this way, the data can be taken out, and the output is shown in the figure below. It can be seen that there is no header information such as x1 in the table here.
insert image description here

However, there are additional information of line numbers 0-169 on the far left. We can use the following command to save the useful information in the array.

data = raw_data.values     # 只提取表中信息
print(data)

Output result:
insert image description here
At this time, the data is read in and stored in the form of an array. We can choose the data we want, for example, we want to separate x and y. After all, one is a feature and the other is a label. At this time, we can use the following code.

features = data[:, 0:6]  # 由于是二维数组,所以第一个冒号表示选择所有行,之后0:6表示只要前六列的数据
labels = data[:, -1]     # 标签只要最后一列

1. You can also select the features, if only the features in the fourth column can also be used:

feature_4 = data[: 3:4]  # 这样得出的数组依然是二维数组,便于后续特征操作

2. If you don’t want the fourth feature, but want all the others, you can also use it like this, you need to use the numpy library:

import numpy as np

feature1_3 = data[:, 0:3]   # 取前三列特征
feature5_6 = data[:, 4:6]   # 取第5,第6列特征
feature_choose = np.hstack(feature1_3, feature5_6)   # 对两份特征进行特征拼接

Let me say more here, np.hstack() function and np.vstack() function:

Here is the np.vstack () function. The main purpose is to stack vertically. When using this function, you must ensure that the number of columns in the two arrays is the same (both are three columns). The results are as follows.

import numpy as np

arr1 = np.array([[1, 2, 3], [4, 5, 6]])
arr2 = np.array([7, 8, 9])
print(np.vstack((arr1, arr2)))

insert image description here
The following is the np.hstack () function, which is mainly for horizontal stacking. When using this function, ensure that the number of rows is consistent (both are two rows).

import numpy as np

arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[7, 8, 9], [10, 11, 12]])
print(np.hstack((arr1, arr2)))

insert image description here
Let’s stop here first, the next article introduces the code usage of machine learning , please pay more attention!

Daily learning records, let's exchange and discuss together! Infringement Contact~

Guess you like

Origin blog.csdn.net/WYKB_Mr_Q/article/details/122999267