8 Python efficient techniques of data analysis, such a you know it?

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/weixin_45523154/article/details/102761726

This article describes the use of Python eight data analysis methods, not only to enhance operational efficiency, but also can make the code more "beautiful."

Line of code defines List

When you define a certain list, write a For loop too cumbersome, fortunately, Python there is a built-in way to solve this problem in one line of code.

The following is a list created using a For loop and create a comparison table with one line of code.

x = [1,2,3,4]
out = []
for item in x:
 out.append(item**2)
print(out)
[1, 4, 9, 16]
# vs.
x = [1,2,3,4]
out = [item**2 for item in x]
print(out)
[1, 4, 9, 16]

Lambda expressions

Tired of function definition does not take a few times? Lambda expressions is your savior! Lambda expressions are used to create small, disposable and anonymous function object in Python. It can create a function for you.

The basic syntax of a lambda expression is:

lambda arguments: expression

Please note that as long as there is a lambda expression, you can complete any operation you can perform routine functions. You can start with the following example, feel the power of lambda expressions:


double = lambda x: x * 2
print(double(5))
10

Map and Filter

Once you have mastered lambda expressions, they learn will be used in conjunction with Map and Filter functions, you can achieve more powerful.

Specifically, map by converting it to a new list each element of the list to perform an operation and. In the present embodiment, it is multiplied by 2 and through each of the elements, constituting a new list. Please note, list () function simply convert the output to a list type.

# Map
seq = [1, 2, 3, 4, 5]
result = list(map(lambda var: var*2, seq))
print(result)
[2, 4, 6, 8, 10]

Filter function takes a list and a rule, just map the same, but it returns a subset of the original list by comparing each element and Boolean filtering rules.

# Filter
seq = [1, 2, 3, 4, 5]
result = list(filter(lambda x: x > 2, seq))
print(result)
[3, 4, 5]

Arange and Linspace

Arange return given step of arithmetic list. The three parameters which start, stop, step represent start value and end value of the step size, note, stop point is a "cut-off" value, so it will not be included in the output array.

# np.arange(start, stop, step)
np.arange(3, 7, 2)
array([3, 5])

Linspace and Arrange very similar, but slightly different. Linspace uniformly to specify the number of divided sections. So given interval start and end, and the number of points divided aliquot num, linspace NumPy returns an array. This is particularly useful for data visualization and declarations axis during drawing.

# np.linspace(start, stop, num)
np.linspace(2.0, 3.0, num=5)
array([ 2.0, 2.25, 2.5, 2.75, 3.0])

What Axis representatives?

In Pandas, delete one or summation value in NumPy matrix, you may experience Axis. We deleted a (line) example:


df.drop( Column A , axis=1)
df.drop( Row A , axis=0)

If you want to deal with the column, the Axis is set to 1, if you want to deal with the line, it is set to 0. But why? Recall Pandas in shape

df.shape
(# of Rows, # of Columns)

Shape attribute from Pandas DataFrame call returns a tuple, the first value represents the number of rows, the second value represents the number of columns. If you want to be indexed in Python, the row number at index 0, the number of columns at index 1, much like how we declare axis values.

Concat,Merge和Join

If you are familiar with SQL, then these concepts for you may be easier. In any case, these functions essentially in a particular combination DataFrame manner. At which time tracking is best used which can be difficult, so let's recap.

Concat allows users to append one or more DataFrame in the table below or beside (depending on how you define the axis).

Merge multiple rows DataFrame combined to specify the same primary key (Key).

Join, and Merge, as the merger of two DataFrame. But it does not press a designated primary key merge, but merge according to the same column or row name.

Pandas Apply

Apply for the Pandas Series is designed. If you are not familiar with the Series, it can be thought of as an array of similar Numpy.

Apply the function to a specified axis of each element. Using the Apply, value may be DataFrame column (a Series) and formatting operation is performed, not cycle, useful!

df = pd.DataFrame([[4, 9],] * 3, columns=[ A , B ])
 df
 A B
0 4 9
1 4 9
2 4 9
df.apply(np.sqrt)
 A B
0 2.0 3.0
1 2.0 3.0
2 2.0 3.0
 df.apply(np.sum, axis=0)
A 12
B 27
df.apply(np.sum, axis=1)
0 13
1 13
2 13
在学习过程中有什么不懂得可以加我的
python学习交流扣扣qun,784758214
群里有不错的学习视频教程、开发工具与电子书籍。
与你分享python企业当下人才需求及怎么从零基础学习好python,和学习什么内容

pivot Tables

最后是Pivot Tables。如果您熟悉Microsoft Excel,那么你也许听说过数据透视表。Pandas内置的pivot_table函数以DataFrame的形式创建电子表格样式的数据透视表,,它可以帮助我们快速查看某几列的数据。下面是几个例子:非常智能地将数据按照“Manager”分了组

pd.pivot_table(df, index=["Manager", "Rep"])

或者也可以筛选属性值

pd.pivot_table(df,index=["Manager","Rep"],values=["Price"])

总结

我希望上面的这些描述能够让你发现Python一些好用的函数和概念。

Guess you like

Origin blog.csdn.net/weixin_45523154/article/details/102761726