Tableau Collection 2: Table Extension makes a word cloud map through python

Table of contents:

I. Introduction

I heard that Tableau was updated a few days ago, and there is a new function Table Extension, which can facilitate the processing of data tables, especially for better support for third-party tools such as Python and R.
In the past, Tableau could only add calculated columns to the original data table or delete row data through the data source filtering period, but it could not process the data and then add the number of rows. Table Extension can realize this function through expansion. The more classic application in this aspect should be the word cloud map.
In previous versions, if you want to make a word cloud map, you usually have a data table, and then use Python to segment the relevant text in the table, then save it as a new data table, and then pull the new data table to Tableau for word cloud map production ( In fact, it is more convenient to draw word cloud maps directly in Python!).
Now through the Table Extension, you can better draw word cloud maps on Tableau. (Of course, its functions are far more than that, and various algorithms can also be controlled.)

2. Configuration environment

2.1 Download and install Tableau 2022.3

Tableau 2022.3 version download link Click here
image.png
image.png
to download and install.
An error 1310 may appear during the installation process: Cannot write to the xxx.dll file, you need to confirm that you have access to this directory (forgot to take a screenshot, it probably means this).
Solution: Exit the 360 ​​first, and then open it after installation.

2.2 Install Tabpy

Tabpy is a tool for connecting Tableau and Python.
Make sure you have a Python environment before installing! ! !
Open the command prompt or terminal, Windows directly type the following code and press Enter, Mac use pip3.

pip install tabpy

image.png

2.3 Test connection

**Tabpy must be turned on to connect! **The way to open it is as follows, just press Enter at the command prompt or terminal tabpy.
This is similar to calling out jupyter notebook through the terminal. After opening, every step of the operation will be printed out on the command prompt or terminal.
image.png
After opening tabpy, switch to Tableau, follow the steps below and
image.png
a window will pop up, enter the information, then click Test Connection, click Save after connection, and then close.

主机名:127.0.0.1或localhost
端口:9004

image.png
image.png

3. Data preparation and processing

3.1 Prepare data

3.1.1 Source data acquisition

Data source: See the end of the article for links to data resources after the processing of the notice of the five departments on printing and distributing the "Action Plan for the Integration and Development of Virtual Reality and Industry Applications (2022-2026)" . The data structure is as follows:



insert image description here


Open tableau 2022.3 and open the file.
image.png
Since there is only one worksheet, it is dragged to the application area by default. Here you can see that there is a new table extension button, double-click it.
Note: At this point, there is already a table in the application area, and the association relationship needs to be edited to add a new table and expand the program table. (Section 3.4 of this article has been supplemented for relevant content)

insert image description here


After double-clicking, the interface is as follows:
image.png
Next, drag Sheet1 to 【Drag the worksheet here】, and then the following interface will appear:

1. It is the place to add data tables, and multi-table association can be performed;
2. It is script code for processing data;
3. You can select the corresponding form to view the table structure, or you can select the relationship to edit the relationship between fields before and after processing. Since the extended table is the second application table, after confirming the relationship with the first application table Sheet1, you can refresh and view the data preview in the output table ;
4. It is the place to view the data in the input table and output table.

insert image description here

3.1.2 Script test

The official recommended script is return _arg1, then type the following code in the script and click Apply.

print(type(_arg1))
print(_arg1)
return _arg1

After the application, you can see some changes in the interface as shown in the figure below. At this time, the code output is valid, because the data structure generated by the table extension program is displayed on the left side of the red box in the figure below. The contents of the other two print()can be viewed at the command prompt or terminal.
image.png

Look at the terminal, the data is output normally, **_arg1**it is a dictionary type, using column storage, the key is the column name, and the value is the value of each column. (In fact, this point is explained in the official API documentation, and added in "Section 4 Expansion")
Note: Since the data I use is relatively long, it may be difficult to see. Friends can use a relatively simple data set for testing. Easy to see the print results. (In "Chapter 4 Expansion", there is another way to build a table, and you can also perform a print test)
image.png

3.1.3 Error Code 03D52C7A Handling

Supplement: You may report an error after clicking Update Now错误代码:03D52C7A , (below).
image.png

Unexpected Error
Internal Error - An unexpected error occurred and the operation could not be completed.
Error code: 03D52C7A:

Then click the Refresh Data Source button to fix it. button next to the save button, as shown below.
image.png

3.2 Processing data - word segmentation

Next, let's start processing the dataset.
Cut the content through stammering and participle, keeping nouns, verbs and adjectives, and then apply it to view the results.

import pandas as pd         # 使用pandas来进行数据处理
import jieba.posseg as psg  # 使用jieba进行文本分词

# 将_arg1转为DataFrame,以便后续处理
df_input = pd.DataFrame.from_dict(_arg1)


# 定义分词函数
def values_cut(arg):
    # 只保留 名词、动词、形容词 这三种词性的词汇
    result = [x.word for x in psg.cut(arg) if x.flag in ['n','v','a']]
    return result


df_input['内容分词'] = df_input['内容'].map(lambda x : values_cut(x)) # 调用分词函数对歌词字段进行分词
df_middle = df_input[['顺序编号', '内容分词']]                        # 只取顺序编号和内容分词,以避免数据冗余。
df_middle = df_middle.explode('内容分词')[['顺序编号', '内容分词']]   # 对分词后的列表进行行扩展,每个词作为一行数据

df_return = df_middle.to_dict(orient='list')  # 将 DataFrame 转为 dict 形式({"列名1":["列值1","列值2"], "列名2":["列值1","列值2"]})
return df_return

image.png
Note: You need to set up the relationship first, and then click Output Table > Update Immediately to update the data.

3.3 Visualization

At this time, you can view the visual results after word segmentation in the worksheet.
image.png
I have to say that the focus is particularly prominent, that is, it is about virtual reality, emphasizing applications! Emphasis is placed on creating an immersive experience through technology and content. At the same time, there is a support chain. The government encourages and supports enterprise innovation, teaching and research institutions support talent output, and enterprises support technology iteration and promotion to promote the integration and development of virtual reality technology and various industries and fields.
You can filter the first-level titles separately to see what the focus of each title is.

3.4 Supplement: extended multi-table

Close the table extension program interface (the content indicated by the red box in the figure below), and return to the previous level.
image.png
As you can see, here is the association between Sheet1 and the table extender table. (You can double-click any table to continue editing)
image.png
Remember what I said at the beginning, after opening the Excel file, since there is only one worksheet, it is automatically added to the application area for me, so the first table I use is Sheet1, When adding other tables, you need to set up the association relationship.
If there are multiple worksheets (will not be added to the application area by default), or remove Sheet1 from the workspace (keep the application area without a table), and then double-click the new sheet extension program, only the table extension program table will be displayed at this time ( as the first table). In the process of adding a new table and expanding the program table, there will be no associated operations. After the script is processed, the data will be output directly.
Note: Every time a new table is added, the association relationship needs to be edited.
image.png

Rebuild the table extender table.
If you need to add another table extender table, click the non-extended table, and then you can double-click "New Table Extender" to add a second extended data table.
image.png

4. Expansion

4.1 Table calculations and table extensions

In the expansion part, I want to talk about the working principle of Tableau table expansion . Before talking about the working principle, two words must be clarified: table calculation ( table calculations) and table expansion ( Table Extensions) .
I didn't pay attention to these two concepts at the beginning, which led to some detours in the middle, because there are long discussions on table calculations, and there are very few materials on table expansion. Before I realized the difference between the two, I became more and more confused!
Table calculation is a tool that "talks" with Python supported by the previous version. Currently, there are 8 functions that "talk" with Python: SCRIPT_BOOL, SCRIPT_INT, SCRIPT_REAL, SCRIPT_STR, MODEL_EXTENSION_BOOL, MODEL_EXTENSION_INT, MODEL_EXTENSION_REAL, MODEL_EXTENSION_STR. Where are these 8 used? In the " Worksheet " interface, right-click the field, create > calculate field, and then search for the corresponding function name, and you can see the corresponding function. The specific application will not be expanded here (in fact, it has not been carefully studied).
This article focuses on table expansion , which is a newly added function in 2022.3. This function is operated on the " Data Source " interface.
To use the table extension function, you need to pass in a table and the code to process table data (support Python, R, etc.), and then Tableau and the corresponding extension tools (Tabpy, Rserve, etc.) will help you process the data, and then return to output table.

4.2 Working principle

After understanding the difference between the two, go back to the terminal.
Among the data prompted by the terminal, something caught my attention, def _user_script(tabpy, _arg1):.
image.png
Perhaps, the complete situation is:

def _user_script(tabpy, _arg1):
 print(type(_arg1))
 print(_arg1)
 return _arg1

If you have a little Python foundation, you should be able to understand that this is a function, that is to say, the code we typed on Tableau is encapsulated into the _user_script()function, so the script code can use returnthe statement.
After viewing the Tableau Analytics Extensions API documentation , I have some guesses, which may not be accurate, but can help understand the relevant data transmission process:
Tableau processes the form I pass in as a dictionary ({"column name 1": [" Column value 1", "Column value 2"], "Column name 2": ["Column value 1", "Column value 2"]}) format, and then when Tabpy makes a Post request, Tableau will pass the dictionary to the parameter, _arg1if Directly return arg1means that the data passed in by Tableau is directly returned intact, and the resulting output table is consistent with the input table.
Note: The data format returned to Tableau must also be in the form of a dictionary ({"column name 1": ["column value 1", "column value 2"], "column name 2": ["column value 1", "column value 2"]}).

4.3 More Possibilities: Self-built Tables and API Calls

Going a step further~~
If you want to realize the processed result data, you can use _arg1the parameters for processing (such as the 3.2 summary of this article). After processing, as long as the data structure is required by Tableau.
Maybe you will think, since filling in the function content is processed by tabpy, can I just create a table or read data from other places? The answer is yes!
For example, I apply the following script, and the result can be output normally.

import pandas as pd
df = pd.DataFrame([i/2 for i in range(10)],columns=['x'])
df['y2'] = df.x.apply(lambda x:x*x)
df_return = df.to_dict(orient='list')  
print(type(df_return))
print(df_return)
return df_return

image.png
At this time, _arg1it is an empty dictionary {}, because no table is dragged under the table extension program, that is to say, there is no input table.

In other words, through this function, you can create a form yourself, and of course there is something more imaginative: calling the API .
There are some external data that are not stored in the database, and the data can be retrieved through the open API for correlation, so as to open up data islands.
Note: The security of the account password needs to be considered.

4.4 A little complaint: and Power BI

Finally, I still have to complain! In fact, in terms of Python support, Tableau seems to be a little bit crotch. The function of PowerBI drawing through Python and R has been online for a long time. In the drawing area, select to Pylink the Python extension (picture 1 below); you can also use the Python extension to pull Take the data table, etc. (Figure 2 below).
image.png
image.png

V. Summary

Table Extension-Python.png

Click here to download related resources , download free of points~~~
image.png

Reference content:
Gan Zheng-How to use the Table Extension of Tableau 2022.3?
Tableau Analytics Extensions API Documentation
Tableau Official Table Extension User Manual

Guess you like

Origin blog.csdn.net/qq_45476428/article/details/127722480