MySQL implements data bursting and splitting (similar to the splitting array function of Hive's explode function)

MySQL implements data bursting and splitting (similar to the splitting array function of Hive's "explode" function)

demand background

background description

In Hive, the "explode" function is used to split an array-type column into multiple rows so that each element in the array can be processed. However, in MySQL, there is no direct equivalent. However, we can use some tricks to simulate this function, and realize the operation of splitting the array and querying it in MySQL. This article will introduce how to implement the split array function similar to Hive's "explode" function in MySQL.

​ Scenario simulation: Suppose we have a wow_infotable named , which contains a column containing a list of numbers separated by vertical bars tianfu, and we want to split each talent into multiple rows for query.

For example, the original sample of data:

tianfuIt is hoped that the different values ​​in the last column |will be split according to each value, and the target result is:

​ Generally, such scenarios are processed in the data warehouse, but occasionally there will be situations where processing tasks are pre-processed. The implementation ideas are as follows.

implementation strategy

​ Use MySQL's built-in functions SUBSTRING_INDEX and FIND_IN_SET to implement the "explode" function similar to Hive

  1. SUBSTRING_INDEX:
    • SUBSTRING_INDEX(str, delim, count)The function returns the substring of the string that occurs before or after strthe specified delimiter .delimcountcount
    • This function can be used for string splitting and interception operations. It accepts three parameters: strit is the string to be processed, delimit is the delimiter, and countit specifies the number of times to intercept.
    • A positive number countwill return the first occurrences of strin the string , and a negative number will return the last occurrences of the substring in the string .delimcountcountstrdelimcount
  2. FIND_IN_SET:
    • FIND_IN_SET(str, str_list)str_listThe function finds the position of the specified string in the comma-separated list of strings str.
    • This function can be used to check if a given string exists in a comma-separated list and return the corresponding position. The return value is the index of the position (1-based) if a match is found, or 0 otherwise.
    • It takes two arguments: stris the string to look for, str_listand is a comma-separated list of strings.

These functions are very useful in data manipulation and querying, especially when working with strings, splitting, and searching. They can be used in conjunction with other MySQL functions and query statements, providing flexibility and convenience.

Realize requirements

Here are some dummy data as an example, the principle remains the same

use wow;

CREATE TABLE `wow_info` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '角色id',
  `role` varchar(255) DEFAULT NULL COMMENT '角色简称',
  `role_cn` varchar(255) DEFAULT NULL COMMENT '角色类型',
  `role_pinyin` varchar(255) DEFAULT NULL COMMENT '角色拼音',
  `zhuangbei` varchar(255) DEFAULT NULL COMMENT '装备类型',
  `tianfu` varchar(255) DEFAULT NULL COMMENT '天赋类型',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=14 DEFAULT CHARSET=utf8;

INSERT INTO `wow_info` VALUES (1, 'fs', '法师', 'fashi', '布甲', '冰法|火法|奥法');
INSERT INTO `wow_info` VALUES (2, 'ms', '牧师', 'mushi', '布甲', '神牧|戒律|暗牧');
INSERT INTO `wow_info` VALUES (3, 'ss', '术士', 'shushi', '布甲', '毁灭|痛苦|恶魔');
INSERT INTO `wow_info` VALUES (4, 'dz', '盗贼', 'daozei', '皮甲', '狂徒|刺杀|敏锐');
INSERT INTO `wow_info` VALUES (5, 'ws', '武僧', 'wuseng', '皮甲', '酒仙|踏风|织雾');
INSERT INTO `wow_info` VALUES (6, 'xd', '德鲁伊', 'xiaode', '皮甲', '恢复|平衡|野性|守护');
INSERT INTO `wow_info` VALUES (7, 'dh', '恶魔猎手', 'emolieshou', '皮甲', '复仇|浩劫');
INSERT INTO `wow_info` VALUES (8, 'lr', '猎人', 'lieren', '锁甲', '兽王|生存|射击');
INSERT INTO `wow_info` VALUES (9, 'sm', '萨满', 'saman', '锁甲', '恢复|增强|元素');
INSERT INTO `wow_info` VALUES (10, 'long', '龙人', 'longren', '锁甲', '湮灭|恩护|增辉');
INSERT INTO `wow_info` VALUES (11, 'dk', '死亡骑士', 'siwangqishi', '板甲', '鲜血|冰霜|邪恶');
INSERT INTO `wow_info` VALUES (12, 'zs', '战士', 'zhanshi', '板甲', '武器|狂暴|防护');
INSERT INTO `wow_info` VALUES (13, 'sq', '圣骑士', 'shengqi', '板甲', '神圣|防护|惩戒');

The code implements SQL:

SELECT role
	, SUBSTRING_INDEX(SUBSTRING_INDEX(tianfu, '|', numbers.n), '|', -1) AS exploded_value
FROM wow.wow_info
	JOIN (
		SELECT 1 AS n
		UNION ALL
		SELECT 2
		UNION ALL
		SELECT 3
		UNION ALL
		SELECT 4
	) numbers
	ON CHAR_LENGTH(tianfu) - CHAR_LENGTH(REPLACE(tianfu, '|', '')) >= numbers.n - 1;

'''
1	fs	法师	fashi	布甲	冰法|火法|奥法
2	ms	牧师	mushi	布甲	神牧|戒律|暗牧
3	ss	术士	shushi	布甲	毁灭|痛苦|恶魔
4	dz	盗贼	daozei	皮甲	狂徒|刺杀|敏锐
5	ws	武僧	wuseng	皮甲	酒仙|踏风|织雾
6	xd	德鲁伊	xiaode	皮甲	恢复|平衡|野性|守护
7	dh	恶魔猎手	emolieshou	皮甲	复仇|浩劫
8	lr	猎人	lieren	锁甲	兽王|生存|射击
9	sm	萨满	saman	锁甲	恢复|增强|元素
10	long	龙人	longren	锁甲	湮灭|恩护|增辉
11	dk	死亡骑士	siwangqishi	板甲	鲜血|冰霜|邪恶
12	zs	战士	zhanshi	板甲	武器|狂暴|防护
13	sq	圣骑士	shengqi	板甲	神圣|防护|惩戒
'''

search result:

id role_cn tianfu
1	法师	冰法
1	法师	火法
1	法师	奥法
2	牧师	神牧
2	牧师	戒律
2	牧师	暗牧
3	术士	毁灭
3	术士	痛苦
3	术士	恶魔
4	盗贼	狂徒
4	盗贼	刺杀
4	盗贼	敏锐
5	武僧	酒仙
5	武僧	踏风
5	武僧	织雾
6	德鲁伊	恢复
6	德鲁伊	平衡
6	德鲁伊	野性
6	德鲁伊	守护
7	恶魔猎手	复仇
7	恶魔猎手	浩劫
8	猎人	兽王
8	猎人	生存
8	猎人	射击
9	萨满	恢复
9	萨满	增强
9	萨满	元素
10	龙人	湮灭
10	龙人	恩护
10	龙人	增辉
11	死亡骑士	鲜血
11	死亡骑士	冰霜
11	死亡骑士	邪恶
12	战士	武器
12	战士	狂暴
12	战士	防护
13	圣骑士	神圣
13	圣骑士	防护
13	圣骑士	惩戒

Summarize

Note that the subquery in the above example (SELECT 1 AS n UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4)is scaled for the maximum number of elements in the array. You can modify the subquery to accommodate arrays of different lengths as needed.

If there are a lot of elements here, it should affect query performance

​ Conclusion: By using MySQL's built-in functions and some tricks, we can implement a split array function similar to Hive's "explode" function in MySQL. Although this method may not be as good as Hive's native functions in performance, for some simple scenarios, this method can help us achieve similar data operations.

​ In actual use, depending on specific needs and performance requirements, we may need to consider using other storage engines or more complex data models to process array data. However, for some simple queries and operations, the above method provides a way to achieve similar functionality.

Guess you like

Origin blog.csdn.net/wt334502157/article/details/131592000