MySQL implements data bursting and splitting (similar to the splitting array function of Hive's "explode" function)
demand background
background description
In Hive, the "explode" function is used to split an array-type column into multiple rows so that each element in the array can be processed. However, in MySQL, there is no direct equivalent. However, we can use some tricks to simulate this function, and realize the operation of splitting the array and querying it in MySQL. This article will introduce how to implement the split array function similar to Hive's "explode" function in MySQL.
Scenario simulation: Suppose we have a wow_info
table named , which contains a column containing a list of numbers separated by vertical bars tianfu
, and we want to split each talent into multiple rows for query.
For example, the original sample of data:
tianfu
It is hoped that the different values in the last column |
will be split according to each value, and the target result is:
Generally, such scenarios are processed in the data warehouse, but occasionally there will be situations where processing tasks are pre-processed. The implementation ideas are as follows.
implementation strategy
Use MySQL's built-in functions SUBSTRING_INDEX and FIND_IN_SET to implement the "explode" function similar to Hive
- SUBSTRING_INDEX:
SUBSTRING_INDEX(str, delim, count)
The function returns the substring of the string that occurs before or afterstr
the specified delimiter .delim
count
count
- This function can be used for string splitting and interception operations. It accepts three parameters:
str
it is the string to be processed,delim
it is the delimiter, andcount
it specifies the number of times to intercept. - A positive number
count
will return the first occurrences ofstr
in the string , and a negative number will return the last occurrences of the substring in the string .delim
count
count
str
delim
count
- FIND_IN_SET:
FIND_IN_SET(str, str_list)
str_list
The function finds the position of the specified string in the comma-separated list of stringsstr
.- This function can be used to check if a given string exists in a comma-separated list and return the corresponding position. The return value is the index of the position (1-based) if a match is found, or 0 otherwise.
- It takes two arguments:
str
is the string to look for,str_list
and is a comma-separated list of strings.
These functions are very useful in data manipulation and querying, especially when working with strings, splitting, and searching. They can be used in conjunction with other MySQL functions and query statements, providing flexibility and convenience.
Realize requirements
Here are some dummy data as an example, the principle remains the same
use wow;
CREATE TABLE `wow_info` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT '角色id',
`role` varchar(255) DEFAULT NULL COMMENT '角色简称',
`role_cn` varchar(255) DEFAULT NULL COMMENT '角色类型',
`role_pinyin` varchar(255) DEFAULT NULL COMMENT '角色拼音',
`zhuangbei` varchar(255) DEFAULT NULL COMMENT '装备类型',
`tianfu` varchar(255) DEFAULT NULL COMMENT '天赋类型',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=14 DEFAULT CHARSET=utf8;
INSERT INTO `wow_info` VALUES (1, 'fs', '法师', 'fashi', '布甲', '冰法|火法|奥法');
INSERT INTO `wow_info` VALUES (2, 'ms', '牧师', 'mushi', '布甲', '神牧|戒律|暗牧');
INSERT INTO `wow_info` VALUES (3, 'ss', '术士', 'shushi', '布甲', '毁灭|痛苦|恶魔');
INSERT INTO `wow_info` VALUES (4, 'dz', '盗贼', 'daozei', '皮甲', '狂徒|刺杀|敏锐');
INSERT INTO `wow_info` VALUES (5, 'ws', '武僧', 'wuseng', '皮甲', '酒仙|踏风|织雾');
INSERT INTO `wow_info` VALUES (6, 'xd', '德鲁伊', 'xiaode', '皮甲', '恢复|平衡|野性|守护');
INSERT INTO `wow_info` VALUES (7, 'dh', '恶魔猎手', 'emolieshou', '皮甲', '复仇|浩劫');
INSERT INTO `wow_info` VALUES (8, 'lr', '猎人', 'lieren', '锁甲', '兽王|生存|射击');
INSERT INTO `wow_info` VALUES (9, 'sm', '萨满', 'saman', '锁甲', '恢复|增强|元素');
INSERT INTO `wow_info` VALUES (10, 'long', '龙人', 'longren', '锁甲', '湮灭|恩护|增辉');
INSERT INTO `wow_info` VALUES (11, 'dk', '死亡骑士', 'siwangqishi', '板甲', '鲜血|冰霜|邪恶');
INSERT INTO `wow_info` VALUES (12, 'zs', '战士', 'zhanshi', '板甲', '武器|狂暴|防护');
INSERT INTO `wow_info` VALUES (13, 'sq', '圣骑士', 'shengqi', '板甲', '神圣|防护|惩戒');
The code implements SQL:
SELECT role
, SUBSTRING_INDEX(SUBSTRING_INDEX(tianfu, '|', numbers.n), '|', -1) AS exploded_value
FROM wow.wow_info
JOIN (
SELECT 1 AS n
UNION ALL
SELECT 2
UNION ALL
SELECT 3
UNION ALL
SELECT 4
) numbers
ON CHAR_LENGTH(tianfu) - CHAR_LENGTH(REPLACE(tianfu, '|', '')) >= numbers.n - 1;
'''
1 fs 法师 fashi 布甲 冰法|火法|奥法
2 ms 牧师 mushi 布甲 神牧|戒律|暗牧
3 ss 术士 shushi 布甲 毁灭|痛苦|恶魔
4 dz 盗贼 daozei 皮甲 狂徒|刺杀|敏锐
5 ws 武僧 wuseng 皮甲 酒仙|踏风|织雾
6 xd 德鲁伊 xiaode 皮甲 恢复|平衡|野性|守护
7 dh 恶魔猎手 emolieshou 皮甲 复仇|浩劫
8 lr 猎人 lieren 锁甲 兽王|生存|射击
9 sm 萨满 saman 锁甲 恢复|增强|元素
10 long 龙人 longren 锁甲 湮灭|恩护|增辉
11 dk 死亡骑士 siwangqishi 板甲 鲜血|冰霜|邪恶
12 zs 战士 zhanshi 板甲 武器|狂暴|防护
13 sq 圣骑士 shengqi 板甲 神圣|防护|惩戒
'''
search result:
id role_cn tianfu
1 法师 冰法
1 法师 火法
1 法师 奥法
2 牧师 神牧
2 牧师 戒律
2 牧师 暗牧
3 术士 毁灭
3 术士 痛苦
3 术士 恶魔
4 盗贼 狂徒
4 盗贼 刺杀
4 盗贼 敏锐
5 武僧 酒仙
5 武僧 踏风
5 武僧 织雾
6 德鲁伊 恢复
6 德鲁伊 平衡
6 德鲁伊 野性
6 德鲁伊 守护
7 恶魔猎手 复仇
7 恶魔猎手 浩劫
8 猎人 兽王
8 猎人 生存
8 猎人 射击
9 萨满 恢复
9 萨满 增强
9 萨满 元素
10 龙人 湮灭
10 龙人 恩护
10 龙人 增辉
11 死亡骑士 鲜血
11 死亡骑士 冰霜
11 死亡骑士 邪恶
12 战士 武器
12 战士 狂暴
12 战士 防护
13 圣骑士 神圣
13 圣骑士 防护
13 圣骑士 惩戒
Summarize
Note that the subquery in the above example
(SELECT 1 AS n UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4)
is scaled for the maximum number of elements in the array. You can modify the subquery to accommodate arrays of different lengths as needed.If there are a lot of elements here, it should affect query performance
Conclusion: By using MySQL's built-in functions and some tricks, we can implement a split array function similar to Hive's "explode" function in MySQL. Although this method may not be as good as Hive's native functions in performance, for some simple scenarios, this method can help us achieve similar data operations.
In actual use, depending on specific needs and performance requirements, we may need to consider using other storage engines or more complex data models to process array data. However, for some simple queries and operations, the above method provides a way to achieve similar functionality.