关于sql一对多搜索分组排序优化问题

在一次项目场景中

现有A、B、C、D三张表，A和B、C的映射关系分别是一对多的关系，A表为业务信息主表，B、C为业务从表（存放各类D表的code，A的关联外键），D为用户信息（包括存放用户各种code）

业务需求为，搜索A表的主体信息，且根据一定的规则进行排序：用户的code同时等于B、C的code为第一优先级，用户code等于B的code排在第二优先级，用户code等于C的code排在第三优先级，A对这个系统开放的数据（B的code位特殊值）排在第四优先级这样的一个业务场，在项目的初期由于为了较快的实现业务使用较多的UNION 进行查询结果集的拼接，但是大量的进行UNION会造成数据库资源的开销过大，从而影响到平台业务的性能，当平台运营一段时间后，有了一定的数据积累，大量的UNION会是查询效率及其低下，而且不利于sql语句的维护

基于以上问题以及分析现有的框架支持得出两种优化方案：

方案一、将结构复杂且低效的sql拆分成几个高效的sql语句，即根据优先级的不同可以将以上的业务拆解成多个子集的数据（第一优先级、第二优先级、第三优先级、第四优先级）且查询出各个级别的数据的数量，再通过业务代码逻辑进行判断处理每次请求是应该取那些级别的数据

如：首次请求先去第一级别当第一级别不够再取第二级别的数据，第二次请求从上次取的位置继续往下取，一次类推，达到最终的业务需求，

优点：1、sql目的明确简单、2、业务代码实现也较为清晰，3、开发者进行系统维护是只需维护业务代码，这样维护成本低

缺点：业务代码在实现的过程中需要思路明了，代码实现的时候比较多繁琐

方案二、从sql上进行优化：1、使用冗余表，将B、C的code以逗号连接起来在冗余在一张表F，2、在查询的时候使用regexp正则表达式，将用户的code串联起来，在使用regexp去匹配F表的符合数据，在通过regexp对搜索的结果进行标注级别从而达到业务需求

原始sql

<select id="selectPageBySql" resultType="com.youx.qd.app.task.dto.app.TaskInfoIndexListDto">
		SELECT tb_all.id,	tb_all.cname,		tb_all.platform_type AS platformType,	tb_all.first_image AS firstImage,resource AS resource,	tb_all.industry_name AS industryName,	tb_all.high AS high,	tb_all.is_recommend AS isRecommend,tb_all.unit AS unit
		FROM
		(
				<![CDATA[( SELECT	*	FROM		A WHERE	    is_recommend = 1 AND `status` >40  AND is_delete = 0 AND start_time <=#{params.nowTime} and #{params.nowTime} <=end_time 		ORDER BY status, start_time DESC 	LIMIT 5)  ]]>
		UNION
		<foreach collection="params.industryCodeList" item="industryCode" index="myIndex"
			separator="UNION">
			(
					SELECT  A.* 	FROM	A LEFT JOIN B ON A.id=B.task_id	LEFT JOIN C ON A.id =C.task_id
					WHERE
					B.area_code = #{params.areaCode}
					<if test="industryCode.industryCodeList.size() > 0">
						 AND C.industry_code in
						<foreach close=")" collection="IndustryCode.industryCodeList"
							item="listItem" open="(" separator=",">
							#{listItem}
						</foreach>
					</if>
					<![CDATA[AND  A.platform_type = #{industryCode.platform}  AND   A.`status` >40	  AND  A.is_delete = 0 AND 	 A.start_time <=#{params.nowTime} and #{params.nowTime} <= A.end_time  ORDER BY  A.status, A.start_time DESC  LIMIT 99999 ]]>
			) 
		</foreach>
		UNION
	  (	
				SELECT	A.* FROM A,B
				WHERE
				<![CDATA[A.id =B.task_id AND B.area_code= #{params.areaCode} AND   A.`status` >40 AND  A.is_delete = 0 AND 	 A.start_time <=#{params.nowTime} and #{params.nowTime} <= A.end_time  
				ORDER BY A.status,A.start_time DESC  LIMIT 99999 ]]>
			) 
		UNION
		<foreach collection="params.industryCodeList" item="industryCode" index="myIndex"
			separator="UNION">
	    (
					SELECT	A.* FROM 	A,C	WHERE	A.id =C.task_id 
					<if test="industryCode.industryCodeList.size() > 0">
							 AND	C.industry_code in
							<foreach close=")" collection="spreadIndustryCode.industryCodeList"
								item="listItem" open="(" separator=",">
								#{listItem}
							</foreach>
					</if>
					<![CDATA[AND	A.platform_type = #{industryCode.platform} AND   A.`status` >40 AND  A.is_delete = 0 AND 	 A.start_time <=#{params.nowTime} and #{params.nowTime} <= A.end_time  ORDER BY  A.status,A.start_time DESC  LIMIT 99999 ]]>
			) 
		</foreach>
		UNION
		<![CDATA[( SELECT *	FROM		A  WHERE	 `status` >40	and  `status` !=80 AND is_delete = 0 AND 	start_time <=#{params.nowTime} and #{params.nowTime} <=end_time	 ORDER BY	status  ,start_time desc LIMIT 99999)]]>
		) AS tb_all
		<where>
			<if test="searchForm.taskType != null">
				AND tb_all.task_type=#{searchForm.taskType}
			</if>
			<if test="searchForm.chargingType != null">
				AND tb_all.charging_type=#{searchForm.chargingType}
			</if>

		</where>
		  LIMIT #{params.beginRow} ,#{params.pageSize}
	</select>

上面的sql使用了4个foreach 嵌套，导致当code的量在增加的时候，sql的UNION就会成倍的增加最终导致sql的执行效率低下从而拉低平台的运行

优化后：

SELECT	tti.id,tti.cname, tti.cname,tti.platform_type AS platformType,tti.first_image AS firstImage,cover_resource AS coverResource,	tti.industry_name AS industryName,	tti.high_income AS highIncome,	a.taskLevel
				FROM tb_task_info tti INNER JOIN
				(SELECT ttair.task_id,(2*(ttair.area_code REGEXP CONCAT(#{areaCode})))+(ttair.industry_code REGEXP #{industryCode}) as taskLevel FROM `tb_task_area_industry_redundancy` ttair WHERE ttair.area_code REGEXP CONCAT(#{areaCode},'|86')
				 <!--<if test="industryCode != null">
				 		自媒体行业定向
					 AND ttair.industry_code REGEXP #{industryCode}
				 </if>-->
				 ) AS a
				ON tti.id = a.task_id
ORDER BY tti.`status` ASC,tti.is_recommend DESC,a.taskLevel DESC

优点：1、sql处理效率极大提高2、业务代码的维护成本降低，只需调取一个查询的接口，不需进行反复的业务处理

缺点：sql语句有一点的辅助程度

关于sql一对多搜索分组排序优化问题

猜你喜欢