Spark SQL limit 函数第二个参数无法解析

问题

select * from ep02  where deadline >= '2020.01.31' and deadline <= '2022.08.31' sort by addAsymptomatic desc limit '0','10'
----------------------------------------------------------------------------------------------------------------------^^^

详细问题

springboot集成spark ,使用spark sql 进行数据查询,笔者想尝试通过limit函数传递两个参数,分别表示起始位置与截取记录长度,执行

select * from ep02  where deadline >= '2020.01.31' and deadline <= '2022.08.31' sort by addAsymptomatic desc limit '0','10'

sql语句时控制台报错

2023-02-06 22:14:05.629 ERROR 23036 --- [nio-9090-exec-1] o.a.c.c.C.[.[.[/].[dispatcherServlet]    : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is org.apache.spark.sql.catalyst.parser.ParseException: 
mismatched input ',' expecting <EOF>(line 1, pos 118)

== SQL ==
select * from ep02  where deadline >= '2020.01.31' and deadline <= '2022.08.31' sort by addAsymptomatic desc limit '0','10'
----------------------------------------------------------------------------------------------------------------------^^^
] with root cause

org.apache.spark.sql.catalyst.parser.ParseException: 
mismatched input ',' expecting <EOF>(line 1, pos 118)

== SQL ==
select * from ep02  where deadline >= '2020.01.31' and deadline <= '2022.08.31' sort by addAsymptomatic desc limit '0','10'
----------------------------------------------------------------------------------------------------------------------^^^

	at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:241) ~[spark-catalyst_2.11-2.4.4.jar:2.4.4]
	at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:117) ~[spark-catalyst_2.11-2.4.4.jar:2.4.4]
	at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48) ~[spark-sql_2.11-2.4.4.jar:2.4.4]
	at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69) ~[spark-catalyst_2.11-2.4.4.jar:2.4.4]
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642) ~[spark-sql_2.11-2.4.4.jar:2.4.4]
	at com.haut.edu.epidemicstatisticsbackend2.controller.Test_Hive.selectAsymptomaticByPage(Test_Hive.java:105) ~[classes/:na]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_261]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_261]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_261]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_261]
	at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205) ~[spring-web-5.3.15.jar:5.3.15]
	at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:150) ~[spring-web-5.3.15.jar:5.3.15]
	at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:117) ~[spring-webmvc-5.3.15.jar:5.3.15]
	at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:895) ~[spring-webmvc-5.3.15.jar:5.3.15]
	at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:808) ~[spring-webmvc-5.3.15.jar:5.3.15]
	at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87) ~[spring-webmvc-5.3.15.jar:5.3.15]
	at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1067) ~[spring-webmvc-5.3.15.jar:5.3.15]
	at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:963) ~[spring-webmvc-5.3.15.jar:5.3.15]
	at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1006) ~[spring-webmvc-5.3.15.jar:5.3.15]
	at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:898) ~[spring-webmvc-5.3.15.jar:5.3.15]
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:655) ~[tomcat-embed-core-9.0.56.jar:4.0.FR]
	at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:883) ~[spring-webmvc-5.3.15.jar:5.3.15]
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:764) ~[tomcat-embed-core-9.0.56.jar:4.0.FR]
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:227) ~[tomcat-embed-core-9.0.56.jar:9.0.56]
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162) ~[tomcat-embed-core-9.0.56.jar:9.0.56]
	at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53) ~[tomcat-embed-websocket-9.0.56.jar:9.0.56]
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189) ~[tomcat-embed-core-9.0.56.jar:9.0.56]
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162) ~[tomcat-embed-core-9.0.56.jar:9.0.56]
	at org.springframework.web.filter.CorsFilter.doFilterInternal(CorsFilter.java:91) ~[spring-web-5.3.15.jar:5.3.15]
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:117) ~[spring-web-5.3.15.jar:5.3.15]
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189) ~[tomcat-embed-core-9.0.56.jar:9.0.56]
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162) ~[tomcat-embed-core-9.0.56.jar:9.0.56]
	at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100) ~[spring-web-5.3.15.jar:5.3.15]
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:117) ~[spring-web-5.3.15.jar:5.3.15]
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189) ~[tomcat-embed-core-9.0.56.jar:9.0.56]
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162) ~[tomcat-embed-core-9.0.56.jar:9.0.56]
	at org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:93) ~[spring-web-5.3.15.jar:5.3.15]
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:117) ~[spring-web-5.3.15.jar:5.3.15]
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189) ~[tomcat-embed-core-9.0.56.jar:9.0.56]
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162) ~[tomcat-embed-core-9.0.56.jar:9.0.56]
	at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201) ~[spring-web-5.3.15.jar:5.3.15]
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:117) ~[spring-web-5.3.15.jar:5.3.15]
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189) ~[tomcat-embed-core-9.0.56.jar:9.0.56]
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162) ~[tomcat-embed-core-9.0.56.jar:9.0.56]
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:197) ~[tomcat-embed-core-9.0.56.jar:9.0.56]
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:97) [tomcat-embed-core-9.0.56.jar:9.0.56]
	at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:540) [tomcat-embed-core-9.0.56.jar:9.0.56]
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:135) [tomcat-embed-core-9.0.56.jar:9.0.56]
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92) [tomcat-embed-core-9.0.56.jar:9.0.56]
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:78) [tomcat-embed-core-9.0.56.jar:9.0.56]
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:357) [tomcat-embed-core-9.0.56.jar:9.0.56]
	at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:382) [tomcat-embed-core-9.0.56.jar:9.0.56]
	at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65) [tomcat-embed-core-9.0.56.jar:9.0.56]
	at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:895) [tomcat-embed-core-9.0.56.jar:9.0.56]
	at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1732) [tomcat-embed-core-9.0.56.jar:9.0.56]
	at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) [tomcat-embed-core-9.0.56.jar:9.0.56]
	at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191) [tomcat-embed-core-9.0.56.jar:9.0.56]
	at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659) [tomcat-embed-core-9.0.56.jar:9.0.56]
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) [tomcat-embed-core-9.0.56.jar:9.0.56]
	at java.lang.Thread.run(Thread.java:748) [na:1.8.0_261]

解决方案

原语句

s = spark.sql("select * from ep002 where deadline >= '" + startTime + "' and deadline <= '" + endTime + "' sort by addAsymptomatic desc limit '" + pageNum + "','" + pageSize+ "'").collectAsList();

修改语句

s = spark.sql("select * from (select *,row_number() over (order by accAsymptomatic/accConfirmedCases desc) as rank from ep002 where deadline >= '" + startTime + "' and deadline <= '" + endTime + "' ) temp where " + pageNum + " < rank and rank <= " + pageNum + pageSize).collectAsList();

原因

不同与多数sql 语句的limit函数,提供两个参数,分别表示起始位置与截取记录长度。spark sql的limit函数仅提供一个参数,表示截取记录长度。不能很好的查找某个区间范围的记录,为解决该问题,自spark1.5版本开始,在SparkSQL中,提供了row_number函数,此函数为窗口函数(window function)
具体语法为:

row_number() over (partition by 'xx' order by 'yy' desc) rank

具体含义为:根据表中字段进行分组(partition by),后根据表中的字段排序(order by),对于每个分组,给每条记录添加一个从1开始的行号
如果不使用partition by语句,则表示对整个dataframe表添加行号

其他解决方案

方案一

事实上,由于笔者的查询语句涉及到排序(order by)与指定查找区间(where),相对复杂,若只需要将数据分页展示,也可以使用如下查询语句

s = spark.sql("WITH count_ep002 AS (SELECT *, monotonically_increasing_id() AS count FROM ep002) SELECT * FROM count_ep002  WHERE count > "+ pageNum +" AND count < "+ pageNum + pageSize).collectAsList();

即为记录添加一个连续的 id 并搜索寄存器之间的起始位置和起始位置 + 截取记录长度

通过上述spark sql 查询上,将会查询出起始位置为pageNum ,结束位置为pageNum + pageSize的数据,
遗憾的是,该语句似乎仅仅将数据分页展示,笔者尝试修改该语句使其支持排序(order by)与指定查找区间(where),

s = spark.sql("WITH count_ep02 AS (SELECT *, monotonically_increasing_id() AS count FROM ep02 where deadline >= '" + startTime + "' and deadline <= '" + endTime +") SELECT * FROM count_ep02 WHERE count > 0 AND count < 5").collectAsList();
s = spark.sql("WITH count_ep02 AS (SELECT *, monotonically_increasing_id() AS count FROM ep02) SELECT * FROM count_ep02  WHERE count > "+ pageNum +" AND count < "+ pageNum + pageSize + " and deadline >= '" + startTime + "' and deadline <= '" + endTime).collectAsList();

都无法按照预期执行(spark sql无法正常解析),因此,若是需要涉及到排序(order by)与指定查找区间(where)。笔者建议读者采取row_number函数即可

方案二

与方案一几乎一致,修改数据(仓)库数据结构,增添id属性,按照id查询指定区间范围,但问题与方案一一致,无法支持排序(order by)与指定查找区间(where)。

参考文献

SparkSQL 数据分页及Top N
如何实现spark sql分页查询

原创不易
转载请标明出处
如果对你有所帮助 别忘啦点赞支持哈
在这里插入图片描述

猜你喜欢

转载自blog.csdn.net/T_Y_F_/article/details/128909308